Selective Muting

Here is a look at the fourth of my list of possible solutions. Last week, I looked at the options for Content ID and determined that both the options for muting and blocking the flagged video have problems.

Essentially, a flagging system needs two things to work well: accuracy and precision. Accuracy is a measure of how well it can correctly pick out videos which violate copyright (amount of correctly flagged or correctly unflagged audio / all audio). The more videos it fails to flag (which the music industry cares about), the lower the accuracy. However, the more videos is misflags (which the users care about), the lower the accuracy as well. This is the problem of false flags. Precision is similar but doesn’t care about missed videos (amount of correctly flagged audio / amount of flagged audio). The problem with muting or blocking an entire video is that it is not precise: it’s possible that only a small portion of the video contains copyrighted audio.

So, how can we solve this? Well, on the user side, there’s the option of changing the audio. If the audio is the problem, just switch to new audio, and everything would be fine. However, we want to fix the precision of the system, so we want the system to do the fix itself. Thus, we need to have the system mute only the portion which has copyrighted audio. There are two ways to do this: a simple way, and a more complicated way.

1. The simple way: Mute the entire audio where there is infringing audio. Content ID should already know what audio is infringing copyright and can compare the audio is the flagged video with the audio in its database to pinpoint where exactly the audio can be found. It can then mute the audio in only those segments and leave the rest of the video untouched. In most cases, this will be good enough. Admittedly, there will be some instances where someone will talk over copyrighted audio (e.g. a Let’s Player talking over in-game music, where viewers want to listen to what he says), in which case muting everything might be less than ideal…

2. The complicated way: Remove just the infringing audio. I don’t want to go into the technical part of how this is done, but it’s essentially like a subtraction problem (though much more complex). You have audio (A) with multiple people (p1 and p2) talking. You have one person’s speech (p1) that you’d like to remove from the audio. Since the audio A = p1 + p2, you just subtract p1 from it, and you’re left with p2. Similar techniques are used to remove background noise from audio, and YouTube has a beta version of such a system to remove the copyrighted audio. If this method could be perfected, this should solve the entire problem of precision.

If at least one of these systems could be implemented well, Content ID would be greatly improved. Aside from that, the only problem to tackle is that of accuracy, which is extremely difficult. It can be improved by limiting who can claim copyrighted material (Verification) and by giving approval ahead of time (Whitelisting). The system will likely never be perfect, but by implementing all the solutions that I have mentioned in some way, a flagging system should be much improved, to a point where it should be difficult to further improve it.

Leave a comment