Tag Archives: Content ID

Audio/Video Processing

It’s time to talk about the last topic from my list of possible solutions. This is by far the most difficult and is not yet a feasible option. However, it is definitely worth talking about.

Audio/Video processing. In particular, I’m talking about looking at video or listening to audio and characterizing it as… well, anything. Is the video a concert? a video game? animation? home video? Is the audio pop music? rock? rap? country? folk? Is it just talking? If you know who might be talking, is it possible to determine the speaker? For a human watching a video and listening to its audio, this is a simple task. You can tell just from looking at it whether it’s animated or real. You can easily tell a concert from a video game. You can differentiate different styles of music and tell apart different speakers. But for computers, this is a near impossible task.

Images are extremely difficult to process. The computer is shown a matrix of colors which represent the pixels of the image. It then has to use that information to figure out what each individual section represents, even though many things have similar shapes and colors, sizes vary from picture to picture, and images might not be entirely clear. A red bouncy ball may be indistinguishable from an apple. Even if the computer is looking for something, it has to realize that the target object could be in any size, position, or resolution and may not be shown in its entirety. So, if a single image is extremely difficult to process, consider how difficult a video (many, many images coming one after another) is to process…

Similarly, audio is presented to the computer as a wavelength. If it has different sources, it may get multiple wavelengths that play at the same time, but it is still very difficult to categorize anything unless the computer is told what patterns of wavelengths are characteristic of different categories (e.g. pop, rock, rap, etc.). Simply put, we don’t have those characteristics, and whatever estimations we have are just that: estimations.

Why does this matter for automated copyright infringement detection systems like Content ID? Let me show you some examples of what we might want to do for the system.

  • Recognize a concert venue in a video. Just this alone can help categorize a video as music and can allow the system to be more strict with its flagging.
  • Recognize a video game based on screenshots of gameplay. This can help identify the game being played and can allow Let’s Players to play their games without worrying about the in-game music causing a flag of the system.
  • Separate talking from music. Even if people play music in the background, it would be great to know what part of the audio is just speech, so that can be left unmuted.
  • Identify voices from games based on samples. The more audio matches in-game dialogue, the more obvious it is that a game is being played. Identifying the game can make it clear whether or not there is copyright infringement.

These are just a few examples of what audio/video processing could do. Most importantly, it can differentiate music from video games, which is where the big problem of copyright comes into play. There is a conflict of interests, as the music industry is much more strict with copyright than the video game industry. Being able to categorize videos as music videos/concerts/lyric videos, walkthroughs/reviews/Let’s Plays, and “other” would be a great step toward being able to enforce copyright without being overzealous and flagging game channels. That said, it may be too much effort for too little improvement over other alternatives.

So now I’d like to ask: Which of the options that I’ve presented seems the most feasible? What seems like it would work well? Would not work well? Is there anything I’ve missed?

Selective Muting

Here is a look at the fourth of my list of possible solutions. Last week, I looked at the options for Content ID and determined that both the options for muting and blocking the flagged video have problems.

Essentially, a flagging system needs two things to work well: accuracy and precision. Accuracy is a measure of how well it can correctly pick out videos which violate copyright (amount of correctly flagged or correctly unflagged audio / all audio). The more videos it fails to flag (which the music industry cares about), the lower the accuracy. However, the more videos is misflags (which the users care about), the lower the accuracy as well. This is the problem of false flags. Precision is similar but doesn’t care about missed videos (amount of correctly flagged audio / amount of flagged audio). The problem with muting or blocking an entire video is that it is not precise: it’s possible that only a small portion of the video contains copyrighted audio.

So, how can we solve this? Well, on the user side, there’s the option of changing the audio. If the audio is the problem, just switch to new audio, and everything would be fine. However, we want to fix the precision of the system, so we want the system to do the fix itself. Thus, we need to have the system mute only the portion which has copyrighted audio. There are two ways to do this: a simple way, and a more complicated way.

1. The simple way: Mute the entire audio where there is infringing audio. Content ID should already know what audio is infringing copyright and can compare the audio is the flagged video with the audio in its database to pinpoint where exactly the audio can be found. It can then mute the audio in only those segments and leave the rest of the video untouched. In most cases, this will be good enough. Admittedly, there will be some instances where someone will talk over copyrighted audio (e.g. a Let’s Player talking over in-game music, where viewers want to listen to what he says), in which case muting everything might be less than ideal…

2. The complicated way: Remove just the infringing audio. I don’t want to go into the technical part of how this is done, but it’s essentially like a subtraction problem (though much more complex). You have audio (A) with multiple people (p1 and p2) talking. You have one person’s speech (p1) that you’d like to remove from the audio. Since the audio A = p1 + p2, you just subtract p1 from it, and you’re left with p2. Similar techniques are used to remove background noise from audio, and YouTube has a beta version of such a system to remove the copyrighted audio. If this method could be perfected, this should solve the entire problem of precision.

If at least one of these systems could be implemented well, Content ID would be greatly improved. Aside from that, the only problem to tackle is that of accuracy, which is extremely difficult. It can be improved by limiting who can claim copyrighted material (Verification) and by giving approval ahead of time (Whitelisting). The system will likely never be perfect, but by implementing all the solutions that I have mentioned in some way, a flagging system should be much improved, to a point where it should be difficult to further improve it.

Content ID Options

After quite a delay, I’m back to discuss the third of my possible solutions for the problem: changing the options for content owners (what they can do with videos containing their copyrighted audio). Let’s take a look at all the options that Content ID provides, and I’ll mention the pros and cons of them. For reference, this information is on the Content ID page, under the section “What options are available to copyright owners?”

1) Mute the video – Users may still watch the video, but the audio is muted. For instances where the entire audio is copyrighted, this is fine. The problem arises when only a small portion of the audio is copyrighted. This can be a bit overboard: as an extreme example, consider an hour long class project where you play a 3 minute song in the middle. That small segment causes your entire video to be muted, ruining it. Yes, this is a necessary option. However, consider implementing selective muting, so the precision of the flags is increased. If you can mute just that 3 minute song, both parties would be happy.

2) Block the video – This is similar but more extreme than muting. Essentially, this should only be done when copyrighted video is used (for example, video game content). As with muting, there is the problem of precision, but if selective blocking could be implemented, this option is not a problem.

3) Monetize the videoThis is the problematic option. Essentially, YouTube has a partner program, through which it runs ads on videos and gives a portion of the ad revenue to the uploader. The more ads are watched, the more the uploader gets. With the monetization, content holders can run ads and get ad revenue on videos flagged by Content ID. Admittedly, this replaces the mechanical license in the music industry (I explain music licenses in more detail here, for those of you interested in the music aspect of this blog). However, this option opens up the possibility for fraud: instead of just stopping uploaders from gaining money, the content holder earns money in their place. Any false flag which is monetized results in fraud: the content holder earns money from another person’s material, which is exactly what Content ID is supposed to prevent.

Now, this sounds bad, and it potentially is. However, as long as the content holders are policed or verified, this option shouldn’t be a problem. Still, perhaps it would be better to have a period of time during which neither uploader nor content holder gets the money, until the ownership can be settled. It worries me that money can change hands so quickly at just the words of an imperfect, automated system… I will admit that this has not become a big problem, and I believe that Content ID’s verification process is to congratulate for that. However, any new system should be careful concerning monetization of copyrighted material…

4) Track the video’s statistics – The last in the list, this is the mildest and least problematic. Simply put, it doesn’t create a problem for the uploader, but it allows the content holder to see how that video compares with their own. For any allowed use of copyrighted materials, this is the option that will probably be chosen.


All in all, the options for Content ID are pretty good. Two problems exist, however. The first is one of precision: copyrighted content anywhere in a video causes the entire video to be affected. The second is one of potential fraud: monetization is based on an imperfect system and trust in the integrity of the content holders. The first can be fixed with other solutions. The second, however, might require changing the options and removing monetization.

What are your thoughts on these options? Is there anything I’ve missed? Are these problems as bad as I think they are? Do you have an idea of an option to add? Whatever you have to offer, I look forward to hearing!

Covers, Remixes, and Compilations – Copyright in the music industry

In my last post, I talked about the gray areas of video game copyright. Now, it’s time for music to take the stage.

First, covers. These are songs which do not belong to the band that plays them, but they play them anyway. Simply put, you need a license if you plan on playing these songs in public or for profit. If you want to play it in public, odds are the venue already has a license. If you plan to make recordings, you have to pay a mechanical license (about 10¢) for each individual recording. If you wish to play a song in a video, it requires a synchronization license. However, if you wish to publish on a site like YouTube, the synchronization license is different: the copyright holder sets the price of the license. It could be as little or as much as they want, or they could simply not allow you to upload the video. For more detailed information on what you need where, this FAQ is helpful.

Second, remixes. This is a much grayer area of copyright law. A remix is when a song is altered, often by combining it with another song or by adjusting the genre of the music. Since it heavily relies upon existing music, most are derivative and require a license to use the music. However, in some instances, they are transformative (sufficiently altered, often for a different purpose than the original) and may be protected under fair use. Here is the gray area: how much must something be remixed in order to be considered transformative? In some cases, small changes can greatly change the genre of a song, while in others, large changes may not.

Third, compilations. These are the visual version of remixes, utilizing combination heavily, normally with a song or remix played alongside. Again, the question of derivative vs. transformative comes into play, with most being derivative. These often use substantially more sources than remixes, such as clips from multiple shows, images and art found online, and audio from possible multiple sources. As such, the risk of infringing copyright is higher and carries a heavier penalty. Still, shouldn’t these to some extent be considered transformative?

So. Here we have three different types of audio/video uploaded to YouTube. Each of them would be flagged by Content ID in most cases, drawing attention to it when it might otherwise be ignored or overlooked. YouTube handles the synchronization license by allowing content holders to impose ads on the videos and earning ad revenue. If they would prefer, they can block videos with their music instead. However, where does and where should they be protected by fair use? Sure, a remix may be based off a song, but if it is substantially different, shouldn’t it be its own work? How much needs to be changed, or how different need it be? Sure, a compilation takes many works and combines them, but if it is substantially different, shouldn’t it be its own work? Where should the line be drawn between creative and derivative? Rights should be protected, but so should creativity be encouraged. These questions need to be considered if systems such as Content ID are to be improved.

False Flags

You’re a musician, and you write your own music. In order to reach a wide audience, you put your songs up on YouTube to reach a wide audience. You get a good number of daily views and are happy to see that people are buying your songs on iTunes, even! Suddenly, you go to your account and find that your videos have been flagged as violating copyright, and you’re no longer receiving the ad revenue. Instead, it’s going to someone you’ve never even heard of. Understandably, you’re confused.

You review games for a living and receive thousands, sometimes millions of views daily for your YouTube videos. Many people enjoy your videos and trust your judgment, sometimes buying a game simply because you played it. You make sure to receive explicit permission from game makers before reviewing. Suddenly, one of these companies has flagged your videos and took down the videos. Understandably, you’re angry.

These are false flags: ContentID incorrectly flagged videos as violating copyright. In the first, it’s because someone claimed ownership of someone else’s content. In the second, it’s because the system doesn’t know when someone other than the copyright holder has been given the right to use the material. How can we fix the system to stop this from happening?

For the first, there’s a preliminary check. You can make sure that the content doesn’t match anything already in the database. If it is in the database, you know one of the two doesn’t own the content. However, if you don’t know who does own the content, you have to make an assumption, and it’s normally first-come, first-served. The more comprehensive but painstaking process is requiring proof of copyright. This should be the way it works, as a DMCA Takedown Notice first requires proof of copyright or authority to file a claim. However, the process of verifying each individual claim is lengthy; there’s a reason an automated process was chosen.

For the second, you can upload proof of right to use the material. However, the verification process can be lengthy. Instead, the content holder can whitelist people: he can list those he gave right of use. However, this requires action on their part; if they do not whitelist, someone who gained the right to use the material will be assumed not to have it. In addition, if there is a large number to whitelist, the process becomes lengthy again.

The best solution against these false flags is to verify each content-holder individually. However, this has to be done manually, since automated systems cannot be presumed perfect. Many would assume this not worth the work, but if it’s done right, it could be very helpful. However, the time taken to confirm a claim is time during which that copyright could be infringed… What do you think? What else could be done? Even if the situation can’t be fixed entirely, can it be improved?

P.S. If you’re interested in looking around, there have been some really hilarious false claims. For instance, someone received a copyright violation flag for a video which only included sounds of nature. Whoops.

Introduction

A first post, to set the scene, I suppose this is. Let’s bring the players in: YouTube and Twitch. The first I assume many are familiar with, the second… not so much. YouTube, for those of you who don’t know, is a very popular and successful video-sharing website which was bought by Google in 2006. Users can upload videos of whatever they want, so others may view it later. Twitch is a live-streaming website, where users may capture video and broadcast it live to anyone watching their channel. In addition, these broadcasts are split into chunks and stored as videos in case anyone wants to watch them later. Together, these sites make up much of the internet’s free video-viewing market. With video comes audio, and with both come copyright.

When people may upload whatever, there’s always a concern that they’re not uploading their own material. Add the anonymity of the internet to that, and these sites are just waiting to be subject to infringement and piracy. Many of you, I assume, have visited YouTube to listen to music. Do you watch official videos? Do you watch unofficial ones? Admittedly, for some songs, there are only unofficial videos. To the copyright holders, these unofficial videos not only result in fewer sales, but these other users are profiting from their music. That’s right. YouTube gives portions of the ad revenue to uploaders, so the more views your video gets, the more you can profit. In fact, some reviewers, gamers, and musicians use this feature to make a living from their videos. It’s a nice system when things go well, but when someone’s video isn’t theirs… something has to change.

Of course, there are billions upon billions of videos on YouTube.  It would be unfeasible to search through them all manually for whatever copyright violations might exist. Thus, the digital age spawned an automatic flagger: YouTube’s Content ID searches through all the videos and matches them to files in its system. It excludes the content holder’s own files, and the user can also whitelist people to whom it has given rights to use material. If there is a match, the video is flagged as potentially violating copyright. Content ID tells the person who put the content in its database that this video on YouTube may be infringing on copyright, and that content holder determines what to do about it. If they think there’s no infringement, they may remove the flag. Otherwise, they can track the video’s statistics, mute or block the video, or reroute funding from that video to themselves. In this way, copyright holders can feel secure that their content is only profiting themselves.

Now, we have a system which finds all potentially copyright-infringing material on YouTube and notifies the content holder.  What do you think about it? Is it perfect? Is it faulty? In what ways? In making a system like this, what concerns are there? For the copyright holders? For gray-area uploaders? I’ll talk more about my own thoughts in upcoming posts, but this is for you. What questions do you have for me? What suggestions? What are your thoughts? I’ve set the scene, and now it’s time for you to figure out just where we’re headed…