Tag Archives: Music

Audio/Video Processing

It’s time to talk about the last topic from my list of possible solutions. This is by far the most difficult and is not yet a feasible option. However, it is definitely worth talking about.

Audio/Video processing. In particular, I’m talking about looking at video or listening to audio and characterizing it as… well, anything. Is the video a concert? a video game? animation? home video? Is the audio pop music? rock? rap? country? folk? Is it just talking? If you know who might be talking, is it possible to determine the speaker? For a human watching a video and listening to its audio, this is a simple task. You can tell just from looking at it whether it’s animated or real. You can easily tell a concert from a video game. You can differentiate different styles of music and tell apart different speakers. But for computers, this is a near impossible task.

Images are extremely difficult to process. The computer is shown a matrix of colors which represent the pixels of the image. It then has to use that information to figure out what each individual section represents, even though many things have similar shapes and colors, sizes vary from picture to picture, and images might not be entirely clear. A red bouncy ball may be indistinguishable from an apple. Even if the computer is looking for something, it has to realize that the target object could be in any size, position, or resolution and may not be shown in its entirety. So, if a single image is extremely difficult to process, consider how difficult a video (many, many images coming one after another) is to process…

Similarly, audio is presented to the computer as a wavelength. If it has different sources, it may get multiple wavelengths that play at the same time, but it is still very difficult to categorize anything unless the computer is told what patterns of wavelengths are characteristic of different categories (e.g. pop, rock, rap, etc.). Simply put, we don’t have those characteristics, and whatever estimations we have are just that: estimations.

Why does this matter for automated copyright infringement detection systems like Content ID? Let me show you some examples of what we might want to do for the system.

  • Recognize a concert venue in a video. Just this alone can help categorize a video as music and can allow the system to be more strict with its flagging.
  • Recognize a video game based on screenshots of gameplay. This can help identify the game being played and can allow Let’s Players to play their games without worrying about the in-game music causing a flag of the system.
  • Separate talking from music. Even if people play music in the background, it would be great to know what part of the audio is just speech, so that can be left unmuted.
  • Identify voices from games based on samples. The more audio matches in-game dialogue, the more obvious it is that a game is being played. Identifying the game can make it clear whether or not there is copyright infringement.

These are just a few examples of what audio/video processing could do. Most importantly, it can differentiate music from video games, which is where the big problem of copyright comes into play. There is a conflict of interests, as the music industry is much more strict with copyright than the video game industry. Being able to categorize videos as music videos/concerts/lyric videos, walkthroughs/reviews/Let’s Plays, and “other” would be a great step toward being able to enforce copyright without being overzealous and flagging game channels. That said, it may be too much effort for too little improvement over other alternatives.

So now I’d like to ask: Which of the options that I’ve presented seems the most feasible? What seems like it would work well? Would not work well? Is there anything I’ve missed?

Selective Muting

Here is a look at the fourth of my list of possible solutions. Last week, I looked at the options for Content ID and determined that both the options for muting and blocking the flagged video have problems.

Essentially, a flagging system needs two things to work well: accuracy and precision. Accuracy is a measure of how well it can correctly pick out videos which violate copyright (amount of correctly flagged or correctly unflagged audio / all audio). The more videos it fails to flag (which the music industry cares about), the lower the accuracy. However, the more videos is misflags (which the users care about), the lower the accuracy as well. This is the problem of false flags. Precision is similar but doesn’t care about missed videos (amount of correctly flagged audio / amount of flagged audio). The problem with muting or blocking an entire video is that it is not precise: it’s possible that only a small portion of the video contains copyrighted audio.

So, how can we solve this? Well, on the user side, there’s the option of changing the audio. If the audio is the problem, just switch to new audio, and everything would be fine. However, we want to fix the precision of the system, so we want the system to do the fix itself. Thus, we need to have the system mute only the portion which has copyrighted audio. There are two ways to do this: a simple way, and a more complicated way.

1. The simple way: Mute the entire audio where there is infringing audio. Content ID should already know what audio is infringing copyright and can compare the audio is the flagged video with the audio in its database to pinpoint where exactly the audio can be found. It can then mute the audio in only those segments and leave the rest of the video untouched. In most cases, this will be good enough. Admittedly, there will be some instances where someone will talk over copyrighted audio (e.g. a Let’s Player talking over in-game music, where viewers want to listen to what he says), in which case muting everything might be less than ideal…

2. The complicated way: Remove just the infringing audio. I don’t want to go into the technical part of how this is done, but it’s essentially like a subtraction problem (though much more complex). You have audio (A) with multiple people (p1 and p2) talking. You have one person’s speech (p1) that you’d like to remove from the audio. Since the audio A = p1 + p2, you just subtract p1 from it, and you’re left with p2. Similar techniques are used to remove background noise from audio, and YouTube has a beta version of such a system to remove the copyrighted audio. If this method could be perfected, this should solve the entire problem of precision.

If at least one of these systems could be implemented well, Content ID would be greatly improved. Aside from that, the only problem to tackle is that of accuracy, which is extremely difficult. It can be improved by limiting who can claim copyrighted material (Verification) and by giving approval ahead of time (Whitelisting). The system will likely never be perfect, but by implementing all the solutions that I have mentioned in some way, a flagging system should be much improved, to a point where it should be difficult to further improve it.

Content ID Options

After quite a delay, I’m back to discuss the third of my possible solutions for the problem: changing the options for content owners (what they can do with videos containing their copyrighted audio). Let’s take a look at all the options that Content ID provides, and I’ll mention the pros and cons of them. For reference, this information is on the Content ID page, under the section “What options are available to copyright owners?”

1) Mute the video – Users may still watch the video, but the audio is muted. For instances where the entire audio is copyrighted, this is fine. The problem arises when only a small portion of the audio is copyrighted. This can be a bit overboard: as an extreme example, consider an hour long class project where you play a 3 minute song in the middle. That small segment causes your entire video to be muted, ruining it. Yes, this is a necessary option. However, consider implementing selective muting, so the precision of the flags is increased. If you can mute just that 3 minute song, both parties would be happy.

2) Block the video – This is similar but more extreme than muting. Essentially, this should only be done when copyrighted video is used (for example, video game content). As with muting, there is the problem of precision, but if selective blocking could be implemented, this option is not a problem.

3) Monetize the videoThis is the problematic option. Essentially, YouTube has a partner program, through which it runs ads on videos and gives a portion of the ad revenue to the uploader. The more ads are watched, the more the uploader gets. With the monetization, content holders can run ads and get ad revenue on videos flagged by Content ID. Admittedly, this replaces the mechanical license in the music industry (I explain music licenses in more detail here, for those of you interested in the music aspect of this blog). However, this option opens up the possibility for fraud: instead of just stopping uploaders from gaining money, the content holder earns money in their place. Any false flag which is monetized results in fraud: the content holder earns money from another person’s material, which is exactly what Content ID is supposed to prevent.

Now, this sounds bad, and it potentially is. However, as long as the content holders are policed or verified, this option shouldn’t be a problem. Still, perhaps it would be better to have a period of time during which neither uploader nor content holder gets the money, until the ownership can be settled. It worries me that money can change hands so quickly at just the words of an imperfect, automated system… I will admit that this has not become a big problem, and I believe that Content ID’s verification process is to congratulate for that. However, any new system should be careful concerning monetization of copyrighted material…

4) Track the video’s statistics – The last in the list, this is the mildest and least problematic. Simply put, it doesn’t create a problem for the uploader, but it allows the content holder to see how that video compares with their own. For any allowed use of copyrighted materials, this is the option that will probably be chosen.


All in all, the options for Content ID are pretty good. Two problems exist, however. The first is one of precision: copyrighted content anywhere in a video causes the entire video to be affected. The second is one of potential fraud: monetization is based on an imperfect system and trust in the integrity of the content holders. The first can be fixed with other solutions. The second, however, might require changing the options and removing monetization.

What are your thoughts on these options? Is there anything I’ve missed? Are these problems as bad as I think they are? Do you have an idea of an option to add? Whatever you have to offer, I look forward to hearing!

Content Verification

As mentioned last week, there are many possible fixes to the system. This week, I will be looking at the first proposed solution: verification.

Regardless of whether you’re a gamer or musician, verification is extremely important. If just anyone can claim they own content, there is a problem. Someone can claim rights to a game or song that no one else has claimed yet, and they would reap in the profits until someone noticed. Now, on YouTube, you may notice that some channels have little checks next to their names. This means they are verified as who they say they are. Yes, YouTube already has a verification system in place.

The solution sounds simple, then, right? Only allow verified users to claim content, and trust them to claim only their own content. This is definitely the first step to a good system, but there are a few things it does not take into account.

1) Can we trust everyone to know what they own and what they don’t?

When a musician publishes a work, oftentimes they go through companies. Take Adele for example. In the USA, she works with Columbia Records, who are a part of Sony Music Entertainment. Now, let’s look at her song Skyfall, which was featured in the movie of the same name. The movie was produced by Eon Productions and distributed by MGM and Sony.

With this one song, there are five different people associated: Adele herself, Columbia Records, Sony, Eon Productions, and MGM. Each of them might include the song as content holders. They would all be verified, and they would all have reason to include the song, whether by itself or as part of a movie. Now, say someone uploads the clip of the song playing in the movie. Who decides whether that’s allowed?

Movies aren’t the only unclear cases. Many games use licensed songs. Games like Dance Dance Revolution, Guitar Hero, and Rock Band prominently feature licensed songs! If a game company gives permission to use their game in a video, does the video maker have to also get permission for the individual songs? Simply put, when the music is used as part of the game, the game maker determines whether or not it can be used. Will the system distinguish between these cases? It’s difficult to, and this is one source of false flags which verification cannot solve.

2) What about content holders who are ineligible for verification?

According to YouTube, there are certain requirements to be eligible for a verified name. The concerning one is “a substantial number of subscribers.” You may be able to verify that you are who you say you are. You may be able to verify that you own what you say you own. However, if you are not popular enough, it appears that YouTube doesn’t care if people rip off your songs. Perhaps this is not a big factor into the verification process, and perhaps it would only create a few problems. Regardless, it is something to consider when attempting a perfect system.


Even if verification is not perfect, the presence of verification is of great importance. Just this simple addition minimizes the chance of fraud and resultant false flags. Admittedly, even if the system is perfectly verified, so every content holder owns everything they say they own, it will not be perfect. There will be mismatches. There is no sense of fair use. However, verification is an easy step toward making the system as perfect as possible.

What are your thoughts on the above? How should verified content be handled in situations where there are multiple content holders? Any ideas about potential improvements to a verification system?

The Problem – What makes a good system?

Over the past few weeks, I’ve introduced you to the basics: the general problem, the game industry, and the music industry. From here on out, we’re going to be looking for a solution to this problem. What problem? Constructing a copyright detection system which is as good as possible by improving on the current ones. In order to do that, we need to look at what makes a good system.

The two industries I’ve brought up are important, as they show the two sides of the system. On the one hand, you have the music industry. They fiercely attempt to protect audio copyright, with cases such as Vanilla Ice eventually paying royalties for “Ice, Ice Baby” to Queen for copying part of “Under Pressure” and Men At Work being sued for royalties for “Down Under” to Larrikin Music for copying part of “Kookaburra.” They care very much that copyright is being followed, and they will not hesitate to sue if they believe a substantial (no matter how small!) part of a song is being copied without the appropriate license. They do want a harsh flagging system: it catches many things that they would miss.

On the other hand, you have the game industry. They are much more lax, allowing reviews, walkthroughs, and general game content to be uploaded, even for profit, without a problem, as it normally increases publicity and doesn’t detract from sales. Some things which, by law, are copyright infringement, they purposefully turn a blind eye to, when they don’t explicitly allow it. They do not want a harsh flagging system: too few relevant videos are caught to make it worthwhile.

How do you balance these conflicting opinions? You can’t just remove a content flagging system: the music industry would object. There cannot be a manual system: it is near impossible to search through the same amount of material (“Content ID scans over 400 years of video every day”). Even an automated system has its problems: the current system has had many false flags, as discussed earlier. There must be a blend of automated and manual systems, but there must also be improvement. The current system needs to return as few false flags as possible, basically improving its precision. In order to do this, we need to figure out why false flags are being returned, and eliminate the causes. I have some ideas, but I’m always looking for more…

  1. Verify – One big problem is when someone claims to own content that they don’t. Assuming the best, they may actually believe it theirs but are mistaken. Alternatively, someone’s own content is flagged on their behalf. To fix this, the system needs to verify who owns what. Admittedly, that’s already happening, as it is one of the simplest fixes, but it is not perfect.
  2. Whitelist – On a similar note to the above, people may give permission to use content in a video. If so, there needs to be a whitelist function. They exist, and they are reasonable for small cases, but game companies may want to whitelist many videos, resulting in much extra work for them. A mass whitelist is perhaps better in that case, but it ends up missing valid flags…
  3. Alter the Options – Currently, a content holder can choose to monitor, block, or monetize flagged videos. Monitoring is perfectly fine, but it doesn’t make the music industry happy. Blocking is better, but it means a few seconds can cause the removal of an entire video. Monetizing is worse, as it opens the potential for fraud, if people can game the system (e.g. there is no Verification). Still, monetizing can essentially take the place of a music license if done legitimately.
  4. Selective Muting – Following the above: since there is technology to detect matches in audio (it’s what these systems do), there is also technology to segment that audio and mute only the copyrighted parts instead of blocking the video or rerouting funding. YouTube offers it as a suggestion, but it doesn’t really solve the problem of false flags.
  5. Audio/Video Processing – This is a difficult task. Basically, the program looks at a video or listens to audio, scans it through the database, and, if there is a match, checks whether it is fair use or not. For simplicity, let’s just say it categorizes it as a “Gaming Video” or not. Perhaps machine learning could be a step in the right direction here, but this is the most difficult, albeit the best, fix to the system, in my opinion.

These are my thoughts so far, and I’ll be expanding on them in future posts. For now, I’d like to ask your thoughts. What problems do you see in the current system? What problems to you see in my suggested fixes? Do you have other possible routes to suggest? Please, throw out whatever thoughts you have! Whatever you give can only help open up more alternatives for research. I look forward to your suggestions!

Covers, Remixes, and Compilations – Copyright in the music industry

In my last post, I talked about the gray areas of video game copyright. Now, it’s time for music to take the stage.

First, covers. These are songs which do not belong to the band that plays them, but they play them anyway. Simply put, you need a license if you plan on playing these songs in public or for profit. If you want to play it in public, odds are the venue already has a license. If you plan to make recordings, you have to pay a mechanical license (about 10¢) for each individual recording. If you wish to play a song in a video, it requires a synchronization license. However, if you wish to publish on a site like YouTube, the synchronization license is different: the copyright holder sets the price of the license. It could be as little or as much as they want, or they could simply not allow you to upload the video. For more detailed information on what you need where, this FAQ is helpful.

Second, remixes. This is a much grayer area of copyright law. A remix is when a song is altered, often by combining it with another song or by adjusting the genre of the music. Since it heavily relies upon existing music, most are derivative and require a license to use the music. However, in some instances, they are transformative (sufficiently altered, often for a different purpose than the original) and may be protected under fair use. Here is the gray area: how much must something be remixed in order to be considered transformative? In some cases, small changes can greatly change the genre of a song, while in others, large changes may not.

Third, compilations. These are the visual version of remixes, utilizing combination heavily, normally with a song or remix played alongside. Again, the question of derivative vs. transformative comes into play, with most being derivative. These often use substantially more sources than remixes, such as clips from multiple shows, images and art found online, and audio from possible multiple sources. As such, the risk of infringing copyright is higher and carries a heavier penalty. Still, shouldn’t these to some extent be considered transformative?

So. Here we have three different types of audio/video uploaded to YouTube. Each of them would be flagged by Content ID in most cases, drawing attention to it when it might otherwise be ignored or overlooked. YouTube handles the synchronization license by allowing content holders to impose ads on the videos and earning ad revenue. If they would prefer, they can block videos with their music instead. However, where does and where should they be protected by fair use? Sure, a remix may be based off a song, but if it is substantially different, shouldn’t it be its own work? How much needs to be changed, or how different need it be? Sure, a compilation takes many works and combines them, but if it is substantially different, shouldn’t it be its own work? Where should the line be drawn between creative and derivative? Rights should be protected, but so should creativity be encouraged. These questions need to be considered if systems such as Content ID are to be improved.

False Flags

You’re a musician, and you write your own music. In order to reach a wide audience, you put your songs up on YouTube to reach a wide audience. You get a good number of daily views and are happy to see that people are buying your songs on iTunes, even! Suddenly, you go to your account and find that your videos have been flagged as violating copyright, and you’re no longer receiving the ad revenue. Instead, it’s going to someone you’ve never even heard of. Understandably, you’re confused.

You review games for a living and receive thousands, sometimes millions of views daily for your YouTube videos. Many people enjoy your videos and trust your judgment, sometimes buying a game simply because you played it. You make sure to receive explicit permission from game makers before reviewing. Suddenly, one of these companies has flagged your videos and took down the videos. Understandably, you’re angry.

These are false flags: ContentID incorrectly flagged videos as violating copyright. In the first, it’s because someone claimed ownership of someone else’s content. In the second, it’s because the system doesn’t know when someone other than the copyright holder has been given the right to use the material. How can we fix the system to stop this from happening?

For the first, there’s a preliminary check. You can make sure that the content doesn’t match anything already in the database. If it is in the database, you know one of the two doesn’t own the content. However, if you don’t know who does own the content, you have to make an assumption, and it’s normally first-come, first-served. The more comprehensive but painstaking process is requiring proof of copyright. This should be the way it works, as a DMCA Takedown Notice first requires proof of copyright or authority to file a claim. However, the process of verifying each individual claim is lengthy; there’s a reason an automated process was chosen.

For the second, you can upload proof of right to use the material. However, the verification process can be lengthy. Instead, the content holder can whitelist people: he can list those he gave right of use. However, this requires action on their part; if they do not whitelist, someone who gained the right to use the material will be assumed not to have it. In addition, if there is a large number to whitelist, the process becomes lengthy again.

The best solution against these false flags is to verify each content-holder individually. However, this has to be done manually, since automated systems cannot be presumed perfect. Many would assume this not worth the work, but if it’s done right, it could be very helpful. However, the time taken to confirm a claim is time during which that copyright could be infringed… What do you think? What else could be done? Even if the situation can’t be fixed entirely, can it be improved?

P.S. If you’re interested in looking around, there have been some really hilarious false claims. For instance, someone received a copyright violation flag for a video which only included sounds of nature. Whoops.

Introduction

A first post, to set the scene, I suppose this is. Let’s bring the players in: YouTube and Twitch. The first I assume many are familiar with, the second… not so much. YouTube, for those of you who don’t know, is a very popular and successful video-sharing website which was bought by Google in 2006. Users can upload videos of whatever they want, so others may view it later. Twitch is a live-streaming website, where users may capture video and broadcast it live to anyone watching their channel. In addition, these broadcasts are split into chunks and stored as videos in case anyone wants to watch them later. Together, these sites make up much of the internet’s free video-viewing market. With video comes audio, and with both come copyright.

When people may upload whatever, there’s always a concern that they’re not uploading their own material. Add the anonymity of the internet to that, and these sites are just waiting to be subject to infringement and piracy. Many of you, I assume, have visited YouTube to listen to music. Do you watch official videos? Do you watch unofficial ones? Admittedly, for some songs, there are only unofficial videos. To the copyright holders, these unofficial videos not only result in fewer sales, but these other users are profiting from their music. That’s right. YouTube gives portions of the ad revenue to uploaders, so the more views your video gets, the more you can profit. In fact, some reviewers, gamers, and musicians use this feature to make a living from their videos. It’s a nice system when things go well, but when someone’s video isn’t theirs… something has to change.

Of course, there are billions upon billions of videos on YouTube.  It would be unfeasible to search through them all manually for whatever copyright violations might exist. Thus, the digital age spawned an automatic flagger: YouTube’s Content ID searches through all the videos and matches them to files in its system. It excludes the content holder’s own files, and the user can also whitelist people to whom it has given rights to use material. If there is a match, the video is flagged as potentially violating copyright. Content ID tells the person who put the content in its database that this video on YouTube may be infringing on copyright, and that content holder determines what to do about it. If they think there’s no infringement, they may remove the flag. Otherwise, they can track the video’s statistics, mute or block the video, or reroute funding from that video to themselves. In this way, copyright holders can feel secure that their content is only profiting themselves.

Now, we have a system which finds all potentially copyright-infringing material on YouTube and notifies the content holder.  What do you think about it? Is it perfect? Is it faulty? In what ways? In making a system like this, what concerns are there? For the copyright holders? For gray-area uploaders? I’ll talk more about my own thoughts in upcoming posts, but this is for you. What questions do you have for me? What suggestions? What are your thoughts? I’ve set the scene, and now it’s time for you to figure out just where we’re headed…