Companies create data, and copyright that data. They then sometimes try to restrict what we do with that data. This bothers a lot of people. I realized something recently which is a huge flaw in the copyright system: data is not copyrightable. I'll try to explain.
Any digital data is a number. Numbers have no meaning without context. At least, other than quantity, which is not the meaning the copyright holders had in mind. The particular number is copyrighted because it represents an idea. And the point is to copyright the idea. Part of the problem is that the number is only a representation, and not the idea itself. Getting to that idea requires context.
An example: The band "Foo" writes a song. Call it "Bar". They copyright this song, or rather, they copyright the CD it is published on. The CD contains a 400MB number, which when put into a CD player, produces sound. There you have data, and a context. The data is on the CD, and the context is a CD player.
This much is pretty simple. But here's where the problem comes along. The copyrighted data can be converted to any number of other pieces of data, which are equivalent in a different context. The standard example is converting the data to an MP3, which uses an mp3 player for a context. That particular example allows several different contexts which, given the MP3 data, give an equivalent result.
The copyrighted idea is not a piece of data. And the copyright should not cover that data. Sometimes, Band Foo will try to apply copyright law to the MP3 as if it were the same as the data on the CD. But it's completely different data, and it uses a completely different context. So, Band Foo assumes that, because the MP3 data can produce the same result, their copyright extends to that MP3 file. And every other piece of data which can represent the same song.
But any piece of data is equivalent to any other piece of data, given the right context. More specifically, data A in context C will be equivalent to any other piece of data B, in context D. Data A and context C are known, and data B can be anything. It could be the number 3, or an image of the virgin Mary. All you need is context D and suddenly data B has the same meaning as data A. It doesn't matter, at all, what data A, data B, and context C are. There is always a context D which will produce equivalent results.
If that didn't make much sense, let me explain some of the implications. It means, in order to fully copyright a piece of data which represents an idea, you must copyright all possible pieces of data. You must own the copyright to every number under infinity. And good luck getting a copyright on the number 3.
Another example. Let's say you create a context, dict. You assign a number to each word in the dictionary. And everyone has the same dictionary. The word "lettuce" is represented by the number 58, and the word "tomato" is 59, for example. Then you encode a book in the dict context. It reads as a string of meaningless numbers. Until you interpret it in the dict context. Why would you want to copyright a string of meaningless numbers?
Books are already distributed this way.
The word "the" is actually represented by the number 0x746865 on most computers. The word "cat" is 0x636174. The context, really, is arbitrary. It doesn't matter much. It only needs to fit a particular piece of data to generate meaning. And the data can be anything. Literally anything. Every number has an infinite amount of meanings, because there are infinite contexts to interpret it with.
It is trivial to take any data and any context, and convert them to a new piece of data and a new context to produce exactly the same meaning. You can do this as many times as you like.
To make things worse, it is also trivial to combine copyrighted data C and public domain data P into data set D, that will produce the copyrighted meaning in one context, and the public domain meaning in another context. And doing so opens a whole new can of worms on the idea of copyrighting data.
Hopefully, I've got you convinced now that it is meaningless to copyright data. So, perhaps a copyright should include both the data and the context. But there is another problem. For any piece of data, there are an infinite number of contexts which produce exactly the same meaning.
How many different CD players are there in the world? Each CD player is a different context, but they all make basically the same sounds. Do you think Band "Foo" can copyright their CD, and also patent every CD player? Even if they could and did, someone would just make a new CD player. Could the band prevent this? It's pretty much impossible to control every context.
An extra problem is that each context can handle an infinite amount of data to produce an infinite number of meanings. It makes no sense to control a context for the purpose of protecting a finite number of meanings.
The point here is that data has no meaning by itself. Each piece of data is related to each possible meaning in an uncountable number of ways. The data can mean literally anything. So copyrighting data is meaningless. And copyrighting both the data plus the contexts which give it a particular meaning is impossible.
I think copyright and patent law need to be revoked or revised to deal with these problems, because our current system is not adaquate or fair.
Some side notes:
Some recent laws deal with parts of this. The DMCA, for example, makes it illegal to create a new context with which to interpret data. That seems incredibly unfair. It is illegal, for example, to make your own DVD player. To me, it appears that the people with power over the situation have almost no understanding of the problem they are trying to solve. I don't even see it as a problem, but that's a matter of opinion.
The new laws are not designed to ensure that people pay artists for their work. The laws are designed to give large corporations more control over what individuals do in private. That is a step in the wrong direction.