Wednesday, February 14, 2007

The Effect of File Sharing on Record Sales: An Empirical Analysis

Felix Oberholzer-Gee Harvard University and Koleman Strumpf University of Kansas

Abstract

For industries ranging from software to pharmaceuticals and entertainment, there is an intense debate about the appropriate level of protection for intellectual property. The Internet provides a natural crucible to assess the implications of reduced protection because it drastically lowers the cost of copying information. In this paper, we analyze whether file sharing has reduced the legal sales of music. While this question is receiving considerable attention in academia, industry, and Congress, we are the first to study the phenomenon employing data on actual downloads of music files. We match an extensive sample of downloads to U.S. sales data for a large number of albums. To establish causality, we instrument for downloads using data on international school holidays. Downloads have an effect on sales that is statistically indistinguishable from zero. Our estimates are inconsistent with claims that file sharing is the primary reason for the decline in music sales during our study period.

In the Journal of Political Economy.

Available in Volume 115, Number 1, February 2007

From Ars Technica via Lawrence Lessig

Tuesday, February 13, 2007

Java generics and the covariance and contravariance of arguments

Well given we require 1.5 now for other reasons, and 1.5 does complain if you don't constrain generic classes I have finally bitten the bullet and started using generics. Unfortunately I just got bitten by what I suspect is going to be a very common mistake - in this case by failing to properly consider the type equivalence of parametrised method calls.

Consider the following code:

public interface TestInterface { }

public class TestClass implements TestInterface { }

import java.util.ArrayList;
import java.util.List;

public class Test {
 private List<testclass> list;

 public TestInterface test() {
   list = new ArrayList<testclass>();
   list.add(new TestClass());

   return covariant(list);
 }

 public TestInterface covariant(List<testinterface> ilist) {
   return ilist.remove(0);
 }
}
Now there is absolutely no reason why this should not work. It is trivially inferable that the above code treats ilist as covariant in the list-type - and that therefore this code is statically correct.

Of course Java's typing has never been particularly smart. List<t1>.add(T1) is contra-variant in t1, and T2 List<t2>.get(int) is co-variant in t2; so the Java compiler is correct to infer that in the general case List<t1> and List<t2> are substitutable iff t1 == t2.

If we can't declare a generic parameter to be covariant in its type parameter we have a serious problem - it means that any non-trivial algorithm involving collections is going to run afoul of this. You might consider trying to cast your way around it:

 public TestInterface test() {
   list = new ArrayList<testclass>();
   list.add(new TestClass());

   return covariant((List<testinterface>)list);
 }
but not surprisingly that didn't work.
Test.java:11: inconvertible types
found   : java.util.List<testclass>
required: java.util.List<testinterface>
 return convariant((List<testinterface>)list);
                                       ^
1 error
If you continue to hack at it you might try a double cast via a non-generic List.
 public TestInterface test() {
   list = new ArrayList<testclass>();
   list.add(new TestClass());

   return covariant((List<testinterface>)((List)list));
 }
This works but leaves us with the unchecked/unsafe operation warning:
Note: Test.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
Now this is a perfectly reasonable warning - it is unchecked; it is unsafe; and more importantly it does violate encapsulation. The problem here is that the caller should not be defining the type invariants of the callee - that's the job of the method signature!

The correct solution is to allow us to declare covariant() to be covariant in its argument; and fortunately Java does support this.

To declare an argument to be covariant in its type parameter you can use the extends keyword:

 public TestInterface covariant(List<? extends TestInterface> ilist) {
   return ilist.remove(0);
 }
To declare an argument to be contravariant in its type parameter you use the super keyword:
 public void contravariant(List<? super TestClass> clist, TestClass c) {
   clist.add(c);
 }
Without these two facilities generics would be badly broken, so I am glad Sun had the presence of mind to include them - btw if you are using Java 1.5 I strongly recommend you read the Java Generics Tutorial

As an aside it is worth noting that as Java includes a Top type 'Object', List is a common covariant type - sufficiently common that Sun has included a syntactic sugar for it, List. Personally I'm not sure this was such a good idea, List would work anyway, and I think I would prefer to have kept the covariance explicit.

Update: Corrected capitalisation error in initial java example.

Tuesday, February 06, 2007

Five things to consider when writing a Mulgara Resolver

Ended up writing a longer response than I had planned to a query about writing a resolver in Mulgara today. I'm putting it here to keep a handle to it as it does cover the basic structure of the resolve() method in reasonable detail.

First it is important to realise that resolvers *don't* return triples - they return Resolutions. These are Tuples that provide bindings for the variables in the Constraint passed to resolve(). So in the case of <http://www.example.com/path/subpath> $predicate $object the resulting Resolution should have two variables ($predicate $object). In the case of <../subpath> <http://www.schema.com#parent> $subpath it will have one ($subpath).

You should also be aware that a Resolution can be unevaluated! It is not uncommon for bindings, required to evaluate the constraint, come from other parts of the query. Consider the following where clause:

$url $p $o in <rmi://localhost/server1#sample> 
and 
<myfile> <hasurl> $url
in this case your resolver will be asked to resolve ($url $p $o), return a Resolution that will later be passed the $url in the prefix argument to beforeFirst(). Evaluation would then occur either in beforeFirst() or in the calls to next() - we prefer it to happen in beforeFirst if the memory requirement isn't unreasonable, our algorithmic reasoning assumes a comparatively cheap next().

If you require that a particular variable be bound prior to final evaluation then you need to provide a MandatoryBindingAnnotation - this indicates to the join logic that it must ensure a specific binding is satisfied by other constraints in the query before you are evaluated (in this case $url).

It is also worth noting that due to the support of intervals and the resulting interaction with query transformations, the XSDResolver is quite complicated as resolvers go. Without that a call to resolve consists of:

  1. Obtaining the model (constraint.getModel()).
  2. Do any preparatory work, especially any work that might be able to prove the result Empty (or a singleton).
  3. If you can't prove the result empty (or singleton), defer further evaluation to the returned Resolution.
Then inside the Resolution you need to consider how you implement the following four key methods
MandatoryBindingAnnotation
are there any variables that *must* be bound for the deferred evaluation to terminate.
DefinablePrefixAnnotation
can you cheaply reorder the variables in the result (log n or less)
ReresolvableResolution
can you cheaply reresolve the constraint if additional information becomes available (again log n or less) [note: this will become an Annotation like the other two in the Mulgara 1.2 dev-cycle]
beforeFirst()
you can ignore the suffixTruncation arg, but you can't ignore the prefix - these *are* the values of the first N variables of the resolution - if all the variables are passed as a prefix your only decision is 1 row or 0 - but most of the time you will be passed less than this.
At this point you have either performed the evaluation, or you have setup the evaluation and deferred the rest to be done incrementally on each call to next().
next()
does whatever is required to ensure that calls to getColumnValue().
There is only one Tuple class that defers evaluation beyond this point (the implementation of count()). Naturally we don't want to go to the effort of evaluating an entire subquery until the user actually goes to use it - so we defer evaluation of the count() until the call to getColumnValue().
getColumnValue()
normally this is a matter of returning values calculated in either beforeFirst() or next() - occasionally this amounts to evaluating it but this is uncommon.

The whole point of the Resolution/Tuples/beforeFirst/next bother is to implement lazy-evaluation in java. We only scale to bignum-levels when all query evaluation is done on a call-by-need basis.