Tuesday, February 06, 2007

Five things to consider when writing a Mulgara Resolver

Ended up writing a longer response than I had planned to a query about writing a resolver in Mulgara today. I'm putting it here to keep a handle to it as it does cover the basic structure of the resolve() method in reasonable detail.

First it is important to realise that resolvers *don't* return triples - they return Resolutions. These are Tuples that provide bindings for the variables in the Constraint passed to resolve(). So in the case of <http://www.example.com/path/subpath> $predicate $object the resulting Resolution should have two variables ($predicate $object). In the case of <../subpath> <http://www.schema.com#parent> $subpath it will have one ($subpath).

You should also be aware that a Resolution can be unevaluated! It is not uncommon for bindings, required to evaluate the constraint, come from other parts of the query. Consider the following where clause:

$url $p $o in <rmi://localhost/server1#sample> 
and 
<myfile> <hasurl> $url
in this case your resolver will be asked to resolve ($url $p $o), return a Resolution that will later be passed the $url in the prefix argument to beforeFirst(). Evaluation would then occur either in beforeFirst() or in the calls to next() - we prefer it to happen in beforeFirst if the memory requirement isn't unreasonable, our algorithmic reasoning assumes a comparatively cheap next().

If you require that a particular variable be bound prior to final evaluation then you need to provide a MandatoryBindingAnnotation - this indicates to the join logic that it must ensure a specific binding is satisfied by other constraints in the query before you are evaluated (in this case $url).

It is also worth noting that due to the support of intervals and the resulting interaction with query transformations, the XSDResolver is quite complicated as resolvers go. Without that a call to resolve consists of:

  1. Obtaining the model (constraint.getModel()).
  2. Do any preparatory work, especially any work that might be able to prove the result Empty (or a singleton).
  3. If you can't prove the result empty (or singleton), defer further evaluation to the returned Resolution.
Then inside the Resolution you need to consider how you implement the following four key methods
MandatoryBindingAnnotation
are there any variables that *must* be bound for the deferred evaluation to terminate.
DefinablePrefixAnnotation
can you cheaply reorder the variables in the result (log n or less)
ReresolvableResolution
can you cheaply reresolve the constraint if additional information becomes available (again log n or less) [note: this will become an Annotation like the other two in the Mulgara 1.2 dev-cycle]
beforeFirst()
you can ignore the suffixTruncation arg, but you can't ignore the prefix - these *are* the values of the first N variables of the resolution - if all the variables are passed as a prefix your only decision is 1 row or 0 - but most of the time you will be passed less than this.
At this point you have either performed the evaluation, or you have setup the evaluation and deferred the rest to be done incrementally on each call to next().
next()
does whatever is required to ensure that calls to getColumnValue().
There is only one Tuple class that defers evaluation beyond this point (the implementation of count()). Naturally we don't want to go to the effort of evaluating an entire subquery until the user actually goes to use it - so we defer evaluation of the count() until the call to getColumnValue().
getColumnValue()
normally this is a matter of returning values calculated in either beforeFirst() or next() - occasionally this amounts to evaluating it but this is uncommon.

The whole point of the Resolution/Tuples/beforeFirst/next bother is to implement lazy-evaluation in java. We only scale to bignum-levels when all query evaluation is done on a call-by-need basis.

No comments: