|
I'm having trouble interpreting the intended semantics of filter scoping in SPARQL. In the 2009 conformance test suite, they provide the following query:
And the following dataset (I'm trimming the header and datatype declarations for brevity):
The expected result has 12 binding sets, one for each combination of ?v to {1,2,3,4} and ?w to {1,2,3}, with no bindings for ?v2. There are two aspects in which scope is important: which block the FILTER will restrict, and how broadly the variables are visible. The RDF SPARQL query specification says (5.2.2) a "constraint, expressed by the keyword FILTER, is a restriction on solutions over the whole group in which the filter appears". In the query above, I interpret the "group in which the filter appears" as everything inside the OPTIONAL, and no farther. (Is this correct?) ...but how far can the variable reference (the "?v" in "?v=1") reach? The spec says (4.1.3) they have "global scope; use of a given variable name anywhere in a query identifies the same variable". So that filter ought to be able to "see" the outermost triple (:x :p ?v), right? If so, then why isn't the expected result for the above query this:
? The only possible alternate explanations I can come up with that explain the expected result, involve limiting the visibility of that FILTER to a degree that limits its usage in several potential real-world scenarios. Could this be a case where the test suite is wrong? Or is there something crucial I'm missing about the semantics here? |
|
The problem you are having in understanding the scoping is quite common to people new to SPARQL and stems from the fact that the SPARQL specification defines bottom up semantics. The test case is entirely correct and is there to ensure that SPARQL engines respect the scoping and evaluation rules correctly. What this means in practice is that SPARQL queries are not procedural rather that they are declarative. So in your example the The best way to understand the scoping of things in SPARQL (esp. if something looks counter-intuitive) is to convert it into the algebra form. So for your example the following algebra is given:
NB - I obtained this using the Query Validator at sparql.org and selecting the algebra option. As you can see this tallies with my explanation above, the filter expression applies within the The query in question can be easily rewritten to give something nearer to the behavior you expect by moving the filter to the outermost graph pattern:
If we compare the algebra for this we can see the difference:
With this query the However this won't actually give you the expected answer from your question because now we're eliminating all solutions where
Edit On the topic of global scoping of variables. A mention of some variable Note that this only applies to SPARQL 1.0, the almost finished SPARQL 1.1 specification introduces sub-queries so it is possible for you to use Hard to imagine a better answer than this; thanks Rob. While I'm new to SPARQL, I'm fortunately not new to declarative languages, so that much went well. (I'm working on a translator to a KIF variant, so the algebra conversion is quite straightforward, aside from knowing SPARQL semantics details. The translator output wasn't agreeing with the test, but it was agreeing with my reading of the spec.) Putting it together, then, if for some reason I did want the 21-row result in the OP, I'd add the FILTER you describe, anywhere outside the OPTIONAL in this case, looks like. Right? It would have to be outside the |
|
Just to emphasis a point Rob makes - there is a slightly special case here:
In this one case, the filter becomes part of the leftjoin condition, not just a FILTER on the { :x :p ?v2 } so it can refer to, for example, ?w defined in the first part of the optional in your example. (Otherwise, the query writer would have to repeat the pattern from the fixed size in the optional into the optional part.) |

