I'm having trouble interpreting the intended semantics of filter scoping in SPARQL.

In the 2009 conformance test suite, they provide the following query:

  SELECT *
  { 
      :x :p ?v . 
      { :x :q ?w 
        OPTIONAL {  :x :p ?v2 FILTER(?v = 1) }
      }
  }

And the following dataset (I'm trimming the header and datatype declarations for brevity):

  :x :p 1
  :x :p 2
  :x :p 3
  :x :p 4

  :x :q 1
  :x :q 2
  :x :q 3

The expected result has 12 binding sets, one for each combination of ?v to {1,2,3,4} and ?w to {1,2,3}, with no bindings for ?v2.

There are two aspects in which scope is important: which block the FILTER will restrict, and how broadly the variables are visible.

The RDF SPARQL query specification says (5.2.2) a "constraint, expressed by the keyword FILTER, is a restriction on solutions over the whole group in which the filter appears". In the query above, I interpret the "group in which the filter appears" as everything inside the OPTIONAL, and no farther. (Is this correct?) ...but how far can the variable reference (the "?v" in "?v=1") reach? The spec says (4.1.3) they have "global scope; use of a given variable name anywhere in a query identifies the same variable". So that filter ought to be able to "see" the outermost triple (:x :p ?v), right? If so, then why isn't the expected result for the above query this:

  v=1 w=1 v2=1   // v=1, so bind v2!
  v=1 w=1 v2=2
  v=1 w=1 v2=3
  v=1 w=1 v2=4
  v=1 w=2 v2=1
  v=1 w=2 v2=2
  v=1 w=2 v2=3
  v=1 w=2 v2=4
  v=1 w=3 v2=1
  v=1 w=3 v2=2
  v=1 w=3 v2=3
  v=1 w=3 v2=4
  v=2 w=1        // v!=1, so don't bind v2, and just keep v and w!
  v=2 w=2
  v=2 w=3
  v=3 w=1
  v=3 w=2
  v=3 w=3
  v=4 w=1
  v=4 w=2
  v=4 w=3

?

The only possible alternate explanations I can come up with that explain the expected result, involve limiting the visibility of that FILTER to a degree that limits its usage in several potential real-world scenarios. Could this be a case where the test suite is wrong? Or is there something crucial I'm missing about the semantics here?

asked 07 Nov '12, 18:34

Paul%20Brinkley's gravatar image

Paul Brinkley
733
accept rate: 0%


The problem you are having in understanding the scoping is quite common to people new to SPARQL and stems from the fact that the SPARQL specification defines bottom up semantics. The test case is entirely correct and is there to ensure that SPARQL engines respect the scoping and evaluation rules correctly.

What this means in practice is that SPARQL queries are not procedural rather that they are declarative. So in your example the OPTIONAL is the inner-most operation and thus the first evaluated. The variable ?v is not in scope in the patterns that comprise the Left Join (the algebra equivalent of OPTIONAL) and so the FILTER evaluates to false for all results so no values for ?v2 are preserved.

The best way to understand the scoping of things in SPARQL (esp. if something looks counter-intuitive) is to convert it into the algebra form. So for your example the following algebra is given:

(join
 (bgp (triple :x :p ?v))
   (leftjoin
    (bgp (triple :x :q ?w))
     (bgp (triple :x :p ?v2))
     (= ?v 1)))

NB - I obtained this using the Query Validator at sparql.org and selecting the algebra option.

As you can see this tallies with my explanation above, the filter expression applies within the leftjoin and ?v is not present inside of the leftjoin hence it can never evaluate to true and all values for ?v2 (from the RHS of the leftjoin) will thus be discarded.

The query in question can be easily rewritten to give something nearer to the behavior you expect by moving the filter to the outermost graph pattern:

SELECT *
{ 
    :x :p ?v . 
    { 
      :x :q ?w 
      OPTIONAL {  :x :p ?v2  }
    }
    FILTER(?v = 1)
}

If we compare the algebra for this we can see the difference:

(filter (= ?v 1)
  (join
    (bgp (triple :x :p ?v))
      (leftjoin
        (bgp (triple :x :q ?w))
        (bgp (triple :x :p ?v2)))))

With this query the filter is the outermost operator so it will be evaluated last and ?v will be in scope.

However this won't actually give you the expected answer from your question because now we're eliminating all solutions where ?v isn't 1. Though I think you can get your expected answer by moving the filter and changing the condition to the following:

FILTER( !BOUND(?v2) || ?v = 1 )

Edit

On the topic of global scoping of variables. A mention of some variable ?var always indentifies the same variable throughout the query however when that variable is visible during the course of the evaluation of the query is dependent on the evaluation semantics of SPARQL as I've explained above.

Note that this only applies to SPARQL 1.0, the almost finished SPARQL 1.1 specification introduces sub-queries so it is possible for you to use ?var and have it be a different variable depending on where it occurs in the query.

permanent link

answered 07 Nov '12, 19:36

Rob%20Vesse's gravatar image

Rob Vesse ♦
13.9k1715
accept rate: 29%

edited 08 Nov '12, 05:49

AndyS's gravatar image

AndyS ♦
13.5k37

Hard to imagine a better answer than this; thanks Rob.

While I'm new to SPARQL, I'm fortunately not new to declarative languages, so that much went well. (I'm working on a translator to a KIF variant, so the algebra conversion is quite straightforward, aside from knowing SPARQL semantics details. The translator output wasn't agreeing with the test, but it was agreeing with my reading of the spec.)

Putting it together, then, if for some reason I did want the 21-row result in the OP, I'd add the FILTER you describe, anywhere outside the OPTIONAL in this case, looks like. Right?

(08 Nov '12, 21:44) Paul Brinkley Paul%20Brinkley's gravatar image

It would have to be outside the { } that enclose the block that contains the OPTIONAL (as otherwise ?v would still be out of scope) but in principal yes

(09 Nov '12, 12:04) Rob Vesse ♦ Rob%20Vesse's gravatar image

Just to emphasis a point Rob makes - there is a slightly special case here:

OPTIONAL {  :x :p ?v2 FILTER(?v = 1) }

In this one case, the filter becomes part of the leftjoin condition, not just a FILTER on the { :x :p ?v2 } so it can refer to, for example, ?w defined in the first part of the optional in your example. (Otherwise, the query writer would have to repeat the pattern from the fixed size in the optional into the optional part.)

permanent link

answered 08 Nov '12, 05:48

AndyS's gravatar image

AndyS ♦
13.5k37
accept rate: 33%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×5

question asked: 07 Nov '12, 18:34

question was seen: 788 times

last updated: 09 Nov '12, 12:04