Hello, I'm playing with sparql & regexp on dbpedia.org (dbpedia.org/sparql).

I use a a query to get all movies a specific actor starred in, e.g:

SELECT DISTINCT ?filmName WHERE {
    ?film foaf:name ?filmName .
    ?film dbpedia-owl:starring ?actress .
    ?actress foaf:name "Nicole Mary Kidman"@en.
}

This works fine, unless you forget that she has "Mary" as second name. That's why I use a regexp to solve this problem:

SELECT DISTINCT ?filmName WHERE {
    ?film foaf:name ?filmName .
    ?film dbpedia-owl:starring ?actress .
    ?actress foaf:name ?name.
    FILTER(regex(?name, "Nicole.*Kidman.*", "i"))
}

But this takes a long time to get a result, about 1 minute. Anyone an idea how I can speedup such a query?

asked 08 Jan '13, 10:09

goto's gravatar image

goto
312
accept rate: 0%

edited 08 Jan '13, 10:11

1

I suggest you go back and try your queries again... Speeds have increased dramatically, for all.

In same order as above -- 1. http://bit.ly/1iyAew6; 2. http://bit.ly/1e4z0oE.

(21 Nov '13, 14:49) TallTed TallTed's gravatar image

You can try using the virtuoso build in "bif" functions for text search.

SELECT DISTINCT ?filmName WHERE {
  ?film foaf:name ?filmName .
  ?film dbpedia-owl:starring ?actress .
  ?actress foaf:name ?name.
  ?actress foaf:name ?name2.
  ?name bif:contains "Nicole" .
  ?name2 bif:contains "Kidman" .
}

This is much faster in reality than using filters. Unfortunately, looking at your comment the virtuoso query execution estimator disagrees.

The 2 name variables are to work around a limitation of the bif functions. More general the contains function is normally slightly faster than the regex ones.

SELECT DISTINCT ?filmName WHERE {
  ?film foaf:name ?filmName .
  ?film dbpedia-owl:starring ?actress .
  ?actress foaf:name ?name.
  FILTER(contains(?name, "Nicole"))
  FILTER(contains(?name, "Kidman"))
}

When you have a larger number of actors you can try the following. The theory here is that less variables need to be bound and that they can be passed into a single filter call faster. You will need to measure to see what is faster.

SELECT DISTINCT ?filmName WHERE {
  ?film foaf:name ?filmName .
  ?film dbpedia-owl:starring ?actress .
  ?actress foaf:name ?name.
  FILTER((contains(?name, "Nicole") && contains(?name, "Kidman")) || (contains(?name, "Tom") && contains(?name, "Cruise")))
}
link

answered 08 Jan '13, 10:40

Jerven's gravatar image

Jerven
4.4k610
accept rate: 35%

edited 09 Jan '13, 03:33

bif:contains looks much faster, but doesn't scale: By 2 instead of just 1 actor in the same movie:

Virtuoso 42000 Error The estimated execution time 2114574080 (sec) exceeds the limit of 3000 (sec).

(2114574080 sec = 67 years)

My query:

SELECT DISTINCT ?filmName WHERE {
  ?film foaf:name ?filmName .

  ?film dbpedia-owl:starring ?actress0.
  ?actress0 foaf:name ?name0.
  ?actress0 foaf:name ?name1.
  ?name0 bif:contains "Tom" .
  ?name1 bif:contains "Cruise" .

  ?film dbpedia-owl:starring ?actress1 .
  ?actress1 foaf:name ?name2.
  ?actress1 foaf:name ?name3.
  ?name2 bif:contains "Nicole" .
  ?name3 bif:contains "Kidman" .
}

But contains works a little bit faster, even with more actors. Thank you. :)

(08 Jan '13, 12:23) goto goto's gravatar image
1

I suggest you go back and try your queries again... Speeds have increased dramatically, for all.

In reverse order of the above (within this answer) --

  1. http://bit.ly/I5V4VH; 2. http://bit.ly/1baAb84; 3. http://bit.ly/17RcV9d; 4. http://bit.ly/1iyzY00
(21 Nov '13, 14:50) TallTed TallTed's gravatar image
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×1,161
×244
×9

Asked: 08 Jan '13, 10:09

Seen: 926 times

Last updated: 21 Nov '13, 14:50