I'm building a small keyword search on a locally installed Virtuoso server containing the DBpedia dump. For example, if the user is searching for Japan, the keyword search will basically implement this query:

SELECT * WHERE {
 ?subject  <http://www.w3.org/2000/01/rdf-schema#label> ?Literal
 filter(REGEX(STR(?Literal) ,'.*Japan.*','i'))
}

The problem is that this query takes too much time to run. I wonder if there's another alternative or best practice to overcome this.

I thought of adding LIMIT and OFFSET to the query and preview to the user results 10 by 10, but i think it will take more time to match to the query OFFSET. What do you think ?

asked 13 Mar '12, 21:07

Hady%20Elsahar's gravatar image

Hady Elsahar
42926
accept rate: 0%

edited 13 Mar '12, 22:39

Signified's gravatar image

Signified ♦
24.0k1623


It's important to understand how such queries are run. Your query will scan through all values of rdfs:label in the data (which is a lot) looking for the substring "Japan". This is never going to be efficient: REGEX is not an alternative for full-text search.

Thankfully, many SPARQL engines support full-text search indirectly, Virtuoso included:

SELECT * WHERE {
  ?subject  <http://www.w3.org/2000/01/rdf-schema#label> ?Literal .
     FILTER bif:contains(?Literal, "Japan")
}

Note again that the bif:contains function is not defined in SPARQL, but is rather a custom function implemented by Virtuoso. This version of the full-text query will run as a lookup on a inverted index (not scanning lots of data) and should be much more efficient.


EDIT Actually, the syntax might be:

SELECT * WHERE {
  ?subject  <http://www.w3.org/2000/01/rdf-schema#label> ?Literal .
  ?Literal bif:contains "Japan" .
}

(I don't have the necessary foo to find the official Virtuoso documentation, but various examples quote the above syntax.)

permanent link

answered 13 Mar '12, 22:33

Signified's gravatar image

Signified ♦
24.0k1623
accept rate: 37%

edited 13 Mar '12, 22:49

that's really fast thank you , the only Questions is does bif:contains a Regex match ? and what do i do if i wanted to match it to a regex ?

(13 Mar '12, 22:48) Hady Elsahar Hady%20Elsahar's gravatar image
1
(13 Mar '12, 22:51) Signified ♦ Signified's gravatar image
3

As Signified has already pointed out bif:* is a Virtuoso "Built In Function" from the SQL realm. Documentation for "contains()" and other SQL functions can be found at - http://docs.openlinksw.com/virtuoso/fn_contains.html

(14 Mar '12, 09:47) Garry Biggs Garry%20Biggs's gravatar image
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×1,328
×277
×204
×13
×9

question asked: 13 Mar '12, 21:07

question was seen: 1,618 times

last updated: 14 Mar '12, 09:47