I have been trying to get a DBpedia resource using keyword search in the SPARQL. I tried FILTER but it caused query timeout everytime I executed query. I was a bit successful using bif:contains function, however, I couldn't get that query working using Jena ARQ API as Virtuoso has not yet opened the SPARQL port for DBpedia yet. The query I'm trying looks something like this

PREFIX dbprop: <http://dbpedia.org/property/>
PREFIX dbpont: <http://dbpedia.org/Ontology>
SELECT *
{   
    ?orgnzn a <http://dbpedia.org/ontology/Organisation>. 
    ?orgnzn rdfs:label ?lbl.
    FILTER(regex(?lbl,"Harvard University","i")).
    ?orgnzn dbpont:City ?city.
    ?orgnzn dbrop:Country ?country
}
LIMIT 5

Description:Find a university whose name matches with ($string), and get me the City and Country of that university. 'university' is an example, but it could be any 'organization'. And the ($string) is multi-word.

How can I get this query working on DBpedia endpoint?

The broad information need: I want retrieve a DBpedia resource of any type of organization and fetch its City & Country information. So this organization could be anything eg 'Harvard University'/'Microsoft'/'Dell',etc. I thought I could get its location info from DBpedia. So do reply if there are other ways/sources to get this info too.

Thanks!

asked 12 Feb '12, 07:42

metaweb87's gravatar image

metaweb87
83438
accept rate: 5%

edited 12 Feb '12, 10:19


Hi

you could try a query like the follow:

PREFIX dbprop: <http://dbpedia.org/property/>
PREFIX dbpont: <http://dbpedia.org/ontology/>

SELECT DISTINCT  ?orgnzn str(?lbl) AS ?lbl str(?city_label) AS ?city str(?country_label) AS ?country
WHERE{

?orgnzn a <http://dbpedia.org/ontology/Organisation>. 
?orgnzn rdfs:label ?lbl.
OPTIONAL{?orgnzn dbpont:city ?city. ?city rdfs:label ?city_label. FILTER(lang(?city_label)='en')}
OPTIONAL{?orgnzn dbpont:country ?country. ?country rdfs:label ?country_label.  FILTER(lang(?country_label)='en')}
FILTER(lang(?lbl)='en')

FILTER(regex(str(?lbl),"Harvard","i")).
#FILTER(?orgnzn = <http://dbpedia.org/resource/Harvard_University>).

}

please note that i use some FILTER statement in order to reduce the results over the language. I also left commented a filter over the specific uri in order to give you an idea on how to execute a specific query for a specifc resource: this could be done -as said before- in order to improve performance. If this way could fits your needs you only have to:

1) retrieve all the uri you are interested into with full-text:

PREFIX dbprop: <http://dbpedia.org/property/>
    PREFIX dbpont: <http://dbpedia.org/ontology/>

    SELECT DISTINCT  ?orgnzn
    WHERE{
    ?orgnzn a <http://dbpedia.org/ontology/Organisation>. 
    ?orgnzn rdfs:label ?lbl.
    FILTER(lang(?lbl)='en')
    FILTER(regex(str(?lbl),"Harvard","i")).
    }

2) then you could programmatically use every one of them in the FILTER statement of this query:

PREFIX dbprop: <http://dbpedia.org/property/>
PREFIX dbpont: <http://dbpedia.org/ontology/>
SELECT DISTINCT  ?orgnzn str(?lbl) AS ?lbl str(?city_label) AS ?city str(?country_label) AS ?country
WHERE{
?orgnzn a <http://dbpedia.org/ontology/Organisation>. 
?orgnzn rdfs:label ?lbl.
OPTIONAL{?orgnzn dbpont:city ?city. ?city rdfs:label ?city_label. FILTER(lang(?city_label)='en')}
OPTIONAL{?orgnzn dbpont:country ?country. ?country rdfs:label ?country_label.  FILTER(lang(?country_label)='en')}
FILTER(lang(?lbl)='en')
FILTER(?orgnzn = <http://dbpedia.org/resource/Harvard_University>).
}
link

answered 12 Feb '12, 12:41

seralf's gravatar image

seralf
71617
accept rate: 14%

edited 12 Feb '12, 15:11

@seralf I understood your concept, and even tried your queries. But am still getting timeout. The part where we try to add filters is failing. The query with single FILTER(lang(?lbl)='en') works, but next one FILTER(regex(str(?lbl),"Harvard","i")) causes time out!

(12 Feb '12, 14:33) metaweb87 metaweb87's gravatar image

uhm it's strange: for me there is no timeout, even if it needs a lot of time to the response (maybe could suffer on the general load of the server? i don't know ). Please consider the idea of identifying the uri (about) of the resources via the first full-text query, and then execute the second query on the specific resource: this executes very well and fast

(12 Feb '12, 14:51) seralf seralf's gravatar image

I'm sorry, its not timeout. Its actually an empty result set; Check the query & URL at http://tinypaste.com/0f910413. Thanks!

(12 Feb '12, 15:04) metaweb87 metaweb87's gravatar image

now it gives me too a timeout, uhm i think it could be because the FILTER queries have big ovehead and the system it's probably too much loaded to handle them in this 'moment'. Another chance you have it's to add some more restriction, if you could, fore example try to search only university, then another kind of organization, and so on...

(12 Feb '12, 15:16) seralf seralf's gravatar image
1

please consider that if i use an Execution timeout all works fine and it uses a few seconds. It's probably an overload issue.

for example: http://tinypaste.com/803f41d6

(12 Feb '12, 15:17) seralf seralf's gravatar image

Thanks @seralf for prompt responses, really appreciate it! The new changes indeed worked. I noticed that you increased timeout to 20000 and I think that made the trick! :-)

(14 Feb '12, 13:29) metaweb87 metaweb87's gravatar image
showing 5 of 6 show 1 more comments

A key to working with DBPedia, or any large RDF store for that matter, is to realize that any string search is very inefficient relative to matching a triple pattern. In your case, I'd suggest using a minimal query to discover the label for the entity you want. Something like:

PREFIX dbprop: <http://dbpedia.org/property/>
PREFIX dbpont: <http://dbpedia.org/Ontology>
SELECT *
{  ?orgnzn a <http://dbpedia.org/ontology/Organisation>. 
   ?orgnzn rdfs:label ?lbl.
   FILTER(fn:starts-with(?lbl,"Harv")).
}

Note that I'm using some heuristics here. I'm pretty sure that it will start with "Harv" and that not many others will start with that string. fn:starts-with will only need to search the first n characters, so that can be used for a more efficient search than regex or contains. Of course, if you don't know if the keyword appears at the beginning of the label or the case isn't known, then you will need to do regex, etc. But this can work in many cases. Note that smaller search strings will, of course, return quicker.

Then do the following to discover what properties are associated with the resource:

PREFIX dbprop: <http://dbpedia.org/property/>
PREFIX dbpont: <http://dbpedia.org/Ontology>
SELECT *
{  ?orgnzn a <http://dbpedia.org/ontology/Organisation>. 
   ?orgnzn rdfs:label "Harvard University"@en .
   ?orgnzn ?p ?o .
} LIMIT 5

From this you will discover that the properties start with a lowercase, as is the custom, e.g. dbprop:city and dbprop:country, and you will discover that others do not have these properties, so you may need to use OPTIONAL and use the properties associated with those resources..

Again, the key here is using SPARQL to iteratively discover how the data is represented in smaller chunks the service is able to process efficiently, then grow the query you are working on, testing for the ability of the service to handle the request as you go.

link

answered 12 Feb '12, 11:57

scotthenninger's gravatar image

scotthenninger ♦
7.5k813
accept rate: 17%

edited 12 Feb '12, 17:05

1

@scotthenninger I didn't get the part of 'using minimal query to discover the entity' working, that is where I am stuck. I know string search are inefficient and time consuming, but then that is first step for me to get the entity and then its other related information. But I get your point of splitting query in smaller chunks to get the faster response, that was helpful! :)

(12 Feb '12, 14:15) metaweb87 metaweb87's gravatar image
1

I edited to address this question.

(12 Feb '12, 17:06) scotthenninger ♦ scotthenninger's gravatar image

Thanks for the changes and useful tips, its working and with that I also learn how to make efficient queries. I didn't try it in code yet, but was wondering though if "fn:starts-with" function is in SPARQL specification or will I get parsing errors like I get for "bif:contains"? But anyway, I'll update here if I get any parsing issues.

(14 Feb '12, 13:38) metaweb87 metaweb87's gravatar image

After looking into some DBpedia resource pages, I also figured that most of the entities have URIs with space replaced by underscore(_) char, but NOT always! So one possible trick could be to replace spaces with underscores to form the DBpedia resource and directly query other details as shown below:

Simple Query Text: Harvard University
New Text: Harvard_University

PREFIX dbprop: <http://dbpedia.org/property/>
PREFIX dbpont: <http://dbpedia.org/ontology/>
SELECT DISTINCT  str(?city_label) AS ?city str(?country_label) AS ?country
WHERE {
OPTIONAL{:Harvard_University dbpont:city ?city. ?city rdfs:label ?city_label. FILTER(lang(?city_label)='en')}
OPTIONAL{:Harvard_University dbpont:country ?country. ?country rdfs:label ?country_label.  FILTER(lang(?country_label)='en')}
}
link

answered 14 Feb '12, 13:51

metaweb87's gravatar image

metaweb87
83438
accept rate: 5%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×1,159
×244
×146

Asked: 12 Feb '12, 07:42

Seen: 2,167 times

Last updated: 14 Feb '12, 13:51