|
I have been trying to get a DBpedia resource using keyword search in the SPARQL. I tried FILTER but it caused query timeout everytime I executed query. I was a bit successful using bif:contains function, however, I couldn't get that query working using Jena ARQ API as Virtuoso has not yet opened the SPARQL port for DBpedia yet. The query I'm trying looks something like this
Description:Find a university whose name matches with ($string), and get me the City and Country of that university. 'university' is an example, but it could be any 'organization'. And the ($string) is multi-word. How can I get this query working on DBpedia endpoint? The broad information need: I want retrieve a DBpedia resource of any type of organization and fetch its City & Country information. So this organization could be anything eg 'Harvard University'/'Microsoft'/'Dell',etc. I thought I could get its location info from DBpedia. So do reply if there are other ways/sources to get this info too. Thanks! |
|
Hi you could try a query like the follow:
please note that i use some FILTER statement in order to reduce the results over the language. I also left commented a filter over the specific uri in order to give you an idea on how to execute a specific query for a specifc resource: this could be done -as said before- in order to improve performance. If this way could fits your needs you only have to: 1) retrieve all the uri you are interested into with full-text:
2) then you could programmatically use every one of them in the FILTER statement of this query:
@seralf I understood your concept, and even tried your queries. But am still getting timeout. The part where we try to add filters is failing. The query with single FILTER(lang(?lbl)='en') works, but next one FILTER(regex(str(?lbl),"Harvard","i")) causes time out! uhm it's strange: for me there is no timeout, even if it needs a lot of time to the response (maybe could suffer on the general load of the server? i don't know ). Please consider the idea of identifying the uri (about) of the resources via the first full-text query, and then execute the second query on the specific resource: this executes very well and fast I'm sorry, its not timeout. Its actually an empty result set; Check the query & URL at http://tinypaste.com/0f910413. Thanks! now it gives me too a timeout, uhm i think it could be because the FILTER queries have big ovehead and the system it's probably too much loaded to handle them in this 'moment'. Another chance you have it's to add some more restriction, if you could, fore example try to search only university, then another kind of organization, and so on... 1
please consider that if i use an Execution timeout all works fine and it uses a few seconds. It's probably an overload issue. for example: http://tinypaste.com/803f41d6
showing 5 of 6
show all
|
|
A key to working with DBPedia, or any large RDF store for that matter, is to realize that any string search is very inefficient relative to matching a triple pattern. In your case, I'd suggest using a minimal query to discover the label for the entity you want. Something like:
Note that I'm using some heuristics here. I'm pretty sure that it will start with "Harv" and that not many others will start with that string. fn:starts-with will only need to search the first n characters, so that can be used for a more efficient search than regex or contains. Of course, if you don't know if the keyword appears at the beginning of the label or the case isn't known, then you will need to do regex, etc. But this can work in many cases. Note that smaller search strings will, of course, return quicker. Then do the following to discover what properties are associated with the resource:
From this you will discover that the properties start with a lowercase, as is the custom, e.g. dbprop:city and dbprop:country, and you will discover that others do not have these properties, so you may need to use OPTIONAL and use the properties associated with those resources.. Again, the key here is using SPARQL to iteratively discover how the data is represented in smaller chunks the service is able to process efficiently, then grow the query you are working on, testing for the ability of the service to handle the request as you go. 1
@scotthenninger I didn't get the part of 'using minimal query to discover the entity' working, that is where I am stuck. I know string search are inefficient and time consuming, but then that is first step for me to get the entity and then its other related information. But I get your point of splitting query in smaller chunks to get the faster response, that was helpful! :) Thanks for the changes and useful tips, its working and with that I also learn how to make efficient queries. I didn't try it in code yet, but was wondering though if "fn:starts-with" function is in SPARQL specification or will I get parsing errors like I get for "bif:contains"? But anyway, I'll update here if I get any parsing issues. |
|
After looking into some DBpedia resource pages, I also figured that most of the entities have URIs with space replaced by underscore(_) char, but NOT always! So one possible trick could be to replace spaces with underscores to form the DBpedia resource and directly query other details as shown below: Simple Query Text: Harvard University
|


