I was playing along with this post, and ended up with running code that I had to halt at 7 minutes due to impatience. I jotted down a Groovy script using Jena to mimic the behavior, and got all the results back in a second or two. Is rdflib known to be slow when it queries over a wire, or is my code using a big inefficiency?
Python 2.7 + rdflib 3.0:
from rdflib import Graph, Namespace, RDF
store = Graph()
store.parse("http://source.data.gov.uk/data/education/bis-research-explorer/2010-03-04/education.data.gov.uk.nt", format="nt")
store.bind("PROJECT", "http://research.data.gov.uk/def/project/")
store.bind("FOAF", "http://xmlns.com/foaf/0.1/")
PROJECT = Namespace("http://research.data.gov.uk/def/project/")
FOAF = Namespace("http://xmlns.com/foaf/0.1/")
for organization in store.subjects(RDF.type, FOAF["Organization"]):
for postcode in store.objects(organization, PROJECT["location"]):
try:
print postcode
store.parse(postcode)
except:
print '404 not found'
Groovy 1.7.5 + Jena 2.6.4:
import com.hp.hpl.jena.rdf.model.*
String url= 'http://source.data.gov.uk/data/education/bis-research-explorer/2010-03-04/education.data.gov.uk.nt'
Model model= ModelFactory.createDefaultModel().read(url, 'N-TRIPLE')
Property rdfType= model.getProperty('http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'type')
Property projectLocation= model.getProperty('http://research.data.gov.uk/def/project/', 'location')
Resource foafOrg= model.getResource('http://xmlns.com/foaf/0.1/Organization')
model.listResourcesWithProperty(rdfType, foafOrg).each { org ->
model.listObjectsOfProperty(org, projectLocation).each { loc ->
try {
println loc
}
catch (Exception e) {
println '404 not found'
}
}
}
asked
03 Feb '11, 15:16
Ryan Kohl
2.4k●3●10
accept rate:
17%