|
I am trying to load the triples of dbpedia 3.7 dump (english only) in my local virtuoso server, the files are huge when uncompressed, when I'm using SQL commands to load files one by one like: DB.DBA.TTLP_MT_LOCAL_FILE ('./geo_coordinates_en.nt','', 'geoCoordinates'); it soemtimes returns error inside the files themselves, after doing some research, I found that some urls in the triples are much larger than 1024 characters, I found that some people run scripts on the files to clean them from such faults, but still some other errors of another kinds appear when uploading triple files to the virtuoso server using the above commands this is an example of errors when loading the images_en.nt file *** Error 37000: [OpenLink][Virtuoso ODBC Driver][Virtuoso Server]SP029: TURTLE RDF loader, line 1806041: Invalid characters in angle-bracketed name; this error can be suppressed by parser flags at < http://upload.wikimedia.org/wikipedia/com mons/thumb/e/e3/Guitar %28Zappa%29.jpg/200px-Guitar %28Zappa%29.jpg> I'm wondering is there's a 'clean' virtuoso-friendly version of the dbpedia dump?? |
|
The Virtuoso dbpedia endpoint doesn't actually load all of dbpedia. The page Datasets loaded into the public DBpedia SPARQL Endpoint will tell you exactly which files are loaded. I can see from that list that images_en.nt are in fact loaded. Which implies the good folks at Virtuoso must either set some flags to make the load less strict, or they clean the data as a pre-step. I know I once was given a dump of dbpedia data from the folks at OWLIM, which had been cleaned. I asked them for the data set after reading the blog entry Loading DBpedia in a RDF database (e.g. OWLIM). See also this question: Why do the dbpedia dumps contain data not found in the endpoint? if he could post the code of scripts or give the general idea, What I did is downloaded dbpedia dump 3.6 and refined it through code, I wrote some code to exclude faulty links, I think it's only one .nt file is deffected in the 3.6 version |
|
Hmmm well DBPedia is actually run on Virtuoso so the data may well be dumped out of Virtuoso and even if not they certainly manage to load it. Have you tried asking on their mailing list - dbpedia-discussion@lists.sf.net - as they might be better placed to help instruct you on how to get the DBPedia dumps loaded. |
|
Hi there are many problems in the dumps: it's difficult to have a general method. Here you find some tips if you are using the nt dumps: Hope this helps. |
|
The following details the use of the Virtuoso RDF Bulk Loader scripts we use for loading DBpedia datasets ... |


