login about faq

I am trying to load the triples of dbpedia 3.7 dump (english only) in my local virtuoso server, the files are huge when uncompressed, when I'm using SQL commands to load files one by one like:

DB.DBA.TTLP_MT_LOCAL_FILE ('./geo_coordinates_en.nt','', 'geoCoordinates');

it soemtimes returns error inside the files themselves, after doing some research, I found that some urls in the triples are much larger than 1024 characters,

I found that some people run scripts on the files to clean them from such faults, but still some other errors of another kinds appear when uploading triple files to the virtuoso server using the above commands

this is an example of errors when loading the images_en.nt file

*** Error 37000: [OpenLink][Virtuoso ODBC Driver][Virtuoso Server]SP029: TURTLE RDF loader, line 1806041: Invalid characters in angle-bracketed name; this error can be suppressed by parser flags at < http://upload.wikimedia.org/wikipedia/com mons/thumb/e/e3/Guitar %28Zappa%29.jpg/200px-Guitar %28Zappa%29.jpg>

I'm wondering is there's a 'clean' virtuoso-friendly version of the dbpedia dump??

asked Feb 15 at 21:29

sherifkandeel's gravatar image

sherifkandeel
20914

edited Feb 15 at 21:32


The Virtuoso dbpedia endpoint doesn't actually load all of dbpedia. The page Datasets loaded into the public DBpedia SPARQL Endpoint will tell you exactly which files are loaded. I can see from that list that images_en.nt are in fact loaded. Which implies the good folks at Virtuoso must either set some flags to make the load less strict, or they clean the data as a pre-step. I know I once was given a dump of dbpedia data from the folks at OWLIM, which had been cleaned. I asked them for the data set after reading the blog entry Loading DBpedia in a RDF database (e.g. OWLIM).

See also this question: Why do the dbpedia dumps contain data not found in the endpoint?

answered Feb 18 at 23:23

harschware's gravatar image

harschware
6.3k415

if he could post the code of scripts or give the general idea, What I did is downloaded dbpedia dump 3.6 and refined it through code, I wrote some code to exclude faulty links, I think it's only one .nt file is deffected in the 3.6 version

(Feb 19 at 10:34) sherifkandeel sherifkandeel's gravatar image

Hmmm well DBPedia is actually run on Virtuoso so the data may well be dumped out of Virtuoso and even if not they certainly manage to load it.

Have you tried asking on their mailing list - dbpedia-discussion@lists.sf.net - as they might be better placed to help instruct you on how to get the DBPedia dumps loaded.

answered Feb 16 at 12:05

Rob%20Vesse's gravatar image

Rob Vesse ♦
8.6k515

Hi

there are many problems in the dumps: it's difficult to have a general method. Here you find some tips if you are using the nt dumps: Hope this helps.

http://blog.acaro.org/entry/dbpedia4neo

answered Feb 17 at 06:02

seralf's gravatar image

seralf
36215

The following details the use of the Virtuoso RDF Bulk Loader scripts we use for loading DBpedia datasets ...

answered Feb 21 at 09:47

hwilliams_opl's gravatar image

hwilliams_opl
3262

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×120
×115

Asked: Feb 15 at 21:29

Seen: 421 times

Last updated: Feb 21 at 09:47