I'd like to replicate a local dump of DBpedia. The DBpedia 3.8 downloads page has a lot of links to files in different languages, and so forth. Essentially, I am looking for a list of URLs for NQuads-format files that taken together would constitute a DBpedia dump, containing (at least) all of the data you could achieve through dereferencing RDF from DBpedia URIs.

The closest I could find to such a list was here, but I guess this might miss some dereferenceable information from the ontology/ and property/ namespaces.

Can anyone suggest a better list, or what additional files the list above might miss? (My last resort would be to screen-scrape all .nq links from the homepage, which I'm lazy to do and hope will not be needed.)

asked 18 Nov '12, 17:26

Signified's gravatar image

Signified ♦
24.0k1623
accept rate: 37%

edited 18 Nov '12, 17:26


They used to package it all up for us, a la http://downloads.dbpedia.org/3.7/all_languages.tar, but it looks as if they have stopped doing that. I don't see an all_languages.tar file for 3.8. dbpedia has always been such a moving target... sigh. Short of an all inclusive tar ball, you could go to the top of the dump directory http://downloads.dbpedia.org/3.7/ and run some web crawler downloader thingy on it. I found a list of java based ones at http://java-source.net/open-source/crawlers and Crawler4j claims you could be up and running in 5 minutes.

permanent link

answered 19 Nov '12, 13:39

harschware's gravatar image

harschware ♦
7.7k1616
accept rate: 20%

Thanks! I ended up just sticking with the EN list here:

http://downloads.dbpedia.org/3.8/en/contents-nq.txt

This should be sufficient for the moment. Otherwise yep, I'll probably throw together some script to go through the folder structure.

... seems like they should have something on their end though. The organisation of those downloads has become extremely confusing.

(19 Nov '12, 14:14) Signified ♦ Signified's gravatar image
1

If you are attempting to replicate the public dbpedia endpoint then you will want to see this: http://wiki.dbpedia.org/DatasetsLoaded

(19 Nov '12, 21:42) harschware ♦ harschware's gravatar image

Ah yes, that's pretty much what I was looking for! Thanks!

(19 Nov '12, 21:57) Signified ♦ Signified's gravatar image
1

I recently had need to do this myself, and created a groovy script that helps download everything (well actually it just creates a text file of the files needed and then shows you the wget command to recursively grab it all). https://bitbucket.org/tharsch/dbpediamirror

(16 Jan, 14:43) harschware ♦ harschware's gravatar image
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×276

question asked: 18 Nov '12, 17:26

question was seen: 1,726 times

last updated: 16 Jan, 14:43