login about faq
3
1

Probably a very simple question: I have multiple RDF file dumps that I want to merge into a single file dump. Thus, what is the easiest way to merge multiple RDF files from the command line, preferrably with libraptor?

I think there must be something more clever that just serializing (by libraptor) every file to N-Triples, then concatenating the files and converting the result to some other, more readable, serialization.

I have searched through libraptor's man page but I haven't found how to use it with multiple file input.

asked Jan 25 '11 at 09:48

jindrichm's gravatar image

jindrichm
966110


Just concatenating files with NTriples does not work, because blank nodes must not be shared among different RDF graphs. The fasted solution is probably based on serializing to NTriples, renaming blanks, and getting unique NTriple lines. I just wrote my own Perl script, mainly to import RDF data into some Triple store. You can import multiple RDF files into an in-memory store and serialize them afterwards. But there should exist other tools for merging RDF data (?).

Edit: rapper could get an additional parameter -x to scramble blank node identifiers (e.g. XOR-ing with a GUID that is unique per input file), and it could bet extended to work with multiple input files. You could than convert multiple files to NTriples and pipe the output such as:

rapper -o ntriples -g -x -I 'file://' - LIST_OF_FILES | sort | uniq

Maybe David Beckett is willing to implement such feature, or who wants to dig into the C code?

answered Jan 25 '11 at 12:20

Jakob's gravatar image

Jakob
1.5k10

edited Jan 26 '11 at 09:09

Yes, the simple concatenation does work only in cases you're not using blank nodes. I have also written a small script that loads multiple RDF files into an in-memory RDF store and then serializes them; but, I think, there must be an easier solution.

(Jan 25 '11 at 12:51) jindrichm jindrichm's gravatar image

What do you mean by "easier solution"? Writing or extending an NTriple parser that replaces blank-node IDs for each input file should not be that hard - the NTriples-approach is sure faster than loading the RDF into any store. I only wonder that there is no such tool yet. So an easy solution would be if someone implemented it into an existing widespread RDF toolset ;-)

(Jan 25 '11 at 20:26) Jakob Jakob's gravatar image

By something "easier" I mean something that is already built. Meanwhile, I have already written a simple merge script with an in-memory store (in Python with RDFlib) myself. But I think it would be better if there were a command line tool such as libraptor or cwm that would be able to do it.

(Jan 26 '11 at 08:33) jindrichm jindrichm's gravatar image

Is the "-x" option available only in newer version of libraptor? I use version 1.4.21 and it prints the option is invalid.

(Jan 26 '11 at 09:17) jindrichm jindrichm's gravatar image

The "-x" option does not exist, it is just a suggestion to implement it in rapper instead of each of us creating his own script.

(Jan 26 '11 at 11:16) Jakob Jakob's gravatar image

Ahh, I misread the your edit.:-)

(Jan 26 '11 at 16:26) jindrichm jindrichm's gravatar image
showing 5 of 6 show all

There is rdfcat, which lives as an executable tool in the "bin/" folder of the Jena distribution.

answered Jan 26 '11 at 10:24

Michael%20Schneider's gravatar image

Michael Schneider
4.7k1411

Thanks, this seems to be easy enough.

(Jan 26 '11 at 11:01) jindrichm jindrichm's gravatar image

Are you happy with this? I end up with ClassNotFoundException as always if I give a Java application a try. Unless there you can easily install some software as Debian package, I consider it as not usable.

(Feb 10 '11 at 13:23) Jakob Jakob's gravatar image
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×424
×13

Asked: Jan 25 '11 at 09:48

Seen: 998 times

Last updated: Jan 26 '11 at 10:24