Greeting to eveyone! New to Semantic Web. Trying to Use TDBLoader to load a 4 GB RDF/XML file.

[OS] Windows 7 - 64-bit
[Jena Version] 2.7.3
[Java Version] 1.6.0_24-b07, 64-bit
[TDBLoader Command] tdbloader.bat -loc bigRDF myRDF.xml
[TDBLoader Command Output]
................................
10:22:56 WARN    riot    :: {W108} Not an XML Name: '8961d5a3-2964-4373-b53d-02c9f2e764f8'
10:22:56 WARN    riot    :: {W108} Not an XML Name: '7ff4d865-1693-43ed-8a6e-368360006b05'
................................
10:23:02 WARN    riot    :: {W108} Not an XML Name: '1ef2dfba-30cc-4efa-b20b-05b45e979649'
10:23:02 INFO    loader    :: -- Finish triples data phase
10:23:02 INFO    loader    :: 58,064,426 triples loaded in 1,523.02 seconds [Rate: 38,124.51 per second]
10:23:02 INFO    loader    :: -- Start triples index phase
10:23:02 INFO    loader    :: Index SPO->POS: 100,000 slots (Batch: 203,665 slots/s / Avg: 203,665 slots/s)
................................
10:41:48 INFO    loader    :: ** Index SPO->OSP: 58,064,426 slots indexed in 837.14 seconds [Rate: 69,360.14 per second]
10:41:48 INFO    loader    :: -- Finish triples index phase
10:41:48 INFO    loader    :: ** 58,064,426 triples indexed in 1,126.68 seconds [Rate: 51,535.68 per second]
10:41:48 INFO    loader    :: -- Finish triples load
10:41:48 INFO    loader    :: ** Completed: 58,064,426 triples loaded in 2,649.71 seconds [Rate: 21,913.51 per second]

Questions:

  • Should I worry about the warnings? How can I get rid of the warnings?

  • The following files are created under the "bigRDF" folder: GOSP.dat GPOS.dat GSPO.dat OSP.dat OSPG.dat POS.dat POSG.dat SPO.dat SPOG.dat journal.jrnl node2id.idn prefix2id.dat prefixIdx.dat prefixes.dat GOSP.idn GPOS.idn GSPO.idn OSP.idn OSPG.idn POS.idn POSG.idn SPO.idn SPOG.idn node2id.dat nodes.dat prefix2id.idn prefixIdx.idn stats.opt and they are all binary. How would I query them? And where can I find documentation about them?

asked 16 Jan '13, 01:02

Charles%20Li's gravatar image

Charles Li
213
accept rate: 0%

edited 18 Jan '13, 05:29

Rob%20Vesse's gravatar image

Rob Vesse ♦
13.9k1715

Crossed posted to a Jena mailing list,

(16 Jan '13, 05:06) AndyS ♦ AndyS's gravatar image

You can find information about the file formats in the Jena Documentation. The overview of the architecture should get you started there. I do not think an understanding of this file format is necessary. I think it is more important that RDF itself is understood.

There are several options on how to query the TDB you created. The first and probably easiest option for you could be using tdbquery.bat (should be in the same folder as tdbloader.bat).

tdbquery.bat --loc path\to\bigRDF --query file\containing\query.rq

There are more options to tdbquery. Just use --help (there is not much of a documentation) If you were on Linux you would have to use the tdbquery script in the bin/ folder of your jena folder, of course.

Other options would be to use the Jena Java API to Query your triplestore. You could also make the triplestore available over HTTP using Fuseki. This would allow you to use a range of SPARQL GUIs, as suggested here.

I do not know about the severity of the warnings.

permanent link

answered 16 Jan '13, 03:33

knut_'s gravatar image

knut_
75117
accept rate: 0%

edited 16 Jan '13, 03:37

Thanks a lot for the help! I can now query the TDB store using Jena Java API. However, when I tried to use dotNetRDFTools-0.72 and twinkle-2.0 with the same query, they both complained about no file. Do they only accept file? I thought someone mentioned TDB-backed store can also be used.

Thanks again!

(17 Jan '13, 18:33) Charles Li Charles%20Li's gravatar image

You use Jena Fuseki (start it with --loc) for that and access the data over HTTP. They don't understand the database file form directly.

(18 Jan '13, 03:26) AndyS ♦ AndyS's gravatar image

@Charles Li The dotNetRDF Store Manager tool can talk to Jena databases but only via the Fuseki web server - http://jena.apache.org/documentation/serving_data/

(18 Jan '13, 05:31) Rob Vesse ♦ Rob%20Vesse's gravatar image

The warnings are just warnings. They may be an indication of a problem in the data but TDB will work. Exporting data to other applications may be problematic.

permanent link

answered 16 Jan '13, 05:06

AndyS's gravatar image

AndyS ♦
13.4k37
accept rate: 33%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×602
×288

question asked: 16 Jan '13, 01:02

question was seen: 1,355 times

last updated: 18 Jan '13, 05:31