Hello!
I have 5 different triplestores on my local harddrive which I needed to dump to text files (N-Triples). I tried doing so using tdbdump on a Windows machine. For 3 of the triplestores this was not a problem. The other two give me the following exception:
com.hp.hpl.jena.tdb.base.file.FileException: ObjectFileStorage.read[nodes.dat](5712499)[filesize=115185505][file.size()=115185505]: Impossibly large object : 1768974624 bytes > filesize-(loc+SizeOfInt)=109473002
at com.hp.hpl.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:319)
at com.hp.hpl.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:72)
at com.hp.hpl.jena.tdb.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:178)
at com.hp.hpl.jena.tdb.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:103)
at com.hp.hpl.jena.tdb.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:74)
at com.hp.hpl.jena.tdb.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:103)
at com.hp.hpl.jena.tdb.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:74)
at com.hp.hpl.jena.tdb.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:55)
at com.hp.hpl.jena.tdb.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:67)
at com.hp.hpl.jena.tdb.lib.TupleLib.triple(TupleLib.java:137)
at com.hp.hpl.jena.tdb.lib.TupleLib.triple(TupleLib.java:114)
at com.hp.hpl.jena.tdb.lib.TupleLib.access$000(TupleLib.java:45)
at com.hp.hpl.jena.tdb.lib.TupleLib$3.convert(TupleLib.java:76)
at com.hp.hpl.jena.tdb.lib.TupleLib$3.convert(TupleLib.java:72)
at org.openjena.atlas.iterator.Iter$4.next(Iter.java:301)
at org.openjena.atlas.iterator.Iter$4.next(Iter.java:301)
at org.openjena.atlas.iterator.Iter.next(Iter.java:828)
at org.openjena.atlas.iterator.IteratorCons.next(IteratorCons.java:89)
at org.openjena.atlas.iterator.Iter.sendToSink(Iter.java:572)
at org.openjena.riot.out.NQuadsWriter.write(NQuadsWriter.java:45)
at org.openjena.riot.out.NQuadsWriter.write(NQuadsWriter.java:37)
at org.openjena.riot.RiotWriter.writeNQuads(RiotWriter.java:41)
at tdb.tdbdump.exec(tdbdump.java:49)
at arq.cmdline.CmdMain.mainMethod(CmdMain.java:101)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
at tdb.tdbdump.main(tdbdump.java:31)
The curious thing is that one of the triplestores that cause this exception is a lot smaller than the other ones which could be dumped without any problems. The sizes of the ones that worked are 1.3gb, 0.7gb and 3.75gb. The sizes of the ones that cause the problem are 0.6gb and 6.7gb.
I guess my problem is related to this issue. Due to poor programming on my end the program was terminated without properly closing the triplestore during the population a few times. The suggestion in the referred issue of simply rebuilding the triplestore would work in theory but is not desirable since the triples were collected over an API and it would probably take over a week to do so.
The mentioned issue also points out that it could be a bug of TDB version before 0.9, but I am using 0.9.3. Also I did not use concurrent access (unless I started the program for the triple collection twice, which I am pretty certain did not occur).
So is there anything else that could be done? I already tried running tdbrecovery which didn't help. I also tried iterating the triples using Java, which caused the same problem. My probably very naive first approach was to iterate all the triples in the model and afterwards I tried to reduce the object size by iterating over the subjects and for each subject iterate over the statements.
Any help would be highly appreciated!
asked
21 Nov '12, 19:50
knut_
75●6
accept rate:
0%
This is quite a specialist question. A few Jena devs and contributers hang out around here, but have you tried the mailing list?
http://jena.apache.org/help_and_support/index.html
You might get a faster response and you'll probably get more expert eyeballs on your question through that.
In any case, I wish you luck. Losing data like that sucks.