I have 5 different triplestores on my local harddrive which I needed to dump to text files (N-Triples). I tried doing so using tdbdump on a Windows machine. For 3 of the triplestores this was not a problem. The other two give me the following exception:
The curious thing is that one of the triplestores that cause this exception is a lot smaller than the other ones which could be dumped without any problems. The sizes of the ones that worked are 1.3gb, 0.7gb and 3.75gb. The sizes of the ones that cause the problem are 0.6gb and 6.7gb.
I guess my problem is related to this issue. Due to poor programming on my end the program was terminated without properly closing the triplestore during the population a few times. The suggestion in the referred issue of simply rebuilding the triplestore would work in theory but is not desirable since the triples were collected over an API and it would probably take over a week to do so.
The mentioned issue also points out that it could be a bug of TDB version before 0.9, but I am using 0.9.3. Also I did not use concurrent access (unless I started the program for the triple collection twice, which I am pretty certain did not occur).
So is there anything else that could be done? I already tried running tdbrecovery which didn't help. I also tried iterating the triples using Java, which caused the same problem. My probably very naive first approach was to iterate all the triples in the model and afterwards I tried to reduce the object size by iterating over the subjects and for each subject iterate over the statements.
Any help would be highly appreciated!
This size test is usually triggered by a corrupt nodetable in database. You don't have large objects but the node table is broken and data has overwritten a length field. This is a data-write time problem even though it shows up at later at read time.
Not using transactions and crashing the JVM is a possible cause when the data was written. The pre/post 0.9 version comment refers to using transactions. It looks like you are not using transactions at one point in the past, and exited the JVM without sync'ing the caches.
Version 0.9.4 is safer again and has had some systemic user testing for crash situations. It fixed some cases of a bad restore after a crash although these were not on the node table but on indexes. I'm afraid this does not help you directly.
With transactions, you can take a backup of a live database by simply starting a read transaction and writing the triples. tdbdump requires exclusive access to the database as it's a separate JVM.
answered 22 Nov '12, 03:52