|
If I wanted to do semantic web application development in [some obscure language] and nothing is currently available, where would I go to find out how they work and how to build a production-quality triple store in my obscure language? What API standards are out there that I ought to adhere to? Specifically, How do I go about indexing triples efficiently in terms of space and search time? Are B-Tree derivative data structures what I should look at, or is something else better? What optimisations are known for compaction and for optimising, say, data retrieval to support reasoners and SPARQL queries? |
|
Thomas Neumann and Gerhard Weikum have put a lot of thought into your question relative to development of their RDF-3X engine. The following document includes performance testing against multiple data sets, info on index optimizations, triples compression, etc.:
link
This answer is marked "community wiki".
|
|
I recently wrote an article about the way we implemented a triple store at our company, Procurios. I go into the low-level details of implementing it in a relational database: Semantic web marvels in a relational database - part I: Case Study 1
Hi Patrick, Thanks for the links. Although I am specifically interested in the design of a dedicated triple store, I'm also very interested in how your approach compares to such a solution in terms of space and time complexity for a standard query benchmark. Do you have such comparison figures yet? Also - just out of interest - what was your implementation language? Have you tried to use this store with an inference engine? how did it perform for that sort of use? Hi Andrew. PHP is our implementation language. Since we don't use a semantic web query language its hard to compare performance with other stores. The article is just to give you some ideas. Set up several test situations and create your own performance stats, is what I would advice. |
|
A search on Google brought up the paper "Design and implementation of an RDF Triple Store". If you're more like the source code reader type of guy, you mabe want to take a look at the sources of TDB, which is the native storage engine in Jena, or also at the sources of Sesame. Update: Another place where you maybe could learn something about the topic is the BigData project. They write a lot about technical details in their blog, and it's open source, too, so you can take a look at the sources, too. I read the paper. Not very impressive. It does score points in the (some obscure language) stakes though. );^}> |
|
You might have a look at a recent paper dipLODocus Here's the introduction:
|
|
http://www.openlinksw.com/weblog/oerling I've found it hard to follow at times, but you get the idea that Orri has thought a lot about it. There have been some academic(-ish) papers here and there. I recently read a good one about a distributed triple store. I think it was about 4store(.org) but I can't remember where I found it. Anyone else know? Otherwise, you probably have to ping the people that have built them for ideas. For instance, in the SemWeb.NET [1] triple store that I built, I found a simple MySQL structure [2] worked well enough to scale to 1B triples, though it was very space-hungry with many indexes. [1] http://razor.occams.info/code/semweb/ [2] http://razor.occams.info/code/repo/?/semweb/src/SQLStore.cs |
|
Intellidimension uses a rather intuitive solution (maybe others, too) and they said it has a major impact on performance. They maintain two triple tables:
The big idea is that the second table not only has truely fixed width (content of TEXT types cells are stored outside of the table, which has a rather bad effect), but also very narrow. They claim that the width of a SQL tables strongly influence performance. Before executing a query, they calculate the MD5 value of the strings in the query and execute the query with the hashed values on Table 2. This way, they get the rows complying to the query in a performant way, and retrieve the real values from Table 1. |
|
Get 'Programming the Semantic Web' by Toby Segaran et. al. http://shop.oreilly.com/product/9780596153823.do The authors build a simple triplestore using Python and explain how it works. Then they use RDFLib and SQLite before moving on to Sesame, Jena etc. |

