Because of log-ability, our project aims at to not to do explicit deletion/modification of triples in our store. Instead, we would like to represent all changes as an "add" operation with timestamp/author associated, similarly to distributed versioning systems, storing the list of change sets. This get rather tricky, when we want to delete/modify something. The triples to be deleted should show up as addition with the label attached "delete", end evaluate the queries based on the latest state.
At the backend, we plan use Sesame with named graphs - each changes set is represented as a separate named graph. But I wonder how could we run SPARQL queries on a list of change-sets (i.e., the named graphs)?
Should we maintain redundantly both the change sets as well as the always-latest evaluation in parallel? The first one for integrity, the second one for query-ability? Or are there other alternatives?
If you have experience on the subject, please share it with us!
asked 23 Mar '10, 11:44
I would definitely keep the current data redundantly!
You could of course theoretically implement something in Sesame that will evaluate queries based on the change set history but I expect that to be awfully slow. And the longer your list of change sets gets the slower it will be.
I had a similar setup for a project where I organised the store like this:
The main data set with the current state of the data is stored in the default graph and can thus be queried easily. Every change to the data to the data potentially creates three new graphs: one for holding the description of the change set, one for holding triples added by the change set and one for holding triples deleted by the change set.
The description of the change set could be done with the change set vocabulary but my requirements were a bit different so I created my own: it doesn't exactly record change sets but user actions ("edits") that have to go through a review process and can be accepted, discarded or fail for other reasons (other edits for example). This design was inspired by MusicBrainz's moderation system. If you're interested in the details, feel free to take a look at the ontology. What it also does it to point at graphs holding the changes, instead of reified triples like the change set vocabulary does.
Whenever the status of such an edit gets changed to "applied" (reflected in its description which is also saved in the store) its changes will be applied to the main data set. You therefore always have an up-to-date and easily queryable view of the data as well as accurate recordings of the history which you could potentially rewind and play back as well (if you keep track of the order in which they were applied) or revert like in a wiki or versioning system.
PS: Also see this related question.