7
2

I keep an OWL2 ontology in a git repository, and I've noticed that every time I edit and save the file in Protege, there's some non-deterministic behavior.

For example, OWL restrictions appear to be rdf/xml serialized in an arbitrary order each time I save, such that a very large percentage of lines in my .owl file change even when I make a tiny edit. This makes it impossible tell what's happened using git diff, for instance.

Could anyone share or suggest approaches to using Protege with version control? In an ideal world I'd love to be able to make changes in multiple branches and merge -- but short of that, I'd at least like to be able to tell what's changed :-)

Thanks!

asked 13 Dec '11, 14:28

JoshM's gravatar image

JoshM
56528
accept rate: 100%

edited 13 Dec '11, 15:00


This my question is very similar to yours. Thank you for pinpointing the problem.

One should not probably worry too much about internals of version control systems. I have managed to put all versions of NCI Thesaurus (about 8 Gb of RDF/XML) into Mercurial repository. The whole repository size (the .hg folder) is only 112 megabytes. On the other hand, Git stores all versions, not diffs (Git repository with NCI Thesaurus takes about 1 Gb).

The real problem is, as you mentioned, to tell what is changed, and perform a three-way merge. I have done a research and the only tool for diffing OWL ontologies (not RDF graphs) I have found was OWLDiff. It does not have any scripts to integrate with version control systems, although it is possible to write them. Another shortcoming of OWLDiff is that it only compares the logical constituent, i.e. axioms, and does not take into account changes of namespace prefixes, imports, ontology format, etc.

So I had to develop a tool to tame version control systems for developing ontologies. You can download it at http://code.google.com/p/ontovcs. It is still beta and contain some flaws (especially the 3-way merge tool) and I would appreciate any feedback from you, positive or negative.


UPDATE

I have started rewriting the tool which now better matches OWL API.

You can find the latest version of owl2vcs at https://github.com/utapyngo/owl2vcs.

link

answered 14 Dec '11, 02:03

utapyngo's gravatar image

utapyngo
1.9k312
accept rate: 19%

edited 04 Jan '13, 08:44

1

@utapyngo I love the diff functionality -- very simple, very meaningful output computed on the real-life snapshots I fed in. I haven't played with merge yet.

(14 Dec '11, 21:23) JoshM JoshM's gravatar image

In terms of serializing output in a deterministic way, TopBraid Composer currently offers a Sorted Turtle option and some internal tools to find the graph-based differences in triples. The next version of the tool (TBC 3.6 in January) will support two serialization formats for version control systems, Sorted Turtle and c14n.

link

answered 20 Dec '11, 10:21

scotthenninger's gravatar image

scotthenninger ♦
7.5k813
accept rate: 17%

An RDF file serializes a graph. Diffing a graph is fundamentally much more difficult than diffing a line-based text file.

I don't think there's much you can do here really, except ditching Protégé and editing your files by hand in Turtle (and that's not a totally insane suggestion.)

link

answered 13 Dec '11, 15:32

cygri's gravatar image

cygri ♦
9.0k412
accept rate: 34%

Fair enough, @cygri :-) For small edits I do just use vim, and that works well enough. There are efforts out there to provide normalized representations of a graph, and I thought perhaps someone might have applied these techniques to the 'problem' of saving ontologies.

(13 Dec '11, 16:27) JoshM JoshM's gravatar image
1

Protégé could do a better job of serializing its output; it could produce more deterministic output if it sorted triples by ?s and then by ?p, for instance.

(15 Dec '11, 18:28) database_animal ♦ database_animal's gravatar image

If you don't care so much about expressivity you could use obo-format (http://oboformat.org). It has a deterministic serialization. You can see examples of diffs here: http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/ontology/editors/gene_ontology_write.obo?view=log

This answer is 75% tongue-in-cheek as obo-format is deprecated and everyone involved with obo-format is urging users to switch to owl. One of the main obstacles is the diffs.

ontovcs is great, you should upvote @utapyngo. Some way to get this into web SVN views would be great.

I do think that ontologies should be treated like source code however, I don't think a new VCS stack should be developed to fit the IDEs, rather the IDEs should adapt to the stack.

link

answered 16 Jul '12, 12:27

Chris%20Mungall's gravatar image

Chris Mungall
1742
accept rate: 9%

Thank you @Chris. I am currently working on a plugin for Redmine to replace the internal diff view for ontologies and allow searching changes by OWL entities.

(16 Jul '12, 21:05) utapyngo utapyngo's gravatar image
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×649
×191
×10

Asked: 13 Dec '11, 14:28

Seen: 2,458 times

Last updated: 04 Jan '13, 08:44