Dear all,

In a new project we are going to design a domain ontology and a triple store, and some advices would be welcome.

The data manipulated will be totally created/updated and controlled in the project, with some links to external data set as geoNames for instance. The data will certainly not be published as linked data.

Given that the RDF data in this project is kind of a closed world, there are some nice features of RDFS or OWL that are certainly useful in many contexts, but maybe not useful or even making things worst in the present case.

The first question that arises here is about properties Range/Domain, and Inverse properties. Given the fact that the T-Box and A-Box is "under control", and so are the SPARQL queries, and given the other important fact that the triple store will handle a lot of triples and user queries (and thus performance is a key point of the solution), I would tend to say that we should avoid properties range/domain and inverse properties that would only lead to an increasing number of non-useful triples. Am I wrong, and even in this context, are there good reason to use those features ?

The second main question is the classical idea that we should use existing ontologies as much as possible. In our case, Dublin Core could be used to express a part of the model only (so using only 2 or 3 metadat from Dublin Core as Title, Creator, etc.), and we will thus have to create a domain ontology anyway. In this context, I would say 'why use Dublin Core' instead of just creating exactly what we need, according to the reasoner we will use and the rules we will carefully select ?
So is there any good reason here to use Dublin Core or even Foaf for our data that will not be published ? If I remember well, Foaf is owl-full and thus it will only bring troubles, can't such a problem arise with other ontologies as Dublin Core ?

And if some day the data are published, it seems never too late to generated a new 'linked data friendly' sets, no ?

[Edition after Jerven's remark] Just want to point out about "correctness first", which of course is a key: the idea of not using domain/range for properties comes from the fact that objects and subjects are all well identified, and so the rdf:type can be an assertion instead of giving more work to the reasoner that could be used for other more important and complexe tasks. But again, my question is not a general question (where domain/range of course do make sense), but a question specific to our project.

About reusing existing ontologies: I am totally convinced that reusing existing ontologies is the key to the semantic web, but that it is a very hard requirement to achieve. Sofar I find it a bit of a mess to find out the corresponding ontology, mixing ontologies together, seeing than strange behavior of reasoning, etc. But the point here is a little different: if modeling a domain for which there is no existing ontology, why create an ontology and also use only 2 or 3 dcterms or foaf terms (the inverse would of course make more sense: there is an ontology which covers 90% of the domain, and we just extend it)

[Edition to answer Richar's remark - why would a reasoner be needed (text too long for a comment)] We are still in the early steps of a project where one of the goal is to see if semantic technologies are of any help (compared to RDB and NoSQL). There is no huge logic in the data that would require a reasoner. But so far we think that the main use of a reasoner would be to facilitate the queries through data which have a long properties "chaining" path: two resources (res1 and res2) are linked together by following properties from res1 to many other resources and finally to res2 - a precise exemple: i am in a shop, I want to buy a shirt, I scan a code and get the origin of the material of my shirt, I get certificates (bio, GMO Free, etc.) and can see the place of production of the raw material for instance. Queries are possible but ugly and maybe slow when it will come to millions of triples and hundreds of users. For queries that have to answer lots of users queries and be efficient, this information needed for the answer will thus be computed at data load time. Then, instead of using transitive properties that would create loads of unnecessary triples all the way from res1 to res2, custom rules will be used to create only the needed triples. For queries which are more rarely issued, following properties paths will not be a problem (the information is there - and fast answer time is not an issue). Does it make sense ?

Thanks a lot for any advice Fabian

asked 05 May '11, 10:42

Fabian%20Cretton's gravatar image

Fabian Cretton
1.1k17
accept rate: 2%

edited 06 May '11, 09:09

So do you actually plan to use a reasoner in the project? What problem are you trying to solve by using the reasoner?

(06 May '11, 06:21) cygri ♦ cygri's gravatar image

my answer as edit of the original post

(06 May '11, 09:09) Fabian Cretton Fabian%20Cretton's gravatar image

Think about correctness first, performance later. It is easy to increase performance when needed (buy bigger machine). It is much harder to go back and correct wrong data (spend lots of man hours fixing data).

Secondly on the reuse of existing ontologies.

  1. When someone new comes onto the project they are more likely to know the general existing ontologies rather than the internal ones. e.g. reducing training costs.
  2. The ontology is already specified, copying is cheaper than rewriting.
  3. You might later integrate existing tools which deal nicely with the existing ontologies, but not your own. Losing value.
  4. When you have a disjoint dataset from the public you have to do a lot of work at once to make it public ready. While if you have it public ready everyday you can publish when you want.Publish also means sell to the next organization who can use it from day one with minimal integration costs.
  5. owl-full is not really a problem in the foaf terms. Owl-full things will be ignored by most reasoners and wont even attempt to reason those parts.
  6. When someone comes and tells you to integrate data from an external source you will need to map their creator with your creator anyway. So start at the beginning with reusing theirs i.e. the common one dcterms:creator.
link

answered 05 May '11, 11:41

Jerven's gravatar image

Jerven
4.5k610
accept rate: 34%

Thanks a lot for your remarks

(06 May '11, 02:45) Fabian Cretton Fabian%20Cretton's gravatar image
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×596
×136
×85
×16

Asked: 05 May '11, 10:42

Seen: 1,937 times

Last updated: 02 Nov '12, 08:45