So far I understand how resources can be identified using triples. Subject-Predicate-Object. That is awesome, you can really store all kinds of information this way, and do kinds of funky things. However, I am a little confused on the vocabularies part. Vocabularies are simply a set of URIs that have a specific meaning, right? So someone can say "Luca is a human". The idea is that instead of "Luca", and "is a" and "human" you have URIs that are dereferenceable, such as "http://foo.com/Luca" instead of just "Luca", right?

Say I am building a vocabulary about crops. I have things such as "crop name" or "plant height" or "stem rust", and it goes down to very scientific names such as "harvest index" or "suitability for intercropping". What am I suppose to do to make this vocabulary available to the world so that it can be used within someone's RDF data?

I imagine that if I publish these descriptors as URIs such as "http://myontology.org/CropName" then someone can use it in their RDF such as:

<http://foo.org/MyCrop> <http://myontology.org/CropName> "Cassava" .

I've read the w3c best practices on building a vocabulary, but I'm still a bit confused. Wouldn't having URIs for each of these "descriptors" be enough so that they can be used with RDF? What's the purpose of OWL and RDFS? I really don't understand what "Classes" are and "Properties" and if I need to be aware of them when I'm building my vocabulary.

But in essence, isn't an RDF vocabulary just a list of URIs?

asked 29 Jan '13, 03:50

Luca%20Matteis's gravatar image

Luca Matteis
accept rate: 13%

edited 29 Jan '13, 03:52

But in essence, isn't an RDF vocabulary just a list of URIs?

In a way, yes. But it is assumed that a vocabulary also assigns a certain meaning to the URIs it is composed of. That is, randomly putting some URIs together would not form a vocabulary. So you need a way to communicate the meaning of terms in a vocabulary. There are many ways you can do that, but one way that is very well adapted to the architecture of the Web is to use a uniform identifier for the term (a URI) and choose it such that it can be dereferenced, and when it's dereferenced it provides all you need to know about the term.

This implies that (1) the description of the terms is accessible online and (2) it is discoverable automatically by following links, if a piece of data is using the term's identifier.

Then, you have two kinds of descriptions of a term: informal, for human consumption; and formal, for computer processing. Both are very important.

The informal description (usually natural language text) allows you, a person, to understand what is the intentional meaning of the term. With this, you know how and when to use the term. With it, you can decide to programme custom code for the data that use the term. For instance, if you see that foaf:Person is intended to mean the class of all persons, you may programme a user interface that will display data about foaf:Persons as in a social network profile page. However, this kind of description does not inform the computer at all about what the term means, nor how to deal with it.

So there is the formal description, that neither allow the computer to understand what the term means, but that enables it to automate processes. For instance, if a file contains ex:Prosthodontist rdfs:subClassOf foaf:Person, the programme does not need to know anything about prosthodontists, nor even persons, to be able to conclude that any processes that are specific to instances of foaf:Person must also be applied to instances of ex:Prosthodontist.

In the end, it's this combination of human understanding of what the terms mean, and the computerised deductions, that makes RDF vocabularies essential.

What's the purpose of OWL and RDFS?

OWL simply introduces more varied formal relationships between terms, such that more computational deductions can be made. RDFS simply has a limited amount of them.

permanent link

answered 29 Jan '13, 08:02

Antoine%20Zimmermann's gravatar image

Antoine Zimm... ♦
accept rate: 34%

Thanks. Sounds all clear. But sometimes I see that you can download RDF vocabularies as RDF itself. What does this mean? As a data provider, how do I use this RDF dump vocabulary? Wouldn't I just need the de-referenceable URIs? Sometimes I even see vocabularies in .owl format. With relationships. How am I suppose to use these when all I need is the URIs and what they actually mean?

(29 Jan '13, 08:13) Luca Matteis Luca%20Matteis's gravatar image

What you need for publishing RDF is indeed URIs with a well defined meaning. But what software agents need, for consuming RDF, is discoverability of new URIs by dereferencability, and automatic processing via logical assertions like subClassOf, subPropertyOf, inverseOf, FunctionalProperty, disjointWith etc. These things have to be put in files, and the files be discoverable, and in SemWeb technologies, this is done using RDF as well.

(29 Jan '13, 14:42) Antoine Zimm... ♦ Antoine%20Zimmermann's gravatar image

I still don't get it. Say there's an RDFS vocabulary made public about crops. And this has certain things such as maybe myontology:Crop rdfs:subClassOf myontology:Plant. How do RDF consumers (or data providers) use this information (the fact that a Crop is a subclass of Plant)? Is it simply just a way to acknowledge the human RDF consumers that Crop is a subclass of Plant, or is there some other thing I'm missing that allow computers to use these relationships? If so, how and where do they use it?

(29 Jan '13, 14:49) Luca Matteis Luca%20Matteis's gravatar image

@lmatteis, reasoners use this information to infer new triples from existing ones (in this example, it could infer that all X rdf:type :Crop are also X rdf:type :Plant). It depends a bit on the tool you use, but many reasoning implementations work by inferring new triples and just inserting these into the triplestore alongside the 'explicit' triples, so that when you later do a SPARQL query, the inferred information is used to answer the query as well.

(30 Jan '13, 14:18) Jeen Broekstra ♦ Jeen%20Broekstra's gravatar image

@Jeen. I see, so essentially you use reasoners to create new triples for you? The whole idea makes sense, but for an implementation point of view it would make more sense to me if the inference (or reasoning) part would be done by the SPARQL implementation. So as a data provider, I simply annotate my data using RDF. I don't have to worry about running a reasoner against all the vocabularies I'm using to infere new triplets from them. The SPARQL implementation could take care of that detail, and it would be easier for each of the parties, no?

(30 Jan '13, 15:47) Luca Matteis Luca%20Matteis's gravatar image

@lmatteis, if you look at databases like Sesame, OWLIM or Stardog (to name just 3), you can configure the database to use a reasoning strategy, and then you don't have to worry about this afterwards - as soon as you add data to the store, the reasoner automatically kicks in. An alternative approach is to start reasoning when a query is evaluated, but that has the disadvantage that query performance is slower.

(30 Jan '13, 20:29) Jeen Broekstra ♦ Jeen%20Broekstra's gravatar image
showing 5 of 6 show 1 more comments

If you think of how a dictionary look like, which defines the meaning of a term with the relationship to other terms, than you will better understand that a list of URI will not grasp the meaning just list your terms you use in your language. However once you start defining the connections with simple typed relationships, you will define the meaning. RDFS and OWL will give you tools to be able to tell basic claims about your entities identified by URIs as Antoine already stated.

permanent link

answered 29 Jan '13, 11:27

Barna's gravatar image

accept rate: 0%

You're saying that I can make relationships within my vocabulary? How is this part used by data providers? Can you give me an example? Are the relationships only useful to the human who's looking at the vocabulary, or do computers make sense of it as well?

(29 Jan '13, 14:42) Luca Matteis Luca%20Matteis's gravatar image

As @Barna says, in real life, almost all things are defined in relation to something else. Using the example above, what is the relation between a crop and a plant? Well, it's a complex relationship, but RDFS allows us to say my:Crop rdfs:subClassOf my:Plant, such that anything that's considered to be a my:Crop must also be considered to be a my:Plant. That's a start. You've created a formal relationship between the two terms: namely rdfs:subClassOf. The semantics (meaning) of rdfs:subClassOf is precisely defined in the RDF Semantics standard.

(30 Jan '13, 14:51) Signified ♦ Signified's gravatar image

... now in practice, say you produce a bunch of data describing crops growing in Zimbabwe and I later want to query your data looking for plants growing in Zimbabwe. Using the semantics of rdfs:subClassOf, a reasoner can automatically figure out that all the crops you speak about should be included in my results (without any intervention on my part!).

Now scale that idea up to the Web, add a ton of well-defined relationships other than just rdfs:subClassOf, and include dereferenceable URIs. If you can picture that, that's roughly the original Semantic Web vision.

(30 Jan '13, 14:53) Signified ♦ Signified's gravatar image

Great, but in practice how would you implement that? Say some person has a crops.rdf resource with lots of triples that use mycropontology. And I want to query this data for all mycropontology:Plant instances. However the crops.rdf resource only uses mycropontology:Crop instead of :Plant. In fact there's no sight of :Plant in the crops.rdf. It's only defined in mycropontology vocabulary where :Crop is a subClassOf it. As a user, what do I do to achieve my original query? Do I have to fire up a reasoner to make these inductions, or does the data provider do the work?

(30 Jan '13, 15:35) Luca Matteis Luca%20Matteis's gravatar image

Good question, and it depends. Typically reasoning is up to consumers of data ... i.e., reasoning will be performed prior to or alongside querying. So typically speaking, whatever tool or site you're using for querying will need to have crops.rdf and mycroontology loaded (or to be able to dereference them) and will need to run (at least) RDFS reasoning. In other words, it's typically up to the provider to provide the raw data ... it's up to the consumer to make full use of it.

(30 Jan '13, 22:48) Signified ♦ Signified's gravatar image

I imagine a data provider mapping their HTML data with RDFa using a specific vocabulary. Essentially all the inference benefits are disabled unless the data provider gets their hands wet with some sort of reasoner. This reasoning part, that generates the extra triples based on the vocabulary, seems like a step that can be automated behind the scenes by the SPARQL query made by the user. Because if you think about it, when a user runs a SPARQL query, they specify in the FROM field the RDF resource. Which contains all the URIs for the vocabularies that it can therefore reason upon.

(31 Jan '13, 04:46) Luca Matteis Luca%20Matteis's gravatar image

Why put extra burden on the data provider with a task that can be automated? And imho should be automated?

(31 Jan '13, 04:49) Luca Matteis Luca%20Matteis's gravatar image

And by the way, is there maybe a web-service online that I can send it a RDF resource and a vocabulary, and it spits out another RDF resource containing all the extra triples based on the vocabulary relationships? that would be quite cool to have, instead of having to install a complicated system like Jena. I feel like most RDF resources are not sitting behind a database with reasoning capabilities, so how do we use this data?

(31 Jan '13, 04:53) Luca Matteis Luca%20Matteis's gravatar image

well, your idea sounds good, but! Reasoning can be multiple type: based on RDFS or on the many OWL and OWL2 semantics, not mentioning the custom rules one might have. In my opinion reasoning should be on the client side.

(31 Jan '13, 05:10) Barna Barna's gravatar image
showing 5 of 9 show 4 more comments
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:


question asked: 29 Jan '13, 03:50

question was seen: 2,205 times

last updated: 31 Jan '13, 05:10