|
So far I understand how resources can be identified using triples. Subject-Predicate-Object. That is awesome, you can really store all kinds of information this way, and do kinds of funky things. However, I am a little confused on the vocabularies part. Vocabularies are simply a set of URIs that have a specific meaning, right? So someone can say "Luca is a human". The idea is that instead of "Luca", and "is a" and "human" you have URIs that are dereferenceable, such as "http://foo.com/Luca" instead of just "Luca", right? Say I am building a vocabulary about crops. I have things such as "crop name" or "plant height" or "stem rust", and it goes down to very scientific names such as "harvest index" or "suitability for intercropping". What am I suppose to do to make this vocabulary available to the world so that it can be used within someone's RDF data? I imagine that if I publish these descriptors as URIs such as "http://myontology.org/CropName" then someone can use it in their RDF such as:
I've read the w3c best practices on building a vocabulary, but I'm still a bit confused. Wouldn't having URIs for each of these "descriptors" be enough so that they can be used with RDF? What's the purpose of OWL and RDFS? I really don't understand what "Classes" are and "Properties" and if I need to be aware of them when I'm building my vocabulary. But in essence, isn't an RDF vocabulary just a list of URIs? |
In a way, yes. But it is assumed that a vocabulary also assigns a certain meaning to the URIs it is composed of. That is, randomly putting some URIs together would not form a vocabulary. So you need a way to communicate the meaning of terms in a vocabulary. There are many ways you can do that, but one way that is very well adapted to the architecture of the Web is to use a uniform identifier for the term (a URI) and choose it such that it can be dereferenced, and when it's dereferenced it provides all you need to know about the term. This implies that (1) the description of the terms is accessible online and (2) it is discoverable automatically by following links, if a piece of data is using the term's identifier. Then, you have two kinds of descriptions of a term: informal, for human consumption; and formal, for computer processing. Both are very important. The informal description (usually natural language text) allows you, a person, to understand what is the intentional meaning of the term. With this, you know how and when to use the term. With it, you can decide to programme custom code for the data that use the term. For instance, if you see that So there is the formal description, that neither allow the computer to understand what the term means, but that enables it to automate processes. For instance, if a file contains In the end, it's this combination of human understanding of what the terms mean, and the computerised deductions, that makes RDF vocabularies essential.
OWL simply introduces more varied formal relationships between terms, such that more computational deductions can be made. RDFS simply has a limited amount of them. Thanks. Sounds all clear. But sometimes I see that you can download RDF vocabularies as RDF itself. What does this mean? As a data provider, how do I use this RDF dump vocabulary? Wouldn't I just need the de-referenceable URIs? Sometimes I even see vocabularies in .owl format. With relationships. How am I suppose to use these when all I need is the URIs and what they actually mean? 1
What you need for publishing RDF is indeed URIs with a well defined meaning. But what software agents need, for consuming RDF, is discoverability of new URIs by dereferencability, and automatic processing via logical assertions like subClassOf, subPropertyOf, inverseOf, FunctionalProperty, disjointWith etc. These things have to be put in files, and the files be discoverable, and in SemWeb technologies, this is done using RDF as well. I still don't get it. Say there's an RDFS vocabulary made public about crops. And this has certain things such as maybe 1
@lmatteis, reasoners use this information to infer new triples from existing ones (in this example, it could infer that all @Jeen. I see, so essentially you use reasoners to create new triples for you? The whole idea makes sense, but for an implementation point of view it would make more sense to me if the inference (or reasoning) part would be done by the SPARQL implementation. So as a data provider, I simply annotate my data using RDF. I don't have to worry about running a reasoner against all the vocabularies I'm using to infere new triplets from them. The SPARQL implementation could take care of that detail, and it would be easier for each of the parties, no? @lmatteis, if you look at databases like Sesame, OWLIM or Stardog (to name just 3), you can configure the database to use a reasoning strategy, and then you don't have to worry about this afterwards - as soon as you add data to the store, the reasoner automatically kicks in. An alternative approach is to start reasoning when a query is evaluated, but that has the disadvantage that query performance is slower.
showing 5 of 6
show 1 more comments
|
|
If you think of how a dictionary look like, which defines the meaning of a term with the relationship to other terms, than you will better understand that a list of URI will not grasp the meaning just list your terms you use in your language. However once you start defining the connections with simple typed relationships, you will define the meaning. RDFS and OWL will give you tools to be able to tell basic claims about your entities identified by URIs as Antoine already stated. You're saying that I can make relationships within my vocabulary? How is this part used by data providers? Can you give me an example? Are the relationships only useful to the human who's looking at the vocabulary, or do computers make sense of it as well? 1
As @Barna says, in real life, almost all things are defined in relation to something else. Using the example above, what is the relation between a crop and a plant? Well, it's a complex relationship, but RDFS allows us to say 1
... now in practice, say you produce a bunch of data describing crops growing in Zimbabwe and I later want to query your data looking for plants growing in Zimbabwe. Using the semantics of Now scale that idea up to the Web, add a ton of well-defined relationships other than just Great, but in practice how would you implement that? Say some person has a Good question, and it depends. Typically reasoning is up to consumers of data ... i.e., reasoning will be performed prior to or alongside querying. So typically speaking, whatever tool or site you're using for querying will need to have I imagine a data provider mapping their HTML data with RDFa using a specific vocabulary. Essentially all the inference benefits are disabled unless the data provider gets their hands wet with some sort of reasoner. This reasoning part, that generates the extra triples based on the vocabulary, seems like a step that can be automated behind the scenes by the SPARQL query made by the user. Because if you think about it, when a user runs a SPARQL query, they specify in the Why put extra burden on the data provider with a task that can be automated? And imho should be automated? And by the way, is there maybe a web-service online that I can send it a RDF resource and a vocabulary, and it spits out another RDF resource containing all the extra triples based on the vocabulary relationships? that would be quite cool to have, instead of having to install a complicated system like Jena. I feel like most RDF resources are not sitting behind a database with reasoning capabilities, so how do we use this data? well, your idea sounds good, but! Reasoning can be multiple type: based on RDFS or on the many OWL and OWL2 semantics, not mentioning the custom rules one might have. In my opinion reasoning should be on the client side.
showing 5 of 9
show 4 more comments
|

