|
I got some drawings by Rembrandt in a British Museum database, and some paintings by Rembrandt in a RKD database. He's referred to as The current representation is:
Use cases:
Aside: If you look in VIAF, you'll see Rembrandt correlated between 19 sources, including national libraries and Getty ULAN. There's a thousand names and a bunch of extra info about him. If you look at the VIAF RDF record, you'll see he's represented as a
There are also some owl:sameAs links equating the
It would have been great if the BM and RKD thesauri were correlated to VIAF but they're not. end-aside The question: which is the best way to correlate I know that both the SKOS Primer and Reference say one should use
If I use
Like this:
What's your advice? |
|
owl:sameAs has potentially unwelcome inferences. Imagine that I have a term "Monkey" which was added to my thesaurus today, and I claim it's owl:sameAs the concept "Monkey" in a dictionary that somebody else published many years ago.
The skos:Concept for a monkey doesn't represent an actual monkey, or the class of all monkeys. Think of it more as representing an entry in an a thesaurus, dictionary, encyclopaedia or catalogue. I understand how owl:sameAs can wreak havoc to thesaurus maintenance data, but my concern is as a user of thesaurus data. Should I use the propertyChainAxiom trick described above? BTW, SKOS can span across ConceptScemes: a concept can have several inScheme, and broaderMatch creates hierarchies spanning two schemes |
|
The matter has been discussed many times on W3C lists, I won't tell a lot more. The short answer is that in the SKOS context, SKOS matching properties make much sense than the owl:sameAs one. owl:sameAs was existing at the time we finished SKOS, if it had been better we'd have picked it. The arguments in the SKOS Primer (http://www.w3.org/TR/skos-primer/#secmapping) are not as weak as you claim. These are example where using owl:sameAs results in invalid SKOS data. And not the kind of SKOS data only used for vocabulary management: having non-unique prefLabels will break many data consumption scenarios. Side notes: - Europeana is using owl:sameAs, yes, but not between SKOS Concepts - there is a foaf:focus property which can be interesting to links SKOS Concepts to "real" entities they represent. Ok Antoine, I see foaf:focus explained at http://lists.w3.org/Archives/Public/public-esw-thes/2010Aug/0002.html and http://xmlns.com/foaf/spec/#term_focus So if bm: is a BM tehsaurus and rkd: is an RKD thesaurus, it would look as:
And when we represent data in eg CRM we need to use dbpedia:
instead of The discussion so far basically comes down to:
Neither of which is a quite satisfactory answer for me... Well, if you regard sameAs as the most efficient way to equate individuals, then this hints that you've already evaluated that you can live with its drawbacks, or that you are not applying its full semantics (which I'd find surprising given your background) or that you do it with some "protection" (say, named graphs). If yes, then of course you can try to use it and see whether there are bad consequences for the case. I've said that in general it seems dangerous for SKOS cases, and we felt we had to provide with an alternative with a lower ontological commitment that would fit better the observed data and be somehow safer. I've never said that you can't use sameAs. Yes, you can do like that. In fact I'd say that you can also state rkd:Rembrandt crm:P14i_performed rkd/painting/2926/production. But that's because I'm comfortable with having a resource being both a skos:Concept and a (say) foaf:Person at the same time--the SKOS model does not say that concepts are distinct with persons. I know other people would strongly object to it. It's up to you. |
|
Overall I think owl:sameAs, with standard semantics, is not suitable for exchange across boundaries between semantic systems. If, for instance, you're scraping triples off the floor and throwing them into a processing chain, owl:sameAs will definitely give you entailments you don't work. Now, inside a perimeter that you control, the story is different. owl:sameAs behaves a particular way in your triple store and if you like what owl:sameAs does, then go ahead and use it. I say it that way, because you're using OWLIM, and OWLIM implements owl:sameAs in a way that may or may not be standards correct or mathematically correct but that is certainly "correct" for building real applications. The case for owl:sameAs is much weaker if you use other tools. Personally I've dealt with these problems by creating a wrapper that normalizes identifiers that cross the system perimeter but this doesn't address all the problems involved when two concepts in the KB get merged, |
|
Antoine, I am also comfortable with having a person be both Person and skos:Concept, but if I cannot use sameAs then there's little value in it being a skos:Concept. Seems to me VIAF got it right First they have a main URI that's foaf:Person and all URIs used in data (eg national library bibliographies) are given as sameAs (or "=" in Turtle):
This would mean that none of the source URIs is a skos:Concept. Then they copy all labels from all the sources to foaf:name. Unfortunately they lose preferredness info, but I don't know how could they pick one of the source's prefLabel as globally preferred...
Finally they have one skos:Concept per source, with foaf:focus to the main URI:
So they use skos:Concept and foaf:focus only for "thesaurus bookkeeping info", but it seems the intention is to use the main URI (and sameAs source URIs) in business data. The BnF data model also uses foaf:focus:
So... the answer is: don't use skos:Concept in business data, use other URIs that are amenable to sameAs. Or else, implement extra rules that propagate business relations across skos:exactMatch. Note: it seems to me we need to assume the following rule:
Yes, the VIAF pattern is a good one, but we had seen it already before. Note that it won't solve your all owl:sameAs problems anyway. Whatever be the types of URI, owl:sameAs may have unintended effect when reconcialiating descriptions with overlapping statements. On "business data", I'm not sure I get your interpretation: there's not a business level and a non-business level. If the SKOS layer does not fit any business case then don't express the data for it. The rule seems interesting indeed. Some kind of "co-denotation" axiom. But as SKOS always refused to endorse a strictly extensional approach to concept definition, this axiom would need to be endorsed by the creators of the "extension" property, ie foaf:focus. 1
I tried to separate and assign comments as best I could. Please keep answers as answers to the original question folks, thanks. If you want to comment on an answer, look for the "add a comment" button, not the "Post Your Answer" button. |



Maybe I haven't expressed my concern well...
IMHO the main purpose of thesauri is to provide controlled (well-known) URIs for things, to be used in business data. The internal organization of a thesaurus is a secondary concern.
Notwithstanding that sameAs can lead to unintended consequences, that's the standard way to say two things are the same. Eg the Europeana enrichnment (links) uses sameAs to geonames, dbpedia etc.