6
3

Deutsche Nationalbibliothek (German National Library) and hbz are working on a service1 which aims to facilitate the linking of data from different catalogues (whether library catalogues or union catalogues) by providing matching information about different records.

For this project we - amongst other things - need to link catalog entries to the bibliograhic identifiers in them. (There are a lot of identifiers in the realm of bibliographic data: widely known identifiers like ISBN, ISSN, DOI, URN, Handle, OCLC number, LCCN and many more local, regional and national identifiers. Some of them identify a bibliographic record while others identify the described bibliographic entity - which doesn't make it easier in a LOD context.)

The question here is how to represent a link in RDF from a record or a bibliographic resource to the associated identifiers. We have to decide between 1.) using individual predicates for each identifier or 2.) using the global predicate dc:identifier and characterizing the identifier more precisely through an xsd data type.

Here is an example for each approach:

1. The "predicate approach"

@prefix biro:  <http://purl.org/spar/biro/> .
@prefix cg:    <http://culturegraph.org/vocab/example/> .
@prefix dc:    <http://purl.org/dc/elements/1.1/> .

<http://opac.bib-bvb.de:8080/InfoGuideClient.fasttestsis/start.do?Query=-1="BV035542944>
  a biro:BibliographicRecord ;
  dc:source <http://lobid.org/collection/bvbcatalog> ;
  cg:bvn "BV035542944" ;
  cg:oclcn "991052625" ;
  ex:describes [
  cg:isbn "3-8273-2774-1"
  ] .
# ex:describes is because there is no inverse property to wdrs:describedby. 
# Does anybody know an appropriate predicate for this?

2. The "datatype approach"

@prefix biro:  <http://purl.org/spar/biro/> .
@prefix dc:   <http://purl.org/dc/elements/1.1/> .

<http://opac.bib-bvb.de:8080/InfoGuideClient.fasttestsis/start.do?Query=-1="BV035542944>
  a biro:BibliographicRecord ;
  dc:source <http://lobid.org/collection/bvbcatalog> ;
  dc:identifier "BV035542944"^^xsd:BVN ;
  dc:identifier "991052625"^^xsd:OCLCN ;
  ex:describes [
  dc:identifier "3-8273-2774-1"^^xsd:ISBN
  ] .

What do you think is the way to go: Creating an XSD or a vocabulary?

Adrian

asked 10 Mar '11, 14:28

acka47's gravatar image

acka47
9614
accept rate: 100%

2

I just came upon the "Custom Datatype" part of Ian Davis' and Leigh Dodds' book "Linked Data Patterns" which gives a very good discussion of this question. It really helped me to understand this even better than after all your answers.

(04 Apr '12, 16:15) acka47 acka47's gravatar image

All I can say about the XSD is ouch!

(04 Apr '12, 16:19) database_animal ♦ database_animal's gravatar image

The predicate approach. Make them subproperties of dc:identifier. Custom datatypes are awkward and not used much in practice for that reason.

permanent link

answered 11 Mar '11, 00:59

cygri's gravatar image

cygri ♦
9.0k413
accept rate: 34%

I would prefer to define specific sub properties of dc:identifier for that issue, e.g., as it is already done in the Bibliographic Ontology*, for example, bibo:isbn. I guess, the query time would be a bit shorter in this case.
Furthermore, I do not really understand, why you would separate the ISBN identifier. I think this is not really necessary.

*) Their super property for that issue is called bibo:indentifier (who knows why it is not aligned to dc:indentifier)

PS: You do not create an "XSD". However, you utilize an identifier for a datatype. This can be a predefined one from the XSD namespace or a selfdefined one from your own (/any other) namespace.

PPS: If you are not already aware of this initiate, the W3C Library Linked Data Incubator Group might be interesting for you.

permanent link

answered 10 Mar '11, 14:41

zazi's gravatar image

zazi
3.4k1213
accept rate: 13%

edited 10 Mar '11, 14:46

Fully agree with having a look at bibo. Its an excellent ontology and the group is very responsive.

(10 Mar '11, 15:43) Jerven ♦ Jerven's gravatar image
1

Thanks, I will contact both the bibo mailing list and W3C LLD-Group and call their attention to this question. @zazi:"Furthermore, I do not really understand, why you would separate the ISBN identifier." --> I do this because in the LOD world you differ between a bibliographic resource and the bibliographic record that describes it. An ISBN identifies a bibliographic resource while identifiers in a catalog identify the record not the resource. Although there is no difference made in traditional cataloging (and Bibo doesn't make this difference too), I think it is reasonable...

(10 Mar '11, 19:15) acka47 acka47's gravatar image

Actually, if you use literal identifiers, it doesn't matter that much whether you enrich them with a property or a datatype. In both cases it requires additional domain knowledge to make use of them beyond a simple display for human consumption. You could infer some kind of identity if two resources have the same identifier of the same datatype and/or the same property, but I doubt that this is done for the general case in practice. But there is also:

3. The "URI approach"

There already are URI forms of many identifier systems and for the rest you can create URI schemes. Not all of them are HTTP URIs, but preferring literal strings over non-HTTP-URIs seems to contradict the whole idea of RDF.

Not-so-good practice:

:X cg:bvn "BV035542944" .
:X dc:identifier "BV035542944"^^xsd:BV .
:Y cg:oclcn "991052625" .
:Y dc:identifier "991052625"^^xsd:OCLCN .
:Z cg:isbn "3-8273-2774-1" .
:Z dc:identifier "3-8273-2774-1"^^xsd:ISBN .

Good practice:

:X owl:sameAs <http://uri.bvb.de/record/BV035542944> .
:Y owl:sameAs <info:oclcnum/991052625> .
:Z owl:sameAs <urn:isbn:9783827327741> . # normalized "3-8273-2774-1" to ISBN-13

If owl:sameAs is not the suitable predicate, you should ask yourself what your identifiers actually identify and how your resource is related to this entity. For the general case

:A dc:identifier "foo" .

means more or less:

:A my:isSomehowRelatedToButWeDontKnowHowExactely <someURINamespace:foo> .
<someURINamespace:foo> skos:prefLabel "foo" .
permanent link

answered 15 Mar '11, 09:25

Jakob's gravatar image

Jakob
1.9k211
accept rate: 10%

edited 15 Mar '11, 10:15

However, from the example, one can see that the application mints its own URIs for the bibliographic resources anyway. So, I think, there is nothing wrong, to add an identifier, such as ISBN, as a literal. Creating same-as relations is another (orthogonal) task in my mind.

(15 Mar '11, 13:46) zazi zazi's gravatar image

There is nothing wrong to add identifiers as literals, but there is little value, because these identifiers do not uniquely identify a resource within the scope of RDF, but only in prorietary contexts that are unrelated to RDF.

(15 Mar '11, 15:29) Jakob Jakob's gravatar image

I think if the identifiers can already uniquely identify a resource outside of a Semantic Web context, then they will also do so in a knowledge representation that is powered by RDF, or? Please remember a unique name assumption cannot delivered by the Semantic Web (see, e.g., http://www.semanticoverflow.com/questions/1837/common-semantic-web-misconceptions-youve-encountered/1838#1838). URIs can be as ambigous as any other identifier type.

(15 Mar '11, 18:35) zazi zazi's gravatar image

A basic assumption of RDF is that one URI reference always refers to one resource (no homonyms) in any context where RDF data is used. Without Unique Name Assumption there can be other URI references for the same resource (synonyms), but that's less critical. A literal identifier in contrast refers to one resource only in a specific context. Literal identifiers are just like names: in your family a given name may be unique, but among more people it is unlikely to identify one person.

(16 Mar '11, 14:56) Jakob Jakob's gravatar image

Ed already mentioned to use URI where they already exist and we'll consider this. But I don't think that it makes sense to mint even more new URIs in the case none exist. In fact, identifier+datatype is as much a unique name as a URI is. (And one might argue that every single literal is a unique identifier in the sense that its spelling is unique in contrast to every differently spelled literal. Uniqueness of identifiers as you postulate it is a question of their use and has nothing to do with the utter form of the identifier.)

(21 Mar '11, 10:06) acka47 acka47's gravatar image

In the modeling I would go with the "predicate" approach because

  1. These values are easier to change into links later.
  2. Its easier to write SPIN (or OWL constructs) to do data quality validation. (i.e. it may be an error for one bibo:BibliographicRecord to have two cg:bvn etc...)
  3. Is slightly easier to query for a sparql user who does not know about datatype functions.

I might then still use the xsd properties if that would make identifier validation easier. e.g. hypothetical example: an identifier for xsd:BVN must start with BV.

permanent link
This answer is marked "community wiki".

answered 10 Mar '11, 15:34

Jerven's gravatar image

Jerven ♦
4.7k610
accept rate: 35%

edited 10 Mar '11, 20:20

Thanks for the input.

(10 Mar '11, 19:15) acka47 acka47's gravatar image

My preference would be to use dcterms:identifier until it becomes clear that you need something more. I also think it makes sense to express the identifier as a URI wherever possible, e.g.

<http://opac.bib-bvb.de:8080/InfoGuideClient.fasttestsis/start.do?Query=-1="BV035542944> dcterms:identifier <info:oclcnum/991052625> .
permanent link

answered 10 Mar '11, 22:12

Ed%20Summers's gravatar image

Ed Summers
57326
accept rate: 10%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×102
×48
×10

question asked: 10 Mar '11, 14:28

question was seen: 3,248 times

last updated: 04 Apr '12, 20:08