|
What's the best RDF Schema vocabulary or OWL ontology for expressing BibTeX bibliographies in RDF? Ideally, the vocabulary would be actively maintained, have a decent spec, be supported by converters to and from BibTeX format, and in use at several sites that publish bibliographical data. Extra points if it's easy to use with RDFa, for embedding into an HTML bibliography. Edit: Jakob below asked about my use case, so here it is. My own publications currently live in at least four places: my BibTeX file, my homepage, my institute's homepage, and my university's library website. So a new publication is entered four times, by me and the institute webmaster and the university librarian. Once should be enough. I'm tempted to make the BibTeX file the primary representation, because there are good tools for managing that file, and find ways of deriving the other representations from it. Edit: Candidates mentioned so far are:
Not RDF, but useful and with working implementations: Potentially useful code:
|
|
I do not understand you question. BibTeX is a specific data format but RDF is a general data structuring language. If you only want a one-to-one mapping than BibTeX in RDF is trivial:
Becomes
If you really want to convert BibTeX to RDF then the only thing you need is a namespace prefix! But obviously you do not look for a simple encoding of BibTeX in RDF but for a general RDF ontology for bibliographic data that happens to be able to express most of what is possible in BibTeX. In this case you need to make clear your use case. There are many Ontologies for bibliographic data and there surely will be more because all have their specific focus. What do you want to achieve with the data that you like to transform from BibTeX to some RDF ontology? If you want a "general bibliographic data format" then you will end up with something linke Dublin Core that will not be enough for many use cases. But if you have a more specific use case (for instance to create nice citations or to do bibliometric analysis etc.) then I could name some more suitable specific formats (most of them are already mentioned). edit: I see no point in a one-to-one-mapping of BibTeX to BibTeX in RDF. The only useful thing that you could do with this BibTeX-Ontology could be creating BibTeX again but to make real use of the data you need to map it to another data model anyway - for instance you want to split the list of author names into single authors. I think SPARQL or similar RDF tools are not suitable for this kind of mapping and there already exist mapping tools from BibTeX to other formats. I think you do not look for an RDF-based bibliographic format but for a tool to manage your bibliography. There should be a master file which other formats for different use cases can be derived from. Have a look at Zotero (Open Source client and public sync server), Mendeley (more functionality), BibSonomy (based on BibTeX and good connection to the computer science community but only online), or other reference managing software. You can easily manage your bibliography there and import and export the data via APIs in various formats - for instance BibTeX but also RDF-based formats - at least Zotero has an RDF-based export format, I don't know about the other ones. All these programs are maintained and developed so more and better export is beeing added. One format that we will soon see in Zotero (you can already see parts of it in the source code) is the input format of the Citation Stylesheet Language (CSL). I just gave a lightening talk about CSL, see http://www.slideshare.net/nichtich/voss-elag-csl2010. I am also going to write an RDF serialization of this format (yet another bibliographic ontology). The existing CSL-processor citeproc-js is pretty cool. In short: Either you do not want to deal with the details of mapping bibliographic formats - then just use a reference managing software. Or you have special needs and like to dig into mapping bibliographic formats - then you definitely need a mapping from and to one of the formats that are commonly used by reference managing software (BibTeX, COinS, MODS, BIBO...) - but in this case you could also directly invest your work in extending existing reference managing software by additional import/export and other tools. Thanks Jakob, good answer. I added a note about my motivation to the question. And as you know, it's part of the RDF ethos that you should re-use other people's terms whenever practical, and that terms should be well-defined and well-documented. So, indeed all I need is a namespace prefix; but I'm hoping that some kind soul out there has already defined this namespace, and set up the proper documentation for it, and maybe that namespace is even already used by some tools or data providers. The question was about rdf vocabularies for bibliographic information; so I'm not sure it's really fair to say that their question was really about applications for managing bibliographic citations... |
|
We currently use SWRC on the SW Dog Food site. It does the job, seemed the best choice at the time and is modelled very closely to BibteX. We're not totally happy with it anymore, though, since it
For these reasons, we are currently thinking about moving to BIBO, which has all of the above. Regarding converters, I used bibtex2rdf a while back. Unfortunately, the source code is not publicly available, but I did get it by asking the author. Thanks Knud. bibtex2rdf uses JavaBib in turn, which looks like a good BibTeX parser. http://www-plan.cs.colorado.edu/henkel/stuff/javabib/ |
|
Don't forget about Dublin Core. dcterms:title & dcterms:description is accepted by many consumers. The only realistic way to create a reliable round trip is to use one that explicitly models the BibTeX structure. Such as; http://zeitkunst.org/bibtex/0.1/ Don't forget you can always overload your RDF. You could provide ALL the data in the bibtex predicates, then add dublin core & bibo where it's easy. Thanks Christopher. Yes, DC as the lowest common denominator certainly has a role to play here. Overloading would be possible, although ideally I wouldn't have to worry about this when publishing instance data (it should be taken care of by mappings in the BibTeX-specific vocabulary). I hadn't seen the zeitkunst.org ontology before, but looking at it, it's the typical kludged-together-in-2004-using-Protégé stuff that I'd rather not use in the age of RDFa. |
|
i have to point to ShaRef because that's the project i've been working on quite a while ago. it's XSD and pretty much derived from BibTeX, but better structured. however, it's not maintained, and the converters live in obscure java apps that probably are only used by one person on the planet these days (me), but there also is an online service. but still, it exists and is open source and easy to use. project page (historical): http://dret.net/projects/sharef/ online converter: http://dret.net/bibconvert/ XSD: http://dret.net/bibconvert/xslt/sharef.xsd example XML: http://dret.net/biblio/dret.xml if there is a well-established RDFS or OWL, i might even give it a try to write some XSLT to convert my XML to RDF. |
|
DBLP, a well known bibliography index for Computer Sciences know exposes references either in bibtex or in RDF. For instance : http://dblp.l3s.de/d2r/page/publications/journals/ai/HendlerB10 The ontology used can be found at http://ontoware.org/swrc/, and is described in this paper http://www.aifb.kit.edu/web/Inproceedings1003 But there is no converter too... Nitpick: This RDF is not published by DBLP, but by a third party (Universität Hannover). But thanks anyway, YMombrun! SWRC is definitely a contender. |
|
Have you checked BibJSON and its export capabilities? Mainly they're using BIBO with some extensions. For examples, check with Jim Pitman (and if you do bug him to just post them rather than leaving dummy links in). |
|
For completeness it is worth pointing out that even without the planned BIBO support Zotero imports wide range of formats including BibTeX and stores and exports its data as RDF. It fails your criteria in that "Zotero RDF" doesn't seem to have a standalone spec and isn't directly used for publication but is normally internal to the tool. They seem to use mostly DC with a bit of FOAF, PRISM and Biblio as the vocabularies. It's a great reference manager though. |
|
Dear folks, I have been alerted to this conversation very late in the day, so my apologies for revising an old string. However, since it mentions CiTO, let me comment. These mentions of CiTO in earlier posts relate to my original version of CiTO (v1.6, described in J. Biomedical Semantics 1 (Suppl. 1): S6. http://dx.doi.org/10.1186/2041-1480-1-S1-S6), which contained properties for typing and counting citations, and classes for describing the objects of citations, e.g. "book" and "journal article". In the second half of last year, Silvio Peroni and I cleaned up this mixed bag by splitting CiTO v1.6 into three separate and complementary ontologies: 1 CiTO, the Citation Typing Ontolopgy (http://purl.org/spar/cito/), containing only the original object properties used for citation typing, as used by Egon Willighagen in CiteuLike (http://chem-bla-ics.blogspot.com/2010/10/citeulike-cito-use-case-1-wordles.html) and Martin Fenner in WordPress blog posts (http://blogs.plos.org/mfenner/2011/02/14/how-to-use-citation-typing-ontology-cito-in-your-blog-posts/). 2 FaBiO, the FRBR-aligned Bibliographic Ontology (http://purl.org/spar/fabio/), that can be used for describing all things that are the objects of citations, from computer software via books and journal articles to blog posts. FaBiO has many elements in common with BIBO, but is richer both in extent and in being structured according to FRBR (http://www.ifla.org/en/publications/functional-requirements-for-bibliographic-records) into works, expressions, manifestations and items. BIBO is widely used and suitable for many purposes, but FaBiO's increased expressivity may be useful for describing things not covered by BIBO, and to avoid potential semantic confusion (e.g. between a research paper and its expression either in a conference presentation or in a research article). 3 C4O, the Citation Counting and Context Characterization Ontology (http://purl.org/spar/c4o/), that does what it says on the tin. In addition, we created five other complementary and orthogonal ontologies for the bibliographic domain, covering for example the relationship between references and bibliographies, and the semantic and structural components of bibliographic documents. Together with CiTO, FaBiO and C4O, these form the SPAR (Semantic Publishing and Referencing) Ontologies described at http://purl.org/spar/. Our current work involves using these ontologies as appropriate, in conjunction with Dublin Core, SWAN, Prism, FOAF, etc, to fully describe bibliographic and citation information in RDF. We are also extending coverage to enable descriptions of and references to datasets, having mapped the DataCite Metadata Kernel to RDF (http://bit.ly/eOLN72), and with other colleagues we are developing best practice recommendations for citing and referencing published datasets. To come back to the original topic of this post, the SPAR ontologies are suitable for mapping reference management metadata into RDF. If you have questions about SPAR, you can contact me at david.shotton@zoo.ox.ac.uk and Silvio at speroni@cs.unibo.it. |


