Hello,

does anybody have an experience with converting of any document (e.g. in HTML or XML) to RDF format? Success rate?

Can you also recommend a converter?

Thank a lot.

Best Regards, Börteçin

asked 26 Oct '10, 13:26

Botticelli's gravatar image

Botticelli
5713
accept rate: 0%

edited 15 Dec '10, 08:33


You can start by taking a look at http://esw.w3.org/ConverterToRdf.

link

answered 26 Oct '10, 14:23

Antoine%20Zimmermann's gravatar image

Antoine Zimm... ♦
9.5k514
accept rate: 31%

TopBraid offers a "Semantic XML" feature that transforms XML into an RDF data structure, preserving all XML tag hierarchies, tag orderings, attributes, etc. You can also convert back to XML, thus supporting round-trip XML access. 100% success rate on well-formed XML files. RDFa is also supported.

There is a PDF describing this and other import capabilities at http://www.topquadrant.com/resources/Import_and_Transformation_with_TBS.pdf There is also a screencam video titled "Importing arbitrary XML files"at http://www.topquadrant.com/resources/videos.html.

link

answered 26 Oct '10, 16:40

scotthenninger's gravatar image

scotthenninger ♦
7.5k813
accept rate: 17%

edited 26 Oct '10, 16:47

@scotthenninger: Just watched the video... pretty slick!

(27 Oct '10, 01:05) harschware ♦ harschware's gravatar image

It really depends what semantics you want to extract! Things like page title are easy, but section headings etc. are used in pretty random ways.

You can convert anything digital into RDF. The question isn't how to convert it into RDF, it's what do you plan to do with the RDF (or hope other people will do)?

Being able to pull out the title and actual content HTML from a page might be useful, also the link and meta tags etc. but it really depends on what you're trying to produce. Start with a use-case and work backwards.

link

answered 27 Oct '10, 11:15

Christopher%20Gutteridge's gravatar image

Christopher ...
1.1k211
accept rate: 16%

You might also like to look into OpenLink Virtuoso's Sponger. This both groks many existing formats and is easily customized for your own sources as well.

(Disclaimer: OpenLink feed me)

link

answered 22 Jun '11, 11:05

Tim.Haynes's gravatar image

Tim.Haynes
26513
accept rate: 8%

Your question is very generic. My answer assumes that you are talking about extracting RDF metadata from documents. There are two opensource frameworks I would recommend you look at:

-Aperture : a Java framework for extracting and querying full-text content and metadata from various information systems (e.g. file systems, web sites, mail boxes) and the file formats (e.g. documents, images) occurring in these systems.

-Any23 : a library, a Web service and a set of command line tools for extracting structured data in RDF format from a variety of Web documents. This project has been developed by SIndice team from DERI. The project has recently moved as a Apache incubator project.

link

answered 13 May '13, 10:08

fellahst's gravatar image

fellahst
3.2k29
accept rate: 8%

Alchemy offers a pretty good API that support entity extraction, relation extraction, sentiment analysis etc. You can try it below:

http://www.alchemyapi.com/api/demo.html

As I understand, Open Calais support something similar:

http://www.opencalais.com/

link

answered 14 May '13, 12:00

William%20Greenly's gravatar image

William Greenly
5.1k412
accept rate: 13%

To add to the previous answers: if the XML or HTML has a lot of unstructured text (paragraphs, etc.), then you may want to think about some type of text analytic software to do basic entity extraction. That way, the RDF version of the document at least has statements describing the metadata within the running text.

I haven't tried TopBraid's tool listed above, but you may be better served just using XSLT, since, as I'm sure you know, the structure of an XML document does not really correlate exactly to the semantic representation (big example of that today is any XBRL document).

link

answered 23 Dec '10, 21:37

Rob%20Gonzalez's gravatar image

Rob Gonzalez
1313
accept rate: 0%

You may want to try Tripliser, a library/command-line tool for XML to RDF conversion/extraction.

Unlike most other solutions, it does not use XSLT, which can become hard to read for more complex mappings.

http://daverog.github.com/tripliser/

It maps XML to RDF using XPath to extract the data.

For the source XML:

<person id="bart-simpson" friends="http://van-houten.name/milhouse">
  <name>Bart Simpson</name>
</person>

The mappings could look like this...

<resource query="person">
  <about prepend="http://people.name/" append="#person" query="@id"/>
  <properties>
    <property name="rdf:type" resource="true" value="foaf:name"/>
    <property name="foaf:name" query="name" />
    <property name="foaf:knows" query="@friends" />
  </properties>
</resource>
link

answered 22 Jun '11, 06:28

daverog's gravatar image

daverog
1
accept rate: 0%

Please try this http://x2r.aut.ac.nz/

Brimount

link

answered 20 Sep '13, 00:40

brimount's gravatar image

brimount
9
accept rate: 0%

Please don't just post a link, but give a bit of an explanation as to why it's a good answer to the question as well.

(20 Sep '13, 00:41) Jeen Broekstra ♦ Jeen%20Broekstra's gravatar image
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×804
×29

Asked: 26 Oct '10, 13:26

Seen: 4,499 times

Last updated: 20 Sep '13, 00:41