It seems that there are three common types of semantic markup methods.

  1. RDFa + GoodRelations
  2. Microdata + Schema.org
  3. Microdata + GoodRelations

If I were to parse HTML pages how do I know what markup combination is being used, what tags, etc., to look at? Also, is there a tool online that lets me validate these different combinations so that I can guarantee that a particular page is syntactically correct?

asked 01 Feb, 01:33

altruist's gravatar image

altruist
11115
accept rate: 0%

edited 04 Feb, 16:20

Signified's gravatar image

Signified ♦
21.4k622


I'm not sure I can provide you a definitive answer but here a couple of thoughts.

  • You are mixing the datatype representation with the ontology. Microdata is the representation, or format, and Schema.org is the ontology. With that pointed out, don't you really just need to parse the target webpage and refer to the namespace of the ontology to figure out which one it is? (if it's RDFa), and if it is microdata then I'm a little less sure of what to do. I suppose it depends whether an online tool suffices or you need to do it via command line. If the latter, there are parsers available to you, see the question "What is the best Java RDFa Parser?"
  • Be aware the schema.org has its Linked Data translation at http://schema.rdfs.org/
  • microdata and RDFa validator is easy enough to find with google. I found an announcment from W3C about New W3C Validation Service with RDFa 1.1 and microdata

If you are looking for a validator that simply classifies a page as "RDFa + Schema.org", I know of no such tool but it seems it would be easy enough to build one using the tools we have for the formats you intend to support and by looking at the namespace of the ontology. It seems like there ought to be a good way with microdata too.

link

answered 02 Feb, 13:49

harschware's gravatar image

harschware ♦
7.5k616
accept rate: 19%

edited 03 Feb, 20:04

Thank you for your reply , I supposed as you mentioned for a parser if its RDFa it would be achievable by looking at the xmlns and for microdata I would investigate further , however I was hoping that there would be some mechanism like the meta tag with the content attribute , but I think there isnt one .

Thanks!

(03 Feb, 00:15) altruist altruist's gravatar image

Just adding to harschware's answer: if you just want to extract the structured data (whether it is in schema.org or something else is, as he said, another matter) you can also use services that would, internally, extract that data using different syntaxes and return a merged RDF graph. There is one at http://www.w3.org/2012/sde/; I think Gregg Kellogg may have one, too (http://greggkellogg.net). The newest (not-yet-released) RDFLib Python library also includes both RDFa and microdata parsers.

link

answered 05 Feb, 08:55

Ivan's gravatar image

Ivan
1.3k34
accept rate: 13%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×83
×17
×15
×7
×6

Asked: 01 Feb, 01:33

Seen: 418 times

Last updated: 05 Feb, 08:55