|
It seems that there are three common types of semantic markup methods.
If I were to parse HTML pages how do I know what markup combination is being used, what tags, etc., to look at? Also, is there a tool online that lets me validate these different combinations so that I can guarantee that a particular page is syntactically correct? |
|
I'm not sure I can provide you a definitive answer but here a couple of thoughts.
If you are looking for a validator that simply classifies a page as "RDFa + Schema.org", I know of no such tool but it seems it would be easy enough to build one using the tools we have for the formats you intend to support and by looking at the namespace of the ontology. It seems like there ought to be a good way with microdata too. Thank you for your reply , I supposed as you mentioned for a parser if its RDFa it would be achievable by looking at the xmlns and for microdata I would investigate further , however I was hoping that there would be some mechanism like the meta tag with the content attribute , but I think there isnt one . Thanks! |
|
Just adding to harschware's answer: if you just want to extract the structured data (whether it is in schema.org or something else is, as he said, another matter) you can also use services that would, internally, extract that data using different syntaxes and return a merged RDF graph. There is one at http://www.w3.org/2012/sde/; I think Gregg Kellogg may have one, too (http://greggkellogg.net). The newest (not-yet-released) RDFLib Python library also includes both RDFa and microdata parsers. |

