There's a lot of data being published as Linked Open Data but what are the most valuable datasets that are needed? Can we target them as a community, persuade their owners to publish as linked data or are there legal/social/economic issues to overcome first?

This could be a lot of work with current tools: but I'd love to see the basic who/what/when/where of newspaper stories as linked data. That would open up a huge number of interesting mash-up and research possibilities.


Got to be careful what I'm saying here, but geographic data is the obvious one. Geography is common to so much data that it provides a natural hub for linkage on the linked data web. However, there are definitely economic issues here that have been argued many many times and I'm not even going to go there right now :)


I would very much like to see old scientific literature be made available as LOD. Perhaps simply by making the biobliographic data available (journal, title, authors, ... BIBO stuff) and later perhaps RDF reflecting the content too.

Old science is often very much outdated science, but increasingly material becomes available that still defines the foundation of current science. Having this information available in a machine readable/understandable way would be an enormous asset.

Scientific organisations/societies could be approached about making out-of-copyright stuff available as RDF. Talis could provide them with the platform, as I am pretty sure these societies do not always have the means (RoyalSocChem/UK excluded).


I agree on geographic data being important but I have another candidate that could be quite disruptive:

Product data

I want to see producers publishing complete specifications about their products. For food or health products there could even be descriptions of the production and distribution chains - sort of provenance of the products. I want to see sellers publish their offers and all involved conditions, linked to the data of the producers. I want to see reviews linked to the products as well and I want to see all that combined with social trust networks, geographical data and what have you.

Eventually this could lead to a whole new way of buying things. You would be a well-informed customer, machines could find the things that really match your requirements, companies might have to change their processes and increase the quality of their products and you could spend your money more accurately and conscientiously.


Perhaps metadata about information resources. A lot of metadata of this kind is curiously not free or open, and even if it is, it is often hidden in systems that allow access only by HTML-form-based interfaces, or arcane and often non-standards-compliant protocols.

Providing access to this kind of metadata makes it possible to at least find out what is available and where, if not actually providing access to the actual documents.


Academics and techies and info folks are competing with Google, Yahoo and soon M$ on this one.

The big guns are traversing the copyright minefield right now to pick and build linked data sets from the most easily or effectively monetized heaps of undead metadata....

Then they will link it to users personal ontologies they are RDFatizing -- to deobfuscate queries and preferences > Gridiron football, or Association football < and link it to their unique walletspaces

For me the most valuable datasets are the assertion-refutation conversations between experts (like semantic overflow!?!?)

The semantic relations of Schopenhauer<->Hegel or Compactification<->Branes would touch humanity


