I'm looking for a data set for testing that would:

  • contain a large variety of literal types

And, optionally would be:

  • amenable to calculations (i.e. literals may be interrelated)
  • the larger the better
  • perhaps synthetically generatable so that the data size and variability can be tuned for testing

dbPedia and SP2 already considered.

asked 18 Jan '13, 00:32

harschware's gravatar image

harschware ♦
accept rate: 20%

The UniProt dataset has the following XML schema literal types.

  • decimal
  • date
  • normalizedString
  • float
  • boolean
  • int
  • long
  • gYear
  • gYearMonth
  • token

As well as plain literals. Some correlations would be

?thing a up:Sequence
?thing up:md5Checksum ?checkSumOfvalue
?thing rdf:value ?valueThatWasChecksumed

Or ?range a up:Range . ?range up:begin ?begin . ?range up:end ?end . FILTER (?begin < ?end ) . #End should always be greater than begin.

Can't think of real calculation between data. Maybe rdf.wwpdb.org has more numbers you can play with.

permanent link

answered 18 Jan '13, 03:44

Jerven's gravatar image

Jerven ♦
accept rate: 35%

Thanks for the suggestion. 5 years ago, @AndyS wrote about Uniprot here: http://seaborne.blogspot.com/2008/06/tdb-loading-uniprot.html

(18 Jan '13, 11:49) harschware ♦ harschware's gravatar image

Its a lot larger these days. 6.4 billion triples after loading. Then it was 1.5. Also at that time we did not declare datatypes we do today.

(18 Jan '13, 11:53) Jerven ♦ Jerven's gravatar image
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:


question asked: 18 Jan '13, 00:32

question was seen: 930 times

last updated: 19 Jan '13, 03:12