What are the best small datasets for testing different algorithms using semantic web technologies? Alternatively what is the best way to rapidly create synthetic datasets using SPARQL? The only good article on this topic seems to be: http://swat.cse.lehigh.edu/pubs/li10d.pdf. In a perfect scenario it would be good if that dataset would have both the ontology and the triples.

asked 13 Feb '13, 08:21

paxRoman's gravatar image

accept rate: 0%

edited 14 Feb '13, 11:27

Off the top of my head:

  1. The Lehigh University Benchmark (LUBM). Quite old and quite simplistic, but has an ontology and an associated synthetic data generator. Often used for SPARQL querying and some OWL (1) inference tests.
  2. Berlin SPARQL Benchmark (BSMS). Newer benchmark framework. Mostly used for SPARQL querying. Not sure if there's an ontology attached.

Both benchmarks have published papers available. (LUBM, BSBM).

permanent link

answered 14 Feb '13, 12:32

Signified's gravatar image

Signified ♦
accept rate: 37%


Btw, congratulations to Signified for being the first to achieve 20k Karma points! :-)

(15 Feb '13, 03:43) Michael Schn... ♦ Michael%20Schneider's gravatar image

Not quite there yet, but many thanks! :)

(15 Feb '13, 09:10) Signified ♦ Signified's gravatar image

Thanks for the answer and congratulations to Signified!

(16 Feb '13, 12:50) paxRoman paxRoman's gravatar image
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:


question asked: 13 Feb '13, 08:21

question was seen: 1,423 times

last updated: 16 Feb '13, 12:50