I am trying to collect foaf.rdf files for my research project. One possible way I thought of collecting these files was to crawl the web. I was wondering if there is an repository that maintains a list of these files similar to that of CKAN which is a repository for linked data sets. Can someone help with this question?

If there is no such repository, is there any method which is better than crawling?


asked 17 Jan '13, 02:04

rar_ind's gravatar image

accept rate: 0%

edited 17 Jan '13, 07:12

I don't know of any repositories but if you Google filetype:foaf it will return .foaf files in it's index. You can be more specific so

smith filetype:foaf

Will return .foaf files with the word smith in them. I haven't played too much with this, but I'd imagine it could help.

permanent link

answered 17 Jan '13, 07:51

Phil's gravatar image

accept rate: 28%


I think foaf filetype:rdf might be better? Google is also very good at picking out the most prominent FOAF files out there.

(17 Jan '13, 13:18) Signified ♦ Signified's gravatar image

Yep, rar_ind you should go with this.

(17 Jan '13, 20:00) Phil Phil's gravatar image

I would have pointed you to pingthesemanticweb.com but that service seems to be offline.

You could also look through the contents of the BTC-2012 dataset. This will contain a lot of FOAF data alongside other stuff. You could filter the data you need from the dumps.

You could also try queries against the LOD cache SPARQL endpoint. Here's one idea:

WHERE { GRAPH ?g { ?d a foaf:PersonalProfileDocument } }

And here's some LIMIT 100 results for that query.

You could also try something similar over at Sindice.

I should add that a lot of FOAF data available on the Web comes from a small number of sites like livejournal.com, identi.ca, vox.com, etc. They export a FOAF profile for each of their users, culminating in millions upon millions of FOAF files on the Web. It's important to be aware of this as these exporters greatly outnumber the "homegrown" FOAF files on the Web, and due to the uniformity of their export code, they can often "skew" distributions and other analysis of large-scale RDF data crawled from the Web. For example, you'll find a lot of FOAF files contain a "social outdegree" of precisely 1,000 simply because such sites limit the number of connections you can have.

permanent link

answered 17 Jan '13, 13:29

Signified's gravatar image

Signified ♦
accept rate: 37%

edited 17 Jan '13, 13:31

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:


question asked: 17 Jan '13, 02:04

question was seen: 1,358 times

last updated: 17 Jan '13, 20:00