Dear all,

I want to know if there's the possibility to make cluster analysis with some individuals of the ontology.

I was thinking that, I can make the simple cluster analysis and then extract some similarities and using the SWRL language I will apply them to new individuals. The problem is that with SWRL I can't consider uncertainty of the results of clusetr analysis.

Can anyone suggest me some ideas to make cluster analysis directly using the ontology language!

Hope to be clear


thanks in advance

asked 03 Dec '12, 04:27

Roberta%20Perrone's gravatar image

Roberta Perrone
accept rate: 0%

Alexandre Passant wrote a very nice paper

about similarity metrics for linked data concepts which he then turned into a site

The gist of this method is that one can infer

?s1 ?p ?o ^ ?s2 ?p ?o ==> ?s1 :similarTo ?s2 .

Of course this is a statistical calculation so perhaps you count the ?p-?o pairs or you have some kind of scoring function similiar to tfidf or Okapi or KS Divergence used in I.R. I've thought a bit about how to design a good scoring function here, but the part I'm hung up on now is how to compare different scoring functions and decide which one I like better. It probably doesn't matter much because the results don't disagree with "common sense" even with the simplest scoring function so long as you throw a large amount of varied data into the hopper.

Given some similarity metric S(?s1,?s2) there are many kinds of clustering methods that one could apply.

The requirements for the scoring function might be different for clustering than they would be for presentation for end users. One problem I know about is similar to the one that Okapi solved for IR, that is, the amount of bias a similarity function should have towards or against concepts that have high subjective importance. For instance, if I asked you who is similar to some horror writer, you're likely to answer "Stephen King" and people are likely to think they know what that means, even if this isn't the most precise or insightful answer.

If this bias is strong it will create points of intense curvature in the metric space and that might have a profound effect on how your clusters work out. If you try to project these into a low-dimensional space you'll see strange stuff like the light caustics that dance on the bottom of a swimming pool on a sunny day.

Perhaps you could do better by recognizing that certain ?p ?o pairs make a cluster -- for instance if one had a collection of songs, one could link together all the Beatles songs this way, but the trouble is that mutual exclusion isn't natural here. Most unsupervised systems have a hard time naming the clusters they discover, and the ?p ?o pairs could give the answer to this.

There are also quite a few approaches to clustering that don't work. Link distances in the graph don't work. The trouble is the "small world" character of networks that means there is little dynamic range here.

I'm also skeptical of trying to infer similarity based on positions in an ontology tree. For instance, one could come to the conclusion that "homo sapiens" and "canus lupus" are closer in the tree of living things than either is to "crotalus atrox" because the first two are both mammals. The trouble is that most ontology trees are rather heterogenous so the meaning of a step in the tree can vary a lot.


answered 06 Dec '12, 15:31

database_animal's gravatar image

database_animal ♦
accept rate: 15%

edited 06 Dec '12, 16:43

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text]( "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: 03 Dec '12, 04:27

Seen: 735 times

Last updated: 06 Dec '12, 16:43