I am developing an application that will allow employees (users) to classify table and columns in the corporate databases according to a given taxonomy.

Once the database metadata has been imported, it is converted into an RDF graph. Each table and column is an instance of an internally developed RDF class for representing tables and columns. Note that actual table data are NOT imported.

So a particular column may look like this in Turtle format:

ex:emp_id a dbmap:Column ;
     dct:title "emp_id" ;
     dbmap:storageType "integer" ;
     dbmap:storageSize "10" .
# ... there may be more attributes here ...

Now let's say that we want to associate a logical type (sumo:EmployeeIdentifier) from the taxonomy with this column.

Should ex:emp_id be a sub-class of sumo:EmployeeIdentifier:

ex:emp_id a dbmap:Column ;
     rdfs:subClass sumo:EmployeeIdentifier ;
....

Or should the association be made by an attribute:

ex:emp_id a dbmap:Column ;
     dbmap:logicalType sumo:EmployeeIdentifier ;

What are pros/cons of each approach?

asked 14 Nov '12, 06:52

slmnhq's gravatar image

slmnhq
234
accept rate: 0%

edited 14 Nov '12, 10:45

Signified's gravatar image

Signified ♦
23.9k1623


Here are some considerations:

If you are modeling ex:emp_id to be a subclass of sumo:EmployeeIdentifier, this means that any instance you have of the former will be an instance of the latter as well (instance inheritance). So one question is: are all conceivable instances of ex:emp_id really always sumo:EmployeeIdentifiers? Or could it be that you have instances that are not (one thing that springs to mind is that you might want to cover the case where the database value is null, or an illegal value: such values would be instances of your ex:emp_id column, but they would not be identifiers). I realize you currently do not plan to actually have instances of ex:emp_id, but it never hurts to think ahead - you might want to later (or someone else might).

The flip side of that argument is that if you model it as a property/association, you will not get any automatic inheritance of the sumo:EmployeeIdentifier classification.

Another consideration is that of modeling complexity: making ex:emp_id of type Column means it is an individual instance. Making it a subclass means that you now have a concept that is both a class and an instance. While this is fine in principle, it does have the potential to make your model more confusing. Using subclassing effectively implies you do not consider ex:emp_id the 'lowest level' in your data model.

To make a long story short: if your intent is not to ever have individual instances of ex:emp_id in your model, but consider ex:emp_id the individual, then I would recommend modeling the relation with sumo:EmployeeIdentifier via a property, rather than using subclassing.

permanent link

answered 14 Nov '12, 09:27

Jeen%20Broekstra's gravatar image

Jeen Broekstra ♦
11.5k412
accept rate: 37%

Thank you for the thoughts. Looking ahead, I do want to expose the actual data through a system like D2RQ but not necessarily import it into a triple-store.

What are your thoughts about using owl:equivalentClass ?

ex:emp_id a dbmap:Column ; owl:equivalentClass sumo:EmployeeIdentifier

The complexity issue remains because ex:emp_id is still an instance and a class. And I'm not sure if reasoners will treat column values as instances of sumo:EmployeeIdentifier.

(14 Nov '12, 10:51) slmnhq slmnhq's gravatar image
1

@slmnhq, saying:

a owl:equivalentClass b .

is the same as saying

a rdfs:subClassOf b . b rdfs:subClassOf a .

In this case, you say that all instances of ex:emp_id are instances of sumo:EmployeeIdentifier and vice-versa.

(14 Nov '12, 11:02) Signified ♦ Signified's gravatar image
1

Exposing this data via D2RQ sounds fine to me - even if you do not store it in an actual triplestore, your data will still be RDF, and therefore it makes sense to model it correctly. Regarding what reasoners support, I'd advice to separate your concerns: model in a conceptually correct way, and worry about tool support later. So if you want to consider column values as instances of sumo:EmployeeIdentifier, express that in your model.

(14 Nov '12, 11:11) Jeen Broekstra ♦ Jeen%20Broekstra's gravatar image

Thanks for the clarifications @Signified and @jeen-broekstra.

(14 Nov '12, 11:12) slmnhq slmnhq's gravatar image
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×16
×11

question asked: 14 Nov '12, 06:52

question was seen: 670 times

last updated: 14 Nov '12, 11:28