In Norway, we're working on numerous library datasets in various groups. Several of the groups have discussed data normalization; there are two camps of library database programmers in Norway; those using non-first normal form (NF2) and those who don't.

The NF2 model doesn't necessarily imply normalization, whereas the other (largely relational database) models typically do.

NF2 models have a perceived "usefulness built in", and the complex relations that are described are perhaps inherent in the datasets. On the other hand, complex relations might make it more difficult to query because it implies a particular (and unfamiliar?) semantics. I suspect that the extent one can query such datasets effectively is a large part of an answer to this question.

The question(s): To what extent does normalization improve the usefulness of data in the semantic web? Is it better to have generic data that are more easily used, or is complexity that returns better "answers" better?

asked 08 Jun '10, 10:38

brinxmat's gravatar image

accept rate: 15%

edited 08 Jun '10, 12:53

No reason you can't do both. A simple example;

In my school RDF, we model: people have roles roles have phone numbers & email (some people may have roles with different email addresses)

That's the accurate view, then we also directly state that people have phone numbers and emails (those from all their roles), this is also true and useful for less complex consumption.

permanent link

answered 09 Jun '10, 16:29

Christopher%20Gutteridge's gravatar image

Christopher ...
accept rate: 16%

Clear concise response!

(10 Jun '10, 09:06) brinxmat brinxmat's gravatar image

You don't say wether you're interested in ontologies and OWL, or in data integration/exchange with RDF and linked data. I'll just talk about the latter.

RDF is rarely used as the primary form in which data is managed. More typically, RDF is just produced as a “view” on some existing data that lives in some other format, and often in a relational database.

The purpose of normalization in relational modelling is to prevent anomalies when inserting, updating and deleting data. For relational views, normalization is pretty much a non-issue, because views are not about data storage, but rather about presenting a different, more convenient perspective on the same data.

And that can be said about RDF as well. An RDF version of some data is usually produced because that's a convenient form for doing particular things, mostly around data integration. And it is not unusual to have redundancies in the RDF view, because that makes using and querying the data easier.

permanent link

answered 08 Jun '10, 15:06

cygri's gravatar image

cygri ♦
accept rate: 34%

I see your point, but I can't help feeling that providing views is a way of hindering re-use of the data.

I can certainly see the reasonableness of "convenience structures", however, isn't providing a view the job of sparql, if you see what I mean?

(09 Jun '10, 13:38) brinxmat brinxmat's gravatar image

The job of SPARQL is to transform the graph back into a tabular form, which is usually more convenient for processing/templating. I'm assuming that data publisher and data consumer are different parties. There is a boundary between them. The data consumer has to figure out your data structure and he doesn't have access to your brain. The more complex the structure at the boundary, the more likely the consumer will be turned off. This boundary is the main impediment to data re-use on the Web. Normalization usually doesn't simplify the structure, quite the opposite.

(09 Jun '10, 17:05) cygri ♦ cygri's gravatar image

You make good points here, and I certainly see that normalization doesn't necessarily do anything good.

Not sure that I agree with you about the rôle of sparql given that there are query forms other than SELECT.

(10 Jun '10, 08:57) brinxmat brinxmat's gravatar image
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text]( "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:


question asked: 08 Jun '10, 10:38

question was seen: 1,434 times

last updated: 09 Jun '10, 16:29