In Norway, we're working on numerous library datasets in various groups. Several of the groups have discussed data normalization; there are two camps of library database programmers in Norway; those using non-first normal form (NF2) and those who don't.
The NF2 model doesn't necessarily imply normalization, whereas the other (largely relational database) models typically do.
NF2 models have a perceived "usefulness built in", and the complex relations that are described are perhaps inherent in the datasets. On the other hand, complex relations might make it more difficult to query because it implies a particular (and unfamiliar?) semantics. I suspect that the extent one can query such datasets effectively is a large part of an answer to this question.
The question(s): To what extent does normalization improve the usefulness of data in the semantic web? Is it better to have generic data that are more easily used, or is complexity that returns better "answers" better?
No reason you can't do both. A simple example;
In my school RDF, we model: people have roles roles have phone numbers & email (some people may have roles with different email addresses)
That's the accurate view, then we also directly state that people have phone numbers and emails (those from all their roles), this is also true and useful for less complex consumption.
answered 09 Jun '10, 16:29
You don't say wether you're interested in ontologies and OWL, or in data integration/exchange with RDF and linked data. I'll just talk about the latter.
RDF is rarely used as the primary form in which data is managed. More typically, RDF is just produced as a “view” on some existing data that lives in some other format, and often in a relational database.
The purpose of normalization in relational modelling is to prevent anomalies when inserting, updating and deleting data. For relational views, normalization is pretty much a non-issue, because views are not about data storage, but rather about presenting a different, more convenient perspective on the same data.
And that can be said about RDF as well. An RDF version of some data is usually produced because that's a convenient form for doing particular things, mostly around data integration. And it is not unusual to have redundancies in the RDF view, because that makes using and querying the data easier.
answered 08 Jun '10, 15:06