3
1

I'd like to write a Python program that:

  1. Queries an openrdf-sesame database through the REST interface
  2. Parses the result into a local graph with librdf
  3. Performs some manipulations that result in add and delete statements
  4. Executes those results back against the openrdf-sesame REST interface

The problem I'm having: librdf assigns its own blank identifiers to the blank nodes as the source graph is parsed, so if one of the steps in my transaction (of #4, above) is to delete a blank node, this will fail -- because my local blank node identifier is different from the remote server's blank node identifier.

How can I tackle this? Is there a way to ask librdf to preserve blank node identifiers on parsing a graph? Or some other approach?

Thanks,

-B


Edit: The core issue

Based on Signified's suggestions, here (I suppose) is the crux of my problem: librdf ignores even explicit nodeIDs from the source graph:

>>> import RDF
>>> m = RDF.Model()
>>> p = RDF.Parser()

>>> s = """<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:nodeID="one_node">
        <rdf:type rdf:resource="http://my.type"/>
  </rdf:Description>
</rdf:RDF>
"""
>>> p.parse_string_into_model(m,s,"http://base_uri")
>>> ss = m.find_statements(RDF.Statement(None,None,None))
>>> list(ss)[0].subject.blank_identifier
'r1298398951r3374r3'

Same thing with NTriples:

>>> s = """_:my_node_name <is>; <okay>;.\n""" 
>>> p = RDF.NTriplesParser() 
>>> p.parse_string_into_model(m,s,"base_uri";) 
>>> ss = m.find_statements(RDF.Statement(None,None,None)) 
>>> list(ss)[0].subject.blank_identifier
 'r1298406586r15211r1'

asked 22 Feb '11, 17:14

Bosh's gravatar image

Bosh
12715
accept rate: 0%

edited 22 Feb '11, 20:37

Strictly speaking, you can't have underscores in blank-node labels of N-Triples http://www.w3.org/2001/sw/RDFCore/ntriples/#name (the semi-colons in your N-Triples are also a little weird). But that's probably not the problem since the parser doesn't throw an exception. I'm really surprised that the librdf N-Triples parser goes to the bother of rewriting blank-nodes. (See comments below...)

(22 Feb '11, 22:04) Signified ♦ Signified's gravatar image

Since you speak of "graphs", I assume that your queries are CONSTUCT/DESCRIBE?

Labelling of blank-nodes

Are the results from the REST interface in RDF/XML? If so, unless there's rdf:nodeID attributes assigned, there are no obvious labels to assign to a blank-node. Even with rdf:nodeID values, parsers are not required to preserve these as blank-node labels, and may (varyingly) append them onto prefixes, or add counters, etc.

For RDF/XML blank-nodes without rdf:nodeID, parsers typically have a prefix like bnode or genid and append a counter _:bnode0, _:bnode1 etc. There's no agreed upon rule-of-thumb on how this should be done (and different parsers might encounter bnodes in different order).

So, you shouldn't expect consistent blank-node labelling across different parsers. Similarly, I wouldn't imagine that the bnode labelling of librdf for RDF/XML parsing is configurable.

Solutions?

An easy solution might be to try return results in N-Triples where blank-nodes are explicitly named. (I think that it's a safe assumption that librdf will keep the native labels for blank-nodes.)

If not possible (tricky)...

One option might be to not specify the blank node in the remove statement request. This will work fine if the remaining constants uniquely identify a particular triple. Otherwise, you can query Sesame with the blank-nodes as variables: this will return a set of candidate statements to remove. You can then try to figure out which candidate statement you want to delete (maybe using a simple entailment check :O).

permanent link

answered 22 Feb '11, 17:53

Signified's gravatar image

Signified ♦
24.0k1623
accept rate: 37%

edited 22 Feb '11, 19:01

Thanks very much for this answer. I've experimented with explicit nodeIDs and found that librdf ignores these, even when specified (see edit to my original post). This is getting complicated, for what should have been a simple operation :-)

(22 Feb '11, 18:31) Bosh Bosh's gravatar image

I should point out that an RDF/XML parser is not required to maintain the same labels as given in the rdf:nodeID attributes (editted answer). It's also very important to remember that blank-nodes suck! ;) Have you tried returning/parsing N-Triples? Should be the easiest solution.

(22 Feb '11, 18:58) Signified ♦ Signified's gravatar image

Sadly the NTriples approach looks the same (see next edit to my original post)

I have lots of non-externally-referencable elements as properties of URI nodes in my openrdf-sesame endpoint. I'd like to manipulate then with add/remove transactions in a client Python script... I agree that blank nodes suck; should I be making these all URIs?! This also feels wrong...

(22 Feb '11, 20:38) Bosh Bosh's gravatar image

That is strange. I guess your three options then are: (i) drop the blank-nodes; (ii) drop librdf for another python (or jython) parser... e.g., http://www.semanticoverflow.com/questions/3275/advantages-of-nxparser-over-jena ; (iii) try the workarounds mentioned after "If not possible (tricky)..." above. Good luck! ;)

(22 Feb '11, 22:07) Signified ♦ Signified's gravatar image
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×152
×30
×30

question asked: 22 Feb '11, 17:14

question was seen: 6,628 times

last updated: 22 Feb '11, 20:37