Hello,

I have the following sample information about some movies including the URLs to the movies in IMDb:

1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)| 2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye%20(1995)| 3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)|

However, when I am searching for this in LinkedMDB, I am not able to find these movies URLs.

I have understood that the LinkedMDB is the Triples version of IMDb. Not sure how I should search them. Please help.

Regards Bonson

asked 02 Nov '12, 07:28

Bonson's gravatar image

Bonson
4016
accept rate: 0%


I need to be able to automate this as there are 100K links that I need to resolve and get other information. Is there any way that I can connect the URI http://us.imdb.com/M/title-exact?Toy%20Story%20(1995) to either LinkedMDB or DBPedia?

For LinkedMDB:

  1. Resolve the URI http://us.imdb.com/M/title-exact?Toy%20Story%20(1995), which points to http://www.imdb.com/title/tt0113101/

  2. Use this query on LinkedMDB:

    PREFIX ... SELECT * WHERE { ?s foaf:page http://www.imdb.com/title/tt0081505 . }

Not all of the movies will be in there.

permanent link

answered 02 Nov '12, 11:00

Signified's gravatar image

Signified ♦
24.0k1623
accept rate: 37%

edited 02 Nov '12, 11:04

Hello,

Thanks for the response. Could you please elaborate Point 1. How do we resolve the URI of IMDb to point to the actual URI.

Regards Bonson

(02 Nov '12, 23:47) Bonson Bonson's gravatar image
1

No problem. Use a HTTP lookup following redirects.

(03 Nov '12, 10:50) Signified ♦ Signified's gravatar image

Hello,

I apologize as I feel that this might be a very basic question but could you please elaborate on using HTTP lookup following redirects?

Where do I lookup? Also, I know the concept of redirecting the request to another page, but how do I use it in this context?

Regards, Bonson

(04 Nov '12, 00:54) Bonson Bonson's gravatar image
1

When you point your browser at "http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)" it automatically redirects to "http://www.imdb.com/title/tt0113101/" which is the URL that would be used in linkedMDB. eg. in python: URL = "http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)" response = urllib2.urlopen(URL) redirectedURL = response.geturl() print redirectedURL

(04 Nov '12, 07:49) Sweet Burlap Sweet%20Burlap's gravatar image

Spoiler Alert! (Full Code)

URL Method:

import urllib2
from SPARQLWrapper import SPARQLWrapper, JSON
URLList = ["http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)", "http://us.imdb.com/M/title-exact?GoldenEye%20(1995)", "http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)"]
sparql = SPARQLWrapper(" http://data.linkedmdb.org/sparql")
for URL in URLList:
    response = urllib2.urlopen(URL)
    redirectedURL = response.geturl()
    print redirectedURL
    query = """
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    SELECT ?s
    WHERE { ?s foaf:page <%s> . }
    """ % redirectedURL
    sparql.setQuery(query) 
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()

    for result in results["results"]["bindings"]:
        print URL
        print(result["s"]["value"])

Name Method:

from SPARQLWrapper import SPARQLWrapper, JSON
NameList = ["Four Rooms (1995)", "Toy Story (1995)", "GoldenEye (1995)",]
for IMDBName in NameList:
    LMDBName = IMDBName[:-7]
    sparql = SPARQLWrapper("http://data.linkedmdb.org/sparql")
    query = """
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    SELECT distinct ?film 
    WHERE {
    ?film rdfs:label "%s".
    }
    """ %LMDBName

    sparql.setQuery(query) 
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()

    for result in results["results"]["bindings"]:
        print LMDBName
        print(result["film"]["value"])

Combined Method:

import urllib2
from SPARQLWrapper import SPARQLWrapper, JSON
CombinedList = [("http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)", "Toy Story (1995)"),
                ("http://us.imdb.com/M/title-exact?GoldenEye%20(1995)","GoldenEye (1995)"),
                ("http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)","Four Rooms (1995)")]
sparql = SPARQLWrapper(" http://data.linkedmdb.org/sparql")
for record in CombinedList:
    response = urllib2.urlopen(record[0])
    redirectedURL = response.geturl()
    IMDBName = record[1]
    LMDBName = IMDBName[:-7]
    query = """
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    SELECT DISTINCT ?film
    WHERE{
    { ?film foaf:page <%s> . }
    UNION
    {?film rdfs:label "%s".}
    }
    """ % (redirectedURL, LMDBName)
    sparql.setQuery(query) 
    sparql.setReturnFormat(JSON)
    results = sparql.query().convert()

    for result in results["results"]["bindings"]:
        print record
        print(result["film"]["value"])

If you feel it is worthwhile or the source data is sufficiently heterogenous you could replace the LMDBName = IMDBName[:-7] with .rfind() to find the positions of the brackets and extract the year and the title separately and incorporate the year into the query which should facilitate the disambigation of 2 films with the same name, but made in different years.

permanent link

answered 04 Nov '12, 08:10

Sweet%20Burlap's gravatar image

Sweet Burlap
2.7k38
accept rate: 18%

edited 04 Nov '12, 20:22

Hello, Thank you for the code bit. It has given me a better perspective. I am more of a Java programmer. Would you have an idea on whether the same thing can be done in Java?

Also, would it be good/better for me to learn Python? Does more coding in the kind of work I am trying to do happen more or better in Python?

Regards Bonson

(04 Nov '12, 21:27) Bonson Bonson's gravatar image

To be honest, I have never programmed in Java, i'm at best an amateur coder and find Python much easier to play with.

Im sure that similar is fairly easy in java, but couldnt venture any specifics. (check out Jena)

I like python - fantastic for getting something working quickly! but if youre happy with java i dont NECESSARILY see a reason to change. Plenty of people do sem web stuff in Java. At the same time python should be relatively easy to pick up if you know Java and is definitely extremely useful for quick and easy prototyping.

(05 Nov '12, 03:26) Sweet Burlap Sweet%20Burlap's gravatar image
1

I don't see any problem to reproduce this with Java+JENA framework. If you understand Java, than you can also read and understand Python.

(05 Nov '12, 14:50) AKSWMember AKSWMember's gravatar image

Try using SPARQL to search by title - I can't find any explicit links back to the imdb sources (The IMDB page seems to be a foaf:page for some films eg. The Shining - could incorporate it using SPARQL UNION)

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT distinct ?film 
WHERE {
?film rdfs:label "Toy Story".
}
permanent link

answered 02 Nov '12, 08:29

Sweet%20Burlap's gravatar image

Sweet Burlap
2.7k38
accept rate: 18%

edited 07 Nov '12, 16:55

Hello,

I need to be able to automate this as there are 100K links that I need to resolve and get other information. Is there any way that I can connect the URI http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)| to either LinkedMDB or DBPedia?

Regards, Bonson

(02 Nov '12, 08:46) Bonson Bonson's gravatar image

The above mentioned query doesnt seem to work. I guess "Toy Story" is mentioned isn some other way. How do I find that out?

Regards, Bonson Mampilli

(07 Nov '12, 08:49) Bonson Bonson's gravatar image

sorry - formatting error - <> around the PREFIX - edited query should work

(07 Nov '12, 16:56) Sweet Burlap Sweet%20Burlap's gravatar image

The query works but it returns [No Results]. So maybe the movie is mentioned as something else other than "Toy Story". How do I resolve this inconsistency when I am writing code to automate?

Regards Bonson

(07 Nov '12, 19:46) Bonson Bonson's gravatar image
1

what are you using to generate the query?

what does the query url look like

im getting - "http://data.linkedmdb.org/page/film/38223" as the result of the query, with a query url of:


http://data.linkedmdb.org/sparql?query=%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0ASELECT+distinct+%3Ffilm+%0AWHERE+%7B%0A%3Ffilm+rdfs%3Alabel+%22Toy+Story%22.%0A%7D%0ALIMIT+100%0A

(08 Nov '12, 18:56) Sweet Burlap Sweet%20Burlap's gravatar image
1

import urllib baseurl="http://data.linkedmdb.org/sparql?" params = urllib.urlencode({'query': """ PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema# ##formatting error here - use <> brackets SELECT distinct ?film WHERE { ?film rdfs:label "Toy Story". } LIMIT 100 """}) queryurl = baseurl+params f = urllib.urlopen(queryurl) print f.read()

(08 Nov '12, 18:59) Sweet Burlap Sweet%20Burlap's gravatar image

Not all the movies are in LinkedMDB unfortunately.

(08 Nov '12, 20:05) Signified ♦ Signified's gravatar image
showing 5 of 7 show 2 more comments
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×262

question asked: 02 Nov '12, 07:28

question was seen: 1,538 times

last updated: 08 Nov '12, 20:05