I was wondering in LDSpider could be used ot crawl an entire website (i.e all the links) like bbc.co.uk using the sitemap provided in robots.txt and gather the triples from that crawl. If i give the seed URI as bbc.co.uk I do not get any triples back. LDSpider does not seem to follow the robots.txt and go to sitemap.xml to crawl all the links in the sitemap.
Can this be accomplished using LDSpider?
I am suing a breadth first crawling strategy.I understand that the kind of crawling that I am trying to do is not neccessarily LOD in nature but I want to crawl multiple sites and intend to leverage the LDSpider because it can be hooked up with Any23Handler .
asked 08 Jan '13, 00:21