|
I am fan of the work done in the DBpedia Spotlight, and I would be very interested on creating this service for other languages. But before to head on the adventure, I would like to know the challenges that this implies and/or recommendations, for example creating dictionaries, training models, etc. Thanks! |
|
Update: We have created a step-by-step guide. Hi Ale, Pablo from DBpedia Spotlight here. Thanks for your interest! You will be pleased to know that there is active work in our group on internationalizing DBpedia Spotlight. We are working on Portuguese and Spanish as examples, and we hope that other languages will follow suit. I recommend interested parties to join our mailing list to keep in touch and avoid duplication of effort. Tim's comments are in the right direction (thanks Tim!), but let's see if I can further detail the process:
Steps 4 and 5 are self-documented in code. :) But we are working on creating a more high level description of the steps to save you some time. Finally, if you plan to change the source code, please consider committing it back to our repository so that other people can also benefit from it - and you can be acknowledged as awesome contributor! :) Update: We have created a step-by-step guide. |
|
I suspect you won't run into a need for creating dictionaries or training models as you say. The source code and instructions for installing a local copy are available. As well as the datasets that you will need as described on the main page. If it were me, I would download the source and datasets, open and familiarize myself with them. I would then take a look as to why the data does not include the English foreign language data (is that even true?). Once, and if, you see that it is missing you can match it up against dbpedia 3.6 dumps. It will probably be in the same format. If so, ingest the foreign language data into your server you are dedicating to this cause (make sure server has enough horse power to handle it). I suspect if the English foreign language data is missing you may also need to create the Disambiguation index as Lucene indexes or merge them with those the project page provides. |

