I'm Building a Question classifier for a question answering systems, and I want to know if anyone worked on something like this before. I've made a research on the state of the art algorithms to develop such component. most of them use machine learning with semantic features.
here's a list of the algorithms I've read
asked 09 Dec '12, 13:01
I'm inclined to believe that the results of those papers are substantially correct.
The thing that all three of them have in common, and the negation of which is almost always the case with failed semantic learning projects, is that they classify queries to a specific ontology for which extensive training and evaluation data is available.
With this data available it is straightforward to build classifiers and see how good results you can get. Researchers, therefore, are a lot like the drunk who keeps looking for his keys under the streetlight because that is where the light is.
If your questions are sampled from the same prior distribution as the TREC questions (or you can pretend so) and you like that classification, it really makes sense to choose something that works from the above papers that you feel comfortable with in terms of what you can do practically. You can pick up the same data they use and go.
If you want to classify some other space of questions with some other categories, the thing that you need to replicate from those papers is the methodology of creating test and evaluation data. That's a lot more fundamental than whatever machine learning algorithm you choose, or if you choose to develop heuristic rules by hand.