Selecting Features for Paraphrasing Question Sentences

Author: Noriko Tomuro and Steven L. Lytinen
Journal-ref: In Proceedings of the workshop on Automatic Paraphrasing at the Natural Language Processing Pacific Rim Symposium (NLPRS 2001), Tokyo, Japan.

Abstract

In this paper, we investigate several schemes for selecting features which are useful for automatically classifying questions by their question type. We represent questions as a set of features, and compare the performance of the C5.0 machine learning algorithm using the different representations. Experimental results show a high accuracy rate in categorizing question types using a scheme based on NLP techniques as compared to a scheme based on IR techniques. The ultimate goal of this research is to use question type classification in order to help identify whether or not two questions are paraphrases of each other. We hypothesize that the identification of features which help identify question type will be useful in the generation of question paraphrases as well.

Paper: Full paper (pdf 294k)