Giovanni Semeraro, Marco Degemmis, Pasquale Lops, Pierpaolo Basile
Understanding user interests from text documents can provide support to personalized information recommendation services. Typically, these services automatically infer the user profile, a structured model of the user interests, from documents that were already deemed relevant by the user. Traditional keyword–based approaches are unable to capture the semantics of the user interests. This work proposes the integration of linguistic knowledge in the process of learning semantic user profiles that capture concepts concerning user interests. The proposed strategy consists of two steps. The first one is based on a word sense disambiguation technique that exploits the lexical database WordNet to select, among all the possible meanings (senses) of a polysemous word, the correct one. In the second step, a naïve Bayes approach learns semantic sense–based user profiles as binary text classifiers (user–likes and user–dislikes) from disambiguated documents. Experiments have been conducted to compare the performance obtained by keyword–based profiles to that obtained by sense–based profiles. Both the classification accuracy and the effectiveness of the ranking imposed by the two different kinds of profile on the documents to be recommended have been considered. The main outcome is that the classification accuracy is increased with no improvement on the ranking. The conclusion is that the integration of linguistic knowledge in the learning process improves the classification of those documents whose classification score is close to the likes/dislikes threshold (the items for which the classification is highly uncertain).
Subjects: 13. Natural Language Processing; 12. Machine Learning and Discovery
Submitted: Oct 9, 2006