First-order Frequent Patterns in Text Mining

Warning

This publication doesn't include Faculty of Education. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

BLAŤÁK Jan

Year of publication 2005
Type Article in Proceedings
Conference EPIA'05, 12th Portuguese Conference on Artificial Intelligence
MU Faculty or unit

Faculty of Informatics

Citation
Field Informatics
Keywords machine learning; first-order frequent patterns; text mining; distributed mining
Description In this paper a universal framework for mining long first-order frequent patterns in text data is presented. It consists of RAP, an ILP system for mining maximal first-order frequent patterns, and two types of redefined background knowledge. Two methods of using generated patterns for solving text mining tasks are described: propositionalization and CBA (class based association). A new variant of the CBA rule based classifier is proposed. The framework is used for solving three text mining tasks: information extraction from biomedical texts, context-sensitive text correction of English and morphological disambiguation of Czech. The distributed mining of frequent patterns is described and its influence on mining in text is discussed. It is shown that frequent patterns as new features for propositionalization usually provide better results than CBA.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.