This book presents statistical models that have recently been developed within several research communities to access information contained in text collections. The problems considered are linked to applications aiming at facilitating information access:
In order to give the reader as complete a description as possible, the focus is placed on the probability models used in the applications concerned, by highlighting the relationship between models and applications and by illustrating the behavior of each model on real collections.
Textual Information Access is organized around four themes: informational retrieval and ranking models, classification and clustering (regression logistics, kernel methods, Markov fields, etc.), multilingualism and machine translation, and emerging applications such as information exploration.
Part 1: Information Retrieval
1. Probabilistic Models for Information Retrieval
2. Learnable Ranking Models for Automatic Text Summarization and Information Retrieval
Part 2: Classification and Clustering
3. Logistic Regression and Text Classification
4. Kernel Methods for Textual Information Access
5. Topic-Based Generative Models for Text
Information Access
6. Conditional Random Fields for Information Extraction
Part 3: Multilingualism
7. Statistical Methods for Machine Translation
Part 4: Emerging Applications
8. Information Mining: Methods and Interfaces for Accessing Complex Information
9. Opinion Detection as a Topic Classification Problem