The aim of this course is to provide students with an advanced level of study in classical and web information retrieval, including web search and the related areas of text classification and text clustering. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents and of methods for evaluating systems, along with an introduction to the use of machine learning methods on text collections. Designed as the primary course for a graduate or advanced undergraduate course in information retrieval, the course assumes students have attended introductory courses in data structures and algorithms, linear algebra, and probability theory.
The potential topics that will be covered in this course include:
- Boolean retrieval
- The term vocabulary and postings lists
- Dictionaries and tolerant retrieval
- Index construction
- Index compression
- Scoring, term weighting, and the vector space model
- Computing score in a complete search system
- Evaluation in information retrieval
- Relevance feedback and query expansion
- XML Retrieval
- Probabilistic information retrieval
- Language models for information retrieval
- Text classification
- Vector space classification
- Support vector machines on documents
- Flat clustering
- Hierarchical clustering
- Web search basics
- Web crawling and indexes
- Link analysis