BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
WORK INTEGRATED LEARNING PROGRAMMES
Digital Learning
Part A: Course Design
Course Title
|
Information
Retrieval
|
Course No(s)
|
SS
ZG537
|
Credit Units
|
4
|
Credit Model
|
|
Content Authors
|
Dr.V.Maheswari
|
Course Objectives
No
|
Course Objectives
|
CO1
|
To
understand structure and organization of various components of an IR system
|
CO2
|
To
understand information representation models, term scoring mechanisms, etc.
in the complete search system
|
CO3
|
To
understand architecture of search engines, crawlers and the web search
|
CO4
|
To
understand cross lingual retrieval and multimedia information retrieval
|
CO5
|
To
understand the concepts of Recommender Systems.
|
Text
Book(s)
T1
|
C.
D. Manning, P. Raghavan and H. Schutze. Introduction to Information
Retrieval,
|
Reference
Book(s) & other resources
R1
|
Modern Information
Retrieval, Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Addison-Wesley,
2000. http://people.ischool.berkeley.edu/~hearst/irbook/http://people.ischool.berkeley.edu/~hearst/irbook/
|
R2
|
Ricci,
F.; Rokach, L.; Shapira, B.; Kantor, P.B. (Eds.), Recommender Systems Handbook.
1st Edition., 2011, 845 p. 20 illus., Hardcover, ISBN: 978-0-387-85819-7
|
R3
|
Cross-Language
Information Retrieval by By Jian-Yun Nie Morgan & Claypool Publisher
series 2010
|
R4
|
Multimedia Information Retrieval by Stefan
M. Rüger Morgan & Claypool Publisher series 2010.
|
R5
|
Information
Retrieval: Implementing and Evaluating Search Engines by S. Buttcher, C.
Clarke and G. Cormack, MIT Press, 2010.
|
R6
|
Web
Data Mining: Exploring Hyperlinks, Contents, and Usage Data by B. Liu,
Springer, Second Edition, 2011.
|
R7
|
Search
Engines: Information Retrieval in Practice by Bruce Croft, Donald Metzler,
and Trevor Strohman, Addison-Wesley, 2009.
|
R8
|
Koehn
P., “Statistical Machine Translation”,
|
Learning Outcomes:
No
|
Learning
Outcomes
|
LO1
|
Students
will gain understanding about an information retrieval system as a whole and
about its components.
|
LO2
|
Students
will have knowledge about the design issues and their solutions of different
type of models including Boolean, vector space etc.
|
LO3
|
Students
will have detailed understanding about text indexing, mining, weighting
schemes etc.
|
LO4
|
Students
will acquire knowledge about cross lingual and multimedia information
retrieval.
|
LO5
|
Differentiate
between different recommender systems and suggest a suitable system based on
the problem and data available.
|
LO6
|
With the
acquired knowledge students will be able to design and build different kind
of information retrieval systems.
|
Content Structure
Modular
Content Structure
No
|
Title of the Module
|
M1
|
Introduction, Boolean Retrieval
|
M2
|
Boolean
Retrieval, Term vocabulary and postings list
|
M3
|
Postings
list, Phrase queries, Dictionaries
|
M4
|
Tolerant
Retrieval, Index Construction
|
M5
|
Index
Construction, Vector Space Model
|
M6
|
Vector
Space Model, Text Classification
|
M7
|
Vector
space classification, Evaluation of classification
|
M8
|
Text
Clustering
|
M9
|
Web Search, Web Crawler, Link
Analysis
|
M10
|
Multimedia
IR, Cross-Lingual IR
|
Glossary of Terms:
1.
Contact Hour (CH) stands for a hour long live session with
students conducted either in a physical classroom or enabled through
technology. In this model of instruction, instructor led sessions will be for
20 CH.
a.
Pre CH = Self Learning done prior to a given contact hour
b.
During CH = Content to be discussed during the contact hour by
the course instructor
c.
Post CH = Self Learning done post the contact hour
2.
RL stands for Recorded Lecture or Recorded Lesson. It is
presented to the student through an online portal. A given RL unfolds as a
sequences of video segments interleaved with exercises
3.
SS stands for Self-Study
to be done as a study of relevant sections from textbooks and reference books.
It could also include study of external resources.
4.
LE stands for Lab Exercises
5.
HW stands for Home Work will consists could be a selection of
problems from the text.
Part B: Contact Session
Plan
Academic Term
|
Second Semester 2017-2018
|
Course Title
|
Information
Retrieval
|
Course No
|
SS
ZG537
|
Content Developer
|
Dr.
Maheswari .V
|
M1:
Introduction, Boolean Retrieval
Type
|
Description
|
RL 1.1
|
RL 1.1.1 - Information vs. Data Retrieval
RL 1.1.2 – IR task
|
RL 1.2
|
RL 1.2.1 -
Basic Concepts
RL 1.2.2
- Logical view of the documents
RL 1.2.3 – The Retrieval process
|
RL 1.3
|
RL 1.3.1 -- Classical IR models
RL 1.3.2
- Boolean model
RL 1.3.3
- Vector space model
RL 1.3.4 – Probabilistic model
|
RL 1.4
|
RL 1.4.1 – Example of an IR problem
RL 1.4.2 – Term document Matrix
RL 1.4.3
-Boolean Queries, Inverted Index
|
CS 1.1
|
CS 1.1.1
- Difference between
Information and Data Retrieval, Information need, IR task
CS 1.1.2 – Relevance , Motivation of IR
CS 1.1.3 – Examples of different IR models
|
CS 1.2
|
CS 1.2.1 – Example of Term document Matrix,
Building Inverted Index
CS 1.2.2 – Processing Boolean queries –
related examples
|
LE1.1
|
|
SS1.1
|
R1- Ch1, Ch2, T1- Ch2
|
HW1.1
|
Exercises at the end of T1 - Ch2
|
QZ1.1
|
|
M2: Boolean
Retrieval, Term vocabulary and postings list
Type
|
Description
|
RL 2.1
|
RL 2.1.1 – Query Optimization
RL 2.1.2 -
Term Vocabulary-Preprocessing
|
RL 2.2
|
RL 2.2.1 – Vocabulary of terms- Tokenization, Stop
Words
RL 2.2.2 – Vocabulary of terms- Normalization,
Stemming and Lemmatization
|
CS 2.1
|
CS 2.1.1 – Intersection of postings list and query
optimization
CS 2.1.2 – Preprocessing- document delineation
|
CS 2.2
|
CS 2.2.1 – Tokenization
CS 2.2.2 – Stop words, Normalization
CS 2.2.3 – Stemming and Lemmatization
|
LE 2.1
|
|
SS 2.1
|
T1 - Ch2
|
HW 2.1,
|
Exercises at the end of T1- Ch2
|
QZ 2.1
|
|
M3: Postings list, Phrase queries, Dictionaries
Type
|
Description
|
RL 3.1.
|
RL 3.1.1 – Faster postings list
RL 3.1.2 - Postings lists intersection with skip
pointers
|
RL 3.2
|
RL 3.2.1 – Positional postings and phrase queries
RL 3.2.2 – Handling phrase queries
|
RL 3.3
|
RL 3.3.1 – Dictionary data structure
RL 3.3.2 – Tolerant retrieval – wildcard queries
|
CS 3.1
|
CS 3.1.1 – Faster postings list via skip pointers
CS 3.1.2 – Skip pointers example, placing skip
pointers
|
CS 3.2
|
CS 3.2.1 – Phrase queries – examples
CS 3.2.2 – Biword indexes – example, extended
biwords, issues
CS 3.2.3 – Positional indexes
|
CS 3.3
|
CS 3.3.1 -
Dictionary data structure – Hash table, Tree
CS 3.3.2 – Tolerant Retrieval- Wildcard Queries
CS 3.3.3 – Wildcard query processing – permuterm
indexes
CS 3.3.4 - Wildcard query processing - k-gram indexes
|
LE3.1
|
|
SS3.1
|
T1 - Ch2, Ch3
|
HW3.1
|
Exercises at the end of T1- Ch2, Ch3
|
QZ3.1
|
|
M4: Tolerant
Retrieval
Type
|
Description
|
RL 4.1
|
RL 4.1.1 – Spelling correction- isolated word
correction
RL 4.1.2 - Edit distance (Levenshtein
distance – computation)
|
RL 4.2
|
RL 4.2.1 – Using edit distance, k-gram index
RL 4.2.2 – Context sensitive spell correction
RL 4.3.1 – Phonetic correction, Soundex
|
CS 4.1
|
CS 4.1.1 – Isolated word correction – standard
lexicon, lexicon of indexed corpus
CS 4.1.2 – Edit distance- example
CS 4.1.3 – Computation of Levenshtein distance
|
CS 4.2
|
CS 4.2.1- Context sensitive spell correction
CS 4.2.2 - Phonetic correction, Soundex algorithm,
computation of Soundex codes
|
LE4.1
|
|
SS4.1
|
T1- Ch3
|
HW4.1
|
Exercises at the end of T1- Ch3
|
QZ4.1
|
Compute Edit distance between “side” and “slide”
|
M5: Index
Construction, Vector Space Model
Type
|
Description
|
RL 5.1
|
RL 5.1.1 - Blocked sort-based Indexing
RL 5.1.2 - Single‐Pass In‐Memory Indexing
|
RL 5.2
|
RL 5.2.1 – Distributed Indexing
RL 5.2.1 – Dynamic Indexing
|
RL 5.3
|
RL 5.3.1 - Term frequency and weighting
RL 5.3.2 - The vector space model for
scoring
|
CS 5.1
|
CS 5.1.1 - BSBI, BSBI index construction algorithm
CS 5.1.2 – SPIMI algorithm, Merits and Demerits of
both algorithms
|
CS 5.2
|
CS 5.2.1 -
Distributed Indexing, Examples of search engines using distributed
indexing
CS 5.2.2 – Mapreduce architecture for distributed
indexing, Parsers, Inverters, Schema for index construction in Mapreduce
CS 5.2.3 – Dynamic Indexing, Issues with main and
auxiliary indexing, Logarithmic merge, Dynamic Indexing used in search
engines
|
CS 5.3
|
CS 5.3.1 – Boolean retrieval issues, Ranked
retrieval models, query – document matching scores
CS 5.3.2 – Bag of words model., Term frequency-tf,
document frequency-idf, effect of idf on ranking, tf-idf weighting
|
LE5.1
|
|
SS5.1
|
T1 - Ch4,Ch6
|
HW5.1
|
Exercises at the end of T1 - Ch4, Ch6
|
QZ5.1
|
|
M6: Vector Space
Model, Text Classification
Type
|
Description
|
RL 6.1
|
RL 6.1.1 – Documents and Queries as vectors,
Similarity measure using angle, cosine
RL 6.1.2 – Cosine similarity for length normalized
vectors
RL 6.1.3 -
Computing Cosine scores, tf-idf weighting variants
|
RL 6.2
|
RL 6.2.1 – Text classification problem
RL 6.2.2 – Classification methods
|
RL 6.3
|
RL 6.3.1 – Bayesian method of text classification
RL 6.3.2 – Bernoulli Model
RL 6.3.3 – Time complexity of NB
|
CS 6.1
|
CS 6.1.1 – Difference between binary, count and
weight Matrix, representation of
documents and queries as vectors, Similarity measure using angle, its
disadvantage.
CS 6.1.2 –
cosine similarity measurement between query and document, advantage of using
length normalized vectors, , example of cosine similarity computation
CS 6.1.3 – Algorithm for computing cosine scores, Weighting in queries vs. documents- Examples.
|
CS 6.2
|
CS 6.2.1 – Introduction to Text classification
problem, examples of search engines using text classification, Supervised
learning
CS 6.2.2 – Classification methods – manual,
rule-based, statistical
|
CS 6.3
|
CS 6.3.1 – Relevance feedback, Bayes rule, Bayes rule for text
classification
CS 6.3.2 – NB classifier model, Conditional Independence assumption,
Parameter estimation, Positional Independence assumption
CS 6.3.3 – The
problem with maximum likelihood estimates- Zeros, Add-one smoothing, Naive
Bayes -Training, Testing, Examples
CS 6.3.4 – Time complexity
|
LE6.1
|
|
SS6.1
|
T1 – Ch6,Ch13
|
HW6.1
|
Exercises at the end of T1 – Ch6, Ch13
|
QZ6.1
|
|
M7: Vector space
classification, Evaluation of classification
Type
|
Description
|
RL 7.1.
|
RL 7.1.1 – Vector space representation of classes
RL 7.1.2 – Rocchio classification
|
RL 7.2
|
RL 7.2.1 – k Nearest Neighbour Classification
RL 7.2.2 - Evaluation of classification
|
CS 7.1
|
CS 7.1.1 - Vector
Space Representation, Contiguity hypothesis, Classes in the Vector Space, Vector Space Classification Methods
CS 7.1.2 - Illustration of Rocchio Text Categorization,
Definition of centroid, Rocchio
algorithm, Example
|
CS 7.2
|
CS 7.2.1 – Explanation about k Nearest Neighbour
Classification, kNN example
CS 7.2.2 – kNN algorithm, impact of parameter k,
solved example.
CS 7.2.3 - Precision
P and recall R,F1 measure, Averaging: Micro vs. Macro
|
LE1.1
|
|
SS1.1
|
T1- Ch14
|
HW1.1
|
Exercises at the end of T1-Ch14
|
QZ1.1
|
|
M8: Text
Clustering
Type
|
Description
|
RL 8.1
|
RL 8.1.1 – Clustering concepts
RL 8.1.2 – Types of clustering Flat vs. Hierarchical clustering
|
RL 8.2
|
RL 8.2.1 – Flat algorithms – K means
|
RL 8.3
|
RL 8.3.1 - Hierarchical
clustering (HAC)
RL 8.3.2 – Cluster
similarity
RL 8.3.3 – Variants of HAC
|
CS 8.1
|
CS 8.1.1 – Classification vs. Clustering, Cluster hypothesis, Applications of
Clustering in IR, examples of search engines
CS 8.1.2 - Flat vs. Hierarchical
clustering, Hard vs. Soft clustering
CS 8.1.3 – Basic idea of K-means, RSS, K-means
algorithm, Example
CS 8.1.4 – seed choice, time complexity and optimality of K-means
|
CS 8.2
|
CS 8.2.1 – HAC algorithm, Dendrogram
CS 8.2.2 – Single Link and Complete link HAC-
examples
CS 8.2.3 – Centroid and Group Average – examples
|
LE1.1
|
|
SS1.1
|
T1-Ch16,Ch17
|
HW1.1
|
Exercises at the end of T1-Ch16,Ch17
|
QZ1.1
|
|
M9: Web Search,
Web Crawler, Link Analysis
Type
|
Description
|
RL 9.1
|
RL 9.1.1 - Web Search Basics
RL 9.1.2 – Web documents
RL 9.1.3 – Duplicate detection
|
RL 9.2
|
RL 9.1.1 – Web crawler operation
RL 9.1.2 – Crawling process
RL 9.1.3 – Crawler Architecture
|
RL 9.3
|
RL 9.3.1 – Link analysis
RL 9.3.2 – Page Rank
RL 9.3.3 – Hubs and authorities
|
CS 9.1
|
CS 9.1.1 – Ads and Web search, Web IR: Differences from traditional IR
User
Needs in Web search, Bowtie structure of the
web
CS 9.1.2 - Web
documents, Dynamic content, Multilinguality, spam techniques
CS 9.1.3 – Near Duplicate
detection example, computation of jaccard for sketches
CS 9.1.4 – Size of the web
issues
|
CS 9.2
|
CS 9.2.1 – Web
Crawler Architecture- Mercator Scheme
CS 9.2.2 – Distributed Indexes- Web
CS 9.2.3 – Page rank algorithm – illustration with
an example
CS 9.2.4 -
Hubs and authorities– illustration with an example
|
LE1.1
|
|
SS1.1
|
T1 – Ch19, Ch20, Ch21
|
HW1.1
|
Exercises at the end of T1 –
Ch19, Ch20, Ch21
|
QZ1.1
|
|
M10: Multimedia
IR, Cross-Lingual IR
Type
|
Description
|
RL 10.1
|
RL 10.1.1 – Multimedia IR- Introduction, Feature
Extraction
RL 10.1.2 – Search Techniques
|
RL 10.2
|
RL 10.2.1 – Cross-Lingual IR
RL 10.2.2 – Query Translation methods
|
CS 10.1
|
CS 10.1.1 – Examples of spatial access methods in
Multimedia IR
CS 10.1.2 – Examples of image, video and audio feature extraction
|
CS 10.2
|
CS 10.2.1 – Examples of query translation methods
CS 10.2.2 – Examples of parallel corpora and comparable corpora
|
CS 10.3
|
CS 10.3.1 – Recommender systems
|
LE10.1
|
|
SS10.1
|
R3,R4
|
HW10.1
|
|
QZ10.1
|
|
Contact hour
|
Pre-contact
hour prep
|
During
Contact hour
|
Post-contact
hour
|
1
|
None
|
CS 1.1
|
|
2
|
RL 1.1, RL 1.2, RL 1.3
|
CS1.2
|
HW 1.1
|
3
|
RL2.1
|
CS2.1
|
HW 2.1
|
4
|
RL2.2
|
CS2.2
|
HW 2.1
|
5
|
RL3.1,3.2
|
CS3.1,CS3.2
|
HW 3.1
|
6
|
RL3.3
|
CS3.3
|
HW3.1
|
7
|
RL4.1, RL4.2
|
CS4.1, CS4.2
|
HW4.1,QZ4.1
|
8
|
RL4.4
|
CS4.2
|
LE4.1, SS4.1, HW4.1
|
9
|
RL5.1, RL5.2,RL5.3
|
CS5.1, CS5.2,CS5.3
|
HW 5.1
|
10
|
RL 6.1
|
CS6.1
|
HW6.1
|
11
|
|
Review
|
|
12
|
RL6.2, RL6.3
|
CS6.2,CS6.3
|
HW 6.1
|
13
|
RL7.1,RL7.2
|
CS7.1
|
HW 7.1
|
14
|
RL8.1, RL8.2
|
CS8.1
|
HW 8.1
|
15
|
RL8.2,RL8.3
|
CS8.2
|
HW8.1
|
16
|
RL9.1,9.2
|
CS9.1,CS9.2
|
HW9.1
|
17
|
RL 9.3
|
CS9.2
|
HW9.1
|
18
|
RL10.1
|
CS10.1
|
|
19
|
RL10.2
|
CS10.2
|
|
20
|
-
|
CS10.3
|
|
21
|
|
Review
|
|
22
|
|
Review
|
|
Evaluation
Scheme:
Legend: EC = Evaluation Component; AN
= After Noon Session; FN = Fore Noon Session
No
|
Name
|
Type
|
Duration
|
Weight
|
Day, Date, Session, Time
|
EC-1
|
Quiz-I
|
Online
|
-
|
5%
|
February 1 to 10,
2018
|
|
Quiz-II
|
Online
|
|
5%
|
March 1 to 10, 2018
|
|
Quiz-III
|
Online
|
|
5%
|
March 20 to 30,
2018
|
EC-2
|
Mid-Semester Test
|
Closed Book
|
2 hours
|
35%
|
03/03/2018 (FN) 10 AM – 12 Noon
|
EC-3
|
Comprehensive Exam
|
Open Book
|
3 hours
|
50%
|
21/04/2018 (FN) 9 AM – 12 Noon
|
Syllabus for Mid-Semester Test (Closed
Book): Topics in Session Nos. 1 to 11.
Syllabus for Comprehensive Exam (Open
Book): All topics (Session Nos. 1 to 22)
Important
links and information:
Elearn
portal:
https://elearn.bits-pilani.ac.in
Students are expected to visit the
Elearn portal on a regular basis and stay up to date with the latest
announcements and deadlines.
Contact sessions: Students
should attend the online lectures as per the schedule provided on the Elearn
portal.
Evaluation
Guidelines:
1.
EC-1
consists of either two Assignments or three Quizzes. Students will attempt them
through the course pages on the Elearn portal. Announcements will be made on
the portal, in a timely manner.
2.
For
Closed Book tests: No books or reference material of any kind will be
permitted.
3.
For
Open Book exams: Use of books and any printed / written reference material
(filed or bound) is permitted. However, loose sheets of paper will not be
allowed. Use of calculators is permitted in all exams. Laptops/Mobiles of any
kind are not allowed. Exchange of any material is not allowed.
4.
If a student is unable to appear for the Regular Test/Exam
due to genuine exigencies, the student should follow the procedure to apply for
the Make-Up Test/Exam which will be made available on the Elearn portal. The
Make-Up Test/Exam will be conducted only at selected exam centres on the dates
to be announced later.
It shall be the responsibility of the
individual student to be regular in maintaining the self study schedule as
given in the course handout, attend the online lectures, and take all the
prescribed evaluation components such as Assignment/Quiz, Mid-Semester Test and
Comprehensive Exam according to the evaluation scheme provided in the handout.
No comments:
Post a Comment