BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
WORK INTEGRATED LEARNING PROGRAMMES
Digital
Part A: Content Design
Course Title
|
Advanced
Data Mining
|
Course No(s)
|
SS
ZG548
|
Credit Units
|
4
|
Credit Model
|
|
Content Authors
|
Kamlesh
Tiwari
|
Course Objectives
No
|
|
CO1
|
To
learn how to mine complex data (beyond conventional record data) and complex
structures such as Tree/graph, sequence data, web/text data, stream data,
mining multivariate time series data, high-dimensional data etc.
|
CO2
|
To
learn how to apply these techniques to specific applications such as web
search, Information Retrieval, social networks etc.
|
CO3
|
To
learn about distributed computing solutions for data intensive applications
in data mining
|
Text Book(s)
T1
|
|
T2
|
Reference Book(s) & other resources
R1
|
Tan P. N.,
Steinbach M & Kumar V. “Introduction
to Data Mining” Pearson Education, 2006
|
R2
|
Yates R. B. and Neto B. R. “Modern
Information Retrieval” Pearson Education, 2005
|
R3
|
Han J. & Kamber M., “Data
Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, Second
Edition, 2006
|
R4
|
Christopher D.M., Prabhakar R. &
Hinrich S. “Introduction to Information Retrieval” Cambridge UP Online
edition, 2009
|
R5
|
Hadzic F., Tan H. & Dillon T. S. “Mining
data with Complex Structures” Springer, 20
|
R6
|
Agarwal Charu C. (Ed) “Data Streams
Models and Algorithms” Springer 2007
|
Content Structure
1.
Introduction
1.1.
Review of data mining
1.2.
Objectives
1.3.
Overview
2.
Incremental & Stream
Data Mining
2.1.
Incremental Algorithms for Data Mining
2.2.
Characteristics of
Streaming Data
2.3.
Issues and Challenges
2.4.
Streaming Data Mining
Algorithms
3.
Distributed computing solutions for data
mining
3.1.
MapReduce/Hadoop
3.2.
Spark
4.
Sequence Mining
4.1.
Characteristics of Sequence Data
4.2.
Problem Modeling
4.3.
Sequence Pattern Discovery
4.4.
Timing Constraints
5.
Text Mining
5.1.
Text Classification
5.2.
Vector Space Model
5.3.
Flat and Hierarchical
Clustering
5.4.
Streaming Data Mining
Algorithms
6.
Web Search
6.1.
Crawling & Indexing
6.2.
Hyperlink analysis
6.2.1.
HITS and Page Rank
Algorithms
7.
Mining Complex Structures
7.1.
Mining Trees
7.1.1.
Tree Miner
7.1.2.
Tree Model Guided Framework
7.1.3.
TMG framework for mining
ordered & unordered subtrees
7.2.
Mining Graphs
7.2.1.
Approaches to graph mining
7.3.
Case Study: Information
Retrieval
7.4.
Case Study: Mining Social
Networks
Learning Outcomes:
No
|
Learning Outcomes
|
LO1
|
To understand how to update the
patterns incrementally when the data is continuously coming
|
LO2
|
To understand the role of distributed
computing in data intensive data mining
|
LO3
|
To study how to investigate the
sequence data
|
LO4
|
To understand how text mining is
different from data mining and how to mine it
|
LO5
|
To understand what goes into the web
search and to study methods of web search and their improvements
|
LO6
|
To understand how to mine complex
structures other than records while retaining the relations among the entities
|
Part B: Learning Plan
Academic Term
|
First Semester 2017-2018
|
Course Title
|
Advanced
Data Mining
|
Course No
|
SS
ZG548
|
Lead Instructor
|
Kamlesh
Tiwari
|
Contact Hour 1
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource
Reference
|
Pre CH
|
Introduction
Review and Overview
|
||
During CH
|
|||
Post CH
|
Contact Hour 2
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource
Reference
|
Pre CH
|
Incremental Data Mining
Relook traditional algorithms
|
See Class Slides
|
|
During CH
|
|||
Post CH
|
Contact Hour 3
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource
Reference
|
Pre CH
|
Incremental algorithms and their
design and analysis
|
See Class Slides
|
|
During CH
|
|||
Post CH
|
Contact
Hour 4
Type
|
Content
Ref.
|
Topic
Title
|
Study/HW
Resource Reference
|
Pre
CH
|
Incremental
algorithms and their design and analysis
|
See
Class Slides
|
|
During
CH
|
|||
Post
CH
|
Contact
Hour 5
Type
|
Content
Ref.
|
Topic
Title
|
Study/HW
Resource Reference
|
Pre
CH
|
Incremental
algorithms and their design and analysis
|
See
Class Slides
|
|
During
CH
|
|||
Post
CH
|
Contact Hour 6
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource
Reference
|
Pre CH
|
See Class Slides
|
Stream Data Mining Characteristics,
Issues and Challenges
|
R6 Ch1,4
|
During CH
|
|||
Post CH
|
Contact Hour 7
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource
Reference
|
Pre CH
|
See Class Slides
|
Stream Data Mining Algorithms and
their Comparison
|
R6 Ch1, 4
|
During CH
|
|||
Post CH
|
Contact Hour 8
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource
Reference
|
Pre CH
|
See Class Slides
|
Stream Data Mining Algorithms and
their Comparison
|
R6 Ch1, 4
|
During CH
|
|||
Post CH
|
Contact Hour 9
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource
Reference
|
Pre CH
|
See Class Slides
|
Distributed computing solutions for
data mining
|
See Class Slides
|
During CH
|
|||
Post CH
|
Contact Hour 10
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource
Reference
|
Pre CH
|
See Class Slides
|
Distributed computing solutions for
data mining
|
See Class Slides
|
During CH
|
|||
Post CH
|
Contact Hour 11
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource
Reference
|
Pre CH
|
See Class Slides
|
Distributed computing solutions for
data mining
|
See Class Slides
|
During CH
|
|||
Post CH
|
Contact Hour 12
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource
Reference
|
Pre CH
|
See Class Slides
|
Distributed computing solutions for
data mining
|
See Class Slides
|
During CH
|
|||
Post CH
|
Contact Hour 13
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource
Reference
|
Pre CH
|
R1 7.4
|
Sequence Mining
Characteristics and Problem Modeling
|
R1 7.4
|
During CH
|
|||
Post CH
|
Contact Hour 14
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource
Reference
|
Pre CH
|
R1
7.4
|
Sequence Pattern Discovery
Timing Constraints
|
R1 7.4
|
During CH
|
|||
Post CH
|
Contact Hour 15
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource Reference
|
Pre CH
|
R4 Ch 1, 13
|
Text Mining
Data Representation and Characteristics
|
R4 Ch 1, 13, R2 Ch 7
|
During CH
|
|||
Post CH
|
Contact Hour 16
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource Reference
|
Pre CH
|
R4 Ch 14
|
Text
Classification
Feature
Selection & Models
|
R4 Ch 14, R2 Ch 7
|
During CH
|
|||
Post CH
|
Contact Hour 17
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource Reference
|
Pre CH
|
R4 Ch 14
|
Text Classification
Vector Space Model
|
R4 Ch 14, R2 Ch 7
|
During CH
|
|||
Post CH
|
Contact Hour 18
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource Reference
|
Pre CH
|
R4 Ch 13, 14
|
Text
Classification
Multiclass
classifiers for text
|
R4 Ch 13,14
|
During CH
|
|||
Post CH
|
Contact Hour 19
Type
|
Content
Ref.
|
Topic
Title
|
Study/HW
Resource Reference
|
Pre
CH
|
R4
Ch 16, 17
|
Text
Clustering
Flat
and hierarchical
|
R4
Ch 16,17
|
During
CH
|
|||
Post
CH
|
Contact Hour 20
Type
|
Content
Ref.
|
Topic
Title
|
Study/HW
Resource Reference
|
Pre
CH
|
R4
Ch 1, 6, 19
|
Web
Search
|
R4
Ch 1, 6, 19
|
During
CH
|
|||
Post
CH
|
Contact Hour 21
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource Reference
|
Pre CH
|
R4 Ch 20
|
Crawling & Indexing
|
R4 Ch 20
|
During CH
|
|||
Post CH
|
Contact Hour 22
Type
|
Content
Ref.
|
Topic
Title
|
Study/HW
Resource Reference
|
Pre
CH
|
R4
Ch 20
|
Crawling
& Indexing
|
R4
Ch 20
|
During
CH
|
|||
Post
CH
|
Contact Hour 23
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource Reference
|
Pre CH
|
R4 Ch 20
|
Crawling & Indexing
|
R4 Ch 20
|
During CH
|
|||
Post CH
|
Contact Hour 24
Type
|
Content
Ref.
|
Topic
Title
|
Study/HW
Resource Reference
|
Pre
CH
|
R4
Ch 21
See
Class slides
|
Link
Analysis
|
R4
Ch 21
|
During
CH
|
|||
Post
CH
|
Contact Hour 25
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource Reference
|
Pre CH
|
R5 Ch1
See Class slides
|
Mining Complex Structures
Data Representation
|
R5 Ch1
|
During CH
|
|||
Post CH
|
Contact Hour 26
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource Reference
|
Pre CH
|
R5 Ch 2, 3
See Class slides
|
Tree Mining problem and Tree basics
|
R5 Ch 2, 3
|
During CH
|
|||
Post CH
|
Contact Hour 27
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource Reference
|
Pre CH
|
R5 Ch 3
See Class slides
|
Tree Miner
|
R5 Ch 3
|
During CH
|
|||
Post CH
|
Contact Hour 28
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource Reference
|
Pre CH
|
R5 Ch 4, 5, 6
|
TMG Model Guided Framework
|
R5 Ch 4, 5, 6
|
During CH
|
|||
Post CH
|
Contact Hour 29
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource Reference
|
Pre CH
|
R5 Ch 11
See Class slides
|
Graph Mining
Introduction and applications
|
R5 Ch 11
|
During CH
|
|||
Post CH
|
Contact Hour 30
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource Reference
|
Pre CH
|
See Class slides
|
Case Study: Information Retrieval
|
|
During CH
|
|||
Post CH
|
Contact Hour 31
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource Reference
|
Pre CH
|
See Class slides
|
Case Study: Social Network Mining
|
|
During CH
|
|||
Post CH
|
Contact Hour 32
Type
|
Content Ref.
|
Topic Title
|
Study/HW Resource Reference
|
Pre CH
|
See Class slides
|
Case Study: Social Network Mining
|
|
During CH
|
|||
Post CH
|
Evaluation Scheme:
Legend: EC = Evaluation Component; AN =
After Noon Session; FN = Fore Noon Session
No
|
Name
|
Type
|
Duration
|
Weight
|
Day, Date, Session, Time
|
EC-1
|
Quiz-I/ Assignment-I
|
Online
|
-
|
5%
|
August 26 to
September 4, 2017
|
Quiz-II
|
5%
|
September 26 to
October 4, 2017
|
|||
Quiz-III/ Assignment-II
|
5%
|
October 20 to 30,
2017
|
|||
EC-2
|
Mid-Semester Test
|
Closed Book
|
2 hours
|
35%
|
24/09/2017 (FN) 10 AM – 12 Noon
|
EC-3
|
Comprehensive Exam
|
Open Book
|
3 hours
|
50%
|
05/11/2017 (FN) 9 AM – 12 Noon
|
Syllabus for Mid-Semester Test (Closed
Book): Topics in Session Nos. 1 to 16
Syllabus for Comprehensive Exam (Open
Book): All topics (Session Nos. 1 to 32)
Important
links and information:
Elearn
portal:
https://elearn.bits-pilani.ac.in
Students are expected to visit the
Elearn portal on a regular basis and stay up to date with the latest announcements
and deadlines.
Contact sessions: Students
should attend the online lectures as per the schedule provided on the Elearn
portal.
Evaluation
Guidelines:
1.
EC-1
consists of either two Assignments or three Quizzes. Students will attempt them
through the course pages on the Elearn portal. Announcements will be made on
the portal, in a timely manner.
2.
For
Closed Book tests: No books or reference material of any kind will be
permitted.
3.
For
Open Book exams: Use of books and any printed / written reference material
(filed or bound) is permitted. However, loose sheets of paper will not be
allowed. Use of calculators is permitted in all exams. Laptops/Mobiles of any
kind are not allowed. Exchange of any material is not allowed.
4.
If a student is unable to appear for the Regular Test/Exam
due to genuine exigencies, the student should follow the procedure to apply for
the Make-Up Test/Exam which will be made available on the Elearn portal. The
Make-Up Test/Exam will be conducted only at selected exam centres on the dates
to be announced later.
It shall be the responsibility of the
individual student to be regular in maintaining the self study schedule as
given in the course handout, attend the online lectures, and take all the
prescribed evaluation components such as Assignment/Quiz, Mid-Semester Test and
Comprehensive Exam according to the evaluation scheme provided in the handout.
Cool stuff you have and you keep overhaul every one of usdata science bootcamp malaysia
ReplyDeleteThis is a great motivational article. In fact, I am happy with your good work. They publish very supportive data, really. Continue. Continue blogging. Hope you explore your next post
ReplyDeletecertification of data science
Thanks for Sharing, Great
ReplyDeleteData Science Online Training
Python Online Training
Wow, amazing post! Really engaging, thank you.
ReplyDeletebest data analytics training in yelahanka