BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
WORK INTEGRATED LEARNING PROGRAMMES
Digital Learning
Part A: Course Design
Course Title
|
Data
Mining
|
Course No(s)
|
IS
ZC415
|
Credit Units
|
3
|
Credit Model
|
|
Content Authors
|
ARUNA
MALAPATI
|
Course
Objectives
No
|
|
CO1
|
To
introduce basic concepts of data mining.
|
CO2
|
To
familiarize students with practical technologies in data mining.
|
CO3
|
To
provide students interesting problems in the field of data mining to solve.
|
Text
Book(s)
T1
|
Tan
P. N., Steinbach M & Kumar V. “Introduction to Data Mining” Pearson
Education, 2006
|
T2
|
Data
Mining: Concepts and Techniques, Third Edition by
Jiawei Han, Micheline Kamber and
|
Reference
Book(s) & other resources
R1
|
Predictive
Analytics and Data Mining: Concepts and Practice with RapidMiner by Vijay Kotu and Bala Deshpande Morgan
Kaufmann Publishers © 2015
|
R2
|
http://www.scikit-learn.org
|
Content Structure
Modules
No.
|
Title of the Module
|
M1
|
Introduction
to Data Mining
|
M2
|
Data Preprocessing:
To understand the need for data
preprocessing and various techniques used in the context of Data Mining
|
M3
|
Data Exploration:
A preliminary exploration of the data
to better understand its characteristics
|
M4
|
Classification and prediction:
To learn different techniques and
algorithms for classification, a major predictive and supervised Data Mining
task
|
M5
|
Association Analysis:
To understand the descriptive relation
between the entities by identifying associations among them and to learn various
algorithms to find them
|
M6
|
Clustering:
To learn different techniques and
algorithms for clustering, a major descriptive and unsupervised Data Mining
task
|
M7
|
Anomaly Detection:
Detecting outliers and noise in data
sets is an important Data Mining task. This module focuses on techniques
needed for anomaly detection
|
M8
|
Data Mining on unstructured(Big) data:
Graph Mining, Social Network Analysis,
Multimedia Data Mining, Text Mining, Mining the World Wide Web
|
M9
|
Data Mining Applications:
Recommendation Systems
Fraud Detection
Sentiment Analysis
|
Glossary of Terms:
1. Contact
Hour (CH) stands for a hour long live session with students conducted either in
a physical classroom or enabled through technology. In this model of
instruction, instructor led sessions will be for 20 CH.
a. Pre CH =
Self Learning done prior to a given contact hour
b. During
CH = Content to be discussed during the contact hour by the course instructor
c. Post CH
= Self Learning done post the contact hour
2. RL
stands for Recorded Lecture or Recorded Lesson. It is presented to the student
through an online portal. A given RL unfolds as a sequences of video segments
interleaved with exercises
3. SS stands for Self-Study to be done as a study
of relevant sections from textbooks and reference books. It could also include
study of external resources.
4. LE
stands for Lab Exercises
5. HW
stands for Home Work will consist of discussed/new problems; could be a
selection of problems from the text.
M1:
Introduction to Data Mining
Type
|
Description/Plan/Reference
|
RL1.1
|
RL1.1.1 = Definition of Data Mining?
RL1.1.2 = What type of data can be mined?
|
RL1.2
|
RL1.2.1 = What kind of patterns can be
mined?
RL1.2.2 = What kind of applications are
targeted?
|
RL1.3
|
DM Process (R1) & DM Challenges
(T2)
RL1.3.1 = Process/Technologies used in
DM.
RL1.3.2 = Challenges in DM.
|
CS1.1
|
CS1.1.1 = Review of Data Mining basics
Examples of patterns that can be mined
CS1.1.2 = Examples of technologies used
in DM Approaches to overcome challenges. Discuss one example Case Study for
data mining
|
LE1.1
|
Exploration of Weka, operations,
features, arff files.
|
SS1.1
|
T1, Chapter 1; T2, Ch 1
|
HW1.1
|
Exercises at the end of T2, Ch 1
|
QZ1.1
|
|
M2:
Data Preprocessing
Type
|
Description/Plan/Reference
|
RL2.1
|
RL2.1.1 = Why does data need
preprocessing?
RL2.1.2 = Major tasks in data
preprocessing
|
RL2.2
|
RL2.2.1 = Data Cleaning techniques
RL2.2.2 = Data discretization,
transformation, integration, reduction
|
CS2.1
|
CS2.1.1 = Review of concepts of data
preprocessing
CS2.1.2 = Examples of application of
preprocessing techniques.
|
LE2.1
|
Experiments with Weka - filters,
discretization
|
SS2.1
|
|
HW2.1
|
|
QZ2.1
|
|
M3:
Data Exploration
Type
|
Description/Plan/Reference
|
RL3.1
|
RL3.1.1 = Various types of data to be
mined
RL3.1.2 = Statistical descriptions of
data
|
RL3.2
|
RL3.2.1 = Measuring data similarity &
dissimilarity
RL3.2.2 = Data Visualization
|
CS3.1
|
CS3.1.1 = Review of concepts of data
exploration
CS3.1.2 = Examples of similarities
& dissimilarities.
|
LE3.1
|
|
SS3.1
|
|
HW3.1
|
|
QZ3.1
|
|
M4:
Classification and Prediction
Type
|
Description/Plan/Reference
|
RL4.1
|
RL4.1.1 = Introduction to
classification and prediction
RL4.1.2 = Decision trees for
classification
RL4.1.3 = Rule based classification,
Bayesian classification, Support vector machines
|
RL4.2
|
RL4.2.1 = Issues regarding
classification and prediction,
RL4.2.2 = Linear Regression, Nonlinear
Regression
|
CS4.1
|
CS4.1.1 = Review of concepts of
recorded lectures, Algorithm for Decision trees induction, Classification by
back propagation, Comparison of methods of classification
CS4.1.2 = Prediction: Other
Regression-Based Methods.
|
LE4.1
|
Experiments with Weka - decision trees,
rules, prediction
|
SS4.1
|
|
HW4.1
|
|
QZ4.1
|
|
M5:
Association Analysis
Type
|
Description/Plan/Reference
|
RL5.1
|
RL5.1.1 = What is association rule
mining?
RL5.1.2 = Frequent Itemsets, Closed
Itemsets, and Association Rules
|
RL5.2
|
RL5.2.1 = What is Apriori Algorithm?
RL5.2.2 = Finding Frequent Itemsets Using
Candidate Generation, Generating Association Rules from Frequent Itemsets
|
CS5.1
|
CS5.1.1 = Review of concepts of recorded
lectures , Improving the Efficiency of Apriori
CS5.1.2 = Mining Frequent Itemsets
without Candidate Generation.
|
LE5.1
|
Experiments
with Weka - mining association rules
|
SS5.1
|
|
HW5.1
|
|
QZ5.1
|
|
M6:
Clustering
Type
|
Description/Plan/Reference
|
RL6.1
|
RL6.1.1 = What is cluster analysis? Types
of data in Cluster analysis.
RL6.1.2 = Partitioning methods: k-means
|
RL6.2
|
RL6.2.1 = Hierarchical algorithms
RL6.2.2 = Introduction to density based
approach
|
CS6.1
|
CS6.1.1 = Review of concepts of recorded
lectures
CS6.1.2 = Density based algorithm: DBSCAN
|
LE6.1
|
Experiments
with Weka - k-means
|
SS6.1
|
|
HW6.1
|
|
QZ6.1
|
|
M7:
Anomaly Detection
Type
|
Description/Plan/Reference
|
RL7.1
|
RL7.1.1 = Preliminaries
RL7.1.2 = Statistical approach
|
RL7.2
|
RL7.2.1 = Proximity based outlier
detection
RL7.2.2 = Density based outlier detection
|
CS7.1
|
CS7.1.1 = Review of concepts of recorded
lectures
CS7.1.2 = Clustering based techniques
|
LE7.1
|
|
SS7.1
|
|
HW7.1
|
|
QZ7.1
|
|
M8:
Data mining on unstructured (Big) data
Type
|
Description/Plan/Reference
|
RL8.1
|
RL8.1.1 = Graph Mining methods and
applications- Graph Indexing, Similarity Search, Classification, and
Clustering
RL8.1.2 = Multimedia Data Mining-
Classification and Prediction Analysis of Multimedia Data, Mining
Associations in Multimedia Data, Audio
and Video Data Mining
|
RL8.2
|
RL8.2.1 = Text Mining - Text Data
Analysis and Information Retrieval
RL8.2.2 = Dimensionality Reduction for
Text, Text Mining Approaches
|
CS8.1
|
CS8.1.1 = Social Network Analysis
CS8.1.2 = Mining the World Wide Web
|
LE8.1
|
|
SS8.1
|
|
HW8.1
|
|
QZ8.1
|
|
M9: Data Mining Applications
Type
|
Description/Plan/Reference
|
RL9.1
|
RL9.1.1 = Recommendation systems
RL9.1.2 = Case study for Recommendation
systems
|
RL9.2
|
RL9.2.1 = Fraud Detection
RL9.2.2 = Case study for Fraud Detection
|
CS9.1
|
CS9.1.1 = Sentiment Analysis
CS9.1.2 = Case study for Sentiment
Analysis
|
LE9.1
|
|
SS9.1
|
|
HW9.1
|
|
QZ9.1
|
|
Part B: Contact Session Plan
Academic Term
|
Second Semester 2017-2018
|
Course Title
|
Data Mining
|
Course No
|
IS ZC415
|
Content Developer
|
ARUNA MALAPATI
|
Contact hour
|
Pre-contact
hour prep
|
During
Contact hour
|
Post-contact
hour
|
1
|
RL 1.1, RL 1.2
|
CS 1.1
|
|
2
|
RL 1.3
|
CS1.2
|
LE1.1, HW1.1 ,SS1.1
|
3
|
RL2.1
|
CS2.1
|
|
4
|
RL2.2
|
CS2.2
|
LE2.1, SS2.1, HW2.1
|
5
|
RL3.1
|
CS3.1
|
|
6
|
RL3.2, RL3.3
|
CS3.2
|
LE3.1, SS3.1, HW3.1
|
7
|
RL4.1, RL4.2, RL4.3
|
CS4.1
|
|
8
|
RL4.4
|
CS4.2
|
LE4.1, SS4.1, HW4.1
|
9
|
RL5.1, RL5.2
|
CS5.1
|
|
10
|
|
Review
|
|
11
|
|
Review
|
|
12
|
RL5.3, RL5.4
|
CS5.2
|
LE5.1, SS5.1, HW5.1
|
13
|
RL6.1
|
CS6.1
|
|
14
|
RL6.2, RL6.3, RL6.4
|
CS6.2
|
LE6.1, SS6.1, HW6.1
|
15
|
RL7.1
|
CS7.1
|
|
16
|
RL7.2, RL7.3
|
CS7.2
|
LE7.1, SS7.1, HW7.1
|
17
|
RL8.1, RL8.2, RL8.3
|
CS8.1, CS8.2
|
|
18
|
RL9.1, RL9.2, RL 9.3
|
CS9.1
|
SS8.1, SS9.1, HW8.1
|
19
|
Python basics, scikit-learn
|
Class notes/case study
|
|
20
|
Earlier case study/python basics
|
Class notes/case study
|
|
21
|
|
Review
|
|
22
|
|
Review
|
|
Notes:
Evaluation
Scheme:
Legend: EC =
Evaluation Component; AN = After Noon Session; FN = Fore Noon Session
No
|
Name
|
Type
|
Duration
|
Weight
|
Day,
Date, Session, Time
|
EC-1
|
Quiz-I/
Assignment-I
|
Online
|
-
|
5%
|
February 1 to 10,
2018
|
|
Quiz-II
|
Online
|
|
5%
|
March 1 to 10, 2018
|
|
Lab
|
Online
|
|
10%
|
March 20 to 30,
2018
|
EC-2
|
Mid-Semester
Test
|
Closed
Book
|
2
hours
|
30%
|
03/03/2018
(AN) 2 PM TO 4 PM
|
EC-3
|
Comprehensive
Exam
|
Open
Book
|
3
hours
|
50%
|
21/04/2018
(AN) 2 PM TO 5 PM
|
Syllabus for
Mid-Semester Test (Closed Book): Topics in Session Nos. 1 to 11
Syllabus for
Comprehensive Exam (Open Book): All topics (Session Nos. 1 to 22)
Important
links and information:
Elearn portal: https://elearn.bits-pilani.ac.in
Students are
expected to visit the Elearn portal on a regular basis and stay up to date with
the latest announcements and deadlines.
Contact sessions:
Students should attend the online lectures as per the schedule provided on the
Elearn portal.
Evaluation Guidelines:
1. EC-1
consists of either two Assignments or three Quizzes. Students will attempt them
through the course pages on the Elearn portal. Announcements will be made on
the portal, in a timely manner.
2. For
Closed Book tests: No books or reference material of any kind will be
permitted.
3. For Open
Book exams: Use of books and any printed / written reference material (filed or
bound) is permitted. However, loose sheets of paper will not be allowed. Use of
calculators is permitted in all exams. Laptops/Mobiles of any kind are not
allowed. Exchange of any material is not allowed.
4. If a
student is unable to appear for the Regular Test/Exam due to genuine
exigencies, the student should follow the procedure to apply for the Make-Up
Test/Exam which will be made available on the Elearn portal. The Make-Up
Test/Exam will be conducted only at selected exam centres on the dates to be
announced later.
It shall be the
responsibility of the individual student to be regular in maintaining the self
study schedule as given in the course handout, attend the online lectures, and
take all the prescribed evaluation components such as Assignment/Quiz,
Mid-Semester Test and Comprehensive Exam according to the evaluation scheme
provided in the handout.
No comments:
Post a Comment