BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
WORK INTEGRATED LEARNING PROGRAMMES
Digital Learning
Part A: Course Design
| 
Course Title | 
Data
  Mining | 
| 
Course No(s) | 
IS
  ZC415 | 
| 
Credit Units | 
3 | 
| 
Credit Model |  | 
| 
Content Authors | 
ARUNA
  MALAPATI | 
Course
Objectives
| 
No |  | 
| 
CO1 | 
To
  introduce basic concepts of data mining. | 
| 
CO2 | 
To
  familiarize students with practical technologies in data mining. | 
| 
CO3 | 
To
  provide students interesting problems in the field of data mining to solve.  | 
Text
Book(s)
| 
T1 | 
Tan
  P. N., Steinbach M & Kumar V. “Introduction to Data Mining” Pearson
  Education, 2006 | 
| 
T2 | 
Data
  Mining: Concepts and Techniques, Third Edition  by 
  Jiawei Han, Micheline Kamber and  | 
Reference
Book(s) & other resources
| 
R1 | 
Predictive
  Analytics and Data Mining: Concepts and Practice with RapidMiner by  Vijay Kotu and Bala Deshpande Morgan
  Kaufmann Publishers © 2015 | 
| 
R2 | 
http://www.scikit-learn.org | 
Content Structure
Modules
| 
No. | 
Title of the Module | 
| 
M1 | 
Introduction
  to Data Mining | 
| 
M2 | 
Data Preprocessing: 
To understand the need for data
  preprocessing and various techniques used in the context of Data Mining | 
| 
M3 | 
Data Exploration: 
A preliminary exploration of the data
  to better understand its characteristics | 
| 
M4 | 
Classification and prediction: 
To learn different techniques and
  algorithms for classification, a major predictive and supervised Data Mining
  task | 
| 
M5 | 
Association Analysis: 
To understand the descriptive relation
  between the entities by identifying associations among them and to learn various
  algorithms to find them | 
| 
M6 | 
Clustering: 
To learn different techniques and
  algorithms for clustering, a major descriptive and unsupervised Data Mining
  task | 
| 
M7 | 
Anomaly Detection: 
Detecting outliers and noise in data
  sets is an important Data Mining task. This module focuses on techniques
  needed for anomaly detection | 
| 
M8 | 
Data Mining on unstructured(Big) data: 
Graph Mining, Social Network Analysis,
  Multimedia Data Mining, Text Mining, Mining the World Wide Web | 
| 
M9 | 
Data Mining Applications: 
Recommendation Systems 
Fraud Detection 
Sentiment Analysis | 
Glossary of Terms:
1.      Contact
Hour (CH) stands for a hour long live session with students conducted either in
a physical classroom or enabled through technology. In this model of
instruction, instructor led sessions will be for 20 CH. 
a.       Pre CH =
Self Learning done prior to a given contact hour
b.      During
CH = Content to be discussed during the contact hour by the course instructor
c.       Post CH
= Self Learning done post the contact hour
2.      RL
stands for Recorded Lecture or Recorded Lesson. It is presented to the student
through an online portal. A given RL unfolds as a sequences of video segments
interleaved with exercises 
3.       SS stands for Self-Study to be done as a study
of relevant sections from textbooks and reference books. It could also include
study of external resources.
4.      LE
stands for Lab Exercises
5.      HW
stands for Home Work will consist of discussed/new problems; could be a
selection of problems from the text.
M1:
Introduction to Data Mining
| 
Type | 
Description/Plan/Reference | 
| 
RL1.1 | 
RL1.1.1 = Definition of Data Mining? 
RL1.1.2 = What type of data can be mined? | 
| 
RL1.2 | 
RL1.2.1 = What kind of patterns can be
  mined? 
RL1.2.2 = What kind of applications are
  targeted? | 
| 
RL1.3 | 
DM Process (R1) & DM Challenges
  (T2) 
RL1.3.1 = Process/Technologies used in
  DM. 
RL1.3.2 = Challenges in DM. | 
| 
CS1.1 | 
CS1.1.1 = Review of Data Mining basics
  Examples of patterns that can be mined 
CS1.1.2 = Examples of technologies used
  in DM Approaches to overcome challenges. Discuss one example Case Study for
  data mining | 
| 
LE1.1 | 
Exploration of Weka, operations,
  features, arff files. | 
| 
SS1.1 | 
T1, Chapter 1; T2, Ch 1 | 
| 
HW1.1 | 
Exercises at the end of T2, Ch 1 | 
| 
QZ1.1 |  | 
M2:
Data Preprocessing
| 
Type | 
Description/Plan/Reference | 
| 
RL2.1 | 
RL2.1.1 = Why does data need
  preprocessing? 
RL2.1.2 = Major tasks in data
  preprocessing | 
| 
RL2.2 | 
RL2.2.1 = Data Cleaning techniques 
RL2.2.2 = Data discretization,
  transformation, integration,  reduction | 
| 
CS2.1 | 
CS2.1.1 = Review of concepts of data
  preprocessing 
CS2.1.2 = Examples of application of
  preprocessing techniques. | 
| 
LE2.1 | 
Experiments with Weka - filters,
  discretization  | 
| 
SS2.1 |  | 
| 
HW2.1 |  | 
| 
QZ2.1 |  | 
M3:
Data Exploration
| 
Type | 
Description/Plan/Reference | 
| 
RL3.1 | 
RL3.1.1 = Various types of data to be
  mined 
RL3.1.2 = Statistical descriptions of
  data | 
| 
RL3.2 | 
RL3.2.1 = Measuring data similarity &
  dissimilarity 
RL3.2.2 = Data Visualization | 
| 
CS3.1 | 
CS3.1.1 = Review of concepts of data
  exploration 
CS3.1.2 = Examples of similarities
  & dissimilarities. | 
| 
LE3.1 |  | 
| 
SS3.1 |  | 
| 
HW3.1 |  | 
| 
QZ3.1 |  | 
M4:
Classification and Prediction
| 
Type | 
Description/Plan/Reference | 
| 
RL4.1 | 
RL4.1.1 = Introduction to
  classification and prediction 
RL4.1.2 = Decision trees for
  classification 
RL4.1.3 = Rule based classification,
  Bayesian classification, Support vector machines | 
| 
RL4.2 | 
RL4.2.1 = Issues regarding
  classification and prediction,  
RL4.2.2 = Linear Regression, Nonlinear
  Regression | 
| 
CS4.1 | 
CS4.1.1 = Review of concepts of
  recorded lectures, Algorithm for Decision trees induction, Classification by
  back propagation, Comparison of methods of classification 
CS4.1.2 = Prediction: Other
  Regression-Based Methods.           | 
| 
LE4.1 | 
Experiments with Weka - decision trees,
  rules, prediction | 
| 
SS4.1 |  | 
| 
HW4.1 |  | 
| 
QZ4.1 |  | 
M5:
Association Analysis
| 
Type | 
Description/Plan/Reference | 
| 
RL5.1 | 
RL5.1.1 = What is association rule
  mining? 
RL5.1.2 = Frequent Itemsets, Closed
  Itemsets, and Association Rules | 
| 
RL5.2 | 
RL5.2.1 = What is Apriori Algorithm? 
RL5.2.2 = Finding Frequent Itemsets Using
  Candidate Generation, Generating Association Rules from Frequent Itemsets | 
| 
CS5.1 | 
CS5.1.1 = Review of concepts of recorded
  lectures , Improving the Efficiency of Apriori 
CS5.1.2 = Mining Frequent Itemsets
  without Candidate Generation.     | 
| 
LE5.1 | 
Experiments
  with Weka - mining association rules | 
| 
SS5.1 |  | 
| 
HW5.1 |  | 
| 
QZ5.1 |  | 
M6:
Clustering
| 
Type | 
Description/Plan/Reference | 
| 
RL6.1 | 
RL6.1.1 = What is cluster analysis? Types
  of data in Cluster analysis. 
RL6.1.2 = Partitioning methods: k-means | 
| 
RL6.2 | 
RL6.2.1 = Hierarchical algorithms 
RL6.2.2 = Introduction to density based
  approach | 
| 
CS6.1 | 
CS6.1.1 = Review of concepts of recorded
  lectures 
CS6.1.2 = Density based algorithm: DBSCAN | 
| 
LE6.1 | 
Experiments
  with Weka - k-means | 
| 
SS6.1 |  | 
| 
HW6.1 |  | 
| 
QZ6.1 |  | 
M7:
Anomaly Detection
| 
Type | 
Description/Plan/Reference | 
| 
RL7.1 | 
RL7.1.1 = Preliminaries 
RL7.1.2 = Statistical approach | 
| 
RL7.2 | 
RL7.2.1 = Proximity based outlier
  detection 
RL7.2.2 = Density based outlier detection | 
| 
CS7.1 | 
CS7.1.1 = Review of concepts of recorded
  lectures 
CS7.1.2 = Clustering based techniques | 
| 
LE7.1 |  | 
| 
SS7.1 |  | 
| 
HW7.1 |  | 
| 
QZ7.1 |  | 
M8:
Data mining on unstructured (Big) data
| 
Type | 
Description/Plan/Reference | 
| 
RL8.1 | 
RL8.1.1 = Graph Mining methods and
  applications- Graph Indexing, Similarity Search, Classification, and
  Clustering 
RL8.1.2 = Multimedia Data Mining-
  Classification and Prediction Analysis of Multimedia Data, Mining
  Associations in Multimedia Data,     Audio
  and Video Data Mining | 
| 
RL8.2 | 
RL8.2.1 = Text Mining - Text Data
  Analysis and Information Retrieval 
RL8.2.2 = Dimensionality Reduction for
  Text, Text Mining Approaches | 
| 
CS8.1 | 
CS8.1.1 = Social Network Analysis 
CS8.1.2 = Mining the World Wide Web | 
| 
LE8.1 |  | 
| 
SS8.1 |  | 
| 
HW8.1 |  | 
| 
QZ8.1 |  | 
M9: Data Mining Applications
| 
Type | 
Description/Plan/Reference | 
| 
RL9.1 | 
RL9.1.1 = Recommendation systems 
RL9.1.2 = Case study for Recommendation
  systems | 
| 
RL9.2 | 
RL9.2.1 = Fraud Detection 
RL9.2.2 = Case study for Fraud Detection | 
| 
CS9.1 | 
CS9.1.1 = Sentiment Analysis 
CS9.1.2 = Case study for Sentiment
  Analysis | 
| 
LE9.1 |  | 
| 
SS9.1 |  | 
| 
HW9.1 |  | 
| 
QZ9.1 |  | 
Part B: Contact Session Plan
| 
Academic Term | 
Second  Semester 2017-2018 | 
| 
Course Title | 
Data Mining | 
| 
Course No | 
IS ZC415 | 
| 
Content Developer | 
ARUNA MALAPATI | 
| 
Contact hour | 
Pre-contact
  hour prep | 
During
  Contact hour | 
Post-contact
  hour | 
| 
1 | 
RL 1.1, RL 1.2 | 
CS 1.1 |  | 
| 
2 | 
RL 1.3 | 
CS1.2 | 
LE1.1, HW1.1 ,SS1.1 | 
| 
3 | 
RL2.1 | 
CS2.1 |  | 
| 
4 | 
RL2.2 | 
CS2.2 | 
LE2.1, SS2.1, HW2.1 | 
| 
5 | 
RL3.1 | 
CS3.1 |  | 
| 
6 | 
RL3.2, RL3.3 | 
CS3.2 | 
LE3.1, SS3.1, HW3.1 | 
| 
7 | 
RL4.1, RL4.2, RL4.3 | 
CS4.1 |  | 
| 
8 | 
RL4.4 | 
CS4.2 | 
LE4.1, SS4.1, HW4.1 | 
| 
9 | 
RL5.1, RL5.2 | 
CS5.1 |  | 
| 
10 |  | 
Review |  | 
| 
11 |  | 
Review |  | 
| 
12 | 
RL5.3, RL5.4 | 
CS5.2 | 
LE5.1, SS5.1, HW5.1 | 
| 
13 | 
RL6.1 | 
CS6.1 |  | 
| 
14 | 
RL6.2, RL6.3, RL6.4 | 
CS6.2 | 
LE6.1, SS6.1, HW6.1 | 
| 
15 | 
RL7.1 | 
CS7.1 |  | 
| 
16 | 
RL7.2, RL7.3 | 
CS7.2 | 
LE7.1, SS7.1, HW7.1 | 
| 
17 | 
RL8.1, RL8.2, RL8.3 | 
CS8.1, CS8.2 |  | 
| 
18 | 
RL9.1, RL9.2, RL 9.3 | 
CS9.1 | 
SS8.1, SS9.1, HW8.1 | 
| 
19 | 
Python basics, scikit-learn | 
Class notes/case study |  | 
| 
20 | 
Earlier case study/python basics | 
Class notes/case study |  | 
| 
21 |  | 
Review |  | 
| 
22 |  | 
Review |  | 
Notes:
Evaluation
Scheme:   
Legend: EC =
Evaluation Component; AN = After Noon Session; FN = Fore Noon Session
| 
No | 
Name | 
Type | 
Duration | 
Weight | 
Day,
  Date, Session, Time | 
| 
EC-1 | 
Quiz-I/
  Assignment-I | 
Online | 
- | 
5% | 
February 1 to 10,
  2018 | 
|  | 
Quiz-II | 
Online |  | 
5% | 
March 1 to 10, 2018
   | 
|  | 
Lab
   | 
Online |  | 
10% | 
March 20 to 30,
  2018  | 
| 
EC-2 | 
Mid-Semester
  Test | 
Closed
  Book | 
2
  hours | 
30% | 
03/03/2018
  (AN) 2 PM TO 4 PM  | 
| 
EC-3 | 
Comprehensive
  Exam | 
Open
  Book | 
3
  hours | 
50% | 
21/04/2018
  (AN) 2 PM TO 5 PM | 
Syllabus for
Mid-Semester Test (Closed Book): Topics in Session Nos.  1 to 11
Syllabus for
Comprehensive Exam (Open Book): All topics (Session Nos. 1 to 22)
Important
links and information:
Elearn portal: https://elearn.bits-pilani.ac.in
Students are
expected to visit the Elearn portal on a regular basis and stay up to date with
the latest announcements and deadlines.
Contact sessions:
Students should attend the online lectures as per the schedule provided on the
Elearn portal.
Evaluation Guidelines:
1.      EC-1
consists of either two Assignments or three Quizzes. Students will attempt them
through the course pages on the Elearn portal. Announcements will be made on
the portal, in a timely manner.
2.      For
Closed Book tests: No books or reference material of any kind will be
permitted.
3.      For Open
Book exams: Use of books and any printed / written reference material (filed or
bound) is permitted. However, loose sheets of paper will not be allowed. Use of
calculators is permitted in all exams. Laptops/Mobiles of any kind are not
allowed. Exchange of any material is not allowed.
4.      If a
student is unable to appear for the Regular Test/Exam due to genuine
exigencies, the student should follow the procedure to apply for the Make-Up
Test/Exam which will be made available on the Elearn portal. The Make-Up
Test/Exam will be conducted only at selected exam centres on the dates to be
announced later.
It shall be the
responsibility of the individual student to be regular in maintaining the self
study schedule as given in the course handout, attend the online lectures, and
take all the prescribed evaluation components such as Assignment/Quiz,
Mid-Semester Test and Comprehensive Exam according to the evaluation scheme
provided in the handout.
 
No comments:
Post a Comment