Data Mining - Course Handout


BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
WORK INTEGRATED LEARNING PROGRAMMES
Digital Learning
Part A: Course Design

Course Title
Data Mining
Course No(s)
IS ZC415
Credit Units
3
Credit Model

Content Authors
ARUNA MALAPATI

Course Objectives
No

CO1
To introduce basic concepts of data mining.
CO2
To familiarize students with practical technologies in data mining.
CO3
To provide students interesting problems in the field of data mining to solve.

Text Book(s)
T1
Tan P. N., Steinbach M & Kumar V. “Introduction to Data Mining” Pearson Education, 2006
T2
Data Mining: Concepts and Techniques, Third Edition  by  Jiawei Han, Micheline Kamber and Jian Pei Morgan Kaufmann Publishers

Reference Book(s) & other resources
R1
Predictive Analytics and Data Mining: Concepts and Practice with RapidMiner by  Vijay Kotu and Bala Deshpande Morgan Kaufmann Publishers © 2015
R2
http://www.scikit-learn.org





Content Structure

Modules

No.
Title of the Module
M1
Introduction to Data Mining
M2
Data Preprocessing:
To understand the need for data preprocessing and various techniques used in the context of Data Mining
M3
Data Exploration:
A preliminary exploration of the data to better understand its characteristics
M4
Classification and prediction:
To learn different techniques and algorithms for classification, a major predictive and supervised Data Mining task
M5
Association Analysis:
To understand the descriptive relation between the entities by identifying associations among them and to learn various algorithms to find them
M6
Clustering:
To learn different techniques and algorithms for clustering, a major descriptive and unsupervised Data Mining task
M7
Anomaly Detection:
Detecting outliers and noise in data sets is an important Data Mining task. This module focuses on techniques needed for anomaly detection
M8
Data Mining on unstructured(Big) data:
Graph Mining, Social Network Analysis, Multimedia Data Mining, Text Mining, Mining the World Wide Web
M9
Data Mining Applications:
Recommendation Systems
Fraud Detection
Sentiment Analysis


Glossary of Terms:
1.      Contact Hour (CH) stands for a hour long live session with students conducted either in a physical classroom or enabled through technology. In this model of instruction, instructor led sessions will be for 20 CH.
a.       Pre CH = Self Learning done prior to a given contact hour
b.      During CH = Content to be discussed during the contact hour by the course instructor
c.       Post CH = Self Learning done post the contact hour
2.      RL stands for Recorded Lecture or Recorded Lesson. It is presented to the student through an online portal. A given RL unfolds as a sequences of video segments interleaved with exercises
3.       SS stands for Self-Study to be done as a study of relevant sections from textbooks and reference books. It could also include study of external resources.
4.      LE stands for Lab Exercises
5.      HW stands for Home Work will consist of discussed/new problems; could be a selection of problems from the text.

M1: Introduction to Data Mining
Type
Description/Plan/Reference
RL1.1
RL1.1.1 = Definition of Data Mining?
RL1.1.2 = What type of data can be mined?
RL1.2
RL1.2.1 = What kind of patterns can be mined?
RL1.2.2 = What kind of applications are targeted?
RL1.3
DM Process (R1) & DM Challenges (T2)
RL1.3.1 = Process/Technologies used in DM.
RL1.3.2 = Challenges in DM.
CS1.1
CS1.1.1 = Review of Data Mining basics Examples of patterns that can be mined
CS1.1.2 = Examples of technologies used in DM Approaches to overcome challenges. Discuss one example Case Study for data mining
LE1.1
Exploration of Weka, operations, features, arff files.
SS1.1
T1, Chapter 1; T2, Ch 1
HW1.1
Exercises at the end of T2, Ch 1
QZ1.1


M2: Data Preprocessing
Type
Description/Plan/Reference
RL2.1
RL2.1.1 = Why does data need preprocessing?
RL2.1.2 = Major tasks in data preprocessing
RL2.2
RL2.2.1 = Data Cleaning techniques
RL2.2.2 = Data discretization, transformation, integration,  reduction
CS2.1
CS2.1.1 = Review of concepts of data preprocessing
CS2.1.2 = Examples of application of preprocessing techniques.
LE2.1
Experiments with Weka - filters, discretization
SS2.1

HW2.1

QZ2.1




M3: Data Exploration
Type
Description/Plan/Reference
RL3.1
RL3.1.1 = Various types of data to be mined
RL3.1.2 = Statistical descriptions of data
RL3.2
RL3.2.1 = Measuring data similarity & dissimilarity
RL3.2.2 = Data Visualization
CS3.1
CS3.1.1 = Review of concepts of data exploration
CS3.1.2 = Examples of similarities & dissimilarities.
LE3.1

SS3.1

HW3.1

QZ3.1


M4: Classification and Prediction
Type
Description/Plan/Reference
RL4.1
RL4.1.1 = Introduction to classification and prediction
RL4.1.2 = Decision trees for classification
RL4.1.3 = Rule based classification, Bayesian classification, Support vector machines
RL4.2
RL4.2.1 = Issues regarding classification and prediction,
RL4.2.2 = Linear Regression, Nonlinear Regression
CS4.1
CS4.1.1 = Review of concepts of recorded lectures, Algorithm for Decision trees induction, Classification by back propagation, Comparison of methods of classification
CS4.1.2 = Prediction: Other Regression-Based Methods.         
LE4.1
Experiments with Weka - decision trees, rules, prediction
SS4.1

HW4.1

QZ4.1


M5: Association Analysis
Type
Description/Plan/Reference
RL5.1
RL5.1.1 = What is association rule mining?
RL5.1.2 = Frequent Itemsets, Closed Itemsets, and Association Rules
RL5.2
RL5.2.1 = What is Apriori Algorithm?
RL5.2.2 = Finding Frequent Itemsets Using Candidate Generation, Generating Association Rules from Frequent Itemsets
CS5.1
CS5.1.1 = Review of concepts of recorded lectures , Improving the Efficiency of Apriori
CS5.1.2 = Mining Frequent Itemsets without Candidate Generation.   
LE5.1
Experiments with Weka - mining association rules
SS5.1

HW5.1

QZ5.1


M6: Clustering
Type
Description/Plan/Reference
RL6.1
RL6.1.1 = What is cluster analysis? Types of data in Cluster analysis.
RL6.1.2 = Partitioning methods: k-means
RL6.2
RL6.2.1 = Hierarchical algorithms
RL6.2.2 = Introduction to density based approach
CS6.1
CS6.1.1 = Review of concepts of recorded lectures
CS6.1.2 = Density based algorithm: DBSCAN
LE6.1
Experiments with Weka - k-means
SS6.1

HW6.1

QZ6.1


M7: Anomaly Detection
Type
Description/Plan/Reference
RL7.1
RL7.1.1 = Preliminaries
RL7.1.2 = Statistical approach
RL7.2
RL7.2.1 = Proximity based outlier detection
RL7.2.2 = Density based outlier detection
CS7.1
CS7.1.1 = Review of concepts of recorded lectures
CS7.1.2 = Clustering based techniques
LE7.1

SS7.1

HW7.1

QZ7.1


M8: Data mining on unstructured (Big) data
Type
Description/Plan/Reference
RL8.1
RL8.1.1 = Graph Mining methods and applications- Graph Indexing, Similarity Search, Classification, and Clustering
RL8.1.2 = Multimedia Data Mining- Classification and Prediction Analysis of Multimedia Data, Mining Associations in Multimedia Data,     Audio and Video Data Mining
RL8.2
RL8.2.1 = Text Mining - Text Data Analysis and Information Retrieval
RL8.2.2 = Dimensionality Reduction for Text, Text Mining Approaches
CS8.1
CS8.1.1 = Social Network Analysis
CS8.1.2 = Mining the World Wide Web
LE8.1

SS8.1

HW8.1

QZ8.1


M9: Data Mining Applications
Type
Description/Plan/Reference
RL9.1
RL9.1.1 = Recommendation systems
RL9.1.2 = Case study for Recommendation systems
RL9.2
RL9.2.1 = Fraud Detection
RL9.2.2 = Case study for Fraud Detection
CS9.1
CS9.1.1 = Sentiment Analysis
CS9.1.2 = Case study for Sentiment Analysis
LE9.1

SS9.1

HW9.1

QZ9.1










Part B: Contact Session Plan
Academic Term
Second  Semester 2017-2018
Course Title
Data Mining
Course No
IS ZC415
Content Developer
ARUNA MALAPATI


Contact hour
Pre-contact hour prep
During Contact hour
Post-contact hour
1
RL 1.1, RL 1.2
CS 1.1

2
RL 1.3
CS1.2
LE1.1, HW1.1 ,SS1.1
3
RL2.1
CS2.1

4
RL2.2
CS2.2
LE2.1, SS2.1, HW2.1
5
RL3.1
CS3.1

6
RL3.2, RL3.3
CS3.2
LE3.1, SS3.1, HW3.1
7
RL4.1, RL4.2, RL4.3
CS4.1

8
RL4.4
CS4.2
LE4.1, SS4.1, HW4.1
9
RL5.1, RL5.2
CS5.1

10

Review

11

Review

12
RL5.3, RL5.4
CS5.2
LE5.1, SS5.1, HW5.1
13
RL6.1
CS6.1

14
RL6.2, RL6.3, RL6.4
CS6.2
LE6.1, SS6.1, HW6.1
15
RL7.1
CS7.1

16
RL7.2, RL7.3
CS7.2
LE7.1, SS7.1, HW7.1
17
RL8.1, RL8.2, RL8.3
CS8.1, CS8.2

18
RL9.1, RL9.2, RL 9.3
CS9.1
SS8.1, SS9.1, HW8.1
19
Python basics, scikit-learn
Class notes/case study

20
Earlier case study/python basics
Class notes/case study

21

Review

22

Review

                                   
Notes:


Evaluation Scheme:  
Legend: EC = Evaluation Component; AN = After Noon Session; FN = Fore Noon Session
No
Name
Type
Duration
Weight
Day, Date, Session, Time
EC-1
Quiz-I/ Assignment-I
Online
-
5%
February 1 to 10, 2018

Quiz-II
Online

5%
March 1 to 10, 2018

Lab
Online

10%
March 20 to 30, 2018
EC-2
Mid-Semester Test
Closed Book
2 hours
30%
03/03/2018 (AN) 2 PM TO 4 PM
EC-3
Comprehensive Exam
Open Book
3 hours
50%
21/04/2018 (AN) 2 PM TO 5 PM

Syllabus for Mid-Semester Test (Closed Book): Topics in Session Nos.  1 to 11
Syllabus for Comprehensive Exam (Open Book): All topics (Session Nos. 1 to 22)
Important links and information:
Elearn portal: https://elearn.bits-pilani.ac.in
Students are expected to visit the Elearn portal on a regular basis and stay up to date with the latest announcements and deadlines.
Contact sessions: Students should attend the online lectures as per the schedule provided on the Elearn portal.
Evaluation Guidelines:
1.      EC-1 consists of either two Assignments or three Quizzes. Students will attempt them through the course pages on the Elearn portal. Announcements will be made on the portal, in a timely manner.
2.      For Closed Book tests: No books or reference material of any kind will be permitted.
3.      For Open Book exams: Use of books and any printed / written reference material (filed or bound) is permitted. However, loose sheets of paper will not be allowed. Use of calculators is permitted in all exams. Laptops/Mobiles of any kind are not allowed. Exchange of any material is not allowed.
4.      If a student is unable to appear for the Regular Test/Exam due to genuine exigencies, the student should follow the procedure to apply for the Make-Up Test/Exam which will be made available on the Elearn portal. The Make-Up Test/Exam will be conducted only at selected exam centres on the dates to be announced later.
It shall be the responsibility of the individual student to be regular in maintaining the self study schedule as given in the course handout, attend the online lectures, and take all the prescribed evaluation components such as Assignment/Quiz, Mid-Semester Test and Comprehensive Exam according to the evaluation scheme provided in the handout.


No comments:

Post a Comment