Data Mining Handwritten Notes
What is Data Mining ?
Data mining refers to extracting or mining knowledge from large amounts of data. The term is actually a misnomer. Thus, data mining should have been more appropriately named as knowledge mining which emphasis on mining from large amounts of data.
What does data mining mean ?
It is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
What are the key properties of Data Mining ?
1. Automatic discovery of patterns
2. Prediction of likely outcomes
3. Creation of actionable information
4. Focus on large datasets and databases
What are the tasks of Data Mining ?
Data mining involves six common classes of tasks:
1. Anomaly detection (Outlier/change/deviation detection)
2. Association rule learning (Dependency modelling)
Topics in our Data Mining Handwritten Lecture Notes PDF
In these “Data Mining Handwritten Lecture Notes PDF”, we will introduce data mining techniques and enables you to apply these techniques on real-life datasets. These notes focuses on three main data mining techniques: Classification, Clustering and Association Rule Mining tasks.
The topics we will cover will be taken from the following list:
Introduction to Data Mining – Applications of data mining, data mining tasks, motivation and challenges, types of data attributes and measurements, data quality.
Data Pre-processing – aggregation, sampling, dimensionality reduction, Feature Subset Selection, Feature Creation, Discretization and Binarization, Variable Transformation.
Classification: Basic Concepts, Decision Tree Classifier: Decision tree algorithm, attribute selection measures, Nearest Neighbour Classifier, Bayes Theorem and Naive Bayes Classifier,
Model Evaluation: Holdout Method, Random Sub Sampling, Cross-Validation, evaluation metrics, confusion matrix.
Association rule mining: Transaction data-set, Frequent Itemset, Support measure, Apriori Principle, Apriori Algorithm, Computational Complexity, Rule Generation, Confidence of association rule.
Cluster Analysis: Basic Concepts, Different Types of Clustering Methods, Different Types of Clusters
K-means: The Basic K-means Algorithm, Strengths and Weaknesses of K-means algorithm
Agglomerative Hierarchical Clustering: Basic Algorithm, Proximity between clusters
DBSCAN: The DBSCAN Algorithm, Strengths and Weaknesses.