- •LECTURE 8
- •What is Data Mining?
- •Typical Kinds of Patterns
- •Example: Clusters
- •Example: Frequent Itemsets
- •Applications (Among Many)
- •Cultures
- •Models vs. Analytic Processing
- •(Way too Simple) Example
- •Meaningfulness of Answers
- •Examples
- •Rhine Paradox --- (1)
- •Rhine Paradox --- (2)
- •Rhine Paradox --- (3)
- •What is Web Mining?
- •How does it differ from “classical” Data Mining?
- •The World-Wide Web
- •Size of the Web
- •Netcraft survey
- •The web as a graph
- •Power-law degree distribution
- •Power-laws galore
- •Searching the Web
- •Ads vs. search results
- •Ads vs. search results
- •Sidebar: What’s in a name?
- •The Long Tail
- •Web Mining topics
- •Web search basics
- •Search engine components
- •Knowledge Discovery in
- •Typical Tasks in Data Mining
- •Typical Tasks in Data Mining
- •Typical Tasks in Data Mining
- •Typical Tasks in Data Mining
- •Typical Tasks in Data Mining
- •Typical Tasks in Data Mining
- •Typical Tasks in Data Mining
- •What is Data Mining?
- •Data Mining Algorithms
- •Data Mining Algorithms
- •Data Mining Models
- •Data Mining Models
- •Data Mining Models
- •Data Mining Models
- •Data Mining Models
- •Searching the Model Space
- •Searching the Model Space
- •THANK YOU
Knowledge Discovery in
Databases
Cleaning
Integration
Selection
Transformation
Data Evaluation
Mining Visualization
Data
Warehou Prepared
data
se
Patterns
Knowledge
Knowledge
Base
Data
Typical Tasks in Data Mining
ClassificationPredictionClustering
Association Analysis
Summarization
…
Typical Tasks in Data Mining
Classification
From data with known labels, create a classifier that determines which label to apply to a new observation
E.g. Label loan applications as low, medium, or high risk
Typical Tasks in Data Mining
Prediction
Given a collection of data with known numeric outputs, create a function that outputs a predicted value from a new set of inputs.
E.g. Given historical consumption of milk in the U.S., predict what the consumption will be over the next five years.
Typical Tasks in Data Mining
Clustering
Identify “natural” groupings in data
Unsupervised learning, no predefined groups
E.g. A city planner grouping houses by value, location, and house type.
Typical Tasks in Data Mining
Association Analysis
Identify relationships in data from co-occuring terms or items.
E.g. Analyze grocery store purchases to identify items most commonly purchased together. This is often used to create coupons and sales: buy chips and get $0.50 off salsa.
Typical Tasks in Data Mining
Summarization
Given a data set, summarize the important characteristics of the data.
E.g. calculate mean and standard deviation, determine statistical distribution, identify most commonly appearing attribute values, etc.
Typical Tasks in Data Mining
Sequence Analysis
Given data collected over time, identify trends in the data that may be used to predict future events occuring
E.g. Analyzing stock data to identify stocks that will perform well vs. those that will perform poorly.
What is Data Mining?
Data Mining Process
|
|
|
|
|
No |
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fit a Model |
|
|
Calculate |
|
|
Meet Criteria? |
||
|
|
Performance |
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yes |
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Interpret |
|
|
|
|
|
|
|
|
Model |
|
|
|
|
|
|
|
|
|
|
Data Mining Algorithms
Apply/create a model
A model is an abstract description of data
What is the model’s function? (i.e. what task does it perform?)
How is the model represented? (I.e. mathematical function, rules,
Gaussian distribution)