Anne B. Wiegle

By
Published by

  • expression écrite
PENNSYLVANIA GOVERNOR'S INSTITUTE FOR WORLD LANGUAGES 2006 Anne Wiegle, Allentown School District, 8/10/2006 1 Anne B. Wiegle William Allen High School, Allentown School District Governor's Institute 2006 Roman Comedy (Latin II)
  • nomen mihi
  • pictura gratissima mihi
  • play scaena- stage pictura
  • excerpt from the play amphytrion
  • conversational phrases
  • plautus
  • plays
  • latin
  • questions
  • students
Published : Tuesday, March 27, 2012
Reading/s : 25
Origin : webdocs.cs.ualberta.ca
Number of pages: 30
See more See less

Course Content
Principles of Knowledge
• Introduction to Data Mining
Discovery in Data
• Association Analysis
Fall 2007
• Sequential Pattern Analysis
Chapter 4: Data Classification
• Classification and prediction
• Contrast Sets
Dr. Osmar R. Zaïane
• Data Clustering
• Outlier Detection
•Web Mining
University of Alberta
• Other topics if time permits (spatial data, biomedical data, etc.)
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
1 2
What is Classification?
Chapter 4 Objectives
The goal of data classification is to organize and
categorize data in distinct classes.
Learn basic techniques for data classification and
A model is first created based on the data distribution.
prediction.
The model is then used to classify new data.
Introduce techniques such as Neural Networks,
Given the model, a class can be predicted for new data.
Naïve Bayesian Classification, k-Nearest
With classification, I can predict in
Neighbors, Decision Trees & Associative
which bucket to put the ball, but I
Classifiers can’t predict the weight of the ball.
Realize the difference between supervised
classification, prediction and unsupervised
?
classification of data.

1 23 4 n
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
3 4Classification = Learning a Model
Application
Training Set (labeled)
Credit approval
Classification
Target marketing
Model
Medical diagnosis
Defective parts identification in manufacturing
Crime zoning
Treatment effectiveness analysis
Etc.
New unlabeled data Labeling=Classification
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
5 6
Classification is a three-step process
Classification is a three-step process
1. Model construction (Learning):
2. Model Evaluation (Accuracy):
• Each tuple is assumed to belong to a predefined class, as
Estimate accuracy rate of the model based on a test set.
determined by one of the attributes, called the class label.
– The known label of test sample is compared with the classified
• The set of all tuples used for construction of the model is called
result from the model.
training set.
– Accuracy rate is the percentage of test set samples that are
• The model is represented in the following forms:
correctly classified by the model.
– Test set is independent of training set otherwise over-fitting
• Classification rules, (IF-THEN statements),
will occur.
• Decision tree
• Mathematical formulae
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
7 8Classification is a three-step process
Classification with Holdout
Derive
Training
Estimate
Classifier
3. Model Use (Classification):
Accuracy
Data
(Model)
The model is used to classify unseen objects.
Data
• Give a class label to a new tuple
Testing
• Predict the value of an actual attribute
Data
• Holdout (partition D in two; one for test one for training)
• Random sub-sampling (repeated holdouts with different partitioning)
• K-fold cross validation (K partitions of D; do k experiments each with a
different test partition. No overlap as in Random sub-sampling. Typically k=10)
• Bootstrapping (N samples (typically of size 63% of D) with replacements)
• Leave-one-out (|D| experiments with |D|-1 for training and one for test)
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
9 10
1. Classification Process
2. Classification Process
(Learning)
(Accuracy Evaluation)
Classification
Algorithms
Classifier
Training
Testing
(Model)
Data
Data
Name Income Age Credit rating
Classifier
Name Income Age Credit rating
Bruce Low <30 bad
How accurate is the model?
(Model)
Tom Medium <30 bad
Dave Medium [30..40] good
Jane High <30 bad
William High <30 good
IF Income = ‘High’
Wei High >40 good
IF Income = ‘High’
Marie Medium >40 good OR Age > 30
Hua Medium [30..40] good
OR Age > 30
THEN CreditRating = ‘Good’
Anne Low [30..40] good
THEN CreditRating = ‘Good’
Chris Medium <30 bad
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
11 123. Classification Process
Improving Accuracy
(Classification)
Classifier 1
Classifier 2
Classifier
New
Classifier 3
Combine
(Model)
Data
Data
votes

New
Classifier n
Name Income Age Credit rating
Data
Credit Rating?
Paul High [30..40] ?
Composite classifier
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
13 14
Evaluating Classification Methods
Framework (Supervised Learning)
• Predictive accuracy
– Ability of the model to correctly predict the class label
• Speed and scalability
– Time to construct the model
Derive
– Time to use the model Training
Estimate
Classifier
Accuracy
Data
• Robustness
(Model)
Labeled
– Handling noise and missing values
Data
• Scalability
Testing
– Efficiency in large databases (not memory resident data)
Data
• Interpretability:
– The level of understanding and insight provided by the model
Unlabeled
• Form of rules
– Decision tree size New Data
– The compactness of classification rules
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
15 16™
¾
¾
z
z

z


z
¾




z



Lecture Outline
Classification Methods
Part I: Artificial Neural Networks (ANN) (1 hour)
Introduction to Neural Networks
• Biological Neural System
Neural Networks
• What is an artificial neural network?
Bayesian Classification
• Neuron model and activation function
Derive
Training
Classifier Estimate
K-Nearest Neighbour
Data Accuracy • Construction of a neural network
(Model)
Labeled
Learning: Backpropagation Algorithm
Data
Decision Tree Induction
Testing
Data
• Forward propagation of signal
Associative Classifiers
• Backward propagation of error
Unlabeled
New Data
• Example
Support Vector Machines
Part II: Bayesian Classifiers (Statistical-based) (1 hour)
Case-Based Reasoning
What is Bayesian Classification
Genetic Algorithms
Bayes theorem
Rough Set Theory
Naïve Bayes Algorithm
• Using Laplace Estimate
Fuzzy Sets
• Handling Missing Values and Numerical Data
Etc.
• Belief Networks
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
17 18
Biological Neurons
Human Nervous System
• The purpose of neurons: transmit information in the
• We have only just began to understand how
our neural system operates
form of electrical signals
• A huge number of neurons and
– it accepts many inputs, which are all added up in some way
interconnections between them
– if enough active inputs are received at once, the neuron will be
10
– 100 billion (i.e. 10 ) neurons in the brain
activated and fire; if not, it remain in its inactive state
10
• a full Olympic-sized swimming pool contains 10
• Cell body - contains nucleus holding the
• Structure of neuron
raindrops; the number of stars in the Milky Way
chromosomes
is of the same magnitude
• Dendrites
4
– 10 connections per neuron
• Axon
• Synapse
• Biological neurons are slower than computers
couples the axon with the dendrite of
-3 -9
– Neurons operate in 10 seconds , computers in 10 seconds another cell;
information is passed from one neuron
– The brain makes up for the slow rate of operation by a single neurone by
to another through synapses;
the large number of neurons and connections (think about the speed of face
no direct linkage across the junction,
recognition by a human, for example, and the time it takes fast computers to do the same
task.)
it is a chemical one.
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
19 20ƒ
ƒ
Operation of biological neurons
• Signals are transmitted between neurons by What is an Artificial Neural Network (NN)?
electrical pulses (action potentials, AP)
traveling along the axon;
A neural network is a data structure that supposedly simulates the
• When the potential at the synapse is raised
behaviour of neurons in a biological brain.
sufficiently by the AP, it releases chemicals
called neurotransmitters
- it may take the arrival of more than one AP before the synapse is triggered
A neural network is composed of layers of units interconnected.
• The neurotransmitters diffuse across the gap and chemically activate
Messages are passed along the connections from one unit to the
gates on the dendrites, that allows charged ions to flow
other. Messages can change based on the weight of the connection
• The flow of ions alters the potential of the dendrite and provides a
and the value in the node.
voltage pulse on the dendrite (post-synaptic-potential, PSP)
• some synapses excite the dendrite they affect, while others inhibit it
i
Input vector: x Output vector
• the synapses also determine the strength of the new input signal
• Each PSP travels along its dendrite and spreads over the soma (cell
Input nodes
body)
Output nodes
Hidden nodes
• The soma sums the effects of thousands PSPs; if the resulting potential
feedforward
exceeds a threshold, the neuron fires and generates another AP.
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
21 22
What is an Artificial Neural Network (NN)?
A Neuron
A network of many simple units (neurons, nodes)
• The units are connected by connections.
bias
θ
0.7
• Each connection has an associated numeric weight.
0.3
x
w
0.2 0
0
• Units receive inputs (from the environment or
other units) via the connections. They produce

x
w
1
1
output using their weights and the inputs (i.e. they
. . ∑
f
operate locally).
. .
.
.
output y
• A NN can be represented as a directed graph.
x
w
n
n
NNs learn from examples and exhibit some capability
for generalization beyond the training data.
Input weight weighted Activation
• knowledge is acquired by the network from its environment via learning
vector x vector w sum function
and is stored in the weights of the connections.
•The n-dimensional input vector x is mapped into
• the training (learning) rule – a procedure for modifying the weights of
connections in order to perform a certain task.
variable y by means of the scalar product and a
• There are also some sophisticated techniques that allow learning by
nonlinear function mapping.
adding and pruning connections (between nodes).
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
23 24Neuron Model
Activation function
• Each connection from unit i to j has a numeric weigh w associated with it,
ij
• Activation function, processing element, squashing
which determines the strength and the sign of the connection
function, firing rule…
• Each neuron first computes the weighed sum of its inputs w , and then
p
• Is applied by each neuron to its input values and weights
applies an activation function f to derive the output (activation) a
• A neuron may have a special weight called bias weight b . (as well as the bias) S = θ + Σ (x * W )
i i i=1..n ij ij
• NNs represent a function of their weights (parameters). By adjusting the
• Can be unipolar [0,1] bipolar [-1, 1]
Threshold or step
weights, we change this function. This is done by using a learning rule.
1
• The function can be
– Linear (f (S)=cS),
0
Sigmoid
1 if S>T
1
if there are 2 inputs p =2 and p =3, f(S)=
1 2
0 otherwise
w
11
– Thresholded (f (S)=1 if S>T; 0 otherwise),
and if w = 3, w =1, b = -1.5, then
11 12
Gaussian
1
a = f(2*3+3*1 -1.5) = f(7.5)
w 0
1R
-cS
– a Sigmoid (f (S)=1/(1+e )), 1
f(S)=
-cS
1 + e
What is f?
0
f(P *w + p *w + b) 2
-S2/v -S
1 11 2 12
–a Gaussian (f (S)=e ), etc.
f(S)= e v
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
25 26
Correspondence Between Artificial and
Constructing the Network
Biological Neurons
• The number of input nodes: Generally corresponds to the
• How this artificial neuron relates to the biological one?
dimensionality of the input tuples. Input is converted into binary
– input p (or input vector p) – input signal (or signals) at the
dendrite
and concatenated to form a bitstream.
– weight w (or weight vector w) - strength of the synapse (or
– Eg. age 20-80: 6 intervals
synapses)
• [20, 30) → 000001, [30, 40) → 000010, …., [70, 80) → 100000
– summer & transfer function - cell body
• [20, 30) → 001, [30, 40) → 010, …., [70, 80) → 110
– neuron output a - signal at the axon
• Number of hidden nodes: Determined by expert, or in some cases,
adjusted during training.
• Number of output nodes: Generally number of classes
– Eg. 10 classes
• 0000000001 → C1, 0000000010 → C2, …., 1000000000 → C10
• 0001 → C1, 0010 → C2, …., 1010 → C10
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
27 28Neural Networks - Pros and Cons
Learning Paradigms
• Advantages
– prediction accuracy is generally high.
– robust, works when training examples contain errors. (1) Classification (2) Reinforcement
adjust weights using adjust weights
– output may be discrete, real-valued, or a vector of several
Error = Desired - Actual using reinforcement
discrete or real-valued attributes.
– fast evaluation of the learned target function. Label
• Criticism
Inputs
Actual Output
– long training time.
– difficult to understand the learned function (weights).
– Typically for numerical data
Training
– not easy to incorporate domain knowledge.
data
– Design can be tedious and error prone (Too small: slow
Compare actual class with output
learning - Too big: instability or poor performance)
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
29 30
Learning Algorithms
Major Steps for Back
Propagation Network
• Back propagation for classification
• Constructing a network
• Kohonen feature maps for clustering
– input data representation
– selection of number of layers, number of nodes
• Recurrent back propagation for classification
in each layer.
• Radial basis function for classification
• Training the network using training data
• Pruning the network
• Adaptive resonance theory
• Interpret the results
• Probabilistic neural networks
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
31 32Network Training
Network Pruning
• The ultimate objective of training
– obtain a set of weights that makes almost all the
• Fully connected network will be hard to articulate
tuples in the training data classified correctly.
•Steps:
• n input nodes, h hidden nodes and m output nodes lead
– Initial weights are set randomly.
to h(m+n) links (weights)
– Input tuples are fed into the network one by one.
– Activation values for the hidden nodes are computed.
• Pruning: Remove some of the links without affecting
– Output vector can be computed after the activation values of
classification accuracy of the network.
all hidden node are available.
– Weights are adjusted using error (desired output - actual output)
and propagated backwards.
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
33 34
Backpropagation Network - Architecture Backpropagation Network – Architecture 2
Outlook Tempreature Humidity Windy Play
sunny hot high false No
• 1) A network with 1 or more hidden layers • 5) Neuron model - weighed sum of input signals + differentiable
sunny hot high true No
overcast hot high false Yes
transfer function
rain mild high false Yes
output neurons
rain cool normal false Yes
a = f(wp+b)
rain cool normal true No
1 output neuron for
overcast cool normal true Yes
each class
sunny mild high false No
sunny cool normal false Yes
hidden neurons
rain mild normal false Yes
(1 hidden layer)
sunny mild normal true Yes
overcast mild high true Yes
overcast hot normal false Yes
rain mild high true No
• any differentiable transfer function f can be used; most frequently the
1 input neuron for
sigmoid and tan-sigmoid (hyperbolic tangent sigmoid) functions are used:
inputs
each attribute
n − n
1 e − e
a =
a =
n −n
−n
• 2) Feedforward network - each neuron receives input only from the neurons
e + e
1+ e
in the previous layer
• 3) Typically fully connected - all neurons in a layer are connected with all
neurons in the next layer
• 4) Weights initialization – small random values, e.g. [-1,1]
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
35 36Architecture – Number of Input Units Number of Output Units
Outlook Tempreature Humidity Windy Play Outlook Tempreature Humidity Windy Play
sunny hot high false No sunny hot high false No
• Numerical data - typically 1 input unit • Typically 1 neuron for each class
sunny hot high true No sunny hot high true No
overcast hot high false Yes overcast hot high false Yes
for each attribute
target class ex1: 1 0
rain mild high false Yes rain mild high false Yes
• Categorical data – 1 input unit for each
rain cool normal false Yes rain cool normal false Yes
No Yes
rain cool normal true No rain cool normal true No
attribute value)
overcast cool normal true Yes overcast cool normal true Yes
• How many input units for the weather
sunny mild high false No sunny mild high false No
data?
sunny cool normal false Yes sunny cool normal false Yes
output layer
rain mild normal false Yes rain mild normal false Yes
sunny mild normal true Yes sunny mild normal true Yes
hidden layer(s)
overcast mild high true Yes overcast mild high true Yes
hidden layer(s) overcast hot normal false Yes overcast hot normal false Yes
rain mild high true No rain mild high true No
sunny overcast rainy hot mild cool high normal false true
sunny overcast rainy hot mild cool high normal false true
outlook temperature humidity windy
sunny hot high false No
outlook temperature humidity windy ex.1: 1 0 0 1 0 0 1 0 1 0
• Encoding of the input examples – typically binary depending on the value of the • Encoding of the targets (classes) – typically binary
attribute (on and off) e.g. class1 (no): 1 0, class2 (yes): 0 1
Other possibilities are also acceptable. For
e.g.: 100 100 10 01
Another possibility is to code the target class
example “Windy” could be coded with only
one unit: true or false (1 or 0). with only one unit: Yes or No (1 or 0).
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
37 38
Learning in Backpropagation NNs
Number of Hidden Layers and Units in Them
a
No Yes
1
• An art! Typically - by trial and error
2

Propagate p
Compare and
Propagate
• The task constrains the number of inputs and output units but not the
forward
calculate error
sunny overcast rainy hot …mild cool high normal false true error
number of hidden layers and neurons in them
outlook temperature humidity windy
adjustments
p
– Too many hidden layers and units (i.e. too many weights) – overfitting
Labeled backward
– Too few – underfitting, i.e. the NN is not able to learn the input-output mapping
data
d
– A heuristic to start with: 1 hidden layer with n hidden neurons,
sunny hot high false N
n=(inputs+output_neurons)/2
Idea of backpropagation learning
No Yes
target class ex1: 1 0
For each training example p
– Propagate p through the network and calculate the output a . Compare the desired
… d with the actual output a and calculate the error;
– Update weights of the network to reduce the error;
Until error over all examples < threshold

sunny overcast rainy hot mild cool high normal false true
• Why “backpropagation”? Adjusts the weights backwards (from the output to
the input units) by propagating the weight change ∆w
outlook temperature humidity windy
new old
w = w + ∆w
How to calculate the weight change?
pq pq pq
ex.1: 1 0 0 1 0 0 1 0 1 0
© Dr. Osmar R. Zaïane, 1999, 2007 © Dr. Osmar R. Zaïane, 1999, 2007
Principles of Knowledge Discovery in Data University of Alberta Principles of Knowledge Discovery in Data University of Alberta
39 40

Be the first to leave a comment!!

12/1000 maximum characters.

Broadcast this publication