Replace the step function in the with a continuous (differentiable) activation function, e.g linear
For classification problems, use the step function only to determine the class and not to update the weights.
Note: this is the same algorithm we saw for regression. All that really differs is how the classes are determined.
Delta Rule:
Training by Gradient Descent Revisited Construct a cost function E that measures how well the network has learned.
For example
(one output node)
where
n = number of examples
ti = desired target value associated with the i-th example
yi = output of network when the i-th input pattern is presented to network
To train the network, we adjust the weights in the network so as to decrease the cost (this is where we require differentiability). This is called gradient descent.
Algorithm
Initialize the weights with some small random value
Until E is within desired tolerance, update the weights according towhere E is evaluated at W(old), m is the learning rate.: and the gradient is
More than Two Classes.
If there are mor ethan 2 classes we could still use the same network but instead of having a binary target, we can let the target take on discrete values. For example of there ar 5 classes, we could have t=1,2,3,4,5 or t= -2,-1,0,1,2. It turns out, however, that the network has a much easier time if we have one output for class. We can think of each output node as trying to solve a binary problem (it is either in the given class or it isn't).
0 comments:
Post a Comment