Backpropagation with a Hidden Layer

A 2-2-1 network learns XOR or AND by adjusting weights. Backpropagation is the method that tells each weight how to change to reduce error.

Step-by-step idea. Step 1: do a forward pass to get a prediction. Step 2: compute a loss (a number that says how wrong the prediction is). Step 3: use the chain rule to assign responsibility for the error to each weight. Step 4: update weights a little in the direction that reduces the loss.

Loss (mean squared error): $L = \frac{1}{N} \sum_{i = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2}$

Chain rule idea: $\frac{\partial L}{\partial w} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial s} \cdot \frac{\partial s}{\partial w}$

Blue arrows show information moving forward. Orange arrows show the error signal moving backward to adjust weights.

Dataset Learning rate (η): 0.5

Loss: 0.25

x1	x2	target y	output $\hat{y}$

Weights

Connection	Value

Training, validation, and test sets. The training set is what the network learns from. The validation set is used to choose settings like the learning rate and the number of hidden units. The test set is kept separate until the end to check how well the model generalizes. A helpful analogy is studying for an exam: practice problems are training, a mock quiz is validation, and the final exam is the test set.