2.4. Implementing XOR-gate from Scratch
This section demonstrates how to implement an XOR gate with neural network from scratch.
Complete Python code is available at: XOR-gate.py
[1] Prepare Inputs and Labels.
# Inputs
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
# The ground-truth labels
Y = np.array([[0], [1], [1], [0]])
# Convert row vectors into column vectors.
X = X.reshape(4, 2, 1)
Y = Y.reshape(4, 1, 1)
[2] Create model
We create two matrices: $W$ and $U$, and two vectors: $b$ and $c$.
input_nodes = 2
hidden_nodes = 3
output_nodes = 1
#
# Initialize random weights and bias
#
W = np.random.uniform(size=(hidden_nodes, input_nodes))
b = np.random.uniform(size=(hidden_nodes, 1))
U = np.random.uniform(size=(output_nodes, hidden_nodes))
c = np.random.uniform(size=(output_nodes, 1))
[3] Training
The training loop involves forward propagation and backpropagation:
- Forward propagation: Outputs the output $y$ for each input $X[i]$.
- Backpropagation: Computes gradients ($dL, db, dW, dc, dU$) and updates weights ($b,W,c,U$) using gradient descent algorithm, which is explained in Appendix 2.3.
n_epochs = 15000 # Epochs
lr = 0.1 # Learning rate
#
# Training loop
#
for epoch in range(1, n_epochs + 1):
loss = 0.0
for i in range(0, len(Y)):
#
# Forward Propagation
#
h_h = np.dot(W, X[i]) + b
h = sigmoid(h_h)
y_h = np.dot(U, h) + c
y = sigmoid(y_h)
#
# Back Propagation
#
loss += np.sum((y - Y[i]) ** 2 / 2) # for measuring the training progress
dL = (y - Y[i])
# Computing the gradients
db = np.dot(U.T, dL * deriv_sigmoid(y_h)) * deriv_sigmoid(h_h)
dW = np.dot(db, X[i].T)
dc = dL * deriv_sigmoid(y_h)
dU = np.dot(dc, h.T)
# Updating Weights and Biases using gradient descent.
c -= lr * dc
U -= lr * dU
b -= lr * db
W -= lr * dW
[4] Test
Run the following command to test the model:
$ python XOR-gate.py
-----------------------------------------------------------------
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 3) 9
dense_1 (Dense) (None, 1) 4
=================================================================
Total params: 13
epoch: 1 / 15000 Loss = 0.608350
epoch: 1000 / 15000 Loss = 0.497935
epoch: 2000 / 15000 Loss = 0.446343
epoch: 3000 / 15000 Loss = 0.338422
epoch: 4000 / 15000 Loss = 0.294963
epoch: 5000 / 15000 Loss = 0.279844
epoch: 6000 / 15000 Loss = 0.272833
epoch: 7000 / 15000 Loss = 0.268903
epoch: 8000 / 15000 Loss = 0.266423
epoch: 9000 / 15000 Loss = 0.264728
epoch: 10000 / 15000 Loss = 0.263502
epoch: 11000 / 15000 Loss = 0.262577
epoch: 12000 / 15000 Loss = 0.261856
epoch: 13000 / 15000 Loss = 0.261279
epoch: 14000 / 15000 Loss = 0.260807
epoch: 15000 / 15000 Loss = 0.260415
------------------------
x0 XOR x1 => result
========================
0 XOR 0 => 0.0400
0 XOR 1 => 0.9671
1 XOR 0 => 0.9671
1 XOR 1 => 0.0330
========================
The model output indicates that it has learned the XOR gate functionality.