Results¶

Model settings¶

The single hidden layer neural network was tested under the following settings:

The chosen gradient descent method is the mini-batch gradient descent, where the batch size is 1,000, so that there are 50 batches per iteration.
The learning rate follows a harmonic sequence of the form \(\left(\frac{\alpha}{1 + k/\gamma}\right)_{k\in\mathbb{R}}\), such the gradient descent is guaranteed to converge, with \(\alpha=0.0001\) and \(\gamma=1000\).
The number of iterations was chosen to be 300 for the first three cases and 1500 for the last.
A range of neurons (\(M\)) was tested, namely \(M=10,50,150\). The table shows that if we were to choose the number of neurons for the model with best accuracy in the validation set, we would have chosen \(M=150\), which is also performs the best in the test set. These values 10,50,150 were chosen according to the typical range 5 to 100. The value 10 is motivated by the fact that there are 10 different characters to classify, and if M<10, the nullity of \(\beta\) would necessarily be non-zero. The value 150 was chosen in the spirit of the general rule that the higher the number of neurons, the better, provided that there is a mechanism to avoid overfitting.
No regularization method was used. In this case, we see that the number of iterations was not large enough to produce significand overfitting.

The results are summarised in the following table, which was generated automatically by the exercise module:

	M = 10	M = 50	M = 150	M = 10 (5000 iters)
Time elapsed	0 days 00:02:11.794284	0 days 00:04:46.988723	0 days 00:11:01.507839	0 days 00:09:19.538305
Accuracy (training)	0.94208	0.95108	0.96298	0.94642
Accuracy (validation)	0.9368	0.946	0.9563	0.937
Accuracy (test)	0.9355	0.9413	0.956	0.9387