Spring 2024

CSC 578 HW 3: Implementation of Neural Networks

Graded out of 10 points.

Introduction

Your task in this assignment is to make some modifications to the NNDL book's "network.py" code (in Chapter 1) and write a small application that uses it. The objective of the assignment is to help you strengthen your understanding of the concepts and mathematics of neural networks through implementation.

The amount of code you write for this assignment won't be much. However, understanding the code written by somebody else makes you learn not only the details of the code itself but also the concepts implemented in the code. This is a great exercise to develop your programming skills as well.

Deliverables:

Submit these two things. More instructions are found at the end of this page.

Code files
A documentation file

Overview

You develop code following these steps:

Decide which development environment you will use for this assignment. Whether it is your local computer or a cloud platform, it must have Jupyter notebook and Python installed.
Run the startup code provided (a slightly modified NNDL network code and a test code) as is, to ensure your environment is compatible.
Make required modifications in the network code and test it with the test code provided.
Create a new test application of your own according to the specifications.

Part 1: Initial tests of application notebook

Download the network definition code (a Jupyter Notebook file) NN578_network.ipynb (and its html file), the iris dataset: iris-3.csv, the saved network file: iris-423.dat, and the initial test application code: 578hw3_Check1_CoLab.ipynb (and its html file). Run all cells in the initial application notebook. Execution should succeed, and you should see the output shown in those files.

Note that, if you are using your local machine/PC to write code, remove a few top cells in the CoLab file by yourself.

Part 2: MODIFICATIONS to be made in the network code

Here, you will extend the network definition code (NN578_network.ipynb) in several ways.

IMPORTANT NOTE: Since a Jupyter notebook creates own execution environment, when you make changes in the file you are importing (e.g. the network file in your case), you must first run all cells in that file (related to the changes), AND THEN you must re-start the kernel (runtime) of the file(s) that import it EVERY TIME.

Modifications to make:

Edit the function evaluate() (which is called after an epoch is complete) so that, in addition to the number of correctly classified instances in the testset, it computes the accuracy, the Mean Squared Error (MSE), Cross-Entropy (CE) and Log-Likelihood (LL). The function should return those five values (correctcount, accuracy, MSE, CE, LL) in a dictionary, where the keys are 'Count', 'Accuracy', 'MSE', 'CE' and 'LL'.

MSE is described in NNDL 1, Eq. (6), and Cross-Entropy is in NNDL 3, initially in Eq. (57), and more precisely in Eq. (63). Log likelihood is described in NNDL 3 , Eq. (80), but the formula shown in Lecture note #4 (Optimizations), Slide 21, is easier to implement.
As a hint, for MSE and Cross-entropy, you can look at the two function classes (QuadraticCost and CrossEntropyCost) in another code file from the book, network2.py (the original version; to be modified in a homework later in the course).

NOTE: Each cost function should return a scalar value, NOT an array. It must also be the average over the data ('test_data' in the code).
Also remember that NO PRINTING takes place inside the function evaluate().

Edit the SGD() function to include these two modifications described below:

Print and format the results returned from evaluate() in the specified way. If 'test_data' is not provided, print just the results for the training data (as an example). Also be sure to print exactly 4 decimal digits.

[Epoch 0] Train: Count= 50, Accuracy=0.3333, MSE=0.3324, CE=1.9056, LL=1.0882
[Epoch 1] Train: Count=100, Accuracy=0.6667, MSE=0.2862, CE=1.6983, LL=0.9408
[Epoch 2] Train: Count=100, Accuracy=0.6667, MSE=0.2384, CE=1.4883, LL=0.7665

If 'test_data' is provided, format the display as follows.

[Epoch 0] Train: Count=36, Accuracy=0.3429, MSE=0.3339, CE=1.9118, LL=1.0294
          Valid: Count=14, Accuracy=0.3111, MSE=0.3353, CE=1.9183, LL=1.0334
[Epoch 1] Train: Count=32, Accuracy=0.3048, MSE=0.3335, CE=1.9107, LL=1.0930
          Valid: Count=18, Accuracy=0.4000, MSE=0.3294, CE=1.8922, LL=1.0802
[Epoch 2] Train: Count=32, Accuracy=0.3048, MSE=0.3241, CE=1.8674, LL=1.0823
          Valid: Count=18, Accuracy=0.4000, MSE=0.3196, CE=1.8477, LL=1.0656
[Epoch 3] Train: Count=69, Accuracy=0.6571, MSE=0.2853, CE=1.6942, LL=0.9433
          Valid: Count=31, Accuracy=0.6889, MSE=0.2844, CE=1.6914, LL=0.9365

Collect the performance results returned from evaluate() for all epochs for training_data and test_data into individual lists, and return the two lists in an enclosing list (i.e., a nested list of length 2, [list_of_training_results, list_of_test_results]). Note that, if test_data was not provided, 'list_of_test_results' should be an empty list [].

Further edit the function SGD() so that the epoch loop terminates early. Currently the loop terminates if the accuracy of the latest epoch becomes 1.0. Extend the condition to stop early when the accuracy of the latest epoch becomes 1.0 or the MSE loss is no longer decreasing. For the purpose of this assignment, you terminate the loop when the MSE loss of the latest epoch was larger than or equal to the maximum of the MSE losses of the THREE preceding epochs (if exist) -- simulating the concept of "patience" (used in optimizer schedule in many DL libraries), the number of epochs with no improvement after which training will be stopped.
Edit the function backprop() so that the local variable activations is allocated at the start with a structure which would hold the activations of ALL layers in the network. The original code starts with just the input layer (by activations = [x]) and appends one layer at a time (by activations.append(activation)). Change it so that, for instance if the network size were [4, 20, 3], you create a list containing three column vectors -- Numpy arrays whose shapes are (4,1), (20,1) and (3,1) respectively, with the initial value of zero's. Then during the forward-propagation, activation values of each layer are copied/assigned into the respective array (overwriting the zero's).
Edit the function update_mini_batch() so that, inside the for-loop (for x, y in mini_batch:), for every instance (x, y), the delta values delta_nabla_b and delta_nabla_w increment the (mini-batch global) nabla accumulator variables nabla_b and nabla_w (which are initially allocated with zero's). In the original code, new nabla_b and nabla_w , are created for every instance: that's grossly inefficient. Change it so that per-instance delta_nabla_b and delta_nabla_w (both) are added to the respective nabla accumulators directly (inside the loop).
Further edit the function update_mini_batch() so that, similar/same to the change (E) above, change the lines self.weights= and self.biases= so that self.weights and self.biases are modified/updated DIRECTLY WIHTOUT CREATING A NEW LIST every time.

Test application code

Test your modified network code with this next test application notebook: 578hw3_Check2_CoLab.ipynb (and its html file). Execution should succeed, and you should see the same output shown in shown in file. Note that this test application code further tests using a deeper network as well, "iris4-20-7-3.dat". Your code should NOT die!

Again, if you are using your local machine/PC to write code, remove a few top cells in the CoLab file by yourself.