Spring 2024
CSC 578 HW 3: Implementation of Neural Networks
- Graded out of 10 points.
Introduction
Your task in this assignment is to make some modifications to the NNDL book's "network.py" code (in Chapter 1) and write a small application that uses it. The objective of the assignment is to help you strengthen your understanding of the concepts and mathematics of neural networks through implementation.
The amount of code you write for this assignment won't be much. However, understanding the code written by somebody else makes you learn not only the details of the code itself but also the concepts implemented in the code. This is a great exercise to develop your programming skills as well.
Deliverables:
Submit these two things. More instructions are found at the end of this page.
- Code files
- A documentation file
Overview
You develop code following these steps:
- Decide which development environment you will use for this assignment. Whether it is your local computer or a cloud platform, it must have Jupyter notebook and Python installed.
- Run the startup code provided (a slightly modified NNDL network code and a test code) as is, to ensure your environment is compatible.
- Make required modifications in the network code and test it with the test code provided.
- Create a new test application of your own according to the specifications.
Part 1: Initial tests of application notebook
Download the network definition code (a Jupyter Notebook file) NN578_network.ipynb (and its html file), the iris dataset: iris-3.csv, the saved network file: iris-423.dat, and the initial test application code: 578hw3_Check1_CoLab.ipynb (and its html file). Run all cells in the initial application notebook. Execution should succeed, and you should see the output shown in those files.
Note that, if you are using your local machine/PC to write code, remove a few top cells in the CoLab file by yourself.
Part 2: MODIFICATIONS to be made in the network code
Here, you will extend the network definition code (NN578_network.ipynb) in several ways.
IMPORTANT NOTE: Since a Jupyter notebook creates own execution environment, when you make changes in the file you are importing (e.g. the network file in your case), you must first run all cells in that file (related to the changes), AND THEN you must re-start the kernel (runtime) of the file(s) that import it EVERY TIME.
Modifications to make:
-
Edit the function
evaluate()(which is called after an epoch is complete) so that, in addition to the number of correctly classified instances in the testset, it computes the accuracy, the Mean Squared Error (MSE), Cross-Entropy (CE) and Log-Likelihood (LL). The function should return those five values (correctcount, accuracy, MSE, CE, LL) in a dictionary, where the keys are 'Count', 'Accuracy', 'MSE', 'CE' and 'LL'.MSE is described in NNDL 1, Eq. (6), and Cross-Entropy is in NNDL 3, initially in Eq. (57), and more precisely in Eq. (63). Log likelihood is described in NNDL 3 , Eq. (80), but the formula shown in Lecture note #4 (Optimizations), Slide 21, is easier to implement.
As a hint, for MSE and Cross-entropy, you can look at the two function classes (QuadraticCost and CrossEntropyCost) in another code file from the book, network2.py (the original version; to be modified in a homework later in the course).NOTE: Each cost function should return a scalar value, NOT an array. It must also be the average over the data ('test_data' in the code).
Also remember that NO PRINTING takes place inside the functionevaluate(). - Edit the
SGD()function to include these two modifications described below:-
Print and format the results returned from
evaluate()in the specified way. If 'test_data' is not provided, print just the results for the training data (as an example). Also be sure to print exactly 4 decimal digits.[Epoch 0] Train: Count= 50, Accuracy=0.3333, MSE=0.3324, CE=1.9056, LL=1.0882 [Epoch 1] Train: Count=100, Accuracy=0.6667, MSE=0.2862, CE=1.6983, LL=0.9408 [Epoch 2] Train: Count=100, Accuracy=0.6667, MSE=0.2384, CE=1.4883, LL=0.7665
If 'test_data' is provided, format the display as follows.[Epoch 0] Train: Count=36, Accuracy=0.3429, MSE=0.3339, CE=1.9118, LL=1.0294 Valid: Count=14, Accuracy=0.3111, MSE=0.3353, CE=1.9183, LL=1.0334 [Epoch 1] Train: Count=32, Accuracy=0.3048, MSE=0.3335, CE=1.9107, LL=1.0930 Valid: Count=18, Accuracy=0.4000, MSE=0.3294, CE=1.8922, LL=1.0802 [Epoch 2] Train: Count=32, Accuracy=0.3048, MSE=0.3241, CE=1.8674, LL=1.0823 Valid: Count=18, Accuracy=0.4000, MSE=0.3196, CE=1.8477, LL=1.0656 [Epoch 3] Train: Count=69, Accuracy=0.6571, MSE=0.2853, CE=1.6942, LL=0.9433 Valid: Count=31, Accuracy=0.6889, MSE=0.2844, CE=1.6914, LL=0.9365 -
Collect the performance results returned from
evaluate()for all epochs fortraining_dataandtest_datainto individual lists, and return the two lists in an enclosing list (i.e., a nested list of length 2, [list_of_training_results, list_of_test_results]). Note that, iftest_datawas not provided, 'list_of_test_results' should be an empty list [].
-
-
Further edit the function
SGD()so that the epoch loop terminates early. Currently the loop terminates if the accuracy of the latest epoch becomes 1.0. Extend the condition to stop early when the accuracy of the latest epoch becomes 1.0 or the MSE loss is no longer decreasing. For the purpose of this assignment, you terminate the loop when the MSE loss of the latest epoch was larger than or equal to the maximum of the MSE losses of the THREE preceding epochs (if exist) -- simulating the concept of "patience" (used in optimizer schedule in many DL libraries), the number of epochs with no improvement after which training will be stopped. -
Edit the function
backprop()so that the local variableactivationsis allocated at the start with a structure which would hold the activations of ALL layers in the network. The original code starts with just the input layer (byactivations = [x]) and appends one layer at a time (byactivations.append(activation)). Change it so that, for instance if the network size were [4, 20, 3], you create a list containing three column vectors -- Numpy arrays whose shapes are (4,1), (20,1) and (3,1) respectively, with the initial value of zero's. Then during the forward-propagation, activation values of each layer are copied/assigned into the respective array (overwriting the zero's). -
Edit the function
update_mini_batch()so that, inside the for-loop (for x, y in mini_batch:), for every instance(x, y), the delta valuesdelta_nabla_banddelta_nabla_wincrement the (mini-batch global) nabla accumulator variablesnabla_bandnabla_w(which are initially allocated with zero's). In the original code, newnabla_bandnabla_w, are created for every instance: that's grossly inefficient. Change it so that per-instancedelta_nabla_banddelta_nabla_w(both) are added to the respective nabla accumulators directly (inside the loop). -
Further edit the function
update_mini_batch()so that, similar/same to the change (E) above, change the linesself.weights=andself.biases=so thatself.weightsandself.biasesare modified/updated DIRECTLY WIHTOUT CREATING A NEW LIST every time.
Test application code
Test your modified network code with this next test application notebook: 578hw3_Check2_CoLab.ipynb (and its html file). Execution should succeed, and you should see the same output shown in shown in file. Note that this test application code further tests using a deeper network as well, "iris4-20-7-3.dat". Your code should NOT die!
Again, if you are using your local machine/PC to write code, remove a few top cells in the CoLab file by yourself.
