Spring 2024

CSC 578 HW 3: Implementation of Neural Networks


Introduction

Your task in this assignment is to make some modifications to the NNDL book's "network.py" code (in Chapter 1) and write a small application that uses it. The objective of the assignment is to help you strengthen your understanding of the concepts and mathematics of neural networks through implementation.

The amount of code you write for this assignment won't be much. However, understanding the code written by somebody else makes you learn not only the details of the code itself but also the concepts implemented in the code. This is a great exercise to develop your programming skills as well.

Deliverables:

Submit these two things. More instructions are found at the end of this page.

  1. Code files
  2. A documentation file

Overview

You develop code following these steps:

  • Decide which development environment you will use for this assignment.  Whether it is your local computer or a cloud platform, it must have Jupyter notebook and Python installed.
  • Run the startup code provided (a slightly modified NNDL network code and a test code) as is, to ensure your environment is compatible.
  • Make required modifications in the network code and test it with the test code provided.
  • Create a new test application of your own according to the specifications.

Part 1: Initial tests of application notebook

Download the network definition code (a Jupyter Notebook file) NN578_network.ipynb (and its html file), the iris dataset: iris-3.csv, the saved network file: iris-423.dat, and the initial test application code: 578hw3_Check1_CoLab.ipynb (and its html file).  Run all cells in the initial application notebook. Execution should succeed, and you should see the output shown in those files.

Note that, if you are using your local machine/PC to write code, remove a few top cells in the CoLab file by yourself.

Part 2: MODIFICATIONS to be made in the network code

Here, you will extend the network definition code (NN578_network.ipynb) in several ways.

IMPORTANT NOTE: Since a Jupyter notebook creates own execution environment, when you make changes in the file you are importing (e.g. the network file in your case), you must first run all cells in that file (related to the changes), AND THEN you must re-start the kernel (runtime) of the file(s) that import it EVERY TIME.

Modifications to make:

  1. Edit the function evaluate() (which is called after an epoch is complete) so that, in addition to the number of correctly classified instances in the testset, it computes the accuracy, the Mean Squared Error (MSE), Cross-Entropy (CE) and Log-Likelihood (LL). The function should return those five values (correctcount, accuracy, MSE, CE, LL) in a dictionary, where the keys are 'Count', 'Accuracy', 'MSE', 'CE' and 'LL'.

    MSE is described in NNDL 1, Eq. (6), and Cross-Entropy is in NNDL 3, initially in Eq. (57), and more precisely in Eq. (63). Log likelihood is described in NNDL 3 , Eq. (80), but the formula shown in Lecture note #4 (Optimizations), Slide 21, is easier to implement. 
    As a hint, for MSE and Cross-entropy, you can look at the two function classes (QuadraticCost and CrossEntropyCost) in another code file from the book, network2.py (the original version; to be modified in a homework later in the course).

    NOTE: Each cost function should return a scalar value, NOT an array. It must also be the average over the data ('test_data' in the code).  
    Also remember that NO PRINTING takes place inside the function evaluate().

  2. Edit the SGD() function to include these two modifications described below:
    1. Print and format the results returned from evaluate() in the specified way.  If 'test_data' is not provided, print just the results for the training data (as an example).  Also be sure to print exactly 4 decimal digits.

      [Epoch 0] Train: Count= 50, Accuracy=0.3333, MSE=0.3324, CE=1.9056, LL=1.0882
      [Epoch 1] Train: Count=100, Accuracy=0.6667, MSE=0.2862, CE=1.6983, LL=0.9408
      [Epoch 2] Train: Count=100, Accuracy=0.6667, MSE=0.2384, CE=1.4883, LL=0.7665
      If 'test_data' is provided, format the display as follows.
      [Epoch 0] Train: Count=36, Accuracy=0.3429, MSE=0.3339, CE=1.9118, LL=1.0294
                Valid: Count=14, Accuracy=0.3111, MSE=0.3353, CE=1.9183, LL=1.0334
      [Epoch 1] Train: Count=32, Accuracy=0.3048, MSE=0.3335, CE=1.9107, LL=1.0930
                Valid: Count=18, Accuracy=0.4000, MSE=0.3294, CE=1.8922, LL=1.0802
      [Epoch 2] Train: Count=32, Accuracy=0.3048, MSE=0.3241, CE=1.8674, LL=1.0823
                Valid: Count=18, Accuracy=0.4000, MSE=0.3196, CE=1.8477, LL=1.0656
      [Epoch 3] Train: Count=69, Accuracy=0.6571, MSE=0.2853, CE=1.6942, LL=0.9433
                Valid: Count=31, Accuracy=0.6889, MSE=0.2844, CE=1.6914, LL=0.9365
      
    2. Collect the performance results returned from evaluate() for all epochs for training_data and test_data into individual lists, and return the two lists in an enclosing list (i.e., a nested list of length 2, [list_of_training_results, list_of_test_results]).  Note that, if test_data was not provided, 'list_of_test_results' should be an empty list [].   

  3. Further edit the function SGD() so that the epoch loop terminates early.  Currently the loop terminates if the accuracy of the latest epoch becomes 1.0.  Extend the condition to stop early when the accuracy of the latest epoch becomes 1.0 or the MSE loss is no longer decreasing.  For the purpose of this assignment, you terminate the loop when the MSE loss of the latest epoch was larger than or equal to the maximum of the MSE losses of the THREE preceding epochs (if exist) -- simulating the concept of "patience" (used in optimizer schedule in many DL libraries), the number of epochs with no improvement after which training will be stopped.

  4. Edit the function backprop() so that the local variable activations is allocated at the start with a structure which would hold the activations of ALL layers in the network.  The original code starts with just the input layer (by activations = [x]) and appends one layer at a time (by activations.append(activation)).  Change it so that, for instance if the network size were [4, 20, 3], you create a list containing three column vectors -- Numpy arrays whose shapes are (4,1), (20,1) and (3,1) respectively, with the initial value of zero's. Then during the forward-propagation, activation values of each layer are copied/assigned into the respective array (overwriting the zero's).

  5. Edit the function update_mini_batch() so that, inside the for-loop (for x, y in mini_batch:), for every instance (x, y), the delta values delta_nabla_b and delta_nabla_w increment the (mini-batch global) nabla accumulator variables nabla_b and nabla_w (which are initially allocated with zero's).  In the original code, new nabla_b and nabla_w , are created for every instance: that's grossly inefficient. Change it so that per-instance delta_nabla_b and delta_nabla_w (both) are added to the respective nabla accumulators directly (inside the loop).

  6. Further edit the function update_mini_batch() so that, similar/same to the change (E) above, change the lines self.weights= and self.biases= so that self.weights and self.biases are modified/updated DIRECTLY WIHTOUT CREATING A NEW LIST every time.

Test application code

Test your modified network code with this next test application notebook: 578hw3_Check2_CoLab.ipynb (and its html file). Execution should succeed, and you should see the same output shown in shown in file.  Note that this test application code further tests using a deeper network as well, "iris4-20-7-3.dat".  Your code should NOT die!

Again, if you are using your local machine/PC to write code, remove a few top cells in the CoLab file by yourself.

Part 3: Your Own Application Code with Visualization

After passing the tests in the previous two Parts, create your own application code, which should be named as "578hw3_VisApp.ipynb".   Then do the following in sequence:

# Create a new network
net1 = network_nb.Network([4,7,3])

Note that, since the initial weights are randomly assigned now, you may want to re-train a few times until you see 'interesting' results.

HINT: You can use any library to plot. If you do not have experience plotting charts in Python, it's quite easy to do. Here are some sites I recommend: (1): simple and good examples, (2): matplotlib tutorial, (3) : Keras code example. 


Submission

  1. Code:
    • Your modified NN578_network.ipynb AND its pdf/html version
    • The 578hw3_Check2.ipynb where all outputs are shown (baked in the file).
    • Your 578hw3_VisApp.ipynb AND its pdf/html version
    • Make sure you have your name, the course/section number and the assignment name at the top of EACH file.
  2. Report
    • In pdf or docx.  Name the file as you wish.
    • Make sure you have your name, the course/section number and the assignment name at the top of the file.
    • Minimum 2.5 pages (i.e., two full pages and possibly more) including figures.
    • Write as much as you can to demonstrate to me that you earned the points. I consider terse answers insufficient. Full credit will not be given if information is missing or implied. Create a strong, presentable report.
       
    • Answer the following questions.  For questions1 through 7, start your answer with one of these three indicators: Complete, meaning you did the code and verified that it worked; Not attempted, meaning you didn't get there; or Partial, meaning that you have some code but it did not completely work, and explain why.
      Note: Do NOT repeat the questions in your answers.
      1. In Part 2, Modification 1 (or A): the function evaluate().  State whether or not the output of your code matched the given output.  If your results were different, describe the discrepancies and speculate where the discrepancies came from.
      2. In Part 2, Modification 2 (or B): the SGD() function, for both subparts a and b.  State whether or not the output of your code matched the given output.  If your results were different, describe the discrepancies and speculate where the discrepancies came from.
      3. In Part 2, Modification 3 (or C): further edit the function SGD().  State whether or not the output of your code matched the given output.  If your results were different, describe the discrepancies and speculate where the discrepancies came from.
      4. In Part 2, Modification 4 (or D): edit the function backprop().  State whether or not you were able to correctly implement it.  If you were unable to, describe the difficulties you had.
      5. In Part 2, Modification 5 (or E): edit the function update_mini_batch().  State whether or not you were able to correctly implement it.  If you were unable to, describe the difficulties you had.
      6. In Part 2, Modification 6 (or F): further edit the function update_mini_batch().  State whether or not you were able to correctly implement it.  If you were unable to, describe the difficulties you had.
      7. In Part 3, the visualization results: Include the plots described above along with your comments/analysis.
      8. Your reaction and reflection on this assignment overall (e.g. difficulty level, challenges you had, future work). Describe in DETAIL.

Note that, regardless of your claims, we will read your code and run it on our end to verify the correctness nonetheless.

DO NOT ZIP YOUR CODE OR WRITE UP. SUBMIT EACH FILE SEPARATELY.