CSC435 Distributed Systems
Gossip Calculations using UDP

Professor Clark Elliott

NOTE: UNDER CONSTRUCTION—This program will undergo minor adminstration changes for two submissions.

Precisely named submission files:

Gossip.java.txt (Includes GossipLog.txt and GossipPosts.txt as Java comments at the bottom of the source code).
ChecklistGossip.html

Summary:

Implement a fully distributed gossip algorithm for calculating or discovering (a) the average value in the network, (b) the size of the network, (c) the maximum and minimum values in the network and then some other gossip functions. Use UDP and serialized Java Objects exclusively for network communications. Nodes may only gossip with immediate neighbors (by node number). This is a relatively easy assignment to start, so at least partial credit should be within reach of everyone.

GossipStarter.java ←You are free to use this utility code (only!) however you like, if you wish, to get started. I strongly recommend that you TYPE IN THIS CODE if you are not yet fully comfortable with network programming.

Administration:

Submit your two precisely named, required files to D2L before the deadline. No late submissions accepted for credit.
Complete at least two posts (questions, answers, commentary, useful annotated links) on Gossip on the forums. Easy! Include as Java comments at the bottom of your source code.
Use copy and paste to gather sample output from one or more node consoles for your log file showing the console commands you have implemented. Include as Java comments at the bottom of your source code.
In the usual way, we will not grade assignments with more than 30% TII overlap flagging for credit. (Note: under 30% might still indicated plagiarism! Write your own code an dcomments).

Features of the assignment:

There is no central control node. All nodes are equal and run from exactly the same code. Node number is determined by an argument passed to the JVM at run time.
You must implement at least two threads at each node, because you must always be ready to accept console input from a user, and also, in the background, perform network gossip calcuations. That is, when performing the work of a gossip cycle, your acceptance of console input must remain active. You can have as many threads of excution as you like.
Each node should be started in its own terminal window, so you don't go nuts trying to keep the console output straight. (This is how we will run your system.)
Nodes are started from the command line, and stopped by either Delete, or Kill (see below).
Gossip takes place in Gossip Cycles wherein a viral gossip session takes place, changing values throughout the network, then ends. Once a Gossip Cycle ends, a new one can be triggered from the console of any network node.
Create an object to contain local values at your node (e.g., the stored data value, the currently-known minimum and maximum values, the current average, the current average for calculating size, number of cycles...) and another object to send and receive information from other nodes during a gosssip cycle.
One of the more difficult design aspects is that, as is true with all distributed systems, there is no global clock and there is no shared memory space. During a gossip cycle, all nodes might be sending and receiving at the same time, all the time and your algorithm must coordinate everything in a distributed way.
All nodes have a Node ID (0-9), a random, permanent (unless recalculated) local data value (integer, 0-99), a Size value (integer with a default always set to zero at the start of a cycle), an Average Value (changes as this is calculated during a cycle), and a Cycle Number from the beginning of time. Additionally, you can keep many other local values.
We will test the system with up to ten nodes with ID numbers 0-9, assigned by first argument at startup (see below), in any order, with any number of nodes, and any ID numbers. However, you are free to set your system to work with an unlimited number of nodes.
For simplicity, we will test all code on localost only.
The UDP server port numbers used start at 48,100 and are made unique by adding the Node ID number. E.g., for Node7 the server port number will be 48,107. (See the sample code from the Blockchain assignment that gives Java code for this.)
The system is undefined if two nodes are started with the same ID number. This means you can implement whatever behavior you like, including, for example, (a) determining a free Node number using a gossip protocol and using that instead, (b) allowing the system to blow up or act in strange ways, (c) issuing a warning and terminating the process, or (d) any other behavior you prefer. In other words, don't worry about it unless you want to have some bragging rights fun.
The local data bucket at each node is assigned a random integer value from 0-99 at startup.
Nodes may only communicate with neighbor nodes with ID numbers that are one greater or one less. For example, Node7 may only communicate with Node6 and Node8.
Use the viral gossip algorithms from Van Steen and Tanenbaum to calculate network size and average network value. Briefly: For average, nodes calculate the average with each neighbor node repeatedly, replacing their current average (both nodes the same). Example Node7 average is 5, Node8 average is 7, after the swap, both have average 6. For size of network, the trigger node starts with 1, all others start with 0. Calculate the average for THIS value (0s and a single 1). Take the reciprocal (e.g., for average of 0.25, 1/0.25 = 4, so network size is 4).
Networks (and partitioned sub-networks) are formed exclusively of nodes with consecutive node IDs.
The whole system may be split into two or more partitions (unconnected sub-networks) either at startup or when nodes are deleted, based entirely on the currently active node IDs (0-9). For example, active nodes 2,3,4,5 would compose a single network. Active nodes 0,2,3,5,6,8,9 would have four partitions into sub-networks, with each acting independently. If we later added node 7 we would have three sub-networks.
Use only UDP/IP datagrams, clients and servers for communication.
Nodes must both send and recieve datagram (UDP) packets to communicate with one another (and, as above, this must not interfere with console input).
Marshal all data as Java Serializable Objects, using Object Streams. (See ColorServer)
Console commands may be entered at any node, at any time. (See below.)
Nodes are started as follows: "java Gossip 7" which would start the Node7 process.
Nodes can be started and deleted at any time. (Start from the command line, delete from the console).
During any Calculate Cycle, stop gossiping with any neighbor when N messages have been sent. N is set by default to 20. You will need to store the value of N locally.

Console commands to be implemented:

All responses are printed on the ONE console, or ALL the consoles in the Gossip [sub-]network as indicated. For EACH of the following preceed all other output with The Local Node ID. MAKE YOUR OUTPUT SUCCINCT, ONE LINE (or fewer) per piece of information when possible. At the end of the output, print at least a blank line to make reading the console output easy.

t (Tell us all of the commands available [i.e., that YOU have implmeneted] at the console as a list, one command per line with character trigger, and description, like this list.) LOCAL NODE
l (Display the Local Values at this node: Node Value / Anything else you want. Label each value.) Display on ALL NODES.
p (Set up a "ping" function to check whether there is a neighbor node at the port above and the port below. Display the results on the console, including the node numbers. You may find this useful for implementing other methods. Hint: Boolean methods: IsNodeAbove / IsNodeBelow.) LOCAL NODE.
m (Display the minimum value and maximum value currently in the network, along with the Node ID associated with each of them. You must determine these values with gossip. Note that you have to retrieve these values from the [sub-]netowork each time because the network configuration can change at any moment by starting and stopping nodes.) ALL NODES
a (Calculate the average of all the local values in the [sub-]network. Display on ALL NODES, preceded by Local Node ID and Local Node Value).
z (Calculate the current [sub-]network size. Display on ALL NODES.
v (Create new random values throughout the network at each node. Display the old value and the new value on each node.) ALL NODES.
d (Delete the current node. Fully stop the process. Be sure to gracefully close the socket [this is non-trivial; see HostServer], so that we can restart a node later at the same node ID and port number.) Hint: If you are using a worker thread, you might want to implement this graceful full close by connecting to your own listening port to "wake up" your listener so you can actively close it. LOCAL NODE.
k (Kill the entire network—be careful, you may need nodes to hang around for a while to pass along viral gossip with the kill message!) ALL NODES.
y (Display the number of cycles since the beginning of time on every node in the network. Note that this may not be the same on all consoles if two sub-networks have been joined.) ALL NODES.
N (N is an integer. Set the number of gossip messages for the entire [sub-]network that can be sent to the same neighbor during any one cycle. For example: "15" would limit the messages to 15.) Note that it is possible that when sub-networks have been joined, nodes will have different values of N.
Extra command A of your own choosing that you implement (not required).
Extra command B of your own choosing that you implement (not required), etc.

Design considerations:

Although the system ultimately is not overly complicated, it will require some serious design thought ahead of time before you start coding. I strongly recommend that you use pencil and paper, and some blue-sky (creative) thinking before you start writing code. DISTRIBUTED SYSTEMS CODE CAN GET VERY COMPLEX IN A HURRY. We have all sorts of timing and communication issues to deal with. There is no global clock. There is no central shared memory. For this system, there is no "central server" because all nodes are equal.
Yet another challenging design problem is that all these nodes may be gossiping at the same time. You may have gossip going out to one neighbor or another, and at the same time you might have both of those neighbors sending gossip back in to your current node. Everyone is talking all at once!
Because there may be more than one gossip cycle going on at once, you may want to uniquely identify gossip cycles with a unique identifier that is included in every communication data object sent as part of that cycle. For example, if you are calculating an average, you need to make sure that the trade and caculate the average takes place at BOTH nodes as a multi-part transaction. There are several ways you could do this: (a) generate a UUID and insert that into your initial gossip trigger object that is sent, (b) use a two-part identifier: (1) the node ID of the initiator, and (2) a counter at that node, and insert that into the object that is sent.
It is a good idea to build a series of small programs trying out your ideas, then integrate them later in your full program. For example: starting your process with the right port used for UDP communication; trading values with a nearby node; parsing console commands; running a listener (at least one) thread AND a console input thread at the same time; communicating shared values among your various threads of execution; determining what the console input is (including a range of integrers for N); etc.
With UDP you will have to extract the IP address and Port number (reply address) from the datagram packet if you are going to send a reply. When trading values with a neighbor, you must treat the trade as a complete transaction and handle all the coordination yourself.
Your communication data object will likely have a collection of various values and coordination flags in it. For example you probably want to send your current local value, your currently known minimum and maximum along with the node ID associated with each of them, and so on.
You will need the reply address, for example to calculate an average value with neighboring nodes.
You may want to implement the concept of "Trade Values" or "Calculate Neighbor Average" which will require a request and a response from each neighbor. Keep in mind that, e.g., Node7 may be requesting a trade of data with Node8 at the same time Node8 is requesting a trade of data with Node7, AND Node6 is requesting a trade of data with Node7. Each trade / average exchange must complete before you start another one. Do you need to implement a critical section? Note, for the calculation of newtork average, you cannot use a "send only" algorithm, because the total sum of all values must remain constant: if you install the average of the two values at one node, you must also install the average at the other node as well.
If you wish, for coordination you can make use of even and odd nodes. (E.g., only odd nodes initiate trade / average procedures.)
UDP will usually not signal an error if there is no server listening at the address/port you have sent a package to. Does your logic require that you know whether you have upper and lower neighbors? If so, you will probably need to write a "ping" function: Node N sends a "ping" to neighbors node N+1 (and N-1) and expects to get a response saying "I'm alive." If no response comes back, you can assume there is no node there.
Another challenging problem is in knowing when a gossip cycle is complete, and that it is now time to display data on the console of the local process.
You will have to handle the edge case for Node0 which should not look for Node-minus1).
You may have to be clever about resetting flags to "ready" once a cycle is over and you have displayed information on the console. E.g., for caluclating the size of the network.
When nodes are deleted, the partitioned sub-networks should still operate independently.
When missing nodes are added, a joined network should now operate as a whole. So, you can't simply check once at startup for neighbors above and neighbors below. You have to check every time. Or, if you like you can always assume there are two neighbors (UDP doesn't care if it never arrives), then deal with replies as you get them.
QUESTION: We will test your system by running one cycle at a time. How would you have to design your system if we were to allow many overlapping gossip sessions to take place at the same time? (Hint: instead of a collection of values in a single local object, you might want to consider an array of objects, each with a full set of local values, identified by a particular gossip cycle number [UUID?])
QUESTION: It is trivial to accept another argument from the command line, such as with the secondary server for JokeServer. It is trivial to send UDP packets to a real IP address and port (instead of just the localhost loopback address). But what other (major?) design changes would we have to make for our Gossip Network to run on various nodes anywhere on the Internet?

Grading your system

Compile your code from the command line by issuing "javac Gossip.java" twice. Only after everything is fully running, with no more changes, copy your source code to Gossip.java.txt
Copy your GossipLog.txt file and your two (or more) scholarly postings to the Gossip forum to the bottom of your Gossip.java file, as java comments.
Turn in your single, commented, Gossip.java.txt file, and your single ChecklistGossip.html file.
In Windows we will run, for example, the following bat file (modified for Mac / Unix) (or similar):
```
start java Gossip 4
start java Gossip 2
start java Gossip 5
start java Gossip 6
start java Gossip 1
```
This will give us two sub-networks: 1,2 and 4,5,6 Later we might issue (or similar)...
```
start java Gossip 3
```
..at which point we will have a single network.
Run your system, trying all the functions.
Read your code, including checking for plagiarism.
Read your pedagogical COMMENTS in the program.
Verify all the checklist features listed as "yes".
Read your posts.
Read your log file output.

Suggested development order:

Write a collection of small utility programs, one per sub-directory. This is "throw-away" code, designed only for you to learn the techniques. In general, you will want to print out information on the consoles of the various processes as a result of these untility methods.

Start up multiple processes with different arguments to set the Node ID. Create a script to do this so you can easily create several processes (each in its own terminmal window) at once.
Write the "t" function to display on the console (tell us) the different different functions you will have implemented. For now, use a dummy method for each one, so if we enter "c" for example, the method only tells us that we "Will get a calculate function here."
Write some UDP client and server code to make sure you know how UDP works. Send some data back and forth. The protocol is simple, but you may not have done this before so there is a ramp-up period to get used to it. TYPE IN MY SAMPLE UDP CODE?? (Recommended!)
Next you will need to asynchronously start your listener in a separate thread, then continue in the current thread to listen for commands. (See the JokeServer for the Admin logic, except adapt it to accept console commands rather than admin connections.) Your UDP listener/server will always be active and processing all incoming data.
Implement a "ping" between two processes, initiated by console input, to see which neighbors are present.
Write the random value code to create a new random local data value (0-99) for the local node only (small steps!).
Write the random value code to create a new random local data value (0-99) for ALL nodes. Because this is a "send only" gossip function, it may be easiest to implement this one first.)
Write a MinMax method so that each node keeps an updated minimum network value, and maximum network value. Each node sends the values to each of its neighors repeatedly, until you have decided that the cycle is over. When a node gets a greater value than the max, it is replaced. When it gets a lesser value than the min it is replaced. This also can be implemented as a "send only" gossip function.
Write a "trade" method so that two nodes trade local data values and store the average of the two values locally on each node in the local data object.
Figure out how to determine when a Gossip Cycle is complete and test it. For example, in each cycle you might only be allowed to send 20 messages to any one neighbor, and when this is done for both neighbors, for all nodes, the Gossip Cycle is over.
Write the method to calculate the average value of the network (see trade above).
Write the method to calculate the size of the network.
Write the method to gracefully delete (kill off) the current node. Be sure to close the UDP listener socket first, if you can. (See HostServer which shows how to "reach back" and close an identified socket.) See above re. sending a "wake up so you can die" message to your own listener.
Write the gossip method to kill off ALL the nodes in the network. Be careful: you may need a node to stay alive long enough to pass on the kill message to other nodes.
Etc. Then, start combining your small utility code into the full program you have designed.

Commentary:

Unlike TCP, UDP never establishes a connection. Once you send a datagram packet off to the network, there is no certain way to know it arrives unless you, yourself, arrange for an ack from the destination. What this means, in practice, is that if no one is listening at that destination, you might not know it. (Some of the time you might get a "destination unreachable" ICMP message from the last router in the chain, but you shouldn't count on this.)

You might want to review our discussions of request / reply / acknowledge protocols which might be useful in your design.

CSC435 Distributed Systems Gossip Calculations using UDP Professor Clark Elliott