Simple Multi-threaded Web Server
Elliott
(Only for background reading this quarter. This is NOT THE CODING ASSIGNMENT.)
Overview:
MyWebServer Checklist
Firefox Browser tools (Quick: Ctrl-Shift-E to raise console. Network
/ Inspector tabs | drag top up for larger console window.)
All MyWebserver programs MUST communicate with the Firefox browser.
In this program you will follow through the steps of capturing the http
stream between existing clients and servers, and write a web server that
supports this same protocol. It builds on the JokeServer, which application
does much of the same work. While the text of the assignment is quite long,
the application itself is quite straightforward, and you might be surprised
at how easily it can be written.
There are four+ phases in the development process:
- Capture the HTTP protocol first-hand by developing some hacking /
debugging skills (hacking in the good sense).
- Return simple, static files on request from a browser client.
- Return dynamically created HTML (build a directory HTML page
dynamically)
- Accept FORM input from the user and do back-end processing on the
server to return computed values in (simple!) dynamically-created HTML.
- Add features of your own choosing, if you like.
See the MyWebServer Tips file for some
suggestions once you get coding.
Run at port 2540 in the server directory!
In all cases these following specifications take precedence: The web server
must run at port http://localhost:2540. It must, by default, serve files
from the directory in which the web server is started, including dog.txt,
cat.html. The source code should be contained in a single,
stand-alone file name MyWebServer.java ready to compile and
run. Subdirectories should be recursively traversed from the default directory in which
the server is started.
Grading procedure:
- Run our various plagiarism checkers on your submission.
- Extract your zip file into a directory, and run a script file that:
- Executes > javac MyWebServer.java
- Populates the new directory with .txt files, .html
files and .java files such as dog.txt, cat.html, MyWebserver.java and the file addnums.html (with an action
statement that points to port 2540 on localhost), then creates subdirectories and
populates those with .txt files and .html files.
- Executes "> java MyWebserver" to start your webserver at port 2540.
- In firefox read your directory listing for the directory where the
server is running, using port 2540.
- Select checklist-mywebserver.html from your listing and read it.
- Browse the .txt .java (treated like .txt) and .html files with which we have populated your directory.
- Select the addnums.html file and submit data through it.
- Select http-streams.txt and read it.
- Select serverlog.txt and read it.
- Select MyWebserver.java, read your source code, and look at the comments. Note: you should display
.java files the same as .txt files by sending the data as text/plain.
- Navigate to the subdirectories and read .txt, .java and .html files there.
Special Security Note:
I expect that you will find that in its most basic form this is not a particularly difficult
assignment. If so, you will soon have a viable, running webserver of your
own creation. If you are developing on a machine that is also connected to
the Internet this means that you might well expose all of the files on your
local machine (or any remote machine where you might be running) to evil
hackers from around the world who are anxious to steal information from your
files. In the worst case this information would allow them write access to
your disk, and/or put financial/personal information in their hands. So—be careful. Hard-code into your server that you only return files from your
root server directory of unimportant files, keep your firewall on,
etc. Be careful about the "../.." form of URLs, which would allow someone to
retrieve files from above your server's directory. For particularly
sensitive machines you can always simply unplug your Internet
connection
while running your server.
Server Directories
For this assignment your server must serve files from the directory where
the server is started. Place all of your submission files in this
same directory.
Administration:
- Submission files: MyWebServer.java,
http-streams.txt, serverlog.txt, checklist-mywebserver.html You
MUST use these exact names.
- Copy the checklist
for this programming assignment. Fill in the blanks. Update it as you
make progress. NEVER change yes to no, unless you have completed the
work. Turn it in to D2L along with your assignment.
- Zip your your files into one, flat, directory, and submit to D2L (No
subdirectories!) Verify that your submission has not been corrupted.
- Concatenate MyWebServer.java, http-streams.txt, serverlog.txt into a
single text file and submit to MyWebserverTII at D2L
- "javac *.java" must work to compile your source code.
- Make sure that you are familiar with the assignment submission rules
(see assignment one, which covers this in detail). Programs that do not
precisely conform to the rules will not be graded. Please do not ask for an
exception to this policy.
- Your websever must, by default, serve directories—and files—from the directory in which
it runs so that we can test it. If you also want to implement something more
sophisticated, such as a default webserver directory, then pass a flag as an
argument to your webserver, but keep the default as the current directory.
- Refer to the InetServer PDF
document, and the lecture, along with your JokeServer if you have completed
it, for the basic program on which you build. Most of you will have
completed this assignment, and extended it, well in advance of the
MyWebServer program.
Capturing HTTP:
- Goal: Be hackers in the good sense... See what a Web browser, and a
webserver are saying to one another for simple browser requests, so that you
can later copy that functionality into your own server program.
- Note that you can use WireShark (see the labs) to capture these streams, as an
alternative to the hacking methods that follow. You probably can also
capture the streaming data directly in the Firefox Browser (search on
inspection tools). Any method is valid.
- IF YOU WANT TO DO (PART OF) THIS YOURSELF USING JAVA:
- Use the given
MyListener.java code, based on Inet. Modify, and simplify, the code as
desired so that it runs at port 2540 and on the console it simply displays
everything sent to it, and optionally writes it to a log file as well. If
you want, have it send back a valid text/plain response to the client,
acknowledging receipt of the "request" ( but note that this is just some
minor elegance, not really needed).
That is, if some simple client were to send the message, "ABC Hello there in Server land! "
then the server would display the message "ABC Hello there in Server land! "
on the server console, and optionally might send some message such as "Got your request"
back to the client. (If you don't return a message to a browser, the browser will just
hang, but we don't care.)
You now have a simple "listener" program which echos all input on the server
console .
If you want to be fancy, your MyListener program can, in addition to the
console display, also send all of the information back to the client as
HTML-formatted (or plain text) data. This is not required but could be generally useful as
an echo-server showing the full format of requests. Note that you will have to
send back the corrent MIME type for HMTL: "Content-Type: text/html [cr/lf]
[cr/lf]" (see below).
- Start MyListener and connect to it with Firefox as follows:
Make valid webserver
requests of MyListener by entering URLS such as
http://localhost:2540/dog.txt, and
http://localhost:2540/cat.html.
Notice, and record, what the
browser sends your MyListener program in each case (it is displayed on the
server console). This is the HTTP stream that the browser sends when it is
requesting files from a web server. You have now hacked it.
- Capture the console output from MyListener into some file as well (or
simply copy it from the console window and paste into a file), for submission as
part of the assignment.
- For example, following the above procedure, while running my listener
at port 2540, I get the following information for a request of dog.txt in
the root web server directory.
C:\dp\435\java>java MyListener
Clark Elliott's Port listener running at 2540.
GET /dog.txt HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwave-flash, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)
Host: localhost:2540
Connection: Keep-Alive
(Note: you may wish to experiment with "Connection: close" with your
webserver if you are having buffering problems.)
- Put this captured output into your http-streams.txt files for submission with the
assignment. Copy and paste from the console is fine. We ONLY need the data
from the http streams you've captured.
- We are now going to use the HTTP stream we have just captured to
manually retrieve files from a web server. As an example, you can retrieve files from my faculty account at:
condor.depaul.edu/elliott/dog.txt and
condor.depaul.edu/elliott/cat.html (But note: Tech support regularly
moves my directories around. If elliott does not work, try it with a tilde
[~elliott]. Or, if you have a webserver on your PC you can just use that. Or you can
install the
SourceForge Uniform Server and run that, which is a version of the
Apache server that
runs on every unix machine. Or you can start the apache web server that
runs on your Mac (sudo apachectl start?). But in all cases be careful
because you are now serving files from your file system to the network!
- Either use Wireshark, or use/modify a MyTelnetClient.java program by modifying
your InetClient, or JokeClient, so that it allows you to type in an
arbitrary text string, and send this (via port 80) to some webserver. Note
that while telnet is disabled on Windows by default it is still there and
can be activated.
- Use your MyTelnet program to manually enter into a dialog with the
condor.depaul.edu (or some other) web server. Write the appropriate input
and output for your MyTelnetClient program to a log file (or copy it from
your console window, or capture it in Wireshark), for later submission to D2L as part of your
http-streams.txt file, but I don't need to see your source code for
this simple program either.
We are working with condor.depaul.edu for convenience because that is where
we put our files. However, we could just as easily manually get files from
the web server at www.cnn.com if our files were on that machine.
You will connect at port 80 instead of the default telnet port of 23,
because you want to tak to the web server, instead of the telnet server.
Do this by entering the shell command,
MyTelnetClient condor.depaul.edu 80 <-- or whichever server you are using
The condor.depaul.edu web server is now waiting for input from you.
You can use the following static files in the step below, or similar files
that you have created on your own webserver:
http://condor.depaul.edu/elliott/cat.html
http://condor.depaul.edu/elliott/dog.txt
- Enter the valid HTTP request stream that you captured using your
listener, for retrieving the file dog.txt from a web server. Note that you
will have to be careful to include all of the necessary information,
including carriage return / linefeeds (cr/lfs), and that you will have to
make changes as needed for different servers. You could probably use copy
and paste if you are clever, but unless you connect many times it is
probably not worth it.
Hint: some of the information, such as "Accept" and "User-Information" is
not needed by the web server, and you can find what you can leave out through
experimentation.
- If you enter the HTTP correctly the web server will now send your
requested file back to you as a text stream response to your MyTelnetClient program.
If you enter it incorrectly you will still usually get some kind of
valid response, albeit one containing an error message.
- Here is a sample session, yours will be similar, but may differ in some
of the details, depending which webserver you are using, on which
machine. (Note: server configurations change, so you may have to vary what
you send to get a response. Follow what your browser sends. My account on
condor moves all the time and you may only get a (valid!) error message.)
> java MyTelnetClient condor.depaul.edu
Clark Elliott's MyTelnet Client, 1.0.
Using server: condor.depaul.edu, Port: 80
Enter text to send to the server, <stop> to end: GET /elliott/dog.txt HTTP/1.1
Enter text to send to the server, <stop> to end: Host: condor.depaul.edu:80
Enter text to send to the server, <stop> to end:
Enter text to send to the server, <stop> to end:
Enter text to send to the server, <stop> to end: stop
HTTP/1.1 200 OK
Date: Wed, 03 Oct 2018 20:40:45 GMT
Server: Apache/2.2.3 (Red Hat)
Last-Modified: Wed, 07 Oct 2015 20:29:55 GMT
ETag: "8a1bfc-30-521899bff76c0"
Accept-Ranges: bytes
Content-Length: 48
Content-Type: text/plain
Connection: close
This is Elliott's dog file on condor. Good job!
- Note: you may get a different response. What we are looking for is SOME
HTTP / HTML response from the webserver. For example, if the file has been
moved somewhere else, you might get back a well-formed error message. This
is fine. In either case you are successfully talking with the webserver.
- Put your captured output into http-streams.txt for submission with the
assignment.
- [Note: You can use Wireshark, and also the Firefox browser console to see
network traffic. In the past Firefox has allowed you to download and
install a plug-in called HTTPFox (tools -> add-ons -> get add-ons). After
HTTPFox is installed you'll see a small icon in the bottom right corner of
your browser window. With HTTPFox you will be able to see all outgoing
traffic from your web browser, as well as all of the server responses
coming back. (Similar to Fiddler for IE) (Thanks Arkadiusz)]
- So, in summary: Create the simple files dog.txt, cat.html,
in your home web directory (or use my files). Verify that they can be reached from the
web. Retrieve your files manually using MyTelnetClient to port 80, or
WireShark, or HTTPFox and add these to http-streams.txt along with your
MyListener data.
- You have now captured both the request coming from a web client, and the
response coming from a web server. Ta-duh.
MIME headers
For this assignment we will use two mime types: Content-Type: text/plain
and Content-Type: text/html. These must be
followed by two cr/lf and then your data.
MIME types are determined by the server from the file extension of the files
that are requested. .html will use text/html, and .txt and .java files will both
use text/plain. (This is just a trick so we can view your java source code
through your webserver.)
Modify your MultiThreaded server so that it becomes a simple web
server.
Goal: Your web server must correctly return requests for files with
extensions of .txt, and .html [and also .java which are treated as the same
as .txt]. This means that it must return the correct MIME headers (That is,
the Content-type [followed by two cr/lf], and Content-length headers), as
well as the data. This is a server that operates on static data.
- Copy your MyListener.java source into a file called MyWebServer.java.
- Copy over your files dog.txt, cat.html to your local
machine into the directory where you are developing your web server, for later use.
- Using the manual responses you captured from the web server (see above),
which contains ALL of the information that the web server sends back to a
client, including, specifically the MIME type information (Content-Type:) and
Content-Length:, modify your listener so that it becomes a valid web server
by sending back a valid text stream, including headers, to the web client.
See HTTP
protocol for some hints.
- In practice you need not send back all of the responses. You WILL want
to include:
HTTP/1.1 200 OK
Content-Length: 47 [Where 47 is changed to the real length of the data --
but note that you might make initial tests by just setting this value high]
Content-Type: text/plain [Where text/plain might also be: text/html]
[followed by two carriage return / linefeeds (crlf), and then the data.]
Modern browsers handle the mini favicon files (the tiny logo that can appear
in the URL window) requests different ways. If your Firefox browser sends a
request for a favicon, you should write code to ignore it. That is, for this assignment we just want
those requests to go away anyway we can manage it. If you put a favicon.ico
file in your server's root directory it may solve problems for you. Here is
the WikiPedia article on
favicons
The following end of line hints might be useful:
static final byte[] EOL = {(byte) '\r', (byte) '\n'};
or:
outstream.writeBytes("Content-Type: " + ConType + "\r\n\r\n");
or:
outstream.print("\r\n\r\n");
- Configure your sever so that it sends back the correct MIME type
headers for .txt, and .html files [text/plain, and text/html, respectively].
- Use your MyListener, and the MyTelnetClient tricks, or WireShark, for debugging as needed.
Extend your server to include directories:
Goal: Extend your server so that it sends back dynamically constructed data:
in this case the HTML-formatted current contents of a directory.
This will now be a server that operates on dynamic data.
[Intermediate step: If you are struggling with this assignment, you might
want to first simply create some dynamically created HTML, by sending back
an very simple HTML file with dynamic data in it, such as the current
time. This way you can at least say you have written back dynamic HTML to
the client. Then once you are getting the text/html mime type working with
dynamic data, go on to creating a directory listing.]
- Note: Most webservers no longer allow the promiscuous display of
a directory's contents. But we will provide it from our server as an exercise.
- See the ReadFiles.java
program for hints on how to read the contents of a directory in Java.
[Note: a directory is simply a more-or-less regular file that contains the
names of other files in it, along with some associated information.]
- Modify your webserver so that it correctly returns a promiscuous
display of the server's directory as requested by the client. Note that
you may want to include some security here, since you WILL be writing
a valid, albeit simple, web server. For example, you might want to restrict
access to a certain subdirectory of where the server is running.
- The first step is to simply send back a plain text listing of the
files in the directory, along with a text/plain MIME header, and the
length of your data.
- The second step is to send back some kind of formatted HTML with
a text/html MIME header.
- The third step (really not that hard) is send back the names of the
files as hot-link references such that "clicking-on" them in the browser
will cause your server to send back the contents of that file.
- Using our MyTelnetClient hack we used to be able to see what a regular
server would send back as an
html listing of hot-links for files. (For security reasons, most servers no
longer give directory listings.) For example, for the condor.depaul.edu
request "GET /elliott/435/.xyz/" condor we used to get back the following:
[...]
<h1>Index of /elliott/435/.xyz</h1>
<pre><img src="/icons/blank.gif" alt="Icon "> <a href="?C=N;O=D">Name</a> <a href="?C=M;O=A">Last modified</a> <a href="?C=S;O=A">Size</a> <a href="?C=D;O=A">Description</a><hr><img src="/icons/back.gif" alt="[DIR]"> <a href="/elliott/435/">Parent Directory</a> -
<img src="/icons/text.gif" alt="[TXT]"> <a href="dog.txt">dog.txt</a> 16-Sep-2005 14:09 39
<img src="/icons/text.gif" alt="[TXT]"> <a href="cat.html">cat.html</a> 16-Sep-2005 14:09 67
<img src="/icons/text.gif" alt="[TXT]"> <a href="MyWebServer.class">MyWebServer.class</a> 16-Sep-2005 14:09 222
<img src="/icons/folder.gif" alt="[DIR]"> <a href="z-directory/">z-directory/</a> 16-Sep-2005 15:08 -
</pre>
Which displays as:
Index of /elliott/435/.xyz
Name Last modified Size Description
Parent Directory -
dog.txt 16-Sep-2005 14:09 39
cat.html 16-Sep-2005 14:09 67
MyWebServer.class 16-Sep-2005 14:09 222
z-directory/ 16-Sep-2005 15:08 -
We can simplify this as follows:
<pre>
<h1>Index of /elliott/435/.xyz</h1>
<a href="/elliott/435/">Parent Directory</a> <br>
<a href="dog.txt">dog.txt</a> <br>
<a href="cat.html">cat.html </a><br>
<a href="MyWebServer.class">MyWebServer.class</a><br>
<a href="z-directory/">z-directory/</a><br>
Which displays as:
Index of /elliott/435/.xyz
Parent Directory
dog.txt
cat.html
MyWebServer.class
z-directory/
- Lastly, modify the return from your server so that it sends back links to
subdirectories as subdirectory URL hot links, if you have not already done
so. The only hard part is identifying a file as a directory, and typically
you can look for a trailing slash ("/"). For grading we will use the convention that if
the URL ends in a slash ("/") then the server will look for a subdirectory with that
name. Thus, when listing subdirectories, you should send subdirectory hotlinks back to
the web client with trailing slashes in your preared URL.
- For some browsers, and browser settings, you may have some difficulties
with the directories—e.g., you might have to send your request
twice. We may also have trouble translating between the directory systems of
Unix, Mac, and Windows operating systems. So be sure to show us that your
directory traversal works in your serverlog.txt file.
Also, you might want to experiment with: Connection: Keep-Alive / Connection: close.
- Also, you may want to experiment with the socket.close() method if your
browser is not displaying the data but all else is working.
- You should now have a relatively complete, working, web server, that
can return correct MIME types for different types of files, recurse
subdirectories, and return dynamically-created html. Because it is
multi-threaded it should be able to handle many hundreds of requests. Good
work!
Server-Side scripting and program execution.
Goal: write simple code to run arbitrary program code on the server processing user
input from the web, and send the results back to the web client.
In this section we add back-end programming capability to your server, or at
least simulate it. We create a simple
addnums web form , accept input from a user, pass this to our webserver,
process the information, and return a computed response based on the
input.
For those who are more ambitious you might look into java's JNI,
which allows us to call native code, by loading it into the virtual machine,
and then running it. In this way we might write programs that actually
run arbitrary scripts/programs under the web server.
Alternatively, for those writing in C, the "system()" function will execute
any executables as subprocesses, making the running of programs and scripts
trival. Note: be very security conscious of running
user-input shell commands with the "system()" call, because, e.g., they
might have you execute a command to erase all of your files!
Neither method is required. Instead, to keep the programming scope
reasonable, we will only simulate the running of back-end scripts.
CGI (the Common Gateway Interface) has been around since the beginning of
the web, so there are thousands of references on how to use it.
Tu-duh! You have now built a multi-threaded web server that can handle files,
directory traversals, and server-side scripting after getting input from the
user through a web client. Good job!
What you turn in
- Capture the HTTP stream from a client using your MyListener Program or WireShark.
Capture the HTTP stream from a server using your MyTelentClient program or WireShark.
Concatenate these streams together, adding header comments, or in-line comments as
needed or helpful, about what the file contains, and put it in a file
called http-streams.txt.
- Produce simple "debugging" console output from your webserver showing a
series of connections that have been made, what the request string is
(dumping the first, informative, characters of the GET request is
fine—no editing needed), and the file names that were returned. Rough
output is fine, just showing the general working of your server. We are
particularly interested in you showing that you can traverse
subdirectories which are sometimes problematic for us to grade. Or, you can produce a log file if you
wish with the same information in it. Put the text of this console or file log, along with clearly-delineated explanatory
comments as needed or helpful, in a file named serverlog.txt.
- Put all of your source code into a single file named
MyWebServer.java . Include the standard header comments, and make
sure it compiles and runs at the command line. Your server MUST
serve files from the subdirectory in which it is started.
- Do NOT submit either MyListener.java or MyTelnetClient.java. These were for
your own utility use, and the worker methods might present a conflict with
MyWebServer.java compilation.
- Fill in your checklist-mywebserver.html file
representing what you have done. NEVER change a "no" to a "yes" without
having completed that portion of the assignment! (See the academic integrity
link.)
- Put everything IN ONE DIRECTORY. No subdirectories. Make sure that you
do not have a conflict with the worker methods of MyListener, and MyWebServer.
- Collect the four files (possibly more if you have bragging rights) into a .zip file, and submit to D2L before the
due date.
- Concatenate all your files except the checklist into a single text file
and submit to the D2L TII link for this assignment.
- Good work!
Grading note:
You can assume we will not have any spaces in file names.
We MUST be able to retrieve files from the directory in which your
MyWebserver program is running. That is, when the following files are
together in the indicated subdirectory. You can assume we will put a
trailing slash if we enter a directory name in the address bar of a
browser. Your root directory should display if there is no further
information beyond the port number.
/users/elliott/students/435/Web/
MyWebserver.class
dog.txt
cat.html
/sub-a
/sub-b
cat.html
We should be able to retrieve your files from:
http://localhost:2540/dog.txt
http://localhost:2540/cat.html
http://localhost:2540/sub-a/sub-b/cat.html
and
http://localhost:2540/ or...
http://localhost:2540
should show us:
addnums.html
checklist-mywebserver.html
dog.txt
http-streams.txt
cat.html
serverlog.txt
sub-a/
MyWebServer.class
MyWebServer.java
...or at least something similar.
As per the grading specifications above, we should be able to retrieve all
your files through your webserver from this kind of directory listing.
Bragging rights (not required):
- Store the MIME types in a table of MIME types and file
extensions. Read the table in when the server starts, and also again, while
the server is running, if a file extension is not recognized. This way,
adding a new MIME type is as simple as adding an entry in your table, and
putting files with that extension in your directory. Be SURE that your
MimeTypes file is included in your submission and note this in your
comments.html file and at the top of your MyWebserver.html file.
- Bragging rights: HTTP has components for storing cookies on the client
through the browser. Implement this, and write a small application that
shows this interaction with your server such that the cookie is sent back to
the server by the browser on a later invocation.
- Bragging rights: implement a security policy for your server. This can
become major bragging rights, depending on how far you go.
- Major bragging rights: Implement all of the above using HTTPS as well
as HTTP. (Note: this is hard.)
- Major bragging rights (not recommended): Implement true, if limited,
CGI capability by spawing subprocesses to execute back-end programs in real
scripting languages. But note: this is actually quite simple if you write
your webserver in a native language like C, or PERL, which supports the
direct spawning of shell processes on the local machine.
Side note: Unix (Apache) servers usually serve files from
USERACCOUNT/public_html. For example, if I put dog.txt on this unix/Apache machine
as /condor/cscfclt/elliott/public_html/dog.txt we would find it on
the web as http://condor.depaul.edu/elliott/dog.txt.
or http://condor.depaul.edu/~elliott/dog.txt.