Using TCP sockets, you will write a simplified version of a HTTP client and server. The client program will use the HTTP protocol to fetch a web page (stored in a file) from the server using the HTTP GET method, cache it, and then subsequently use conditional GET operations to fetch the file only if it has been modified.

The HTTP client will perform the following functions:

1. Take in a single command line argument that specifies a web url containing the hostname and port where the server is running, as well as the name of the file to be fetched, in the appropriate format.

Example: localhost:12000/filename.html

2. If the file is not yet cached, use a HTTP GET operation to fetch the file named in the URL

  • Print out the contents of the file
  • Cache the file

3. If the file is cached, use a Conditional GET operation for the file named in the URL

  • If the server indicates the file has not been modified since last downloaded, print output saying so (no need to print file contents in this case)
  • Otherwise, indicate that the file has been modified, and print and cache new contents

The HTTP server will perform the following functions:

1. Read a command-line argument specifying IP address and port server is to listen on e.g. 127.0.0.1 12000

2. Open a TCP socket and listen for incoming HTTP Get and Conditional GET requests from one or more HTTP Clients at above address and port

3. In the case of a HTTP Get request:

  • Read the named file and return a HTTP GET Response, including the Last-Modified header field

4. In the case of a HTTP Conditional Get Request:

  • If the file has not been modified since that indicated by If-Modified-Since, return the appropriate Not Modified response (return code 304)
  • If the file has been modified, return the file contents as in step 2

5. In the case that the named file does not exist, return the appropriate "Not Found" error (return code 404)

6. The server must ignore all header fields in HTTP Requests it does not understand

Simplifying Assumptions:

  • Only GET and Conditional GET requests need be supported in client and server
  • Only a subset of header fields need to be supported in HTTP Requests and Responses (see Message Format section)
  • However, the client and server must ignore all header fields it does not understand. For example, a "real' web browser will send many more header fields in GET requests than those expected to be implemented by the server. The server MUST ignore these fields and continue processing as if these fields were not part of the GET request. The server MUST NOT report an error in these cases.

Cache Implementation:

  • The cache must be implemented as a file so that it persists across client instantiations
  • The file(s) used to implement the cache must include "cache" in the filename so that it is easy to distinguish e.g cache.txt
  • The program must work as per the test cases across multiple client instantiations, one per test case.
  • The client program must work if the cache file does not exist (in which case, the implication is no files have been cached)
    • Note that this means the file may have existed after previous runs of your program, but has since been deleted prior to client restarting

Test Cases:

Enable wireshark during all the following test cases. (One wireshark .pcap is fine for the test cases in this section, but the .pcap must show the test cases in this order.) Run the client four times, once for each test case.

1. Run client when web object not cached (or no cache exists): Using your HTTP client, fetch the contents of a text-based html file named filename.html from your HTTP server using the appropriate URL. Example: localhost:12000/filename.html. The client must:

  • Print out the contents of the header in the HTTP Request
  • Print out the contents of the header in the HTTP Response (should indicate a "200 OK " Response)
  • Print out file contents (you can print "as is". No formatting is required)

2. Run client when web object cached, but not modified on server: Using your HTTP Client, send a conditional GET request to your HTTP server. The client must:

  • Print out the contents of the header in the HTTP Request
  • Print out the contents of the header in the HTTP Response (should indicate a "304 Not Modified" Response)

3. Run client when web object cached, but modified on server: Using your HTTP Client, send a conditional GET request to your HTTP server. The client must:

  • Print out the contents of the header in the HTTP Request
  • Print out the contents of the header in the HTTP Response.
  • Display the new file contents

4. Web object does not exist: Using your HTTP Client, send a GET request for a filename that does not exist. The client must:

  • Print out the contents of the header in the HTTP Request
  • Print out the contents of the header in the HTTP Response

Using a web browser, such as Firefox or Chrome (note: Safari web browser may not implement the conditional GET as expected), perform the same test cases as above on your server. Enable wireshark during all the following test cases. One wireshark .pcap is fine for the web browser test cases.

1. Enter the URL in the web browser search bar and press < return>. The web browser should print the contents of the file downloaded from your server.

2. Re-enter/Re-fresh the URL in the web browser search bar and press < return>. The web browser should show the same web page contents as in step 1 (assuming the file has not been modified.), and the wireshark trace should show a Conditional Get and a "Not Modified" response.

3. Modify the file. Re-enter the URL in the web browser search bar and press . The web browser should now show the updated web contents.

4. Enter a non-existent URL, and browser should indicate "Not found"

Message Format:

HTTP messages are encoded as strings in a specific format defined according to the HTTP specification.

As part of this assignment, your HTTP Client and HTTP Server programs are only expected to handle the following header fields:

HTTP Client GET Request Message:

Your GET Request must include the following:

  • Request line containing method (GET) , object (from URL) and version (HTTP1.1)
  • Host: includes hostname (and port, if specified, separated by ':')
  • Blank line: signifies ends of header, expressed by "\r\n\"

Example:

GET /filename.html HTTP/1.1\r\n
Host: localhost:12000\r\n
\r\n

HTTP Server Response to Client GET Request (assuming file exists):

The response from the HTTP Server must include the following:

  • Status line including version (HTTP1.1), status code (200), and status phrase (OK)
  • Date: header field containing current date and time in the following format (must be UTC/GMT time zone):
    • Example of Date format: Mon, 23 Jan 2017 15:55:47 GMT
  • Last-Modified: header field containing date and time file was last modified. Must follow same format as Date: above
  • Content-Length: length of data in bytes
  • Content-Type (can be hard-coded): text/html; charset=UTF-8\r\n
  • Blank line: signifies ends of header
  • Body: Contents of requested file

Example:

HTTP/1.1 200 OK\r\n
Date: Sun, 04 Mar 2018 21:24:58 GMT\r\n
Last-Modified: Fri, 02 Mar 2018 21:06:02 GMT\r\n
Content-Length: 75\r\n
Content-Type: text/html; charset=UTF-8\r\n
\r\n
< html>< p>First Line< br />Second Line< br />Third Line< br />COMPLETE< p>< html>

HTTP Client Conditional GET Request Message:

Your GET Request must include the following:

  • Request line containing method (GET), object (from URL) and version (HTTP1.1)
  • Host: Same as in GET request
  • If-Modified-Since: Echo back value of "Last-Modified" time in HTTP GET Response
  • Blank line: signifies ends of header

Example:

GET /filename.html HTTP/1.1\r\n
Host: localhost:12000\r\n
If-Modified-Since: Fri, 02 Mar 2018 21:06:02 GMT\r\n
\r\n

HTTP Server Conditional Response Message (Not Modified):

HTTP Server Conditional Response Message (Not Modified):
HTTP/1.1 304 Not Modified\r\n
Date: Sun, 04 Mar 2018 21:24:58 GMT\r\n
\r\n

HTTP Server Response when file not found:

HTTP/1.1 404 Not Found\r\n
Date: Sun, 04 Mar 2018 21:24:58 GMT\r\n
Content-Length: 0\r\n
\r\n

Get current time in UTC/GMT time zone and convert to string in HTTP format:

import datetime, time
t = datetime.datetime.now(timezone.utc)
date = time.strftime("%a, %d %b %Y %H:%M:%S %Z\r\n", t)

Determining a file's modification time (in seconds since 1 Jan, 1970 on Unix machines)

import os.path
secs = os.path.getmtime(filename)

Convert above time to UTC /GMT (returns a time tuple):

import time
t = time.gmtime(secs)

Convert above time tuple to a string in HTTP format:

last_mod_time = time.strftime("%a, %d %b %Y %H:%M:%S GMT\r\n", t)

Convert a date/time in string format back to time tuple and seconds since 1 Jan, 1970

t = time.strptime(last_mod_time, "%a, %d %b %Y %H:%M:%S %Z\r\n")
secs = time.mktime(t)
Academic Honesty!
It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.
Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.