Description:

  • Rename the socket1.py program from our textbook to URL_reader.py.
  • Modify the URL_reader.py program to use urllib instead of a socket. This code should still process the incoming data in chunks, and the size of each chunk should be 512 characters. The idea is that to allow for the processing of very large files, which may be too large to fit into working memory. You cannot use a chunk (buffer) that grows beyond the 512 character limit! Do not use the value 512 as a magic number. Rather, it should be established as a named constant.
  • Add code that prompts the user for the URL so it can read any web page.
  • Add error checking using try and except to handle the condition where the user enters an improperly formatted or non-existent URL.
  • Count the number of characters received (read), and stop displaying any text after it has shown exactly 3000 characters. Space characters, tab characters, and newline characters are characters, and should therefore be included in your count. Since your chunk size will not divide evenly into 3000, you will need to add some logic to ensure that exactly 3000 characters (no more, and no less) are displayed. When you print your blocks, it is okay for them to be separated by a newline character. Do not use magic numbers! The values 3000 and 512 should be established as named constants. Any other values that you might use should be derived from these two named constants. The idea is for to allow for the values 3000 and 512 to be adjusted (to 5000 and 256, for example), and still ensure that your code operates correctly.
  • Very important! You are not allowed to calculate manually or programmatically the number of iterations needed to reach your 3000 character print limit. In other words, instead of counting iterations (or divisions), you need to count the number of characters that you have read so far.
  • Continue to retrieve the entire document, count the total number of characters, and display (i.e., print) the total number of characters (i.e., how many characters are in the entire document).
  • Be sure that all constant numeric values are established as named constants, and that they adhere to the Python convention of naming constants.

Note: Because you are printing characters one chunk at a time, and you need to stop printing when you reach 3000 characters, there's a point where you will need to print only the portion of the chunk that enables you to reach the 3000 character print limit. This printing of calculated portion of the last printable chunk is arguably the most challenging aspect of this assignment.

import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'.encode()
mysock.send(cmd)

while True:
data = mysock.recv(512)
if len(data) < 1:
break
print(data.decode(),end='')

mysock.close()
Academic Honesty!
It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.
Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.