Introduction

Servers and software often keep logs. Logs record things that happen. All sorts of things can be logged - access, errors, updates, etc. As an IT professional it may be necessary to view some of these logs and to help solve a problem. An access log tracks all requests made to the sever. Imagine the number of entries logged during a DDOS attack. With such a large amount of data to sort through, human eyes alone will not suffice.

In this project you have been provided an access log for an Apache server. You will write a program to examine the contents and that presents the following information:

  • what is the most commonly accessed resource?
    • how many times was it accessed?
    • who (what IP or domain) is requesting this the most?
    • how many requests were made by that top requester?
  • Who (what IP or domain) made the most requests?
    • how many request were made?
    • what was their most requested resource?
    • how many times was this resource requested

See the instructions for more details.

Instructions

Write a program that meets the following requirements:

  • the overall design and implementation of this project is up to you - except for a few detail that follow.
  • your program must print
    • the most commonly accessed resource, how many times it was accessed, who accessed it the most
    • the most common requester, how many request were made, their most requested resource, and how many times it was requested
  • make use of dictionaries - you will need at least but you must use one or more.
    • Use the dictionary as a counter - to count the number of unique requester or requesters (like the word counting example)
    • Use a dictionary to map requester to resources , and recourses to requesters. for example - you could have the key be a resources and the value be a list of requesters
  • your program must use at least 1 regular expression to access the data in the file
  • your program should be organized with functions
  • You are welcome to use lists and tuples but your work of accessing file data needs to be with regular expressions

Write a report in a word doc or pdf hat details the algorithms behind your code, analyse your results, and answers the following questions:

  • how did you identify the key parts of the file?
  • how did you figure out who requested the most?
  • how did you find out what resources was most requested?
  • what is the most commonly accessed resource?
    • how many times was it accessed?
    • who (what IP or domain) is requesting this the most?
    • how many requests were made by that top requester?
  • Who (what IP or domain) made the most requests?
    • how many request were made?
    • what was their most requested resource?
    • how many times was this resource requested
  • This should be clearly formatted - imagine your boss asked you to answer these questions and you needed to present them the results in a formal way.

access_log sample

64.242.88.10 - - [07/Mar/2004:16:05:49 -0800] "GET /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12846
64.242.88.10 - - [07/Mar/2004:16:06:51 -0800] "GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.1" 200 4523
64.242.88.10 - - [07/Mar/2004:16:10:02 -0800] "GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291
64.242.88.10 - - [07/Mar/2004:16:11:58 -0800] "GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 200 7352
64.242.88.10 - - [07/Mar/2004:16:20:55 -0800] "GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253
64.242.88.10 - - [07/Mar/2004:16:23:12 -0800] "GET /twiki/bin/oops/TWiki/AppendixFileSystem?template=oopsmore¶m1=1.12¶m2=1.12 HTTP/1.1" 200 11382
64.242.88.10 - - [07/Mar/2004:16:24:16 -0800] "GET /twiki/bin/view/Main/PeterThoeny HTTP/1.1" 200 4924
64.242.88.10 - - [07/Mar/2004:16:29:16 -0800] "GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12851
64.242.88.10 - - [07/Mar/2004:16:30:29 -0800] "GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1" 401 12851
64.242.88.10 - - [07/Mar/2004:16:31:48 -0800] "GET /twiki/bin/view/TWiki/WebTopicEditTemplate HTTP/1.1" 200 3732
64.242.88.10 - - [07/Mar/2004:16:32:50 -0800] "GET /twiki/bin/view/Main/WebChanges HTTP/1.1" 200 40520
64.242.88.10 - - [07/Mar/2004:16:33:53 -0800] "GET /twiki/bin/edit/Main/Smtpd_etrn_restrictions?topicparent=Main.ConfigurationVariables HTTP/1.1" 401 12851
64.242.88.10 - - [07/Mar/2004:16:35:19 -0800] "GET /mailman/listinfo/business HTTP/1.1" 200 6379
64.242.88.10 - - [07/Mar/2004:16:36:22 -0800] "GET /twiki/bin/rdiff/Main/WebIndex?rev1=1.2&rev2=1.1 HTTP/1.1" 200 46373
64.242.88.10 - - [07/Mar/2004:16:37:27 -0800] "GET /twiki/bin/view/TWiki/DontNotify HTTP/1.1" 200 4140
64.242.88.10 - - [07/Mar/2004:16:39:24 -0800] "GET /twiki/bin/view/Main/TokyoOffice HTTP/1.1" 200 3853
64.242.88.10 - - [07/Mar/2004:16:43:54 -0800] "GET /twiki/bin/view/Main/MikeMannix HTTP/1.1" 200 3686
64.242.88.10 - - [07/Mar/2004:16:45:56 -0800] "GET /twiki/bin/attach/Main/PostfixCommands HTTP/1.1" 401 12846
64.242.88.10 - - [07/Mar/2004:16:47:12 -0800] "GET /robots.txt HTTP/1.1" 200 68
64.242.88.10 - - [07/Mar/2004:16:47:46 -0800] "GET /twiki/bin/rdiff/Know/ReadmeFirst?rev1=1.5&rev2=1.4 HTTP/1.1" 200 5724
64.242.88.10 - - [07/Mar/2004:16:49:04 -0800] "GET /twiki/bin/view/Main/TWikiGroups?rev=1.2 HTTP/1.1" 200 5162
64.242.88.10 - - [07/Mar/2004:16:50:54 -0800] "GET /twiki/bin/rdiff/Main/ConfigurationVariables HTTP/1.1" 200 59679
64.242.88.10 - - [07/Mar/2004:16:52:35 -0800] "GET /twiki/bin/edit/Main/Flush_service_name?topicparent=Main.ConfigurationVariables HTTP/1.1"
Academic Honesty!
It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.
Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.