Description

This project will be about building a hash table. You will read words from a given file into your program. Each word consists strictly of the letters A-Z and a-z. Any whitespace, punctuation, digit, etc. will separate words. An example is "this!is A&Word." This text would represent the 4 words this, is, a, and word. Each word should be converted to lowercase so that only lowercase letters remain. As you read the words, you should insert each word into a hash table. The word itself is the key value, and each entry in the hash table also needs to store a list of the line numbers where the word appears. When inserting a word that already appears in the hash table, simply add the current line number to the list for the word. You will also count the number of collisions that occur while inserting words into the hash table. Every probe to a cell that contains a word other than the one being inserted or searched should count as an additional collision, so that inserting a word may have multiple collisions. Probes to empty cells or cells already containing the word do not count as collisions.

Once you have read all of the words, you should output the total number of words read, the number of distinct words, and the total number of collisions that occurred in the hash table.

Finally, you will read words from the query file and report the line numbers where the word occurs, and the number of collisions while searching for the word in the table.

Requirements

Please carefully read the following requirements:

  • There will be four command line arguments to your program.
    • The name of the input file
    • The name of the query file
    • The size of the hash table
    • The collision resolution strategy, Ip for linear probing, qp for quadratic probing, and dh for double hashing. If the strategy is double hashing, the double hash function will be of the form h2(x) = a - (x % a), and the integer parameter a will be the fifth command line argument.
  • You must format your output as shown in the examples below.
  • You must use the "djb2" hash function. (http://www.cse.yorku.ca/moz/hash.html)
  • h1(key) = djb2(key) % size
  • h2(key) = a - (djb2(key) % a)
  • You must define and use a class named HashTable.

Examples

$ cat sample1.txt
Cryptography is both the practice and study of the techniques used to communicate and/or store information or data privately and securely, without being intercepted by third parties. This can include processes such as encryption, hashing, and steganography. Until the modern era, cryptography almost exclusively referred to encryption, but now cryptography is a broad field with applications in many critical areas of our lives.

$cat q1.txt
cryptography
of
lives
hash
$ g++ -std=C++11 p3.cpp
$./a.out sample1.txt 91.txt 79 lp
The number of words found in the file was 64
The number of unique words found in the file was 52
The number of collisions was 30
cryptography appears on lines (1,5,6)
The search had 0 collisions
of appears on lines (1,7)
The search had 0 collisions
lives appears on lines [7]
The search had 1 collisions
hash appears on lines[]
The search had 2 collisions
$./a.out sample1.txt q1.txt 79 qp
The number of words found in the file was 64
The number of unique words found in the file was 52
The number of collisions was 26
cryptography appears on lines (1,5,6]
The search had 0 collisions
of appears on lines (1,7)
The search had 0 collisions
lives appears on lines [7]
The search had 1 collisions
Academic Honesty!
It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.
Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.