Introduction

This project focuses specifically on hash tables and should not take much time to implement. You will implement hash table solutions to two common situations (general - unknown data, and specific - known data) with the goal of minimizing the number of key collisions. While Java and other languages all have implementations of some kind of hash table, in this project, you are required to write your own hash table and hash functions.

Assumptions and Rules

The keys that we will use for this project will all be Strings which are "tokens" (described next)

For this project, a "token" will be defined as any sequence of characters (other than the white space characters: tab, space, newline) delimited by one or more white space characters. For example, the following are all tokens:

-123.34
computer
and
result.
*hello
abc)
{}
(
x
x,y

Do NOT maintain duplicate entries of a key in your hash table. In other words, if a token occurs multiple times, only put it in the hash table once.

Keep in mind: the idea behind hash functions is that they are O(1), so do not "hand map keys to indices in order to reduce collisions (i.e. do not manually assign indexes for each key).

What To Do

1 Build a Hash Table

Create a hash table class

public class HashTable< T >

and build a hash table of an "optimal" length that you choose (around 100, but not more than 150). The hash table should be an array of NGen< T > (recall NGEN< T > is the generic linked list node class; very similar to the Node class used for linked lists in Project 3), and should use chaining (as discussed in lecture) to handle collided elements. You can use the provided NGen.java file.

For the general case, do not attempt to build a table that is large enough to contain all the data. Rather, it will be the goal of your hash function to evenly distribute the chains of collided elements over the length of your table.

For the specific hash table, you will want to minimize the length of the table without creating large numbers of collisions - but a few evenly distributed collisions are just fine.

To keep things simple, and since our purpose here is to develop good hash functions, the only public methods required for your Hash class are:

public void add(T item) // adds item to the hash table
public void display() // displays the hash table along with stats about the hash
public static void main() // main method. detailed below

You are free to write the constructor however you see fit. You will also write a main() method to run your general and specific case hash. You can add your own helper methods if you feel the need.

2 Reading Tokens

To make testing of your hash table quick and simple, it is best to start with a way to easily read symbols (or tokens) from a file. The file TextScan.java has been provided for you to use (or modify) in order to read tokens from an arbitrary file. The main() driver method in TextScan. java reads (and displays) all tokens found in the file specified on the command line when running TextScan. Try it on some arbitrary file-for example, itself. You may borrow TextScan.java and modify it to meet the requirements of this project, but if you do, be sure to properly credit the source within your program.

3 Display the Hash Table

It will be necessary to print out the contents of your hash table so that we can tell how well your hash functions are working. Write a method:

public void display()

that will display each location in your hash table along with the number of keys that "hashed" to that location. You should also report the length of the longest chain and the average length of the chains. Additionally, print out the number of unique tokens, number of empty and non-empty indices.

Example output with hash table of length 5 and 6 tokens:

0: 1
1: 0
2: 2
3: 3
4: 0
average collision length: 2
longest chain: 3
unique tokens: 6
empty indices: 2
non-empty indices: 3

Average collision length is calculated as (number of unique tokens / number of non-empty indices) assuming all tokens (without duplicates) are successfully added to the hash table.

4 A General Hash Table

Write several hash functions:

private int hash1(T key)
private int hash2(T key)
private int hash3(T key)
...

that attempt to distribute tokens "evenly across the hash table. Sample files have been provided for you to test your hash table and hash functions. (See proverbs, canterbury, gettysburg, that bad.) You may also use files of your own. Note that the hash table does not always need to grow to handle larger files, but the hash function should distribute the collided values evenly across the hash table.

You are NOT expected to create the greatest hash function. You are required to implement at least three hash functions and explain and compare them in your program comments, showing you tried different approaches to writing the hash function.

Example: For a good hash function, the average collision length should be <= 6 with our provided .txt files. An example of how to calculate average collision length is in Section 3. The length of the hash table should not be longer than 150.

5 A Specific Hash Table

In some cases, the data that will go into the hash table is known in advance. One such example of this is the reserved words for a programming language. In the file keywords.txt are the 50 reserved keywords in the Java programming language. With some effort, it should be possible to map all the keywords (without collision) to a hash table. But, the table ends up having to be very long in order to achieve nearly 0 collisions. Instead, find a hash function that does a good job of minimizing and evenly distributing the few collisions that will occur in a table that is not overly huge. Keep your hash function to yourself, but explain how it works in your program comments.

What is the smallest your hash table can be before the collisions start to collect on a single index? What happens when you size the table very near to the number of keywords? Can you keep the collisions somewhat evenly distributed?

Example: For a good hash function, the longest collision length should be 3 with keywords. txt. Note for the 50 Java reserved words, you will need a table length that is probably well over 150 - even up to 500.

6 Expected Output

You are required to implement and compare at least 3 hash functions. You should try different approaches which should be non-trivially different. You will need to provide some metrics to determine how well your hash functions distribute keys over the table. You will need to provide explanations for how your hash functions work, how they compare with each other, and what hash functions work the best for the general and specific cases.

In the display() method, you will be printing out the length of the longest collision, the average length of collisions in the hash table, and other statistics to help gauge the performance of your hash functions.

Do not expect perfect hash functions, but good ones will have very few unused locations, and most chains of collided elements at about the same length. The primary emphasis will be on your explanations and comparisons of your hash functions.

Academic Honesty!
It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.
Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.