it2051229 CS 323 Hashing

This is a programming assignment to conduct experimentation on hash tables. The program must be written in Oracle Standard Edition compliant Java or ISO/ANSI standard compliant C++.

Part 1 Implement the following linear congruential pseudo-random number generator:

x_n+1 = (a * x_n) mod M

where a = 7⁵ = 16807 and M = 2³¹ - 1 = 2147483647.The seed value X₀ is determined appropriately. Write a function "pseudoRandom()" that returns X_{n + 1} from X_n. Use 64-bit integer type for the variables and constants (long type in Java, long long type in C++). Despite its simplicity, it is known to generate good pseudo-random numbers for many purposes, including this assignment.

The program will have to compute the mean (average) as well as the standard deviation of data values. Let X₁, ..., X_n be data values. The mean (average) of the X_i is u = (E_{1 <= i <= n} X_i) / n. The standard deviation is s = [(E_{1 <= i <= n} (X_i - u)²) / (n - 1)]^1/2. The standard deviation is a fundamental statistical quantity that measures the degree of dispersion of data values. The larger the standard deviation is, the more dispersed the data values are. A basic law of standard deviation tells us that there should be a large concentration of values in the range [s, +s].

In the following, m is the array size, n is the # of key values inserted into the table, and f(k, m) = floor(m(fractional part of kA)) = floor(m(kA mod 1)), A = (sqrt(5) 1)/2 = 0.6180339887498949.

Write a program that performs the following experiments and displays the required statistical data legibly on the screen. Use 64-bit double type for floating-point numbers.

Part 2 This part is experimentation on chaining method. Set the array size m to 1000003, which is a prime number.

a.Using the above random number generator, generate n integer key values starting from the seed value x0 = 98760053, and insert them into the hash table using the hash function:

h(k) = k mod m, where k is the generated key value

Then compute and display the following data:

distribution of bucket sizes in terms of bucket size, # of buckets of this size, (# of buckets of this size)/(total # of buckets), listed in increasing order of bucket size
load factor a = n/m
standard deviation of bucket sizes from load factor (load factor is the mean value in this case)

Repeat this experiment for n = 800000, 1000000, 2000000, 3000000, each of the four starting from the same seed value 98760053.

b.Repeat the experiments in part (a) using the hash function:

h(k) = f(k, m)

Part 3 This part is experimentation on open addressing. A cluster is a maximal contiguous sequence of occupied positions, an empty cluster is a maximal contiguous sequence of unoccupied positions. As discussed in class, linear probing suffers from primary clustering - generation of long clusters. Quadratic probing improves linear probing by eliminating primary clustering but still generates milder secondary clustering. Double hashing improves quadratic probing by eliminating secondary clustering. Experimental results should confirm this.

Set the array size m to 2²⁰ = 1048576.

a.Using the above random number generator, generate n key values starting from the seed value x0 = 98760053, and insert them into the hash table using the linear probing sequence:

h(k, i) = (h'(k) + i) mod m,
h'(k) = f(k, m)

Then compute and display the following data:

load factor a = n/m
average # of probes performed by insertion procedure
distribution of clusters in terms of cluster size, # of clusters of this size, (# of clusters of this size)/(total # of clusters), listed in increasing order of cluster size
total # of clusters
average cluster size
standard deviation of cluster sizes
distribution of empty clusters in term of empty cluster size, # of empty clusters of this size, (# of empty clusters of this size)/(total # of empty clusters), listed in increasing order of empty cluster size
total # of empty clusters
average empty cluster size
standard deviation of empty cluster sizes

Repeat this experiment for n = 500000, 800000, 1000000, 10485761=1048575, each of the four starting from the same seed value 98760053.

b.Repeat the experiments in part (a) using the quadratic probing sequence:

h(k, i) = (h'(k) + i(i+1)/2) mod m,
h'(k) = f(k, m)

c.Repeat the experiments in part (a) using the double-hashing probing sequence:

h(k, i) = (h1(k) + i h2(k)) mod m,
h1(k) = f(k, m),
h2(k) = k mod m if k mod m is odd, (k mod m)+1 if k mod m is even

You should conduct experiments using different seed values. Optionally you might also conduct experiments using other pseudo-random number generators, e.g., ones available in Java/C++ libraries. As far as the elementary statistical data collected in this assignment are concerned, different seed values and random number generators should yield similar experimental results for fairly large values of m and n like those used in this assignment.

Academic Honesty!

It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.

Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.