it2051229 CSci 2002-51 Hashed Data Structures

Hashed Data Structures

Although they are covered in the text to some extent, I would like to elaborate on the hashed data structures. One of the reasons is their heavy usage in cryptography, including the algorithms that run on mobile devices.

Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Hashing is used to index and retrieve items in a database because it is faster to find an item using the shorter hashed key than to find it using the original value. That's why it is used in many encryption algorithms.

Each of the data structures we have studied so fararray-based structures, restricted structures (Stacks and Queues), and linked listshave strengths and weaknesses. All of these structures have high densities. Restricted structures are fast, but they do not support access in the key field mode.

The array-based and linked list structures do offer access in the key field mode, but are slow because they use a sequential search to locate a node. Hashing is an alternative search technique (more accurately, a set of techniques) for locating a node in the key field mode. Unlike the Sequential Search algorithm, hashing algorithms are fast. When you want speed, think hashing.

Data structures that use hashing access algorithms are in wide use because of their ability to rapidly locate a node. The corresponding data structures that use them, called hashed data structures, vary in speed. However, when properly implemented, all of them are faster than the structures we have studied thus far.

There is a downside to hashed structures. For some applications, their overhead can be very high. However, there are many applications where the overhead of even the fastest hashed structure approaches zero. With their promise of speed and the possibility of low overhead, hashed data structures should always be considered in the designs.

Hashing access algorithms are a collection of algorithms that share a common characteristic: a given key is used to compute an index or a location into a primary storage area. Since the primary storage area is a group of sequentially numbered storage cells, it is normally implemented as an array.

Sometimes the nodes themselves are stored in the primary storage area at the computed location, and sometimes paths to the nodes are stored there.

Programming Project 2

Use a HashMap to store the frequency counts for all the words in a large text document. When you are done, display the contents of this HashMap. Next, create a set view of the Map and store its contents in an array. Then sort the array based on key value and display it. Finally, sort the array in decreasing order by frequency and display it.

Assume that the document file has no formatting, except indication of end-of-line. The program must support files with multiple lines. The output must list all words and their frequencies as shown in the following example:

...
Word1, 21
Word2, 43
...

The program should allow for selecting a file to be processed. This can be done through a visual interface displaying a file tree; otherwise indicate to the user where the program is looking for the file (an absolute path or a relative path to the directory where the program is running).

A user should not need to look at your code to learn the expected location of the file.

Academic Honesty!

It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.

Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.