Executive Summary

You will develop a set of routines for creating and looking up a dictionary. Your code will be graded on correctness, efficiency, and documentation.

Purpose

  • Learn how to manipulate strings and files
  • Implement algorithms from high-level descriptions
  • Learn to document to best practices

Deliverables

You will be provided with a header file main.h, a main function file main.c, a file assignment-5.c defining the API (application program interface) for your code, and three input files: small-dict.txt,medium-dict.txt,large-dict.txt. File assignment-5.c contains stub functions with short descriptions.

The main.c file is used for testing your design.You can make changes on test cases to make sure that your program works correctly for each possible input. Your program will be tested with a different main.c file, which will include different test cases.

Your will deliver a complete program implementing the functionalities described below. All of your functionality is in the form of functions that take parameters and return values, i.e., there is no terminal or file I/O.

Sort Dictionary

Given an input file containing a long string with words on every new line, create a function that puts these words in an array of strings and sorts the array. The function to read the string and puts the words in an array is already provided to you in main.c.

The sort order should be dictionary order, i.e., the ordering given by strcmp. The array of strings will be a global 2D array, whose size we select - you do not have to worry about the array being too short or the words being too long.

Example input: “datenapplenelderberryncantaloupenblackberryn”

Lookup

Write a lookup function that checks to see if a word is in the dictionary. Implement via binary search to find if a string is present in the dictionary. If it is present, return 1, otherwise 0.

Range Search

Write a lookup function that determines where in the dictionary a given word would lie.

For example if the dictionary consists of

apple blackberry cantaloupe date elderberry

then a range search for batuan should return (0,1), for cantaloupe should return (1,3), for apple should return (-1,1), for elderberry should return (3,5), for ackee should return (INT_MIN, 0), for fig should return (4,INT_MAX).

Since a function can return only one int, and we have not covered structures, you will return the range by setting two caller integer variables which you are passed pointers to.

Prefix Lookup

Given a string prefix, return strings that starts with that prefix.

Your function will set these strings in a global array of 10 strings, and return the number of strings set. The array should be in sorted order.

For example, for the dictionary above, if prefix is “a”, you should set the first string in the global array to “apple” and return 1.

If there was more than one match, you should return all of them up to 10 matches. E.g., if the dictionary included apricot, and prefix is “ap” return 2, and set the global array’s first and second elements to “apple” and “apricot”.

Approximate lookup

Given a string w, return 10 words in the dictionary that are closest to w and the smallest edit distance found. The distance between strings, D(s,t) is defined below:

  • D(“”,s) = |s|
  • D(c1.s,c2.t) = min(D(s,t) + d(c1, c2), D(s, c2.t) + 3, D(c1.s,t)+3))

Here |s| is the length of string s. Furthermore, by c1.s we refer to the string consisting of the character c1 followed by the string s. (So the . here is not the . used in matching expressions.)

Here d(x,y) is the distance on a QWERTY keyboard from the keys corresponding to c1 and c2 (and its 0 if x = y), which will be computed by a function we provide in main.c.

Requirements

You are required to write a program that implements the functions described above: Sort dictionary, Plain lookup, Range search, Prefix lookup and Approoximate lookup.

Input format

Inputs for your functions will come from tests cases written in main.c file. You can assume that input format will always be correct. You do not need to check the validity of input in your functions. (For example, you do not need to worry about words that have blanks or characters outsize of the range ‘a’-’z’, or dictionary size overflowing the global array of strings).

However, your code should handle other legal input cases e.g., empty string, sorted dictionary, words that appear before any word in the dictionary, more than 10 matches, etc.

Output Format

  • Sort dictionary: your code returns the sorted character strings with all dictionary words.
  • Plain lookup: your code simply returns a 1/0.
  • Range search: you will return the position of two dictionary words where the given word would lie if it was in the dictionary.
  • Prefix lookup: your code returns the number of matching strings and sets the global result.
  • Approximate lookup: your code returns the smallest edit distance and sets the global result (similar to prefix lookup). The closest strings should be in dictionary order.

For example if you are given the word “null” and it is in the dictionary, your function should return 0. If “null” is not in the dictionary and its closest match is “nullify” which has edit distance of 3 compared to “null”, your function should return 3.

Implementation Notes

For the sorting function use the standard C library functions qsort and strcmp.

Use the provided array of strings to store the dictionary, and the provided global integer size variable for keeping track of the number of entries in the dictionary.

Use the provided array of strings to store results, and the provided global integer size variable for keeping track of the number of entries in the results array.

Your code will be tested for speed among other things (see grading). Thus, it’s important that your code executes in the most efficient way. You should consider using a 2D array to cache calls to the string distance function. Similarly, you might find that you can speed up the prefix matching by keeping some additional data in global variables (that you define).

Academic Honesty!
It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.
Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.