Problem 1

A text concordance is an alphabetical listing of all the distinct words in a piece of text, and in this project we consider the problem of constructing one concordance for a document stored in a file. Because the words in a concordance are arranged in alphabetical order, a concordance is an ordered list(sorted list). To construct such a concordance, we begin with an empty list. As each word is read, we will check the list whether it is in one of list or not. If the word is a new word, it is inserted into the list in the appropriate place. If the word you read appears in the list already, the frequency of the word will be increased by one. Obviously, it may be necessary to insert words at any point in this list, and thus a sorted list using double-linked implementation is appropriate. Each node consists of four fields, one for word, one for its frequency, and two for prev/next node link(reference) fields.

Each time a word is read from a document, this Sorted list as a doubly linked list must be searched sequentially, always beginning with the first node. The doubly-linked list would be in the alphabetically ascending order. (The data type of the word field would be a string. If the word starts with non-alphabet(number or special symbol) do not insert it in the list. ALGORITHM:

(* Input : A Text file (concordinput.txt)
Function : Construct a text concordance from a document stored in a file. The concordance is stored in a data structure composed of a double linked list.
Output : A list of distinct words with their frequencies on the screen and into the output file in an ascending alphabetic order. *)
• Create two empty doubly linked list.
• Open the input file and get the first Word.
• While there are more words to process, do the following:
• Search the lists of words having the same Word to see if it already contains Word.
• Increase its frequency by one if it is already there.
• Insert Word into list if it is not already in the list .
• Get the next Word.

Output (to an output file and on the screen simultaneously)

• Your output file should be called OUT.TXT
• The source file should be called CONCORD
• You will be expected to hand in the hard copy of your source files in a folder and output as well as upload your program to blackboard.

Problem 2. (general BST)

Write a program to evaluate empirically the following strategies for removing nodes with two children:

• Replace with the largest node, X, in TL and recursively remove X.
• Alternately replace with the largest node in TL and the smallest node in TR, and recursively remove the appropriate node.
• Replace with either the largest node in TL or the smallest node in TR (recursively removing the appropriate node), making the choice randomly.(Use a random function.)

Input: use a file, BSTinput.txt

+ 3 means inserting 3 into the current BST.
- 5 means removing 5 from the current BST if 5 is in the BST.

Output:

• Print out the BST height after every insert/remove operation for each method.
• Print out the final resulting BST of each method in preorder node list with the depth of each node.