1 Introduction

This assignment will require you to write a number of functions (in C) which conform to a very precise specification.

1.1 Basic behavior

You program (named "index") will analyze a set of input (from stdin) and determine how many times certain words occur. More precisely, your program will:

1. open a file named "keywords.txt" in the current directory, and read, convert to upper-case, and save a list of keywords (one word per line);

2. read input from stdin, ignoring anything other than whitespace and letters (a-z and A-Z);

3. break the input into words, where a "word" is any string of letters (a-z or A-Z) separated from following words by any amount of whitespace (or followed by EOF);

4. for each word that is read, convert it to upper-case, and if that word is found in keywords.txt, add 1 to a running tally of occurences for that word; and

5. at the end of all input, print the list of words (one per line) from key- words.txt with the number of occurrances next to each word.

1.2 Assumptions

  • There will be a maximum of 100 keywords in keywords.txt.
  • Each word in that file will be no more than 31 characters (32 if you count the null terminator) and be comprised of only letters (mixed case possible)
  • Each word you read from stdion will end up being no more than 31 char- acters
  • keywords.txt and the input from stdin will be comprised of only plain ASCII characters (no Unicode)

2 Other Requirements

  • All code must be well commented. Each .c file must include a com- ment block in the beginning, listing your name, the date, the class, the assignment number, and a description in your own words of what the code in this file does. Additionally, code should be commented throughout, explaining the purpose of each statement or small group of statements.
  • Do not assume keywords.txt exists; you must check for it, and if it does not exist or is not readable, return -1 from the readKeyWords function, print an error message from your main program and return from the main program (but if the file exists, you can assume each line contains a single word comprised of only letters a-Z and A-Z).
  • You must convert all keywords and input words to upper case.
  • You may use functions from stdio.h, string.h and ctype.h only.
  • All code must be your own, written by you and you alone.
  • There is no limit to how long the input may be, so you must process it as you read it. Do not ingest the entire input stream first and then process it; you must process as you read each word.
  • You must only read keywords.txt once, beginning at the start, reading until the end, and then closing it. You should store its contents in memory, which you may access as often as needed.
  • You must implement the Required Functions described in the next section, and they must match the indicated prototypes exactly.
  • Only the displayResults() function should produce any output; all other functions operate silently. Your main program may print error messages, but no other output should be produced by your main program.
  • You must make an "index.h" file which includes all function prototypes for the Required Functions, and include this in any code that uses these functions.
  • You must include a "good quality" Makefile (meaning it only compiles files that need to be re-compiled); your makefile must also include a "clean" target.
  • Your executable program must be buildable with the single "make" command, which must work correctly after a "make clean".
  • Your compilations should not produce any warnings (or errors).

3 Required Functions

The indicated functions must each appear in a .c file named as specified, and must be prototyped exactly as indicated.

readKeyWords.c

int readKeyWords(char *filename, char keyWords[100][32]);

This function opens the given filename, reads it one line at a time (each line containing a single word), converts the word to uppercase, and saves each word in the next location in keyWords array. The first word is stored in keyWords[0], the next in keyWords[1] and so on. The input file should be closed once the end of file is reached.

The function returns the number of words read. If the file cannot be opened, the function returns -1.

tally.c

void tally(char *word,char keyWords[100][32],int count[100],int numWords);

This function takes in a single word (already converted to uppercase), and com- pares it to the words in the keyWords array. If it matches the nth element in that array, then count[n] is incremented. Only the first numWords elements in the keyWords array should be considered.

If word is not found, no action is taken.

getWord.c

char *getWord(char *word);

This function returns the next "word" from stdin. A word is a collection of consecutive non-whitespace characters, which are followed in the input stream by whitespace or EOF. Note that words never contain non-letters (as recognized by the isalpha() function). When reading from stdin, any non-letter non- whitespace character should be ignored (do not add it to the word, do not count it as a word seperater). Lowercase letters must be converted to uppercase. See the Sample Execution section for more details.

If a word is successfully read from stdin, it is copied to the "word" argument, which is also returned as the function value. If EOF is reached, this function should return NULL.

myGetChar.c

int myGetChar();

This function operates like getchar(), returning the next character from stdin (or EOF at end-of-file), but with the following modifications:

  • if a letter is read, it is converted to upper-case before being returned;
  • if any whitespace is read, a single space (' ') is returned;
  • if a non-letter, non-whitespace character is read, myGetChar should ignore it and read the next character from stdin, and repeat the above processing.
  • Like getchar(), the return value from this function is an integer, either con- taining the character or EOF.

displayResults.c

void displayResults(char keyWords[100][32],int tally[100],int numWords);

This function prints numWords lines of output. The nth line should be key- Words[n] followed by a colon (':') followed by tally[n].

4 Sample Execution

The most confusing part of this assignment is the getWord function (which uses myGetChar to simplify its work). Suppose the input stream is:

He-llo. Th..Is i
s FUN!

Each successive call to myGetChar would return (I'm separating these by com- mas for clarity):

H,E,L,L,O, ,T,H,I,S, ,I, ,S, ,F,U,N, ,EOF

Notice that the - . and ! are ignored. A space separates the end of HELLO from the start of THIS, so those are separate words. The I on the first line is followed by a newline, which is whitespace, so I is the third word. The s in the beginning of the second line is followed by a space, so S is the fourth word. FUN is the final word, being followed by a newline (whitespace) and EOF.

The corresponding calls to getWord would thus return the following words (separated by commas):

HELLO,THIS,I,S,FUN,NULL (the value NULL, not the word "NULL")
Academic Honesty!
It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.
Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.