For this computer assignment, you are to write and implement a Python program to scan and process an input text stream. Assume that the words in the input stream are separated by whitespaces.

After reading each word, filter the word as follows: If the first letter of the word is preceded or its last letter is followed by non-alphabetical characters (punctuation marks and/or numerical digits), erase them from the word; however, if the word contains non- alphabetical characters in the middle, ignore the alphabetical characters beyond them, and if the parts of a word separated by dashes, then consider each part as a separate word. For example, the word "William?Mary" should be filtered as the word William, and the word fishnet should be filtered as the word fish and the word net. You also need to convert each uppercase letter to a lowercase for the final list.

The syntax of the usage of your program should be: prog5.py inputFile prog5.out, where inputFile is the name of the input file that contains a text stream and prog2.out is the name of the output file that will contain the individual words and their frequencies from the input stream.

In your program, use a subroutine for each individual task as explained below:

  • def checkArgs ( ): Checks the number of input arguments and prints a diagnostic error message, stating the correct usage of the Python program, on stderr and aborts the program if the program does not have the correct number of arguments. (You can use the member function exit ( ) from the package sys for this task.) For the correct number of arguments, it returns the names of input and output files (as a tuple) given on the command line, which are stored in argv [ ] from the sys package.
  • def openFiles ( files ): Opens the input and output files. The input argument files [ ] contains the names of these files. It prints a diagnostic error message on stderr and aborts the execution of program if either file cannot be opened. If both files can be opened, it returns the generated file objects for these files ( as a tuple).
  • def closeFiles ( fobjects ): Closes the input and output files. The input argument fobjects [ ] contains the file objects generated from these files. It prints a diagnostic error message on stderr and aborts the execution of the program if either file cannot be closed.
  • def createList ( inFileObj ): Creates a list of words from the text stored in the input file, where the input argument inFileObj is the object for this file. It splits the input stream using the whitespaces and dashes as separators and returns the final list.
  • def createDictionary ( words ): Removes all non-alphabetical characters from each word in the input argument words [ ] and puts each word and its corresponding frequency in a dictionary, excluding the empty words and converting each word to lowercase letters before inserting them in the dictionary. At the end, the routine returns the dictionary. One of the easy ways to remove non-alphabetical characters from an input string is to use a regular expression (RE). The package re contains all features of REs that you might need to write a RE.
  • def main ( ): Prints the size of dictionary and its contents - each word and its frequency. To save space, it prints only three words per line, where each word is left-aligned, and its frequency is right-aligned. This routine first calls the routine openFiles ( checkArgs ( ) ), and then calls the routines createList ( ) and createDict ( ). Before writing the cleaned words in the dictionary to the output file, it prints the full-path name of the input file. To get the full-path name of a file, import the package os in your program and use the function: path.abspath ( file-object.name ). At the end, it calls the routine closeFiles ( ) to close both the input and output files.
Academic Honesty!
It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.
Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.