Do writings by individual authors have statistical signatures? They certainly do, and while such signatures may say little about the quality of an author's art, they can say something about literary styles of an era, and can even help clarify historical controversies about authorship. Statistical studies, for example, have shown that the Illiad and the Odyssey were not written by a single individual.

  • Use inheritance to specialize the functionality provided by existing code.
  • Write statements that process lines of text from a file.
  • Use arrays to record observations about a data set.
  • Write a class that conforms to an existing specification.

For this assignment you are to create a program that analyzes samples of text -- novels perhaps, or newspaper articles -- and produces two statistics about these texts: word size frequency, and average word length.

The program consists of three classes: FileAccessor, WordPercentagesDriver and WordPercentages. For this project you will write the WordPercentages class, which must compile and work with the FileAccessor and WordPercentagesDriver classes provided. The FileAccessor class provides basic file I/O functionality. The driver class reads in the name of a file that contains the text to be analyzed, creates an instance of WordPercentages, obtains the statistics and prints them to the console.

Here is a sample run with output:


Enter a text file name to analyze:
> AliceInWonderland.txt

Analyzed text: AliceInWonderland.txt
words of length 1: 6.98%
words of length 2: 14.30%
words of length 3: 24.41%
words of length 4: 20.86%
words of length 5: 12.96%
words of length 6: 7.81%
words of length 7: 6.01%
words of length 8: 2.79%
words of length 9: 1.85%
words of length 10: 0.80%
words of length 11: 0.48%
words of length 12: 0.21%
words of length 13: 0.17%
words of length 14: 0.32%
words of length 15 or greater: 0.06%
average word length: 4.08

Notice that the output formatting is NOT produced by the WordPercentages code. It is done by the printWordSizePercentages method in the driver class.

Your job, then, is to code a solution to this problem, and provide these two statistics - word size percentage, for word lengths from 1 to 15 and greater, and average word length (thus in the example given, 6.98 percent of the words are of length 1, 14.30 percent of the words are of length 2, and so forth. The average word length is 4.08).

The source code files provided are WordPercentagesDriver.java and FileAccessor.java. (Remove the package statements before using).

/JavaCS1/src/scrabble/wordpercentages/FileAccessor.java

/JavaCS1/src/scrabble/wordpercentages/WordPercentagesDriver.java

Here are some files to use for testing:

  • AliceInWonderland.txt
  • HeartOfDarkness.txt
  • sampletext.txt

You can obtain interesting sample texts by, for example, visiting the Gutenberg foundation website (Gutenberg.org), and downloading books from there.

PROJECT REQUIREMENTS:

  • Your WordPercentages class must extend FileAccessor class to read the lines of the text file. Points will be deducted if you do not do this properly. For example, points will be taken off if you repeat all of the superclass code in your subclass
  • Your WordPercentages class must have a constructor that takes the file name String as a parameter.
  • You must define and implement the getWordPercentages method in your WordPercentages class which takes no parameters and returns an array of type double. This array contains 16 cells (has length 16). The index of each cell is the length of the word, the value of the cell is the percentage of all words in the text that have that length. For example, the cell at index 5 has the value 12.958167330677291. This means that approximately 13% of all words in the text had a length of 5. NOTE: The output is formatted in the printWordSizePercentages method to a precision of 2 decimal places. The values in your array are not formatted. Note that the cell at index 0 will not be used, since there are no words of length 0. The cell at index 15 will have the percentage of words of length 15 and greater.
  • You must define and implement the getAvgWordLength method in your WordPercentages class which takes no parameters and returns a single double value which is the average word length that was observed in the text.
Academic Honesty!
It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.
Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.