In this project, we will work on text processing and analysis. Text analyzers could be used to identify the language in which a text has been written (language detection), to identify keywords in the text (keyword extraction) or to summarize and categorize a text. You will calculate the letter (character) frequency in a text. Letter frequency measurements can be used to identify languages as well as in cryptanalysis. You will also explore the concept of n-grams in Natural Language Processing. N-grams are sequential patterns of n-words that appear in a document. In this project, we are just considering uni-grams and bi-grams. Uni-grams are the unique words that appear in a text whereas bi-grams are patterns of two-word sequences that appear together in a document.

Write a Java application that implements a basic Text Analyzer. The Java application will analyze text stored in a text file. The user should be able to select a file to analyze and the application should produce the following text metrics:

  • Number of characters in the text.
  • Relative frequency of letters in the text in descending order. (How the relative frequency that you calculated compares with relative letter frequencies in English already published?)
  • Number of words in the text.
  • The sizes of the longest and the shortest word.
  • The twenty most repeated uni-grams (single words) in the text in descending order.
  • The twenty most repeated bi-grams (pairs of words) in the text in descending order.

Test your program in the file TheGoldBug1.txt, which is provided.

Academic Honesty!
It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.
Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.