This assignment requires you to use lists and dictionaries, and carry out some file processing (read from and write into files).

Problem 1

Numbering lines in a text file. An instructor for a writing-intensive course evaluates students on the basis of written essays. Students submit essays electronically as text files. The instructor then provides comments on specific lines or specific paragraphs of the essay, so that the students may revise and improve them. In order to provide this kind of feedback, it would be extremely useful to have an automated way to number the lines and the paragraphs of a text file. In this problem, you are asked to write a program to accomplish this task. Further details are described below.

1. Write a function called insert line para nums with two parameters, infilename and outfilename. This function reads from a file called infilename (this is the file con- taining an essay) and writes into a file called outfilename. Each line of infilename is written exactly as it is into outfilename, but with the difference that a pair of numbers, namely the paragraph number and line number, are inserted at the start of each line. The two numbers are separated by a comma, followed by a space, and then the line of text. Remember to open and close all files as necessary.

Note the following points:

  • Blank lines separate paragraphs. There may be more than one blank line between paragraphs.
  • Line numbers start at 1 and increase sequentially until the end of the file. Even blank lines have a line number.
  • Paragraph numbers also start at 1 and increase sequentially until the end of the file. All the lines in a given paragraph have the same paragraph number. However, blank lines do not belong to any paragraph number. Therefore, all blank lines should indicate the paragraph number as 0 (zero).
  • Each paragraph number and the line number is right-justified in a field width of 4 (we'll assume that none of the essays have more than 9999 lines).

For example, suppose the essay is as follows:

My essay is kind of short. It
is only going to have a few inarticulate
lines and even
fewer paragraphs.

The second paragraph
has arrived, and you can see
it’s not much.

The third paragraph now arrives
and departs as hastily.

The output file should look like this:

0 , 1
1 , 2 My essay is kind of short. It
1 , 3 is only going to have a few inarticulate
1 , 4 lines and even
1 , 5 fewer paragraphs.
0 , 6
0 , 7
0 , 8
2 , 9 The second paragraph
2 , 10 has arrived, and you can see
2 , 11 it’s not much.
0 , 12
3 , 13 The third paragraph now arrives
3 , 14 and departs as hastily.

2. contains paragraph and line numbers inserted at the beginning of each line (as described above). The second and third parameters are integers with start <= finish. The function should print all lines of the file starting at line number start up to (and including) line number finish. Note that all lines, including blank lines, should be printed. The lines may also span one or more paragraphs. If start is larger than the last line number in the file, nothing is printed. The function is responsible for opening and closing the file.

3. Write a function called print paragraph that has two parameters: lpnumfilename and paranum. As above, the first parameter lpnumfilename is the name of a file that contains paragraph and line numbers inserted at the beginning of each line. The function should print all lines (and only those lines) of the file that belong to paragraph number paranum. If there is no paragraph numbered paranum, nothing is printed.

Problem 2

Frequency distribution of characters in a file. Several problems in computer science require the frequency distribution of characters in a file. For example, a famous and widely used method called Huffman coding is used to compress data by using fewer bits to encode more frequently occurring characters. In order to do this, the frequency distribution of the characters in the file must be determined first. Compression methods used to transmit digital images and video over the internet use some form of Huffman coding.

In this problem, you are asked to find the frequency distribution of the characters in a given text file, i.e., the number of times each character occurs in the file. The character could be any printable non-whitespace character on the keyboard (for example, upper/lower case letters, digits, punctuation marks etc.). Note that we ignore whitespace characters (space, tab, and newline characters). You will use a dictionary to carry out this task. In particular, you are asked to implement the following functions:

1. A function called freq distribution with two parameters: infile and distfile. The first parameter, infile, is the name of the file for which we want to compute the frequency distribution. The second parameter, distfile is the name of the file into which the frequency distribution is to be written in sorted order of the characters. That is, the characters should be listed in alphabetically sorted order in the output file. The built-in function sort() will be useful here. Each line of the file should contain the character and its frequency. Use string formatting to make sure the output is neatly formatted.

In order to carry out the above task, you must first create a dictionary to store the frequency distribution. You will do this by calling the helper function freq dictionary described in #(3) below. Remember to close distfile when you are done writing into it.

2. A function called ordered freq distribution with two parameters: infile and ordered distfile. The first parameter, infile, is the name of the file for which we want to compute the frequency distribution. The second parameter, ordered distfile is the name of the file into which the frequency distribution is to be written in sorted order of the frequency. That is, the characters should be printed into the file in decreasing order of frequency. If several characters have the same frequency, they should be printed in alphabetical order. Once again, the built-in function sort() will be useful here. Use string formatting to make sure that the output in ordered distfile is neatly formatted.

Once again, in order to carry out this task, you must first create a dictionary to store the frequency distribution by calling the helper function freq dictionary described in #(3) below. Remember to close ordered distfile when you are done writing into it.

3. A function called freq dictionary with a single parameter infile, the name of the text file for which we want to compute the frequency distribution. The function should first open infile for reading. It should then create a dictionary with key:value pairs, where the key is a character occurring in infile and its associated value is the frequency of that character in the file (i.e., the number of times it occurs in the file). Note that every non-whitespace character (characters other than newline, space, and tab) occurring in the file should be included in the dictionary. Upper and lower case letters should be kept distinct. Characters that do not occur in the file should not be included in the dictionary. The function must return the dictionary. Remember to close the file prior to the return statement.

For example, suppose essay.txt is the name of a file containing the following (uninteresting) essay:

how now brown cow

Then, after the function call freq distribution("essay.txt", "freq.txt"), a file called freq.txt should contain the following:

b 1
c 1
h 1
n 2
o 4
r 1
w 4

After the function call ordered freq distribution("essay.txt", "ordfreq.txt"), a file called ordfreq.txt should contain the following:

o 4
w 4
n 2
b 1
c 1
h 1
r 1

Problem 3

Calculating student grades. This problem requires you to read data from several files (containing numeric scores on homeworks, quizzes, and exams for students in a course), use a dictionary to collect information from the data in those files, and finally to write a combined course grade roster into an output file. Evaluation criteria for the course are described below:

1. Four homework assignments are handed out. Each homework is out of 100 points. The lowest homework score is dropped, and the remaining three scores determine the final homework score. The homeworks are worth 30% of the final grade.

2. Eight quizzes are given in class. Each quiz is out of 50 points. The lowest and the highest quiz scores are dropped, and the remaining six scores determine the final quiz score. The quizzes are also worth 30% of the final grade.

3. Two comprehensive exams are given. Each exam is out of 200 points. (No exam scores are dropped!) The exams are worth 40% of the final grade.

The final total score is simply the sum of the combined homework score (out of 30), the combined quiz score (out of 30), and the final exam score (out of 40), each calculated as described above. All scores are real values. The letter grade is determined according to the usual scale: A final total score of 90.0 or higher is an A, 85.0 or higher is a B+, 80.0 or higher is a B, 75.0 or higher is a C+, 70.0 or higher is a C, 60.0 or higher is a D, and anything lower than 60.0 is an F.

The data for this course is available in four separate input files:

1. A file called "studentids.txt" that contains simply the list of identification (ID) num- bers for the students in the course with one entry per student. Each line of the file contains just an ID, which is a string of characters in the usual format xxx-xx-xxxx.

2. A file called "hwscores.txt" that contains a dump of all the homework scores for all the students in the course. Each line of the file contains simply an ID followed by a single homework score. A homework score is an integer value between 0 and 100 (inclusive). Observe that the same ID will, in general, appear on several lines in the file, once for every homework submission made by the student with that ID. The number of entries for a student is equal to the number of homework submissions made by that student. If a student submits fewer than 4 homeworks, the remaining homework scores are simply zero (keep this in mind when dropping the lowest score). The IDs may appear in any order in the file (in other words, you should not assume that the same ID will appear consecutively in the file).

3. A file called "quizscores.txt" that contains a dump of all the quiz scores for all the students in the course. Each line of the file contains simply an ID followed by a single quiz score. A quiz score is an integer value between 0 and 50 (inclusive). As above, the same ID may appear several times in the file, once for every quiz taken by the student with that ID. The number of entries for a student is equal to the number of quizzes taken by that student. If a student takes fewer than 8 quizzes, the remaining quiz scores are zero. Again, the IDs may appear in any order in the file.

4. A file called "examscores.txt" that contains a dump of all the exam scores for all the students in the course. Each line of the file contains simply an ID followed by a single exam score. Once again, the same ID may appear more than once in the file and the IDs appear in any order.

Your program should create a neatly formatted output file called "graderoster.txt" that contains one line of data per student in the format described below:

1. The first line should contain the column headings, which are RUID, HW(30), QUIZ(30), EXAM(40), TOTAL(100), GRADE. The width of the first column must be 14 characters long. The width of the second, third, and fourth columns should be 10 characters long, the width of fifth column should be 12 characters long, and the width of the sixth column should be 10 characters long. Print the first column header left-justified and the remaining ones right-justified.

2. The next line should be a series of hyphens ('-) (as seen in sample output file).

3. The rest of the file should contain one line of information per student. Each line should contain the ID in the first column (width 14), the final homework score out of 30 (width 10), the final quiz score out of 30 (width 10), the final exam score out of 40 (width 10), the final total score out of 100 (width 12), and the letter grade (width 10). The RUID must be left-justified, the scores must be right-justified, and the letter grade must be right justified as well. Each score should be printed as a real value with a precision of 2.

4. After the above has been printed, print a blank line, followed by the maximum, minimum, and average scores for the homeworks, quizzes, exams, and totals.

You are asked to accomplish the above task by implementing the following in the module problem3.py:

1. A function called create dictionary with four parameters: idfilename, hwfilename, qzfilename, and examfilename. These parameters are, respectively, the names of the files containing student IDs, homework scores, quiz scores, and exam scores (in the format described above). The function should open and close all necessary files. This function returns a dictionary of key:value pairs in which the key is the student ID and the value is itself a dictionary containing the keys "hw", "quiz", and "exam". The values associated with these keys are lists of length 4, 8, and 2 containing, respectively, the homework, quiz, and exam score data for that student. An example of a key:value pair in such a dictionary is shown below:

"123-45-6789":{"hw":[98,89,92,75], "quiz":[45,36,42,50,29,27,40,41], "exam":[175,157]}

2. A function called create graderoster with two parameters: sdata dict and outfilename. The first parameter, sdata dict, is a dictionary of the type returned by the above func- tion create dictionary. The second parameter, outfilename, is the name of the output file in which a neatly formatted grade roster will be printed (exactly as described above). The function is responsible for opening and closing the file.

3. So that I may run your program in the shell, include appropriate calls to the above functions in the body of the following if statement:

if __name__ == "__main__":

A set of sample input files is available on Canvas (these are the files studentids.txt, hwscores.txt, quizscores.txt, and examscores.txt) along with the output file created for that data (this is the file SRgraderoster.txt that was created by my Python code for this problem). The output file created by your create graderoster function should look similar. Feel free to create additional data files to test your functions.

Academic Honesty!
It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.
Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.