In this homework, you have been given a selection of random biography pages from Wikipedia. This information can be found in the file "206_hw5_wiki_bios.txt". Your assignment is to use regular expressions to extract information from these biographies. To be clear, each function (except for read_file) must pull the appropriate pieces from the text using regex.

To do so, you will complete the following functions in HW5.py:

1. find_bio_names(string_list)

This function returns a dictionary with the keys being numbers (1 - 10) and the values being the names of each biography subject. This function should use a regular expression to find the biography pattern and then add each name to a dictionary with the keys representing that name's position in the list of biographies and the values being the names themselves.

The expected output should be in the format:

{1: "Mike Kearney", 2: "Margit Symo", ... }

2. find_possessives(string_list)

This function finds all possessives used in the text file and then returns them in a list. A word counts as a possessive if it includes letters before and after an apostrophe. Valid possessives might include: Holden's, Julias, etc. Note that there are many apostrophes present in the text file that dont meet these criteria.

3. find_section_headings(string_list)

This function finds and returns all the section headings which match the following conditions:

  • The heading should start with 2 or more equals signs, like "=="
  • There are words between the equals signs
  • The heading ends with 2 or more equals signs, like "=="
  • The heading has the same number of equals signs on each side

These are examples of valid section headings:

==Albums==
===Producer Compilation Albums===

5. find_birth_years(string_list)

This function returns a dictionary where the keys are the names of the biography subjects and the values are integers representing each subject's year of birth. Where the year of birth is unknown, you should save the string 'unknown instead of a year.

Example:

{'Mike Kearney': 1953, 'Alexander Champion': unknown, etc.}

6. Make at least 3 test cases each for find_bio_names, find_possessives, find_section_headings, and find_birth_years.

Extra Credit

Write a function count_mid(string_list, middle) to return a count of the number of times a specified string appears in a file. It should match the string that is in the middle of a word (not the beginning or the end). For example, if called with "be" it should match "number" but not "vibe". Make sure to account for punctuation (e.g., ',' or ?) in your regular expression. You MUST use a regular expression to earn credit for this part. (We will not be checking if you make tests for the extra credit, but feel free to write your own tests if it will help you complete this problem!)

Starter Code

# Your name:
# Your student id:
# Your email:
# List who you have worked with on this homework:

import re, os, unittest

def read_file(filename):
""" Return a list of the lines in the file with the passed filename """

# Open the file and get the file object
source_dir = os.path.dirname(__file__) #<-- directory name
full_path = os.path.join(source_dir, filename)
infile = open(full_path,'r', encoding='utf-8')

# Read the lines from the file object into a list
lines = infile.readlines()

# Close the file object
infile.close()

# return the list of lines
return lines

def find_bio_names(string_list):
"""
This function returns a dictionary with the keys being numbers, (1 - 10)
and the values being the names of each biography subject
"""
pass

def find_possessives(string_list):
"""
This function finds all (real, English language) words with an apostrophe in them
"""
pass

def find_section_headings(string_list):
"""
This functions returns a list of section headings in the list of strings
"""
pass

def find_birth_years(string_list):
"""
This function returns a dictionary where the keys are names and the values are corresponding birth years
If the birth year is unknown, use the string 'unknown' in place of a birth year
Hint: you could call your find_bio_names function here to help
"""
pass

## Extra credit
def count_mid(string_list, middle):
"""
This function returns a count of the number of times a specified string appears
in the text. The matched string should be in the middle of a word, not at
the start of end
"""
pass

#Implement your own tests
class TestAllMethod(unittest.TestCase):

def test_find_bio_names(self):
pass

def test_find_possessives(self):
pass

def test_find_section_headings(self):
pass

def test_find_birth_years(self):
pass

#Uncomment if working on Extra Credit
#def test_count_mid(self):
# pass

def main():
#Feel free run your functions here as well!

if __name__ == '__main__':
main()
print()
unittest.main(verbosity=2)
Academic Honesty!
It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.
Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.