Problem Statement: ACTG DNA Stats Processor

PART 1:

1. Write a program that can generate a one-dimensional matrix of characters randomly. The allowed characters are 'A', 'C', 'T and 'G' (all capitals). Your program should be named as 'part1.cpp'.

2. Your program should ask the user for the number of characters to be generated. Your program should ideally allow the maximum number of characters possible by your memory subsystem. The higher it can allow the better.

3. Now generate all 4-character combinations for 'A', 'C', 'T' and 'G', including repetitions of any character, and generate a statistics for each combination, stating how many times it appears in the random matrix of characters. You should ideally print the statistics on the standard console; each entry must be followed by a line feed. For example, three lines in the console could look like this:

ACCC, 42421
ACCG, 44211
ACCT, 33221

Use arrays, and nested loops within the main function, where permissible.

PART 2:

4. Using your program in PART 1, write a separate library called data.hpp', and write another program named as 'part2.cpp'. The file 'part2.cpp' should include the 'data.hpp' library, together with others. The data.hpp' library should include all data related functions, e.g.:

a. The function to generate data
b. The function to create the next combination using the current combination
c. The function to count the number of occurrences in the data for a given combination
d. The function to print the outputs in a separate file called 'analysis.csv', in a comma separated format; each entry must be followed by a line feed. For example, three lines in the file could look like this:

ACCC, 42421
ACCG, 44211
ACCT, 33221

e. You should use call by reference for all functions above, when permissible, and use loops and arrays, where possible.
f. Make your main function parameterizable, so that you can instruct the internal functions to generate and analyse the data as follows:

./part2.exe -n 87385783-t 5

Where 'part2.exe' is the executable generated from the program, '-n' denotes the number of characters, followed by its value, and '-t' denotes the size of character tokens, followed by its value. Note, instead of 4 tokens as in part 1, this parameter suggests creating the tokens of five characters each, e.g. 'AACCT' and 'AACGA'.

Your solution to this should include passing arguments from part2.cpp' and using variadic functions, if possible, in the 'data.hpp', where you declare and define the functions.
g. Generate some timing statistics for each function using clock(..) included in time.h' file (see examples in the class or elsewhere), and record how their execution times vary with increasing 'n' and 't' values. Save the execution times for each n and t value in an organised table within an Excel file named 'stats.xlsx'.

Academic Honesty!
It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.
Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.