Objective

You have received two DNA/RNA sequences from the lab in a text le in a format like the following:

43
attacgatagagctcgatttagcggggctgcggctggcgatat

and

43
atcttcttatcgtagtatgatcgggctagatgatcgatgcagt

You are to analyze each sequence in the following manner:

  • Detect the length of each (43 in both the sequences above)
  • Compute the ratio of each nucleotide in the sequence (each has 9 adenosine out of 43 or 0.2093 or 20.93%)
  • Identify its type (DNA, RNA, or undetermined) (each sequence in the example is DNA)

Also, you are to show where the two DNA/RNA sequences bond. A DNA strand would be a series of nucleotides (Adenine, Thymine, Cytosine, Guanine). RNA is the same except Thymine is replaces by Uracil. The following bonds are possible (Adenine-Thymine, Cytosine-Guanine, Guanine-Cytosine, and Adenine-Uracil). Note the bonds are symmetrical. Additionally, space or ' ' means that the lab has detected a gap in the sequence.

Thus the example above will have the following bonding pattern (the | shows bonding between the top sequence and the bottom sequence):

a t t a c g a t a g a g c t c g a t t t a g c g g g g c t g c g g c t g g c g a t a t
| | | | | | | | | | |
a t c t t c t t a t c g t a g t a t g a t c g g g c t a g a t g a t c g a t g c a g t

You analysis should be in a le named output.

Inputs

Your program should accept two lenames with a series of characters for the DNA/RNA sequences to be paired. You can either ask the user for inputs similar to how your previous assignment or optionally use program parameters. Assume lenames are no longer than 256 characters.

Example of asking for lenames (user input is in red):

Which file contains your first sequence? seq
Which file contains your second sequence? seq2
Finished processing your sequence data.
Output will be found in the file 'output'.

File contents format:

[Length of the sequence]
[String of characters representing sequence (see assumptions)]

This part of the code is ready to go, but it needs to be completed.

// Remember to include the necessary headers
/* Start of given declarations */
#define NUM_COUNTERS 5 // Same number of items in CounterType
#define MAX_SEQ_LEN 100
#define PRINT_LEN 35
#define MAX_FILENAME_LEN 256
#define NUM_BOND_TYPES 3 // Same number of items in BondType

typedef enum {
DNA,
RNA,
INVALID,
UNDETERMINED
} SequenceType;

// Note that you may use these to access arrays too! array[ADENINE] will be the
// same as array[0]. Follow this ordering presented here. CYTOSINE equals 1 and
// so forth.
typedef enum {
ADENINE = 0,
CYTOSINE,
GUANINE,
THYMINE,
URACIL,
ERROR
} NucleotideType;

// Translation function from symbol to counter index NucleotideType GetNucleotideType(char symbol);
// Translates the counter index to its character name
char GetCounterName(int type);
// Standardizes any sequence character.
char GetStandardChar(char input);
// Debug function
void PrintCounters(const int counter[]);
// Prints the name of the sequence type to file output
void PrintSequenceType(const SequenceType type, FILE *output);
/* End of given definitions */

// Identify the type of sequence from the counter information SequenceType
GetSequenceType(const int counter[]);

// Counts nucleotides in sequence and update values in counters
void CountNucleotides(const char seq[], int counters[]);

// Logic to determine whether a nucleotide pairing can form a bond.
// 1 means true, 0 means false, -1 means maybe
int IsMatch(const char sym1, const char sym2);

// Print the ratios of each nucleotides given the counters and length.
void PrintRatios(const int counter[], int total_len, FILE *output);

// Prints sequences in the following format
// A T C A G A T A C
// | | | | // T A C T T C T T T
void PrintMatches(const int length, const char seq1[], const char seq2[], FILE *output);

// Hint: you need a function to process the file data

// Main
void main () {
// Fill this or change as necessary
}
/* Given code definitions */

NucleotideType GetNucleotideType(char symbol) {
// Sift through the data
switch(symbol) {
case 'a':
case 'A': {
return ADENINE;
break;
}
case 'c':
case 'C': {
return CYTOSINE;
break;
}
case 'g':
case 'G': {
return GUANINE;
break;
}
case 't':
case 'T': {
return THYMINE;
break;
}
case 'u':
case 'U': {
return URACIL;
break;
}
default: {
// Unexpected character. Error to invalidate the sequence.
return ERROR;
}
}
}

char GetCounterName(int type) {
switch(type) {
case ADENINE: return 'A';
case CYTOSINE: return 'C';
case GUANINE: return 'G';
case THYMINE: return 'T';
case URACIL: return 'U';
default: return 0;
}
}

char GetStandardChar(char input) {
return GetCounterName(GetNucleotideType(input));
}

void PrintCounters(const int counter[]) {
for (int i = 0; i < NUM_COUNTERS; i++) {
printf("%s%c%s%dn", "Number of '", GetCounterName(i), "': ", counter[i]);
}
}

void PrintSequenceType(const SequenceType type, FILE *output) {
switch(type) {
case UNDETERMINED: {
fprintf(output, "%s", "Undetermined");
break;
}
case DNA: {
fprintf(output, "%s", "DNA");
break;
}
case RNA: {
fprintf(output, "%s", "RNA");
break;
}
default: {
fprintf(output, "%s", "Invalid");
}
}
}

/* End of given definitions */

SequenceType GetSequenceType(const int counter[]) {
// Fill this in
}

void CountNucleotides(const char seq[], int counters[]) {
// Fill this in
}

int IsMatch(const char sym1, const char sym2) {
// Fill this in
}

void PrintRatios(const int counter[], int total_len, FILE *output) {
// Fill this in
}

void PrintMatches(const int length, const char seq1[], const char seq2[], FILE *output) {
// Fill this in
}
Academic Honesty!
It is not our intention to break the school's academic policy. Posted solutions are meant to be used as a reference and should not be submitted as is. We are not held liable for any misuse of the solutions. Please see the frequently asked questions page for further questions and inquiries.
Kindly complete the form. Please provide a valid email address and we will get back to you within 24 hours. Payment is through PayPal, Buy me a Coffee or Cryptocurrency. We are a nonprofit organization however we need funds to keep this organization operating and to be able to complete our research and development projects.