logo CodeStepByStep logo

dna_errors

Language/Type: Python string parameters return
Author: Keith Schwarz (on 2017/07/08)

Write a function named dna_errors that accepts two strings representing DNA sequences as parameters and returns an integer representing the number of errors found between the two sequences, using a formula described below. DNA contains nucleotides, which are represented by four different letters A, C, T, and G. DNA is made up of a pair of nucleotide strands, where a letter from the first strand is paired with a corresponding letter from the second. The letters are paired as follows:

  • A is paired with T and vice-versa.
  • C is paired with G and vice-versa.

Below are two perfectly matched DNA strands. Notice how the letters are paired up according to the above rules.

"GCATGGATTAATATGAGACGACTAATAGGATAGTTACAACCCTTACGTCACCGCCTTGA"
 ororororororororororororororororororororororororororororor|
"CGTACCTAATTATACTCTGCTGATTATCCTATCAATGTTGGGAATGCAGTGGCGGAACT"

In some cases, errors occur within DNA molecules the task of your function is to find two particular kinds of errors:

  • Unmatched nucleotides, in which one strand contains a dash ('-') at a given index, or does not contain a nucleotide at the given index (if the strings are not the same length). Each of these counts as 1 error.
  • Pomutations, in which a letter from one strand is matched against the wrong letter in the other strand. For example, A might accidentally pair with C, or G might pair with G. Each of these counts as 2 errors.

For example, consider these two DNA strands:

index 01234567890123456789012
     "GGGA-GAATCTCTGGACT"
     "CTCTACTTA-AGACCGGTACAGG"

This pair of strands has three pomutations (at indexes 1, 15, and 17), and seven unmatched nucleotides (dashes at indexes 4 and 9, and nucleotides in the second string with no match at indexes 18-22). The pomutations count as a total of 3 * 2 = 6 errors, and the unmatched nucleotides count as 7 * 1 = 7 errors, so your function would return an error count of 6+7 = 13 total errors if passed the two above strands.

You may assume that each string consists purely of the characters A, C, T, G, and - (the dash character), but the letters could appear in either upper or lowercase. The strings might be the same length, or the first or second might be longer than the other. Either string could be very long, very short, or even the empty string. If the strings match perfectly with no errors as defined above, your function should return 0.

Function: Write a Python function as described, not a complete program.

You must log in before you can solve this problem.

Log In

Need help?

Stuck on an exercise? Contact your TA or instructor.

If something seems wrong with our site, please

Is there a problem? Contact us.