CS101 - Lab 3

Introduction to Java programming: arrays

The goal for this lab is to put together everything you've seen so far in this class: basic instructions, conditionals, loops and list variablesarrays. Start by reviewing the problems we did in class and read the handouts. Then go ahead and start on the problems below.

Programming style: Pay attention to issues of programming style:

I will take points off if you do not follow the programming style guidelines above. Read carefully the handout on programming style handed out in class.

Work individually, and call me if you need help. You are encouraged to discuss ideas and techniques broadly with other class members, but not specifics. Discussions should be limited to questions that can be asked and answered without using any written medium (e.g. pencil and paper or email). You should at no point look at the screen of your colleagues.


    DNA computing

    The human genome is composed of a sequence of approximately 3.5 billion nucleotides, each of which can be one of only four different chemical compounds : Adenine, Cytosine, Thymine, Guanine. These nucleotides are usually referred to by their first letter: A, C, T and G. Thus, our DNA, the basis of our life, turns out to be a very long sequence of letters, written in a four-letter alphabet.

    ....T A G C C A G T A A C T A A G C T...
    

    The DNA is made up of genes, which are smaller sequences of nucleotides, usually about tens of thousands long. Genes are responsible for basic functions of our body. Gene and DNA sequences for many organisms are being collected in large databases, and a rich set of computational methods are available to analyze patterns in them. Gene analysis using computers is now a self-contained branch of Computer Science, Biology and Genetics called Bioinformatics. The goal is to understand the function and evolution of genes, which will bring a deeper understanding of the evolution of life, as well as new means to treat diseases.

    For the sake of this lab we will not work with entire DNA sequences, but with small fractions.

  1. Write an algorithm that reads two DNA sequences from the user, each precisely (say) 10 letters long. Your program should find out whether the two DNA sequences are the same or not. Assume that two DNAs are the same if they match leter by letter, either forwards or backwards. For example:
    A C A A G T C     and      A C A A G T C
    
    match. Also
    A C A A G T C     and      C T G A A C A
    
    match.

    Your algorithm should print one of the folowing:

    DNA1 matches with DNA2 forward.
    DNA1 matches with DNA2 backwards.
    DNA1 does not match with DNA2.
    

    Before you start coding, think how you are going to solve the problem. That is, how are you going to check if two arrays are identical? You cannot test equality for arrays; i.e. you cannot say if (dna1 == dna2). You will have to write a loop that checks whether every element is equal.

    Design the inter face of your program so that it is user friendly. For example,

    Enter DNA1: 
    ...
    Enter DNA2: 
    ...
    You entered: 
    DNA1: A C A A G T C
    DNA2: A C A A G T C
    
    DNA1 matches with DNA2 forward.
    Goodbye.
    

  2. Same problem as before, except that now the two DNAs are allowed to have at most (say) 4 letter mismatches, either forward or backward.

    Here are some examples, assuming, for simplicity, that the length of a DNA is 6 and the number of allowed mismatches is 3.

     G T G G C A   and     A T A G C G 
    match forward, with 3 mismatches (first, third and last position).
     G T G A C A   and     A C A G T C 
    If we try to align them forward, we find 6 mismatches, therefore they do not match forwards. However, when we try to match them backwards, we see that they in fact match with 1 mismatch.

    Your algorithm should print one of the folowing:

    DNA1 matches with DNA2 forward with xx mismatches.
    DNA1 matches with DNA2 backwards with xx mismatches.
    DNA1 does not match with DNA2.
    

    For this problem, if you are bored to have to type in the DNAs every single time, you are allowed to use inline initialization of the DNA arrays, i.e.:

    char[] dna1 = {A, C, T, C, T, T, A, G, G};
    char[] dna2 = {G, C, C, C, T, T, A, G, G};
    
    Feel free to play with sizes different than 10. Use final variables whenever appropriate.

    What to turn in:

    Note that the first problem is actually a special case of the second problem, with the number of allowed mismatches set to 0.

    If your second program is written elegantly, it will have a final variable that stores the number of allowed mismatches. I will check your first program by setting this variable to 0. If your program allows me to do this, then you only need to turn in the second program.

    If, for whatever reason, I cannot easily check your first problem using your solution to the second problem, then do send me both files.