[leetcode] Repeated DNA Sequences

Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

Just use an array as hashtable to avoid Memory Limit Exceed.

The total number of combinations of “ACGT” string in the length of 10 is 4^10 which is 2^20. We allocate a sort data type to each combination storing the number of occurrence of it.

The memory use is 1MB.

O(n) in time, O(1) in space.

Warning, when we use “<<” operator, remember to use parentheses to ensure the result is exactly what you want.

//A C G T
//0 1 2 3

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        if(s.size() <= 10) return vector<string>();
        short map[0x00100000];
        memset(map, 0, sizeof(map));
        vector<string> ans;
        unordered_map<char, int> idxTable;
        idxTable['A'] = 0;
        idxTable['C'] = 1;
        idxTable['G'] = 2;
        idxTable['T'] = 3;
        for(size_t i = 0; i < s.size() - 9; i++){
            string subs = s.substr(i, 10);
            int idx = 0;
            for(size_t j = 0; j < subs.size(); j++){
                idx = (idx << 2) + idxTable[subs[j]];//BUG HERE, the priority of operators.
            }
            map[idx]++;
            if(map[idx] > 3) map[idx] = 3; //avoid overflow
            if(map[idx] == 2){
                ans.push_back(subs);
            }
        }
        return ans;
    }
};

Untitled

O(n)space，force solution, Memory Limit Exceed.

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        unordered_map<string, int> map;
        if(s.size() <= 10) return vector<string>();
        for(size_t i = 0; i < s.size() - 9; i++){
            string subs = s.substr(i, 10);
            if(map.find(subs) != map.end()){
                map[subs] += 1;
            }
            else{
                map[subs] = 1;
            }
        }
        vector<string> ans;
        for(auto it = map.begin(); it != map.end(); it++){
            if((*it).second > 1){
                ans.push_back((*it).first);
            }
        }
        return ans;
    }
};

Bo Song

Life is beautiful. Enjoy.

Life is beautiful. Enjoy.

[leetcode] Repeated DNA Sequences

Leave a comment Cancel reply