0

Given two sequences of data (of equal length) and quality values for each data point, I want to calculate a similarity score based upon a given scoring matrix.

What is the most efficient way to vectorize the following loop:

score = 0
for i in xrange(len(seq1)):
    score += similarity[seq1[i], seq2[i], qual1[i], qual2[i]]

similarity is a 4-dimensional float array, shape=(32, 32, 100, 100); seq1, seq2, qual1 and qual2 are 1-dimensional int arrays of equal length (of the order 1000 - 40000).

2 Answers 2

3

Shouldn't this Just Work(tm)?

>>> score = 0
>>> for i in xrange(len(seq1)):
        score += similarity[seq1[i], seq2[i], qual1[i], qual2[i]]
...     
>>> score
498.71792400493433
>>> similarity[seq1,seq2, qual1, qual2].sum()
498.71792400493433

Code:

import numpy as np

similarity = np.random.random((32, 32, 100, 100))
n = 1000
seq1, seq2, qual1, qual2 = [np.random.randint(0, s, n) for s in similarity.shape]

def slow():
    score = 0
    for i in xrange(len(seq1)):
        score += similarity[seq1[i], seq2[i], qual1[i], qual2[i]]
    return score

def fast():
    return similarity[seq1, seq2, qual1, qual2].sum()

gives:

>>> timeit slow()
100 loops, best of 3: 3.59 ms per loop
>>> timeit fast()
10000 loops, best of 3: 143 us per loop
>>> np.allclose(slow(),fast())
True
Sign up to request clarification or add additional context in comments.

3 Comments

Well that's just great. I like this better than my own answer (though I may leave mine just for contrast). +1.
That's a really neat numpy feature I didn't know about.
Thanks! - Timings on my machine (including John Zwinck's answer as 'third') for 1000 iterations of length 10000: slow: 17.1289219856, fast: 0.61208987236, third: 15.7027080059
0

How about this?

score = numpy.sum(map(similarity.__getitem__, zip(seq1, seq2, qual1, qual2)))

Of course you can try with itertools imap and izip too. The zip is necessary because __getitem__ takes a single tuple rather than four numbers...maybe that can be improved somehow by looking in a darker corner of the itertools module.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.