5

Here are two fonctions which (I thought) should do the same thing but actually do not.

It seems that with the list comprehension, the index taken is the first that could correspond, so when you have the same value at different index, there is an ambiguity.

Is there a way to modify the list comprehension in filter2 so get the same result as in filter1 ?

  L = [98.75011926342906,
 97.8178200008178,
 98.6138182016438,
 98.55520874507613,
 98.25262038791283,
 98.75011926342906,
 99.06770073738875,
 98.66970163697574,
 98.56611283001895,
 98.47751713985852, 
 98.66970163697574,
 97.8178200008178]


def filter1(L, threshold=98.7):
    items = []
    for i in range(len(L)):
        if L[i] < threshold:
            items.append(i)
    return items

def filter2(L, threshold=98.7):
    items = [L.index(x) for x in L if  x <= threshold]
    return items

print filter1(L)
>>> [1, 2, 3, 4, 7, 8, 9, 10, 11]
print filter2(L)
>>> [1, 2, 3, 4, 7, 8, 9, 7, 1]
1
  • pay attention in the future to the small bits. What I mean is that in filter1 you use < sign, in filter2 you use <=. The result is also different because these two filters use different logic, index returns the value of the first index found (that is where the flaw is). Commented Dec 6, 2016 at 19:37

3 Answers 3

15

You can use enumerate as a helper here:

bad_items = [i for i, x in enumerate(L) if x <= threshold]

enumerate will give you pairs of (index, value) which you can unpack in comprehension
(into i, x). Then you only take i if x <= threshold.

Sign up to request clarification or add additional context in comments.

3 Comments

I think you mean for i, x in enumerate(L)
Does enumerate generate indices, or simply count iterations? It is often written as generating indices, but if you apply this code to a dataframe column with indices that do not start at 0 or are non-sequential (for instance, a slice of a dataframe), then this will not return those indices. Rather a sequence of integers beginning at 0 is returned.
enumerate just yields pairs. The first value of the pair is an incrementing integer, the second value is pulled from the iterable passed to enumerate.
3

The reason you have index 7 instead of 10 is because you have duplicate elements and index returns the smallest index at which the value is present. Besides, searching for the index takes linear time too. Your whole loop is being quadratic.

Comments

0

You can use enumerate, which assigns the position of the loop into i, x is assigned with the current value.

def filter2(L, threshold=98.7):
    items = [i for (i, x) in enumerate(L) if x <= 98.7]
    return items

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.