0

and nothing be else.

[1, 28]
[2, 14]
[3, 5]
2
  • 2
    these are lists not tuples Commented Jan 28, 2022 at 15:11
  • 1
    Can you sort your list by the second element after the first? Commented Jan 28, 2022 at 15:15

6 Answers 6

2

This solution is really simple, but does not exploit the fact that your data is sorted according to the first column.

import collections

data = [
    (1, 50),
    (1, 95),
    (1, 28),
    (2, 104),
    (2, 14),
    (3, 5),
    (3, 28),
]

mins = collections.defaultdict(lambda: float('inf'))
for a, b in data:
    if mins[a] > b:
        mins[a] = b
data_reduced = list(mins.items())
print(data_reduced)

It should be plenty fast!

The slightly advanced collections.defaultdict(lambda: float('inf')) expression results in a special kind of dictionary, which returns float('inf') (infinity) if you look up an element that is not in the dictionary. With this, we can do the mins[a] > b test without worrying about whether mins[a] fails because a might not already be in the dictionary.

Sign up to request clarification or add additional context in comments.

3 Comments

decently fast, 1 sec for 2,000,000 tuples.
@EliHarold Thanks for the testing. What if we upgrade the mins = ... line to mins = collections.defaultdict(lambda _=float('inf'): _)?
sorry I already deleted my test code, kinda not in the mood to rewrite it xD
2

Given that the outer list is already sorted by the first element (which is the premise of the question), I'd use itertools.groupby:

from itertools import groupby

for _, group in groupby(data, lambda t: t[0]):
    print(min(group, key=lambda g: g[1]))

This outputs

[1, 28]
[2, 14]
[3, 5]

1 Comment

This is the fastest solution by far. well under 1 sec for 2,000,000 tuples. others are 1sec and the np.array solution is 3 sec.
1

This runs in seconds for 2,000,000 tuples including creating and reducing the list:

from random import randint
l = []
output = []
for i in range(2000000):
    l.append((randint(1,5), randint(1,50)))
l = sorted(l)
d = {}
for tup in l:
    try:
        d[tup[0]].append(tup[1])
    except:
        d[tup[0]] = [tup[1]]
for k,v in d.items():
    output.append((k, min(v)))
print(output)

Output:

[(1, 1), (2, 1), (3, 1), (4, 1), (5, 1)]

Solution without setup given sorted l list:

d = {}
for tup in l:
    try:
        d[tup[0]].append(tup[1])
    except:
        d[tup[0]] = [tup[1]]
for k,v in d.items():
    output.append((k, min(v)))
print(output)

Comments

1

here is one way, this is O(n^2) for sorting and O(n) for finding the min:

li.sort()
res = []
for i in li:
    if not res or not i[0] == res[-1][0]:
        res.append(i)

print(res)

output:

[[1, 28], [2, 14], [3, 5]]

another method which should be way faster ( doesn't need sorting) : this should be O(n)

res = {}
for a, b in l: res[a] = min(res.get(a,b) , b)
print([*res.items()])

Comments

1

I would go with something like this (modified @jmd_dk answer). No need to use any dictionary here since the elements are sorted on the first index. That will get rid of the memory footprint associated with this dictionary and if you data set is very large that could be a big plus.

data = [
    (1, 50),
    (1, 95),
    (1, 28),
    (2, 104),
    (2, 14),
    (3, 5),
    (3, 28),
]

last_a = None
minimum = None
for a, b in data:
    # Detects the change of the first index. That's how we know it's time to start a new group and look for it's minimum
    if a != last_a:
       if last_a is not None:
          print([last_a, minimum])
       last_a = a
       minimum = None
    else:
        if minimum is None:
            minimum = b
        else:
            minimum = min(minimum, b)
print([a, minimum])

Output:

[1, 28]
[2, 14]
[3, 28]

Comments

0

A short answer with numpy and comprehensions would be:

import numpy as np

a = np.array(a)
data = [(i, j) for i, j in {k: v for k, v in a[np.argsort((-a[:,1]))].tolist()}.items()]

Assuming the input is:

a = [
[1, 50],
[1, 95],
[1, 28],
[2, 104],
[2, 14],
[3, 5],
[3, 28]
]

Output would be:

[(2, 14), (1, 28), (3, 5)]

4 Comments

This is pretty slow.
3x slower than all other solutions
@EliHarold well, its the best I came up with
that's no problem, just giving info for OP and future readers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.