and nothing be else.
[1, 28]
[2, 14]
[3, 5]
and nothing be else.
[1, 28]
[2, 14]
[3, 5]
This solution is really simple, but does not exploit the fact that your data is sorted according to the first column.
import collections
data = [
(1, 50),
(1, 95),
(1, 28),
(2, 104),
(2, 14),
(3, 5),
(3, 28),
]
mins = collections.defaultdict(lambda: float('inf'))
for a, b in data:
if mins[a] > b:
mins[a] = b
data_reduced = list(mins.items())
print(data_reduced)
It should be plenty fast!
The slightly advanced collections.defaultdict(lambda: float('inf')) expression results in a special kind of dictionary, which returns float('inf') (infinity) if you look up an element that is not in the dictionary. With this, we can do the mins[a] > b test without worrying about whether mins[a] fails because a might not already be in the dictionary.
mins = ... line to mins = collections.defaultdict(lambda _=float('inf'): _)?Given that the outer list is already sorted by the first element (which is the premise of the question), I'd use itertools.groupby:
from itertools import groupby
for _, group in groupby(data, lambda t: t[0]):
print(min(group, key=lambda g: g[1]))
This outputs
[1, 28]
[2, 14]
[3, 5]
This runs in seconds for 2,000,000 tuples including creating and reducing the list:
from random import randint
l = []
output = []
for i in range(2000000):
l.append((randint(1,5), randint(1,50)))
l = sorted(l)
d = {}
for tup in l:
try:
d[tup[0]].append(tup[1])
except:
d[tup[0]] = [tup[1]]
for k,v in d.items():
output.append((k, min(v)))
print(output)
Output:
[(1, 1), (2, 1), (3, 1), (4, 1), (5, 1)]
Solution without setup given sorted l list:
d = {}
for tup in l:
try:
d[tup[0]].append(tup[1])
except:
d[tup[0]] = [tup[1]]
for k,v in d.items():
output.append((k, min(v)))
print(output)
here is one way, this is O(n^2) for sorting and O(n) for finding the min:
li.sort()
res = []
for i in li:
if not res or not i[0] == res[-1][0]:
res.append(i)
print(res)
output:
[[1, 28], [2, 14], [3, 5]]
another method which should be way faster ( doesn't need sorting) : this should be O(n)
res = {}
for a, b in l: res[a] = min(res.get(a,b) , b)
print([*res.items()])
I would go with something like this (modified @jmd_dk answer). No need to use any dictionary here since the elements are sorted on the first index. That will get rid of the memory footprint associated with this dictionary and if you data set is very large that could be a big plus.
data = [
(1, 50),
(1, 95),
(1, 28),
(2, 104),
(2, 14),
(3, 5),
(3, 28),
]
last_a = None
minimum = None
for a, b in data:
# Detects the change of the first index. That's how we know it's time to start a new group and look for it's minimum
if a != last_a:
if last_a is not None:
print([last_a, minimum])
last_a = a
minimum = None
else:
if minimum is None:
minimum = b
else:
minimum = min(minimum, b)
print([a, minimum])
Output:
[1, 28]
[2, 14]
[3, 28]
A short answer with numpy and comprehensions would be:
import numpy as np
a = np.array(a)
data = [(i, j) for i, j in {k: v for k, v in a[np.argsort((-a[:,1]))].tolist()}.items()]
Assuming the input is:
a = [
[1, 50],
[1, 95],
[1, 28],
[2, 104],
[2, 14],
[3, 5],
[3, 28]
]
Output would be:
[(2, 14), (1, 28), (3, 5)]