Python append performance

Question

I'm having some performance problems with 'append' in Python. I'm writing an algorithm that checks if there are two overlapping circles in a (large) set of circles. I start by putting the extreme points of the circles (x_i-R_i & x_i+R_i) in a list and then sorting the list.

class Circle:
def __init__(self, middle, radius):
    self.m = middle
    self.r = radius

In between I generate N random circles and put them in the 'circles' list.

"""
Makes a list with all the extreme points of the circles.
Format = [Extreme, left/right ~ 0/1 extreme, index]
Seperate function for performance reason, python handles local variables faster.
Garbage collect is temporarily disabled since a bug in Python makes list.append run in O(n) time instead of O(1)
"""
def makeList():
    """gc.disable()"""
    list = []
    append = list.append
    for circle in circles:
        append([circle.m[0]-circle.r, 0, circles.index(circle)])
        append([circle.m[0] + circle.r, 1, circles.index(circle)])
    """gc.enable()"""
    return list

When running this with 50k circles it takes over 75 seconds to generate the list. As you might see in the comments I wrote I disabled garbage collect, put it in a separate function, used

append = list.append
append(foo)

instead of just

list.append(foo)

I disabled gc since after some searching it seems that there's a bug with python causing append to run in O(n) instead of O(c) time.

So is this way the fastest way or is there a way to make this run faster? Any help is greatly appreciated.

"""String literals""" are not # comments. And docstrings must go inside the function, not before the function. — Sven Marnach
– Sven Marnach, Commented Apr 14, 2011 at 12:37
@eumiro, CrazyJugglerDrummer: true, changed to cirkeList instead. @Sven: Not used to the Python way of commenting things yet, I'll keep your advice in mind. — Harm De Weirdt
– Harm De Weirdt, Commented Apr 14, 2011 at 12:57

eumiro · Accepted Answer · 2011-04-14 12:38:52Z

15

Instead of

for circle in circles:
    ... circles.index(circle) ...

use

for i, circle in enumerate(circles):
    ... i ...

This could decrease your O(n^2) to O(n).

Your whole makeList could be written as:

sum([[[circle.m[0]-circle.r, 0, i], [circle.m[0]+circle.r, 1, i]] for i, circle in enumerate(circles)], [])

edited Apr 14, 2011 at 12:38

answered Apr 14, 2011 at 12:30

eumiro

214k36 gold badges307 silver badges264 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

theheadofabroom Over a year ago

@Harm this is good advice - you should also consider using a deque for this (see collections module) as it has reportedly marginally better performance for append operations. You may then want to convert it to a list at the end though, so the savings might not outweigh the overhead.

Harm De Weirdt Over a year ago

Ah ok, I was looking at the wrong thing the whole time.. Thanks for the answer :) Care to explain why getting the index of a circle is O(N²)? O(n) I would understand since getting the index of an object in a list equals linearly searching trough the list until you find that object.

Sven Marnach Over a year ago

@Harm: The linear search is done in each of the n iterations of the for-loop, giving a total of O(n*n).

eumiro Over a year ago

@Harm: one single .index() is O(n), but doing it for every element of the list makes it O(n^2).

user395760 Over a year ago

He's misusing sum to concatenate lots of lists - sum([x, y, z] default) is default + x + y + z. Has quite suboptimal performance by the way - it first creates a large list in memory, then makes n time O(n) concatenation with them. A better one-liner would be list(item for i, circle in enumerate(circles) for item in ([circle.m[0]-circle.r, 0, i], [circle.m[0]+circle.r, 1, i]) which avoids having all data in memory before construction the list and builds the whole result list at once without concats.

|

Sven Marnach · Accepted Answer · 2011-04-14 12:38:50Z

7

Your performance problem is not in the append() method, but in your use of circles.index(), which makes the whole thing O(n^2).

A further (comparitively minor) improvement is to use a list comprehension instead of list.append():

mylist = [[circle.m[0] - circle.r, 0, i]
          for i, circle in enumerate(circles)]
mylist += [[circle.m[0] + circle.r, 1, i]
           for i, circle in enumerate(circles)]

Note that this will give the data in a different order (which should not matter as you are planning to sort it anyway).

edited Apr 14, 2011 at 12:38

answered Apr 14, 2011 at 12:31

Sven Marnach

608k123 gold badges966 silver badges865 bronze badges

1 Comment

Harm De Weirdt Over a year ago

I'm quite new to Python and am still learning about python-esque things like list comprehensions. Thanks for the answer :)

Dane Lee · Accepted Answer · 2017-11-06 05:18:44Z

I've just tried several tests to improve "append" function's speed. It will definitely helpful for you.

Using Python
Using list(map(lambda - known as a bit faster means than for+append
Using Cython
Using Numba - jit

CODE CONTENT : getting numbers from 0 ~ 9999999, square them, and put them into a new list using append.

Using Python

import timeit

st1 = timeit.default_timer()

def f1():

    a = range(0, 10000000)

    result = []
    append = result.append

    for i in a:
        append( i**2 )

    return result

f1()


st2 = timeit.default_timer()
print("RUN TIME : {0}".format(st2-st1))

RUN TIME : 3.7 s

Using list(map(lambda

import timeit

st1 = timeit.default_timer()

result = list(map(lambda x : x**2 ,  range(0,10000000) ))

st2 = timeit.default_timer()
print("RUN TIME : {0}".format(st2-st1))

RUN TIME : 3.6 s

Using Cython
- the coding in a .pyx file
  
  def f1(): cpdef double i a = range(0, 10000000)
```
result = []
append = result.append

for i in a:
    append( i**2 )

return result
```

and I compiled it and ran it in .py file.

in .py file

import timeit
from c1 import *

st1 = timeit.default_timer()

f1()

st2 = timeit.default_timer()
print("RUN TIME : {0}".format(st2-st1))

RUN TIME : 1.6 s

Using Numba - jit

import timeit
from numba import jit

st1 = timeit.default_timer()

@jit(nopython=True, cache=True)
def f1():

    a = range(0, 10000000)

    result = []
    append = result.append

    for i in a:
        append( i**2 )

    return result

f1()

st2 = timeit.default_timer()
print("RUN TIME : {0}".format(st2-st1))

RUN TIME : 0.57 s

CONCLUSION :

As you mentioned above, changing the simple append form boosted up the speed a bit. And using Cython is much faster than in Python. However, turned out using Numba is the best choice in terms of speed improvement for 'for+append' !

Qiao Zhang · Accepted Answer · 2018-10-06 12:03:44Z

2

Try using deque in the collections package to append large rows of data, without performance diminishing. And then convert a deque back to a DataFrame using List Comprehension.

Sample Case:

from collections import deque

d = deque()
for row in rows:
 d.append([value_x, value_y])

df = pd.DataFrame({'column_x':[item[0] for item in d],'column_y':[item[1] for item in d]})

This is a real time-saver.

answered Oct 6, 2018 at 12:03

Qiao Zhang

5894 silver badges9 bronze badges

Comments

Alan Rogers · Accepted Answer · 2012-01-20 20:37:15Z

1

If performance were an issue, I would avoid using append. Instead, preallocate an array and then fill it up. I would also avoid using index to find position within the list "circles". Here's a rewrite. It's not compact, but I'll bet it's fast because of the unrolled loop.

def makeList():
    """gc.disable()"""
    mylist = 6*len(circles)*[None]
    for i in range(len(circles)):
        j = 6*i
        mylist[j] = circles[i].m[0]-circles[i].r
        mylist[j+1] = 0
        mylist[j+2] = i
        mylist[j+3] = circles[i].m[0] + circles[i].r
        mylist[j+4] = 1
        mylist[j+5] = i
    return mylist

answered Jan 20, 2012 at 20:37

Alan Rogers

111 bronze badge

Collectives™ on Stack Overflow

Python append performance

5 Answers 5

6 Comments

1 Comment

CODE CONTENT : getting numbers from 0 ~ 9999999, square them, and put them into a new list using append.

CONCLUSION :

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

6 Comments

1 Comment

CODE CONTENT : getting numbers from 0 ~ 9999999, square them, and put them into a new list using append.

CONCLUSION :

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related