python iterating list of lists performance differences

Question

When iterating a list of lists in python 2.7.3 I noticed performance differences when changing the order of the iteration:

I have a list of 200 lists of 500000 strings. I then iterate in the following ways:

numberOfRows = len(columns[0])
numberOfColumns = len(columns)

t1 = time.clock()
for i in xrange(numberOfRows):
    for j in xrange(numberOfColumns):
        cell = columns[j][i]
print time.clock() - t1

t1 = time.clock()
for i in xrange(numberOfColumns):
    for j in xrange(numberOfRows):
        cell = columns[i][j]
print time.clock() - t1

The program repeatedly produces outputs similar to this:

33.97
29.39

Now I expected to have efficient random access on the lists. Where do these 4 seconds come from; is it only caching?

Also how many columns and rows, respectively, do you have? That's probably the reason, since it has nothing to do with efficient random access. — jamylak
– jamylak, Commented Apr 17, 2013 at 8:03
Have you tried replacing the cell = ... lines with pass, just to see how much time the creation of xrange objects takes? — Reinstate Monica
– Reinstate Monica, Commented Apr 17, 2013 at 8:13
You're not testing random access, you're iterating either columns then rows or rows then columns. — MattH
– MattH, Commented Apr 17, 2013 at 8:28
I have 200 columns and 500000 rows. If I just pass I get numbers similar to: 7.67 8.15 I know that I am iterating through columns and rows, but as lists provide efficient random access I expected both iterations to take the same amount of time. — jzwiener
– jzwiener, Commented Apr 17, 2013 at 8:29

Reinstate Monica · Accepted Answer · 2013-04-17 08:32:23Z

1

I get something like

30.509407822896037
29.88344778700383

for

columns = [[0] * 500000 for x in range(200)]

If I replace the cell = ... lines with pass, I get

8.44722739915369
10.23647023463866

So it's definitely not an issue with creating the xrange objects or something alike.

It's the caching (not by Python, by the computer) of the columns: If I use

columns = [[0] * 500000] * 200

I get

27.725353873145195
29.592749434295797

Here, always the same column object is used, and there is (almost) no difference in caching. Thus (about) the same timing difference as in the pass variant shows.

answered Apr 17, 2013 at 8:32

Reinstate Monica

4,7531 gold badge27 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

MattH Over a year ago

Please elaborate on It's the caching (not by Python, by the computer).

Reinstate Monica Over a year ago

@MattH: I am not aware of Python caching objects. The processors cache will work better if the elements of one column are used one after the other (they are stored next to each other) than if the 200 columns (that are located far away in memory) are used all at once. (At least that's what I think.)

Collectives™ on Stack Overflow

python iterating list of lists performance differences

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related