Haskell: list/vector/array performance tuning

Question

I am trying out Haskell to compute partition functions of models in statistical physics. This involves traversing quite large lists of configurations and summing various observables - which I would like to do as efficiently as possible.

The current version of my code is here: https://gist.github.com/2420539

Some strange things happen when trying to choose between lists and vectors to enumerate the configurations; in particular, to truncate the list, using V.toList . V.take (3^n) . V.fromList (where V is Data.Vector) is faster than just using take, which feels a bit counter-intuitive. In both cases the list is evaluated lazily.

The list itself is built using iterate; if instead I use Vectors as much as possible and build the list by using V.iterateN, again it becomes slower ...

My question is, is there a way (other than splicing V.toList and V.fromList at random places in the code) to predict which one will be the quickest? (BTW, I compile everything using ghc -O2 with the current stable version.)

It does ! About 10% faster overall using it ... Optimizing like this is kind of fun :-) — Vincent Beffara
– Vincent Beffara, Commented Apr 19, 2012 at 12:36
BTW - I did a benchmark implementation in C++, using the same algorithm in an imperative way using std::vector. On my computer for n=15, the Haskell version finishes in 4.6 seconds, and the C++ one in about 1.8 seconds. I would say that this is quite satisfactory :-) — Vincent Beffara
– Vincent Beffara, Commented Apr 19, 2012 at 21:39

Don Stewart · Accepted Answer · 2012-04-19 12:24:45Z

12

Vectors are strict, and have O(1) subsets (e.g. take). They also have an optimized insert and delete. So you will sometimes see performance improvements by switching data structures on the fly. However, it is usually the wrong approach -- keeping all data in either one form or the other is better. (And you're using UArrays as well -- further confusing the issue).

General rules:

If the data is large and being transformed only in bulk fashion, using a dense, efficient structures like vectors make sense.
If the data is small, and traversed linearly, rarely, then lists make sense.

Remember that operations on lists and vectors have different complexity, so while iterate . replicate on lists is O(n), but lazy, the same on vectors will not necessarily be as efficient (you should prefer the built in methods in vector to generate arrays).

Generally, vectors should always be better for numerical operations. It might be that you have to use different functions that you do in lists.

I would stick to vectors only. Avoid UArrays, and avoid lists except as generators.

answered Apr 19, 2012 at 12:24

Don Stewart

138k36 gold badges372 silver badges472 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Vincent Beffara Over a year ago

Thanks for the reply. Indeed it looks wrong to mix (which is why I am asking the question), but all the "uniform" ways I tried ended up being slower than the strange mix I have now, sometimes by a factor of 3 or 4. Maybe I missed one ... I will try other things!

Vincent Beffara Over a year ago

About avoiding UArrays: I tried replacing accumArray with V.accum or V.accumulate which seem to be the equivalent, and they are a bit slower, which is why I stayed with the array option.

Collectives™ on Stack Overflow

Haskell: list/vector/array performance tuning

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related