7

I am trying out Haskell to compute partition functions of models in statistical physics. This involves traversing quite large lists of configurations and summing various observables - which I would like to do as efficiently as possible.

The current version of my code is here: https://gist.github.com/2420539

Some strange things happen when trying to choose between lists and vectors to enumerate the configurations; in particular, to truncate the list, using V.toList . V.take (3^n) . V.fromList (where V is Data.Vector) is faster than just using take, which feels a bit counter-intuitive. In both cases the list is evaluated lazily.

The list itself is built using iterate; if instead I use Vectors as much as possible and build the list by using V.iterateN, again it becomes slower ...

My question is, is there a way (other than splicing V.toList and V.fromList at random places in the code) to predict which one will be the quickest? (BTW, I compile everything using ghc -O2 with the current stable version.)

3
  • BTW -funbox-strict-fields will help your Stats data type. Commented Apr 19, 2012 at 12:26
  • It does ! About 10% faster overall using it ... Optimizing like this is kind of fun :-) Commented Apr 19, 2012 at 12:36
  • BTW - I did a benchmark implementation in C++, using the same algorithm in an imperative way using std::vector. On my computer for n=15, the Haskell version finishes in 4.6 seconds, and the C++ one in about 1.8 seconds. I would say that this is quite satisfactory :-) Commented Apr 19, 2012 at 21:39

1 Answer 1

12

Vectors are strict, and have O(1) subsets (e.g. take). They also have an optimized insert and delete. So you will sometimes see performance improvements by switching data structures on the fly. However, it is usually the wrong approach -- keeping all data in either one form or the other is better. (And you're using UArrays as well -- further confusing the issue).

General rules:

  • If the data is large and being transformed only in bulk fashion, using a dense, efficient structures like vectors make sense.

  • If the data is small, and traversed linearly, rarely, then lists make sense.

Remember that operations on lists and vectors have different complexity, so while iterate . replicate on lists is O(n), but lazy, the same on vectors will not necessarily be as efficient (you should prefer the built in methods in vector to generate arrays).

Generally, vectors should always be better for numerical operations. It might be that you have to use different functions that you do in lists.

I would stick to vectors only. Avoid UArrays, and avoid lists except as generators.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the reply. Indeed it looks wrong to mix (which is why I am asking the question), but all the "uniform" ways I tried ended up being slower than the strange mix I have now, sometimes by a factor of 3 or 4. Maybe I missed one ... I will try other things!
About avoiding UArrays: I tried replacing accumArray with V.accum or V.accumulate which seem to be the equivalent, and they are a bit slower, which is why I stayed with the array option.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.