Mutable, random-access array/vector with high performance in haskell

Question

This a topic on Haskell discussed a lot (e.g. mutable-array-implementation), but I am still not sure what is the best practice for the case requiring frequent modification and random-access of an array/vector.

Say a vector of length 1,000,000. Operation on it involves accessing a (small, e.g 1000) subset of it based on input, and modifying the values based on the input. Furthermore, such operation are repeated 2,000,000 times. The task itself can be implemented in pure data structure such as list as the following for example, though very inefficient:

type Vect = [Int]

f :: Vect -> [[Int]] -> Vect
f x indsList = foldl g x indsList

-- g is just an example of random-access and modifications on the values.
g :: Vect -> [Int] -> Vect
g x inds = map h $ zip x [0..]
    where h (x, i) = if i `elem` inds then x !! i + 1 else x !! i

Hash/Map data structure (e.g. IntMap) could be used for efficient large amounts of random-accesses, but array/vector should do it too. More importantly, the large amount of modifications is still need to be addressed by a mutable structure to avoid memory replication. Is there a mutable, random-acesss array/vector in Haskell indeed? If ST/IO Monads used, does such controls affect performance in my settings?

Are you sure you can't do this more elegantly (and perhaps even more efficiently!) when approaching the entire problem from a more functional angle? Why do you need to do these step-by-step modifications, can't you perhaps obtain the desired result "directly", e.g. with some lazy redirections... — leftaroundabout
– leftaroundabout, Commented Jul 26, 2013 at 22:39
[] is not a type. map h $ zip x [0..] – I think you mean zipWith h x [0..], otherwise you need to uncurry h. — leftaroundabout
– leftaroundabout, Commented Jul 26, 2013 at 23:11
@leftaroundabout, Updated the code, though the implementation of g serves as an example only. Approaching it from a higher functional perspective would be nice, but difficult in the case. I thought about caching each modification and re-compact the modifications once in a while. However, for millions of modifications, such mimicking of a mutable structure with non-mutable structure is certainly far less efficient than a real mutable one. — Causality
– Causality, Commented Jul 26, 2013 at 23:25

leftaroundabout · Accepted Answer · 2013-07-27 00:58:47Z

Haskell does have efficient mutable arrays.

There is STUArray, which has the quite sophisticated but often just unnecessary Ix-indexing methodology with many bounds-checks and little special optimisation, which makes it a bit slower than theoretically possible.

All Data.Vector have very little overhead, make heavy use of stream fusion optimisation, prefer a simple, "list-like" interface. That means you can actually transfer your example very easily directly to immutable vectors, and may still get better performance than you might expect:

import Data.Vector.Unboxed as VU

type Vect = VU.Vector Int

f :: Vect -> [[Int]] -> Vect
f x indsList = VU.foldl g x indsList


g :: Vect -> [Int] -> Vect
g x inds = VU.zipWith h x [0..]
    -- h is just an example of modifications on the values.
    where h x i
           | i`elem`inds   = x VU.! i + 1
           | otherwise     = x VU.! i

Yes, you probably want to work in the ST monad for mutable updates. Not sure what you mean by "does such controls affect performance": there isn't really any "control" that's not also present in imperative languages, once the compiler has optimised away the proven type-safety. Which GHC can do quite well; you can get pretty close to C performance with Data.Vector.Unboxed. There is always some inevitable overhead, but that has mostly to do with garbage-collection etc. issues, which you would also get in Java.

Since ST and IO are monads, the compiler can in fact do some more high-level optimisations that wouldn't be possible in an imperative language, though compiler's aren't really that far yet.

Performance, particularly of array operations, is discussed in many places, for instance in RWH.

leventov · Accepted Answer · 2013-07-27 00:21:24Z

1

Foreign UArrays from Yarr are mutable, random-access and maximum fast. Also they are "quick and dirty", i. e. don't impose freezing/thawing boilerplate for every mutation operation.

Disadvantage: almost all "low-level" operations are under IO.

answered Jul 27, 2013 at 0:21

leventov

15.5k11 gold badges74 silver badges104 bronze badges

Collectives™ on Stack Overflow

Mutable, random-access array/vector with high performance in haskell

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related