6

I've been dabbling in Haskell - so still very much a beginner.

I'm been thinking about the counting the frequency of items in a list. In languages with mutable data structures, this is typically solved using a hash table - a dict in Python or a HashMap in Java for example. The complexity of such a solution is O(n) - assuming the hash table can fit entirely in memory.

In Haskell, there seem to be two (mainstream) choices - to sort the data then group and count it or use a Data.Map. If a sort is used, it dominates the run-time of the solution, so the complexity is O(n log n). Likewise, Data.Map uses a balanced tree, so inserting n elements into it will also have complexity O(n log n).

If my analysis is correct, then I assume that this particular problem is most efficiently solved by resorting to a mutable data structure. Are there other types of problems where this is also true? How in general do people using Haskell approach something like this?

17
  • 2
    Your benchmark for efficiency is Python? Commented Feb 27, 2014 at 17:26
  • 8
    For all n, log n < 30. Commented Feb 27, 2014 at 17:32
  • 6
    See Can I always convert mutable-only algorithms to single-assignment and still be efficient? for a guy who was convinced immutability would be a problem, felt he'd found an example that would necessitate exponential time, but I wrote a linear Haskell solution. Commented Feb 27, 2014 at 17:37
  • 6
    He means that unless you have more than 2^30 pieces of data (a billion) a factor of log n is essentially a constant. Commented Feb 27, 2014 at 17:59
  • 7
    (It's a humourous way of pointing out that constant factors can outweigh logarithmic factors for real problems.) Commented Feb 27, 2014 at 18:05

1 Answer 1

4

The question whether we can implement any algorithm with optimal complexity in a pure language is currently unknown. Nicholas Pippenger has proven that there is a problem that must necessarily have a log(n) penalty in a pure strict language compared to the optimal algorithm. However, there is a followup paper which shows that this problem have an optimal solution in a lazy language. So at the end of the day we really don't know. Though it seems that most people think that there is an inherent log(n) penalty for some problems, even for lazy languages.

Sign up to request clarification or add additional context in comments.

5 Comments

I seem to recall that someone proved it's possible to implement any mutable data structure using laziness and immutability with at most a constant factor penalty. You wouldn't happen to be familiar with this result? (hope I'm not making it up)
I've also heard a similar argument but I don't have any reference and I don't think I've ever seen a paper claiming such result. But intuitively it seems that any imperative algorithm mutating random access memory could be simulated with a state monad using a binary tree to model the memory. That ought to give the log(n) penalty.
Hmm. I suppose that's close enough for many purposes, thanks.
What I took from the comments on my post is that even if there is a log n penalty to be paid sometimes, that I probably shouldn't spend too much time worrying about it.
Indeed. For most practical applications Haskell should be fast enough for you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.