Functional Programming Performance

Question

I've recently started to use Scala to solve some programming challenges at Codeforces in order to exercise functional programming skills. Doing so I came across one particular challenge I wasn't able to solve in a way respecting the given execution time limit of 1000ms; the Painting Fence problem.

I tried various different ways, starting with a straight forward recursive solution, trying a similar approach using streams instead of lists, and eventually trying to reduce list manipulations by working a bit more with indices. I ended up having stack overflow exceptions on larger tests that I was able to fix using Scala's TailCall. But still, while the solution correctly solves the problem, it is too slow to complete within 1000ms. In addition to that, there's a C++ implementation shown that is ridiculously fast in comparison (< 50ms). Now I do understand that Scala will be slower compared to C++ in many cases, and I also understand that I could write a more imperative-style solution in Scala that would probably perform a lot better. Still, I'm wondering if I am missing something more fundamental here, because I have a hard time believing that functional programming is that much slower in general (and I am pretty new to functional programming).

Here is my scala code that you can paste in the REPL including the example that takes >1000ms:

import scala.util.control.TailCalls._

def solve(l: List[(Int, Int)]): Int = {

  def go(from: Int, to: Int, prevHeight: Int): TailRec[Int] = {
    val max = to - from
    val currHeight = l.slice(from, to).minBy(_._1)._1
    val hStrokes = currHeight - prevHeight
    val splits = l.slice(from, to).filter(_._1 - currHeight == 0).map(_._2)
    val indices = from :: splits.flatMap(x => List(x, x+1)) ::: List(to)
    val subLists = indices.grouped(2).filter(xs => xs.last - xs.head > 0)

    val trampolines = subLists.map(xs => tailcall(go(xs.head, xs.last, currHeight)))
    val sumTrampolines = trampolines.foldLeft(done(hStrokes))((b, a) => b.flatMap(bVal =>
      a.map(aVal => aVal + bVal)))
    sumTrampolines.flatMap(v => done(max).map(m => Math.min(m, v)))
  }
  go(0, l.size, 0).result
}

val lst = (1 to 5000).toList.zipWithIndex
val res = solve(lst)

And for comparison, here's a C++ example achieving the same thing written by Bugman (includes some read/write from console that I didn't include in the Scala version above):

#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <algorithm>
#include <vector>
#include <string>
#include <set>
#include <map>
#include <cmath>
#include <memory.h>
using namespace std;
typedef long long ll;

const int N = 1e6+6;
const int T = 1e6+6;

int a[N];
int t[T], d;

int rmq(int i, int j){
    int r = i;
    for(i+=d,j+=d; i<=j; ++i>>=1,--j>>=1){
        if(i&1) r=a[r]>a[t[i]]?t[i]:r;
        if(~j&1) r=a[r]>a[t[j]]?t[j]:r;
    }
    return r;
}

int calc(int l, int r, int h){
    if(l>r) return 0;

    int m = rmq(l,r);
    int mn = a[m];
    int res = min(r-l+1, calc(l,m-1,mn)+calc(m+1,r,mn)+mn-h);
    return res;
}

int main(){
    //freopen("input.txt","r",stdin);// freopen("output.txt","w",stdout);

    int n, m;

    scanf("%d",&n);
    for(int i=0;i<n;++i) scanf("%d",&a[i]);

    a[n] = 2e9;
    for(d=1;d<n;d<<=1);
    for(int i=0;i<n;++i) t[i+d]=i;
    for(int i=n+d;i<d+d;++i) t[i]=n;
    for(int i=d-1;i;--i) t[i]=a[t[i*2]]<a[t[i*2+1]]?t[i*2]:t[i*2+1];

    printf("%d\n",calc(0,n-1,0));

    return 0;
}

At least before I introduced the explicit tail calls, the more functional style seemed more natural to me to solve the problem than a more imperative solution. So I'd be really happy to learn more on what I should be attentive of when writing functional code in order to still get acceptable performance.

One of the pain points is val indices calculation -- once I optimized this line alone, total time dropped from 1.360s to 0.672s on my machine. I do believe if you will use collections in a wiser way, you will get way more better results. — om-nom-nom
– om-nom-nom, Commented Jul 22, 2014 at 20:33

Travis Brown · Accepted Answer · 2014-07-24 00:26:58Z

4

Relying so heavily on indices is arguably not really idiomatic functional style, and combining indexing and lists is a recipe for less-than-ideal performance.

Here's an index-free implementation:

import scala.util.control.TailCalls._

def solve(xs: Vector[Int]): Int = {
  def go(xs: Vector[Int], previous: Int): TailRec[Int] = {
    val min = xs.min

    splitOn(xs, min).foldLeft(done(min - previous)) {
      case (acc, part) => for {
        total <- acc
        cost  <- go(part, min)
      } yield total + cost
    }.map(math.min(xs.size, _))
  }

  go(xs, 0).result
}

That's not quite the whole story, though—I've factored out the splitting part into a method named splitOn that takes a sequence and a delimiter. Because this is a very simple and generic operation, it's a good candidate for optimization. The following is a quick attempt:

def splitOn[A](xs: Vector[A], delim: A): Vector[Vector[A]] = {
  val builder = Vector.newBuilder[Vector[A]]
  var i = 0
  var start = 0

  while (i < xs.size) {
    if (xs(i) == delim) {
      if (i != start) {
        builder += xs.slice(start, i)
      }
      start = i + 1
    }
    i += 1
  }

  if (i != start) builder += xs.slice(start, i)

  builder.result
}

While this implementation is imperative, from the outside the method is perfectly functional—it doesn't have side effects, etc.

This is often a good way to improve the performance of functional code: we've separated our program into a generic piece (splitting a list on a delimiter) and our problem-specific logic. Because the former is so simple, we can ask treat it (and test it) as a black box, while keeping the code we're using to solve the problem clean and functional.

In this case the performance still isn't great—this implementation is about twice as fast as yours on my machine—but I don't think you're going to get much better than that while trampolining with TailCalls.

answered Jul 24, 2014 at 0:26

Travis Brown

139k12 gold badges384 silver badges689 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Marc Over a year ago

Thanks a lot @TravisBrown for your answer, cleared up many things. However, it makes me wonder if functional programming is a good approach to compete in coding challenges. While I do see benefits of functional programming regarding compose-ability of solutions, it seems to me that the often heard claim that algorithms can be expressed more naturally is only half-true if the simple natural expression will not usually perform well enough. I think the final solution we arrived at, while really nice, is not necessarily easier to understand for less experienced FP programmers than the C solution.

Marc Over a year ago

Just as a reference I found that stackoverflow.com/questions/13942106/… discusses a similar topic.

Travis Brown Over a year ago

@Marc: a big part of the problem is the fact that we have to trampoline the recursion here because of the limitations of the Scala language and compiler—that's a huge performance hit, and one you probably won't run into all that often.

Dmitry Leskov Over a year ago

@Marc: In a coding challenge, timing requirements for solutions in different languages would normally differ. E.g. HackerRank currently allows 2 sec for C/C++ solutions, 4 for Java, 7 for Scala, 10 for Ruby, and so on: hackerrank.com/environment But the time you are given to produce that solutions is the same. I'd say that if you want to win, stick to the language/paradigm you are most productive with (these may vary between problems), but if you want to learn, stick to what you are learning, possibly passing on problems that you don't know how to solve effectively (yet).

Collectives™ on Stack Overflow

Functional Programming Performance

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related