I've recently started to use Scala to solve some programming challenges at Codeforces in order to exercise functional programming skills. Doing so I came across one particular challenge I wasn't able to solve in a way respecting the given execution time limit of 1000ms; the Painting Fence problem.
I tried various different ways, starting with a straight forward recursive solution, trying a similar approach using streams instead of lists, and eventually trying to reduce list manipulations by working a bit more with indices. I ended up having stack overflow exceptions on larger tests that I was able to fix using Scala's TailCall. But still, while the solution correctly solves the problem, it is too slow to complete within 1000ms. In addition to that, there's a C++ implementation shown that is ridiculously fast in comparison (< 50ms). Now I do understand that Scala will be slower compared to C++ in many cases, and I also understand that I could write a more imperative-style solution in Scala that would probably perform a lot better. Still, I'm wondering if I am missing something more fundamental here, because I have a hard time believing that functional programming is that much slower in general (and I am pretty new to functional programming).
Here is my scala code that you can paste in the REPL including the example that takes >1000ms:
import scala.util.control.TailCalls._
def solve(l: List[(Int, Int)]): Int = {
def go(from: Int, to: Int, prevHeight: Int): TailRec[Int] = {
val max = to - from
val currHeight = l.slice(from, to).minBy(_._1)._1
val hStrokes = currHeight - prevHeight
val splits = l.slice(from, to).filter(_._1 - currHeight == 0).map(_._2)
val indices = from :: splits.flatMap(x => List(x, x+1)) ::: List(to)
val subLists = indices.grouped(2).filter(xs => xs.last - xs.head > 0)
val trampolines = subLists.map(xs => tailcall(go(xs.head, xs.last, currHeight)))
val sumTrampolines = trampolines.foldLeft(done(hStrokes))((b, a) => b.flatMap(bVal =>
a.map(aVal => aVal + bVal)))
sumTrampolines.flatMap(v => done(max).map(m => Math.min(m, v)))
}
go(0, l.size, 0).result
}
val lst = (1 to 5000).toList.zipWithIndex
val res = solve(lst)
And for comparison, here's a C++ example achieving the same thing written by Bugman (includes some read/write from console that I didn't include in the Scala version above):
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <algorithm>
#include <vector>
#include <string>
#include <set>
#include <map>
#include <cmath>
#include <memory.h>
using namespace std;
typedef long long ll;
const int N = 1e6+6;
const int T = 1e6+6;
int a[N];
int t[T], d;
int rmq(int i, int j){
int r = i;
for(i+=d,j+=d; i<=j; ++i>>=1,--j>>=1){
if(i&1) r=a[r]>a[t[i]]?t[i]:r;
if(~j&1) r=a[r]>a[t[j]]?t[j]:r;
}
return r;
}
int calc(int l, int r, int h){
if(l>r) return 0;
int m = rmq(l,r);
int mn = a[m];
int res = min(r-l+1, calc(l,m-1,mn)+calc(m+1,r,mn)+mn-h);
return res;
}
int main(){
//freopen("input.txt","r",stdin);// freopen("output.txt","w",stdout);
int n, m;
scanf("%d",&n);
for(int i=0;i<n;++i) scanf("%d",&a[i]);
a[n] = 2e9;
for(d=1;d<n;d<<=1);
for(int i=0;i<n;++i) t[i+d]=i;
for(int i=n+d;i<d+d;++i) t[i]=n;
for(int i=d-1;i;--i) t[i]=a[t[i*2]]<a[t[i*2+1]]?t[i*2]:t[i*2+1];
printf("%d\n",calc(0,n-1,0));
return 0;
}
At least before I introduced the explicit tail calls, the more functional style seemed more natural to me to solve the problem than a more imperative solution. So I'd be really happy to learn more on what I should be attentive of when writing functional code in order to still get acceptable performance.
val indicescalculation -- once I optimized this line alone, total time dropped from1.360sto0.672son my machine. I do believe if you will use collections in a wiser way, you will get way more better results.