C# performance - linear array access vs. random access

Question

Can someone please help me understand why is accessing arrays using a linear increment for index approximately 3-4 times faster than using random index?

Is there any way of making the random index access times faster?

Please consider the following test code, it return ~3 seconds for linear, ~9-10s for random:

    public static void test()
    {
        var arr = new byte[64 * 1024 * 1024];
        byte b = 0;

        var sw = new Stopwatch();

        double timeSum = 0;

        for (var i = 0; i < arr.Length; i++)
        {

            sw.Restart();
            b = arr[i];
            sw.Stop();
            timeSum += sw.Elapsed.TotalMilliseconds;
        }


        Console.WriteLine("Linear access : " + timeSum + " ms");


        timeSum = 0;

        var rng = new Random();
        var rnum = 0;
        for (var i = 0; i < arr.Length; i++)
        {
            rnum = rng.Next(0, arr.Length - 1);
            sw.Restart();
            b = arr[rnum];
            sw.Stop();
            timeSum += sw.Elapsed.TotalMilliseconds;
        }

        sw.Stop();

        Console.WriteLine("Random access : " + timeSum + " ms");

    }

This is the way the CPU caches work. It loads data by contiguous blocks, so if you access the array from beginning to end, you have a high chance that the data is already in cache — Kevin Gosse
– Kevin Gosse, Commented Nov 25, 2018 at 17:31
@JKurcik Fixing the array no. Accessing the value through pointers yes — Kevin Gosse
– Kevin Gosse, Commented Nov 25, 2018 at 19:44

InBetween · Accepted Answer · 2018-11-25 19:27:17Z

3

The differences you are seeing in your benchmark (4 to 5 times) can't be explained only by cache lines and sequential access to arrays. Its true that sequential predictable access will be faster but, unless you are managing huge arrays, I'd be surprised if performance gains were even close to those figures.

EDIT With the size of arrays in your benchmark (64x 1024x1024) the difference is staggering, way more than I expected, to be honest. So my first impression was dead wrong!

The problem is your benchmark. You are micro measuring; there is no way you can measure with any sort of confidence individual look ups with System.Diagnostics.Stopwatch.

Trying to come up with a fair benchmark is surprisingly tricky because there is no easy way to isolate the randomness generation from the look up. I haven't given it much thought, but the following at least tries to compare apples with apples: the trick is to pregenerate random and sequential arrays and then benchmark double look ups:

static void Main(string[] args)
{
    lookUpArray(1, new[] { 0 }, new[] {0}); //warmup JITTER

    var r = new Random();
    const int arraySize = 10000;
    const int repetitions = 10000;

    var lookupArray = new int[arraySize]; //values dont matter
    var sequentialArray = Enumerable.Range(0, arraySize).ToArray();
    var randomArray = sequentialArray.Select(i => r.Next(0, arraySize)).ToArray();

    for (var i = 0; i < 10; i++)
    {
        var sw = Stopwatch.StartNew();
        lookUpArray(repetitions, lookupArray, randomArray);
        sw.Stop();
        Console.WriteLine($"Random: {sw.ElapsedMilliseconds} ms");

        sw.Reset();
        sw.Start();
        lookUpArray(repetitions, lookupArray, sequentialArray);
        sw.Stop();
        Console.WriteLine($"Sequential: {sw.ElapsedMilliseconds} ms");
    }
}

private static void lookUpArray(int repetitions, int[] lookupArray, int[] indexArray)
{
    for (var r = 0; r < repetitions; r++)
    {
        for (var i = 0; i < indexArray.Length; i++)
        {
            var _ = lookupArray[indexArray[i]];
        }
    }
}

I am no benchmarking expert by any means, so this is probably awful in many ways, but I think its a fairer comparison.

edited Nov 25, 2018 at 19:27

answered Nov 25, 2018 at 18:40

InBetween

32.9k3 gold badges52 silver badges98 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

InBetween Over a year ago

@JKurcik Yup, I was the first one not to expect this difference. Your benchmark is still giving wrong results, but its biased towards the opposite direction I anticipated.

JKurcik Over a year ago

sorry for deleting my previous comment, I posted it before your edit. I believe your benchmark is much better as it really makes the difference shine. I found that fixing the lookup array makes the difference much less pronounced, but i don't want to go unsafe in my application. And yes, truth is the problem appeared only after the amount of data being processed ramped up recently.

Collectives™ on Stack Overflow

C# performance - linear array access vs. random access

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related