Is it better to store a 2D image in a flat array or in an array of arrays?

Question

I have two options.

receiving and processing pixel data in a flat array.

If I choose it, I have to use single pointer. I have to use this code in every repeated pixel data process.

int getIndex(int row, int col){
    return row*(Image_Width) + col;
}

for (int i ...){
    for (int j ...){
        R[getIndex(i,j)]
        G[getIndex(i,j)]
        B[getIndex(i,j)]
    }
}

receiving and processing pixel data in a double array.

If I choose it, I have to use a pointer-to-pointer. I don't need to use getIndex().

I want to know the pros and cons of the two options and which one is efficient.

This is an example of using a single array and a double array to perform constant multiplication for each pixel data.

B = (unsigned char *)malloc(ImgSize);
G = (unsigned char *)malloc(ImgSize);
R = (unsigned char *)malloc(ImgSize);
for (int i = 0; i < ImgSize; i++){
    B[i] = fgetc(f);
    G[i] = fgetc(f);
    R[i] = fgetc(f);
}
for (int i = 0; i < Height; i++){
    for(int j = 0; j < Width; j++){
        B[getIndex(i,j)] = int(B[getIndex(i,j)]*num);
        G[getIndex(i,j)] = int(G[getIndex(i,j)]*num);
        R[getIndex(i,j)] = int(R[getIndex(i,j)]*num);
    }
}

dB = (unsigned char **)malloc(hInfo.biHeight * sizeof(unsigned char *));
dG = (unsigned char **)malloc(hInfo.biHeight * sizeof(unsigned char *));
dR = (unsigned char **)malloc(hInfo.biHeight * sizeof(unsigned char *));
for (int i = 0 ; i < hInfo.biHeight; i++){
    *(dB+i) = (unsigned char *)malloc(hInfo.biWidth * sizeof(unsigned char));
    *(dG+i) = (unsigned char *)malloc(hInfo.biWidth * sizeof(unsigned char));
    *(dR+i) = (unsigned char *)malloc(hInfo.biWidth * sizeof(unsigned char));
}
for (int i = 0; i < hInfo.biHeight; i++){
    for (int j = 0; j < hInfo.biWidth; j++){
        dB[i][j] = fgetc(f);
        dG[i][j] = fgetc(f);
        dR[i][j] = fgetc(f);
    }
}
for (int i = 0; i < Height; i++){
    for(int j = 0; j < Width; j++){
        dB[i][j] = int(dB[i][j]*num);
        dG[i][j] = int(dG[i][j]*num);
        dR[i][j] = int(dR[i][j]*num);
    }
}

If you look at the tag description that pops up when you hover your mouse over the double-pointer tag, you'll read "The term "double pointer" is sometimes confusingly used to refer a data type which can point to another pointer. This name is confusing because it may mean a pointer to a double floating-point object. Please use pointer-to-pointer instead." This is exactly the reason I was confused at first about your question, "single" and "double" usually refer to 4 and 8-bit floating-point format. — Cris Luengo
– Cris Luengo, Commented Apr 7, 2023 at 14:10

Cris Luengo · Accepted Answer · 2023-04-07 14:48:29Z

I'm sure many answers on this site and over on Code Review comment on this issue, but I haven't been able to find a question explicitly asking this, so I'll try to summarize pros and cons here.

In short:

Array of arrays. There are no pros here, it's all cons. At first it looks like the syntax img[i][j] is better than img[i * width + j], but it is only superficially so.
Flat array. I cannot recommend you use anything else. This is the way to go.

Differences:

Memory management. Not only do you need a loop and height number of calls to malloc in the array of arrays, but you also need the same for freeing the memory. Memory management is much more complicated and expensive than in the flat array case.
Indexing cost. img[i][j] requires two pointer lookups. First it goes into memory to find the pointer to row i, then it goes into memory to find the datum at column j within that row. In contrast, img[i * width + j] does only one pointer lookup. Memory access in modern hardware is much more expensive than the trivial index computation.
Data locality. When filtering, you often need to look up a neighboring pixels. In the array of array style, each row of pixels is in a separate memory block, the neighbor at i+1 could be in a totally different part of the address space. This is really hard on the cache.
Indexing simplicity. It is not often that you need to access the pixel at a given set of coordinates. For most image processing, you either iterate over all pixels in the image, or you iterate over the neighborhood of a pixel. With the array of array style, you always need to use indices. To find a neighbor, you need to compute the neighbor's indices, and dereference two pointers. In the flat array style, you only need to do a pointer addition and one dereference: if you're at index i, the neighbors are at i-width, i+width, i-1, i+1, etc. Most of the time you're not actually using the coordinates themselves (e.g. checking for out-of-bounds reads when accessing neighbors can often be done without knowing the coordinates of a pixel, by maintaining a matching array that marks border pixels).

This is an important point, so I'm going to illustrate it from a different perspective. With the "prettier" img[i][j] syntax, you need to deal with two indices; if you want to visit all 8 neighbors of a pixel, you need to write complex code to manage these two indices. With the "ugly" img[i * width + j] syntax you're only dealing with a single index; if you want to visit all 8 neighbors, you just keep a list of index offsets for the 8 neighbors, and loop over the list. What looked like an advantage for the array of arrays style, actually becomes a disadvantage when you want to do more than just access one single, isolated pixel.
Color images. For the RGB case shown in the OP, it is not necessary to have three separate arrays, one can use a single array, and compute the index as follows: img[(i * width + j) * 3 + c]. This simplifies all the code shown, since there is now just one statement where previously there were three nearly identical ones:

img = (unsigned char *)malloc(ImgSize * 3);
for (int i = 0; i < ImgSize * 3; i++) {
    img[i] = fgetc(f);
}
for (int i = 0; i < Height; i++) {
    for (int j = 0; j < Width; j++) {
        for (int c = 0; j < 3; j++) {
            img[getIndex(i,j) * 3 + c] = int(B[getIndex(i,j) * 3 + c] * num);
        }
    }
}

The statement

img[getIndex(i,j) * 3 + c] = int(B[getIndex(i,j) * 3 + c] * num);

can of course be simplified to

img[getIndex(i,j) * 3 + c] *= num;

but most importantly, we don't need so many loops. The data is contiguous, we can use a single loop (and this is the case for maybe 95% of things we want to do with the pixel data):

for (int i = 0; i < ImgSize * 3; i++) {
   img[i] *= num;
}

This is easier to read, easier to maintain, and it's a bit faster to boot.

Collectives™ on Stack Overflow

Is it better to store a 2D image in a flat array or in an array of arrays?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related