3

I've got a question regarding the sizeof() operator. Browsing Open Source projects written in C obviously, some developers tend to use the sizeof() operator in one place and on other places the literal array size.

For example, there is code written like this:

...
char   buf[2048];

memset(buf, 0, sizeof(buf));

... = snprintf(buf, 2048, ...);

...

The developer used sizeof(buf) for memset() but referenced the literal buffer size 2048 directly to snprintf() instead of sticking to sizeof(buf). Is this only a bad habit/inconsistency in writing code or was/is the use of sizeof() inappropriate for functions such as snprintf()?

I see this often and really wondered, why not always use sizeof() in such cases? Wouldn't this ease the maintenance of code? Imagine the snprintf() call was somewhere placed much more lower within the function body, a developer changes the size of the buffer on top of the function to e.g. 1024, memset() would catch the change but snprintf() could cause a overflow.

2
  • "Why not always use sizeof() in such cases?" -- it's impossible to answer for "such cases" in general, because context matters. It's unclear to what extent the pattern you have presented is representative of any particular code or codebase, or whether some variation that you would lump into that category indeed does have a reason for being written as it is. Commented Feb 3 at 14:38
  • The operator is sizeof, not sizeof(). There's absolutely no need for the parens you've used here. Commented Feb 5 at 16:19

2 Answers 2

10

Hard coding the array size or length is indeed a risky habit as the code may become incorrect if the array length is changed to a lesser value and the code is not updated consistently.

Using sizeof(buf) or countof(buf) defined as

#define countof(a) (sizeof(a)/sizeof(*(a)))

is recommended, but may also be incorrect if buf is not defined as an array in the scope of the function call. If buf is a pointer, received as a function argument, sizeof(buf) will be the size of a pointer, not that of the array it points to. What makes it especially confusing is the case of function arguments specified using the array syntax:

// !!! This definition is equivalent to void initialize_array(char *buf)
void initialize_array(char buf[1024]) {
    // BUG! sizeof(buf) is sizeof(char *), much less than 1024
    memset(buf, 0, sizeof(buf));
}

The C Standard does not provide a simple way to evaluate an array length only for actual arrays, so I would recommend this approach:

    ...
    char buf[2048];
    const size_t buf_size = sizeof(buf); // always next to the array definition
    
    memset(buf, 0, buf_size);
    
    ... = snprintf(buf, buf_size, ...);
    
    ...

The compiler will eliminate the buf_size variable and generate optimal code. Should you receive buf as an argument, this idiom will ensure you also get the buffer size or length as an additional argument.

For types larger than char, passing the array length is preferable and naming that buf_length or buf_count is much safer:

    ...
    int buf[2048];
    const size_t buf_count = sizeof(buf) / sizeof(*buf);
    
    memset(buf, 0, buf_count * sizeof(*buf));
    
    for (size_t i = 0; i < buf_count; i++) {
        if (handle_case(i))
            bur[i] = 1;
    }
    ...
Sign up to request clarification or add additional context in comments.

2 Comments

Awesome reply, thanks alot! I appreciate your effort to answer my question. This made it pretty obvious, especially the initialize_array() example I've never thought of. Thanks alot! So the compiler will eliminate the const size_t buf_count = sizeof ... and no additional overhead is generated, perfect. I'll stick to your recommendation :)
I believe that some readers will not be aware that if a function uses array syntax as its parameter, then this array type will be adjusted (i.e. decay) to a pointer. Therefore, I believe it may be appropriate to explicitly state this in your answer, instead of only implying it.
6

Is this only a bad habit/inconsistency in writing code

Yes, the code is using bad practices such as "magic numbers". It inconsistently switches between using a hardcoded size and sizeof. That's sloppy.

The correct way to write readable/maintainable code would have been:

#define BUF_SIZE 2048  // [comment about why 2048 was picked goes here]

...
char buf[BUF_SIZE];

memset(buf, 0, BUF_SIZE );

... = snprintf(buf, BUF_SIZE, ...);

Here BUF_SIZE is used consistently every single time we need the buffer size. KISS principle - no messing around, no programmer "posing" their knowledge of C uselessly by making the program needlessly complex for the heck of it.


is the use of sizeof() inappropriate for functions such as snprintf()

No. Using sizeof would have been ok in theory, but since we already have BUF_SIZE, that constant should be used consistently all over the correctly written program.

Programmers doing intricate things with sizeof instead of keeping the code simple is a well-known source of bugs.

8 Comments

This approach will fail spectacularly if someone changes type of buf to int[] and uses memset for initialization
@tstanisl Yeah which is why we don't let complete amateurs modify source code in a professional setting. Anyone who thinks you can just go ahead and change the type of a variable without considering every single line of code where that variable is used is a complete fool and have a long way to go until they can work with programming professionally.
"change the type of a variable without considering every single line of code where that variable is used is a complete fool". Of course not. One can write robust C code where one can change a single type without inspecting every line. Forcing inspection is actually very-unprofessional practice that can easily introduce bugs. Better use memset(&buf, 0, sizeof buf) (minus some odd platforms with NULL with non-zero representation). This will either fail for non-lvalue or set a pointer to zero (typical NULL) causing segfault or it will be easily diagnosed by sanitizers.
@tstanisl It isn't obvious to me why "setting a pointer to zero" bug is better than a "not every value was initialized" bug. Setting a pointer to zero doesn't necessarily lead to a seg fault. Changing one bug for another doesn't make the program any better. I'm tired of these strange "what if the programmer changes the type" arguments, which seems to be some SO culture thing rather something taken from real-world code maintenance - I can't recall encountering a single such bug during the past 20 years of maintaining many code bases of diverse quality.
Bottom line: if the programmer don't know what they are doing while maintaining code, no coding style and standard in the world will save them from themselves.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.