Undefined Behavior and Apple’s Secure Coding Guide

Recently Apple released its Secure Coding Guide (dated 2014-02-11). This is filled the kind of good advice you’d expect to see from a high-tech firm that is committed to helping developers on their platform create secure code.

But I want to call your attention to the section called Avoiding Integer Overflows and Underflows. On page 28 is this code snippet:

size_t bytes = n * m;
if (bytes < n || bytes < m) { /* BAD BAD BAD */
    ... /* allocate "bytes" space */
}

Apple’s document doesn’t explicitly say that n and m are signed integers, but we’ll assume they are because the undefined behavior discussed in the reference only occurs for overflows of signed integers.

What this code is attempting to do is to detect if the expression n * m has overflowed. As the document states, this approach won’t work. It references CWE-733, CERT VU#162289. The issue here is that in order for the condition to be true, the expression must have overflowed. But overflow of signed integer is undefined behavior in C and C++. So the condition can never be true unless the program exhibits undefined behavior. So the compiler is free to remove the if statement.

In other words, in the case where the condition is false, removing the if statement doesn’t matter (because it wouldn’t be executed anyway and removing it speeds up the code) and in the case where the condition is true, the program exhibits undefined behavior and the compiler can emit any code at all in that case, including code without the if statement. So either way, no if statement.

This is a real-world optimization that is produced by modern compilers. They optimize for the case where the code doesn’t produce undefined behavior and they ignore the undefined behavior case.

So Apple is correct to warn about this. The problem is that the document then recommends an example solution with the following snippet.

size_t bytes = n * m;
if (n > 0 && m > 0 && SIZE_MAX/n >= m) {
    ... /* allocate "bytes" space */
}

I have to say I’m a little surprised by  this!

How can a document that just explained the problem of undefined behavior of signed integer overflow recommend a solution that has undefined behavior triggered by signed integer overflow?

The confusion here is that the document has not correctly determined the real issue. The document assumes that the problem with the first snippet is that the condition in the if statement can only be true in the case of undefined behavior, so the complier can and will remove the test.

While that is true as far as it goes, that isn’t the crux of the problem. The undefined behavior happens on the first line of the snippet. If the expression n * m results in signed integer overflow, then it doesn’t matter what follows, the code has entered the realm of undefined behavior and all bets are off.

In the first code snippet the assumption is that the first line exhibits the wrapping behavior that most processors will perform. The document correctly points out that isn’t a correct assumption. But in the second snippet, offered by the document as a solution, the assumption is that the first line won’t abort the program or do something else unexpected if the value overflows. But that is exactly what the language does not promise. How you detect undefined behavior after the fact is irrelevant. Once you’ve stepped outside the line of defined behavior, it is too late to pull back.

So what is the solution to a situation like this?

The document is on the right track to a solution. The key is to be able to detect the overflow situation without triggering it. Or in this specific case, detect that n * m would overflow, without actually calculating the value of bytes. But putting the detection after the calculation of bytes defeats that purpose because by then we’ve triggered undefined behavior. We need to use the detection to avoid the calculation that would result in undefined behavior. Something like this:

if (n > 0 && m > 0 && SIZE_MAX/n >= m) {
    size_t bytes = n * m; /* will not overflow */
    ... /* allocate "bytes" space */ }
else {
    /* handle the overflow case */
}

With this approach, bytes is not calculated until after we have determined that the calculation will not overflow so undefined behavior is avoided.

Marshall Clow has a talk on undefined behavior at the upcoming C++Now.

Shout out to Microsoft MVP Bruce for pointing out this issue.

Please post comments on Google Plus.