Conflicting Code Conventions

a case where the coding convention of assuming true==success bit me

2016-01-18

I love that C, my favorite language, is still getting popular posts written about it. How to C in 2016 has a lot of good tips I’ll be delving into and using, and Errata Security’s notes on that article is a good response to it with even more info I’ll look at.

In particular, I liked what they said about true/false, success/failure return values. In C, an expression that evaluates to 0 is considered false (for instance, if used in an if statement) and a 1 is considered true. It’s common that if you want a function to have a return value indicating if it succeeded or not, you return true for success and false for failure. The ErrataSec post says that’s not always true.

You may ask, “Who would not follow that convention?” Well let me tell you of a time that happened in real, professional code.

Working for a huge company, we had a different group that did work for us. They were responsible for libraries and tools that helped us with our product. We got a new release of a new library from them. I decided to write tests for the library. It’s not so much that I didn’t trust them, I’m just thourough that way.

For one function, I think it was just something like a constructor or initialization for the library, it’s return type was an explicit boolean and the documentation said the return value indicates success or failure. Ok, true is success and false is failure I assumed.

It took an object to connect to the IO as input; if that object reported it can successfully communicate then their function should succeed, and failure should make them fail. I created a mock object that I can control its success and tested if the new code succeeds or fails as expected. I always test both cases to avoid things like false positives.

Both cases failed. The test for success wouldn’t be too strange, the test returning what I thought was failure when expecting success could mean it was broke. But that scenario would mean the failure testcase should “pass” and return failure as well. But it returned success when I was expecting failure. What was going on?

I of course reported this to the team right away. They looked at it, and came right back saying no it worked fine. Puzzled, I ran my tests again. Nope, failure when it should succeed, succeed when it should fail.

I had them look at my test cases. I showed it was returning our explicit true value in my testcase that should fail, and returned false when it should succeed.

They said, “Yes, that’s how it works.”

What.

Their reasoning was programs, like ones run on the command line, return zero for success and non-zero for failure. They wanted to use that convention of zero=success, non-zero=failure and decided to use a type that had two values instead of an int. Well, we were already using a type that had two values that were zero and non-zero: our boolean. They made their function return false for succeed because false was zero, and true for fail because true was non-zero.

zero = success = false
non-zero = failure = true

This weird mixing of conventions was one of the most unexpected bugs I’ve had and a good reason to use SUCCESS to mean success.