When a Byte is Not 8 Bits

not just for old computers

2020-01-19

I’ve been getting into my backlog of C books and resources when I came across my copy of Portable C by Henry Rabinowitz¹ that I obtained after reading the post C Portability Lessons from Weird Machines.

The blog post lists old weird machines with addressable units that might not be your typical 8-bit bytes. One might think that’s well and good, but not a concern since typical modern architectures have 8-bit bytes. That’s not entirely the case.

I work on products that have a Texas Instruments C2000 microcontroller. This is a modern microcontroller in use now. However, it has 16-bit bytes instead of 8.

I understand that the C2000 isn’t a part you see every day, but the fact remains that I have to support code for this.

If you want to play with this part, you can get their LaunchPad kit that has it.

So the addressable units and char are 16 bits. The standard says sizeof(char)==1, so any size is a multiple of 16 bits.

The standard also says int is at least 16 bits, but could for instance be 32. The C2000 just happens to have int be the minimum 16 bits. Interestingly, this means sizeof(int)==1 when we’re used to int being larger than char.

A multi-byte word like a 32-bit unsigned long is then made up of two 16-bit bytes. The C2000 is little-endian, so if we had an unsigned long with the value 0x01020304 at address 0x00000426, it would look like this in memory:

0x00000426 0x0304
0x00000427 0x0102

An example of portability becoming a concern is when we have to take something off of the network. We can’t reuse code to convert network order to the C2000 when it expects 8-bit bytes in the host. We had to write our own just for this.

Endianness is a worry when you have multi-byte words. But also what about 8-bit byte arrays coming in from the network? Do you store each 8-bit network byte in its own 16-bit byte? Or do you pack two 8-bit network bytes into one 16-bit host byte?

Similarly, when we’re sending something out, does the host put just 8-bits in the 16-bit register that holds values going out onto the wire? And is that upper or lower 8 bits? Or pack two 8-bit bytes again?

It’s certainly awkward talking about 8-bit bytes inside our 16-bit bytes, so we just call them octets.

I look forward to learning anything from those portability books that I can apply to the C2000.

That blog post must have driven demand up for that book. When I first ordered it on Amazon for a reasonable price, the seller then canceled my order but I didn’t think much of it. When I searched for another copy, I saw that same seller had the book again, but this time for over $300! It’s a niche area of programming but demand shouldn’t be that crazy. ↩