Endianness, how I loathe you
(originally posted in February, but got lost in time)
I have been busy making my own implementation of SHA-1. To better learn about why so many people depend on it for basically everything from SSL to tamper detection mechanism. I have a bigger project idea, but that is not important right now. What is important is that SHA-1 does everything in big endian, and I am on x86-64 which is a little endian machine.
Remember that a big endian machine has the most significant byte first, and little endian has the most significant byte last.
For example, let’s say I want a 64-bit integer to hold the number 1. This is how it’ll be stored:
Big endian:
1 = 0000 0000 0000 0000 0000 0000 0000 0001
Little endian:
1 = 0001 0000 0000 0000 0000 0000 0000 0000
SHA-1 stores the size of the message as a 64-bit integer in the last block during padding (each block is 512 bits). Since I have a little-endian machine, I wrote a function that correctly switches endian and now, the 1 appears as the as it should.
However, SHA-1 loops through each block in 32-bit integers.
*((unsigned int*)0000 0000 0000 0000) = 0
*((unsigned int*)0000 0000 0000 0001) = 16 million and change on little endian machine instead of 1 as I expect
so the second time, I have to do another endian change, this time a 32-bit endian change, so that it appears as :
0001 0000 0000 0000
so I get back 1.
This is a PITA, and a frustrating one. Mainly because I couldn’t figure it out for a few days. But feel so accomplished for figuring it out. Accomplished and embarrassed.