03: Learning Integrity With Hashing

03: Learning Integrity With Hashing

 
Integrity
 
Integrity of a message is that the correctness and completeness of the transmitted information can be verified.
 
Message Digests And Cryptographic Hash Functions
 
  • A message is a piece of data processed by cryptographic algorithm.
  • Cryptographic Hash Functions or CHF is an algorithm.
  • A CHF maps message of arbitrary size and converts it to a relative shorter fixed-size array of bits.
  • This fixed size array of bits is called a Message Digest or a Cryptographic Hash
  • A message digest is the output of running a CHF on a message.
  • Requirements for a good CHF:
    • Deterministic Same message must always produce the same message digest.
    • Irreversible It must be impossible or extremely difficult to recover the original message from the digest.
    • It must be comutationally infeasible to find two distinct messages that produce the same message digest. If 2 messages are found with the same digest this is known as a hash collision.
    • Any change to the message big or small must result in an extensive change to the digest. So extensive that the two digests should be impossible to be related.
      • Such extreme reaction is called an Avalanche Effect
  • The goal of all this is to measure correctness and completeness of information.
  • REMEMBER: As the domain of all possible inputs to a CHF is much larger than the domain of all possible hash functions there will be hash collisions (when the hash of 2 different input run through a CHF produces the same hash) and this uniqueness in reality is thus impossible to realise. This is because inputs can be of any length it wants to be, but your hash function will always be of the same length.
  • So in reality one now requires that it merely be difficult, not impossible, but difficult to find different inputs that will be mapped to the same hash value.
  • If this is the case for a hash function then the hash function is called a collision resistant hash function and be used for integrity related purposes.
 
Applications of A Message Digest
 
  • You can perform Data Integrity Verification Verify the data that you have downloaded is the same data that is meant to be downloaded.
  • It forms the basis for HMAC. It combines a secret key and CHF for data authentication.
  • It also is an integral part of creating Digital Signatures. X.509 certificates in TLS protocol.
  • Used in network protocols like TLS, SSH etc.
  • This also finds use in Password Verification.
  • Is used in Source Control Management as Content Identifier. GIT, Mercurial use hashes to uniquely identify stored objects such as files, commits, branches and tags
  • Blockchain And Cryptocurrency
  • Proof-of-work system.
 
The security level of a CHF depends on the size of the message digest. If the message digest is n bits, the maximum attack complexity is 2n/2 for the collision attack and 2n fr the preimage attack. It is impossible to have a higher complexity than 2n/2 for the collision attack because the birthday attack, based on the birthday paradox, can always find collisions in 2n/2 time. For example: SHA-256 has a 2^128 collision attack complexity. Hence the security level is 128.
 
Reviewing Popular Hash Functions
 
  • SHA-2 Family of Hashes
  • Contains the most popular used: SHA-256 hashing algorithm.
  • This outputs a 256 bit digest and has a collision resistance level of 128 bits
  • SHA-256 is the default hash function in the TLS protocol.
  • Default signing function for X.509 certificates and SSH Keys.
  • Bitcoin uses SHA-256 to verify transactions and proof-of-work
  • GIT SCM is migrating to SHA-256 hashes for it’s blockchain implementation and object identification process.
  • Used in SSH, IPSec, DNSSEC, PGP etc.
  • Other SHA-2 HF:
    • SHA-224: Modification of SHA-256. Security Level: 112 bits
    • SHA-512: Algo is similar to SHA-256 but works on 64 bit words. SL: 256 bits
  • Developed by NSA and published by NIST in 2001 as federal standard.
  • The alogirthm is patented but available under royalty-free license.
 
  • SHA-3 Family of Hashes
  • Chosen through an algorithm competition.
  • Similar to how AES algorithm was chosen.
  • NIST orgnized the competitions, because of successful attacks on SHA-2 predecessors, namely SHA-1, SHA-0, MD5 etc.
  • SHA-3 is based on the Keccak algorithm from a team of Belgian Cryptographers. One of the team mate is also the person who co-authored the AES.
  • SHA-3
    • SHA3-224
    • SHA3-256
    • SHA3-384
    • SHA3-512
    • SHAKE128
    • SHAKE256
  • The SHA-3 Keccak algorithm is slower than SHA-2 because of more sequential operations.
  • As a result SHAKE128 and SHAKE256 were developed.
  • Strictly they are not Hash functions but Extendable Output Functions or XOFs .
  • Also same authors introduced the Kangaroo12 extendable Output Function which are 13 times faster than the SHA3-256 and also has SL of 128.
  • SHA-3 is currently used in the Ethereum blockchain as proof-of-work checking.
 
  • Other Notable Hash Functions
  • SHA-1
  • NSA in 1990
  • This is not secure anymore.
  • Broken by Google and Centrum Wiskunde & Informatica research center
  • Used for X.509, PGP, S/MIME, DSA, Git and Mercurial SCM
  • NIST deprecated SHA-1 in 2011
  • Web browsers stopped it’s support in 2017
 
  • MD Family
  • MD2, MD4, MD5, MD6
  • MD1 was proprietary
  • MD3 was experimental
  • Designed by Ronald Rivest who also invented symmetrtic cipher RC like RC2/4/5 etc
  • MD4 was used for hashing passwords in Windows NT, 2000 and XP
  • You can still enable MD password hashing
 
  • BLAKE 2
  • BLAKE2s and BLAKE2b
  • BLAKE2s produces 256 bit message digest
  • BLAKE2b produced 512 bit message digest
  • Similar to SHA-3
  • Faster than MD5, SHA-2, SHA-3 on more modern CPUs
  • Based on teh ChaCha Stream cipher
  • Popular in WhatsApp, 7-Zip, WinRAR, Rsync, Chef, Wireguard
 
Calculating Message Digest using OpenSSL
 
  • Check which message digest algorithms are supported
    • openssl dgst -list Supported digests: -blake2b512 -blake2s256 -md4 -md5 -md5-sha1 -mdc2 -ripemd -ripemd160 -rmd160 -sha1 -sha224 -sha256 -sha3-224 -sha3-256 -sha3-384 -sha3-512 -sha384 -sha512 -sha512-224 -sha512-256 -shake128 -shake256 -sm3 -ssl3-md5 -ssl3-sha1 -whirlpool
  • Let’s calculate SHA3-256 digest
    • seq 2000 > message.txt openssl dgst -sha3-256 message.txt SHA3-256(message.txt)= 6cea69b64fbbcb58732abb54a1f02557886b9935ddcd89aa9d2f6211443a1732
 
 
Examples of using other Hash Functions
 
First let’s see the release section of the openssl.org website:
 
notion image
 
 
 
You can see that each of the downloads has a SHA256, PGP Signature, SHA1 checksum attached to them. So let’s take the openssl-1.1.1w.tar.gz as an example. Copy the link of the file and download the file
 
wget https://www.openssl.org/source/openssl-1.1.1w.tar.gz --no-check-certificate
 
notion image
 
Let’s get the SHA1 and the SHA256 checksums as well. Copy the link and download the files.
 
>wget https://www.openssl.org/source/openssl-1.1.1w.tar.gz.sha256 --no-check-certificate >wget https://www.openssl.org/source/openssl-1.1.1w.tar.gz.sha1 --no-check-certificate
 
notion image
 
notion image
 
 
Now the idea is we calculate the same SHA1 checksum of the downloaded file and check against this value to verify the integrity of the downloaded file. If the values match that means the files were not tampered while on it’s way. If the values don’t match that means the files were corrupted either intentionally or un-intentionally.
 
Let’s calculate the SHA1 hash of the downloaded file. If you are using:
  • MAC: Use the shasum command
  • Linux: sha1sum command
 
I am using MAC so the shasum command by default calculates the SHA1 checksum of a file:
 
cat openssl-1.1.1w.tar.gz.sha1 shasum openssl-1.1.1w.tar.gz
 
notion image
 
You will see that both the checksum values match. Which shows that the integrity of the file is verified.
 
Let’s do another one, sha256. Again
 
  • MAC: Use the shasum command and use option -a 256
  • Linux: Use sha256sum command
 
cat openssl-1.1.1w.tar.gz.sha1 shasum -a 256 openssl-1.1.1w.tar.gz
 
notion image
 
 
We can also use OpenSSL directly to calculate the hash of a file as well.
Just use the openssl sha256/sha1 <filename> command
 
cat openssl-1.1.1w.tar.gz.sha1 openssl sha256 openssl-1.1.1w.tar.gz openssl sha1 openssl-1.1.1w.tar.gz
 
notion image
 
You can also use openssl to create a checksum file. Let’s create a checksum file for the openssl-1.1.1w.tar.gz file and match it against the downloaded checksum file as an exercise
 
>openssl sha256 -hex -out openssl.sha256 openssl-1.1.1w.tar.gz >cat openssl.sha256 SHA2-256(openssl-1.1.1w.tar.gz)= cf3098950cb4d853ad95c0841f1f9c6d3dc102dccfcacd521d93925208b76ac8 >cat openssl-1.1.1w.tar.gz.sha256 cf3098950cb4d853ad95c0841f1f9c6d3dc102dccfcacd521d93925208b76ac8
 
The command format is openssl sha256 -hex -out <output filename> <filename of which we need to calculate the checksum>
 
Let’s create for a sample file that we create for ourselves.
 
>echo "hello" > hello.txt >cat hello.txt hello >openssl sha256 -hex -out hello.txt.sha256 hello.txt >cat hello.txt.sha256 SHA2-256(hello.txt)= 5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03
 
Now if you want to you can share hello.txt along with it’s checksum file hello.txt.sha256 with someone else and they will use the same method as above to verify if the integrity of the file is maintained during the transfer.