Hashing
What does a hashing algorithm do?
A hashing algorithm is a complex mathematical function that takes input text of any size and converts it to a fixed-length alphanumeric output. The output is the called the hashed value or message digest.
The produced value is always the same, if the input is exactly the same. If just a single letter is different, even if the capitalization is different, the whole output is different. In other words, there are no correlation between the input text and the output, no pattern to be "cracked".
A hashing algorithm is kind of like a "one-way function" - You can hash an input, but never return the original string from the hashed value.
What can a hashing algorithm be used for?
Hashing algorithms has a lot of uses. The main topic it is used for is verification.
Usage example 1: File verification
Since the input value can be of any size, even a large file could be hashed, to get the "fingerprint" for that file.
A software company might include this hash-value on their website, next to the download button. When a user downloads the file, they can use the same hashing algorithm on the file and if the produced hash-value matches the one on the website, the user knows that the file hasn't been tampered with or had any network issue in transit.
Usage example 2: Passwords
Saving users passwords as cleartext is a security concern. Not only could employees with bad intention and access to the data, potentially grab all the passwords, but also if a databreach were to happen, would this be troublesome.
Instead, hashing algorithms are used! When the user creates a profile, their password is hashed and the hashed value is saved. Next time the same user wants to login:
- They enter their password
- Their password is hashed
- Their hashed password is checked against the saved hash
- If their hashes match, the user can be authenticated
Another security measure added to this, is "salts". A salt is a unique, random string that is added to the password before it is hashed. This means that even if two users have the same password, their hashed values will be different. It also makes precomputed attacks, where a bad actor has a huge list of already-hashed common passwords, much harder to pull off. It's still not a great idea to reuse passwords!