Checksum & Hash – What are they and what do I need them for?

Posted on January 14, 2015 by Angela

In an earlier article, we have talked about the difference between encoding, encryption and hashing already. Today, we want to have a closer look at the latter of the three, the hashing function and the resulting hash or checksum.

For this purpose, we will have a brief look at what a checksum is before we concentrate on what checksums are used for.

What’s Hashing?

Using a hashing function creates a hash or checksum which are basically a seemingly random string of numbers, letters, and special characters – depending on the algorithm used for the hashing.

For example, the result of hashing “Testing Hashing Functions” with the Blowfish function would result in the following hash:

$2a$08$TYehtX/wnmSaNEkL2q4ER.ujOhEQvL6GHgM6Ue6xzQKTGidnDBxny

This hash is unique to the sentence used to create it (using the very same system). Thus, you will be able to determine whether a set of data has been altered by creating another hash, for example of “Testing Hashing Function” without the plural ‘s’. The hash differs from the one above:

$2a$08$VAYz8xtY3MXlCaW4NuoahO1zzb92Ej8yr3x99P.CKdmKYEAEKf1ki

While hashes are thus a way to secure your data when, for example, sending it over the internet, the function is not reversible. This means that the input used to create the checksum or hash can not or very very hardly be generated using the checksum, even if the function used is known.

Examples for Hashing functions include: Gost, SHA-3, Tiger-160, Whirlpool, and other hash generators.

What is Hashing used for?

Hashes and checksums can be used for several purposes, some of which are listed below.

Searching in a big database

Creating hashes of database entries helps the user, administrator, or program to find certain entries in the database a lot faster than a real character-by-character search. Names, titles, headlines, or anything else ‘search worthy’ are assigned an individual hash or checksum. Entering a keyword into the search will generate a hash for said keyword as well, quickly comparing it to the hashes inside the database, and delivering the matching result.

Checking downloads

Some sites that provide the download of programs, example files, or anything else, also provide the correlating checksum to the download. This is important at times, since you surely don’t want to download a corrupted file or one with a virus. Thus you can compare the hash or checksum generated from the file you downloaded, and compare it to the one given to you by the website you got it from. If the checksum differs, the file is not the same one that has been used to generate the checksum you got for comparison.

Checking transmitted data

Another use of checksums and hashes is that they can be used to compare data that has been written to a disk or external hard drive with the original one to rule out transmission errors. Sometimes, the transmission of data to another device is interrupted or incomplete for other reasons. Instead of checking the data individually, one can simply compare the ckecksums made of both, the original data and the one on the device.

Password comparison

As the owner or administrator of a forum, online shop, or another website that uses a password function, it’s wiser to not store the user’s password in your database. The database is (more or less easily) hackable, and thus the password information of the users would be accessible to every hacker that is interested in it. Saving the hash value or checksum of the passwords s a lot saver since the original password can not be derived from it. Yet, comparison of the hash value stored and the hash created when the user enters his password will still determine whether the same password has been entered.

Duplicate content

Looking for duplicate content, either across the web or in a large file, it can become a very time consuming task. Here, hash value comparison simplifies and quickens the task as well. Services that provide looking for duplicate content across the web or plagiarism software rely on using checksums or hashes for the most part.