Cryptography: Verify file integrity with md5 hash checksum

Checksum is used to verify the data integrity of files. It can be used to detect errors while transmitting data between devices or over the internet. The checksum functions as a digital fingerprint of a file. Computer programs like md5sum and sha256sum calculate and verify checksum. md5sum calculates and verifies 128-bit MD5 hashes and sha1sum calculates and verifies SHA-1 hashes.

Verifying locally stored files

Suppose, you had computed checksum of your files and stored checksum of all the files in another file. After some time/days, you need to check if those files are tampered / modified, or corrupted then you can recompute the checksum of your files and check if it matches with the previously stored checksum. If it matches then, your data are not altered or corrupted. Otherwise, your files are corrupted. You can verify the checksum of each individual file.

Verifying downloaded files

Sometime, data might get lost during download. In such case, you can verify the integrity of your downloaded files with the help of checksum. For this, you need to have previously stored checksum of your files on your server. You can download that file containing checksum of all the other files from your server. Then, you can compute the checksum of your downloaded files and verify if it matches with the previoulsy stored checksum. If it matches then, your data are not altered or corrupted. Otherwise, your files are corrupted. You can verify the checksum of each individual file.

Working Example

In this article, I will show how to compute checksum of files, store the checksum in a new file and then verify the data integrity / authenticity of the files using md5sum. I am using Ubuntu Linux as my operating system.

1. Create files and folders

I go to my home folder and create a new directory over there named ‘crypto’. So, crypto folder will be my working directory for this article example.


cd ~
mkdir crypto

I create two directories named ‘folder1’ and ‘folder2’ and 5 files. folder1 and folder2 have one file each inside them.


mkdir folder1 folder2
touch file1.txt file2.txt file3.txt folder1/file4.txt folder2/file5.txt

2. Print all files

Just checking all the files and folders present inside our working directory.

Print all files


find . -type f -print

output:

./file1.txt
./file2.txt
./file3.txt
./folder1/file4.txt
./folder2/file5.txt

Print all directories


find . -type d -print

output:

.
./folder1
./folder2

3. Compute md5 hash of all files and store it in a new file named ‘md5’


md5sum * > md5

4. Print content of the new file ‘md5’

This file has md5 hash of all the files present inside our working folder.


cat md5

output:

d41d8cd98f00b204e9800998ecf8427e file1.txt
d41d8cd98f00b204e9800998ecf8427e file2.txt
d41d8cd98f00b204e9800998ecf8427e file3.txt

5. Making changes

Let’s update one of the exisiting files. We add “some text” to file1.txt.


echo "some text" > file1.txt

6. Verify md5 hash

Now, we check and verify the integrity of all the files in our working directory.


md5sum -c md5

output:

file1.txt: FAILED
file2.txt: OK
file3.txt: OK
md5sum: WARNING: 1 computed checksum did NOT match

We can see that it shows one file (file1.txt) as altered because we had edited the file.

Hope this helps. Thanks.