This article deals with implementing Hash Table using Python programming language. Hash Table is a data structure where data are stored in an associative manner (in key, value format). The key/index is unique. This kind of storage makes it easier to find the data later on.
Hash Table stores data into an array format. It uses a hashing function that generates a slot or an index to store/insert any element or value.
We will be creating our own hashing function and hash table. But before that, let’s look into the built-in Python dictionary and Python list.
Python Dictionary
- This is a Python data type which is same as what is called “associative array” in other programming languages.
- Data are stored in key-value pairs.
- It’s easy to retrieve data from Python Dictionary.
Example:
country_code = {'25': 'USA' ,'20': 'India', '10': 'Nepal'}
print country_code['10'] # Output: Nepal
print country_code['20'] # Output: India
Python List
- A list is another data type in Python.
- Data can be stored as tuples.
- It’s not as easy as python dictionary to retrieve items.
- We need to use a loop to search for any item in the list.
- It’s time-consuming to retrieve data.
Example:
country_code = [('25', 'USA'), ('20', 'India'), ('10', 'Nepal')]
def insert(item_list, key, value):
item_list.append((key, value))
def search(item_list, key):
for item in item_list:
if item[0] == key:
return item[1]
print (search(country_code, '20')) # Output: India
print (insert(country_code, '100')) # Output: None, python returns 'None' by default if the searched item is not found in the list
Very Simple Implementation
Here is a very simple implementation of the hash table and hashing function.
Hash Function / Hashing
This deals with generating slot or index to any “key” value. Perfect hashing or perfect hash function is the one which assigns a unique slot for every key value. Sometimes, there can be cases where the hash function generates the same index for multiple key values. The size of the hash table can be increased to improve the perfection of the hash function.
First of all, let’s create a hash table of size 10 with empty data.
hash_table = [None] * 10
print (hash_table)
# Output:
# [None, None, None, None, None, None, None, None, None, None]
Below is a simple hash function that returns the modulus of the length of the hash table. In our case, the length of the hash table is 10.
Modulo operator (%) is used in the hashing function. The % (modulo) operator yields the remainder from the division of the first argument by the second.
def hashing_func(key):
return key % len(hash_table)
print (hashing_func(10)) # Output: 0
print (hashing_func(20)) # Output: 0
print (hashing_func(25)) # Output: 5
Inserting Data into Hash Table
Here’s a simple implementation of inserting data/values into the hash table. We first use the hashing function to generate a slot/index and store the given value into that slot.
def insert(hash_table, key, value):
hash_key = hashing_func(key)
hash_table[hash_key] = value
insert(hash_table, 10, 'Nepal')
print (hash_table)
# Output:
# ['Nepal', None, None, None, None, None, None, None, None, None]
insert(hash_table, 25, 'USA')
print (hash_table)
# Output:
# ['Nepal', None, None, None, None, 'USA', None, None, None, None]
Collision
A collision occurs when two items/values get the same slot/index, i.e. the hashing function generates same slot number for multiple items. If proper collision resolution steps are not taken then the previous item in the slot will be replaced by the new item whenever the collision occurs.
Example:
In the example code above, we have inserted items Nepal and USA with key 10 and 25 respectively. If we try to insert a new item with key 20 then the collision occurs because our hashing function will generate slot 0 for key 20. But, slot 0 in the hash table has already been assigned to item ‘Nepal’.
insert(hash_table, 20, 'India')
print (hash_table)
# Output:
# ['India', None, None, None, None, 'USA', None, None, None, None]
As you can see, ‘Nepal’ is replaced by ‘India’ as the first item of the hash table because the result of hashing_func for keys 10 and 20 is the same (i.e. 0).
Collision Resolution
There are generally two ways to resolve a collision:
- Linear Probing
- Chaining
1. Linear Probing
One way to resolve collision is to find another open slot whenever there is a collision and store the item in that open slot. The search for open slot starts from the slot where the collision happened. It moves sequentially through the slots until an empty slot is encountered. The movement is in a circular fashion. It can move to the first slot while searching for an empty slot. Hence, covering the entire hash table. This kind of sequential search is called Linear Probing.
2. Chaining
The other way to resolve collision is Chaining. This allows multiple items exist in the same slot/index. This can create a chain/collection of items in a single slot. When the collision happens, the item is stored in the same slot using chaining mechanism.
While implementing Chaining in Python, we first create the hash table as a nested list (lists inside a list).
hash_table = [[] for _ in range(10)]
print (hash_table)
# Output:
# [[], [], [], [], [], [], [], [], [], []]
The hashing function will be the same as we have done in above example.
def hashing_func(key):
return key % len(hash_table)
print (hashing_func(10)) # Output: 0
print (hashing_func(20)) # Output: 0
print (hashing_func(25)) # Output: 5
We change the insert function. We use append()
function to insert key-value pairs in the hash table.
def insert(hash_table, key, value):
hash_key = hashing_func(key)
hash_table[hash_key].append(value)
insert(hash_table, 10, 'Nepal')
print (hash_table)
# Output:
# [['Nepal'], [], [], [], [], [], [], [], [], []]
insert(hash_table, 25, 'USA')
print (hash_table)
# Output:
# [['Nepal'], [], [], [], [], ['USA'], [], [], [], []]
insert(hash_table, 20, 'India')
print (hash_table)
# Output:
# [['Nepal', 'India'], [], [], [], [], ['USA'], [], [], [], []]
Standard Implementation
A more standard implementation of Hash Table with Python is presented below. We create three different functions to insert, search, and delete items from the hash table.
Python’s built-in “hash” function is used to create a hash value of any key. This function is useful as it creates an integer hash value for both string and integer key. The hash value for integer will be same as it is, i.e. hash(10) will be 10, hash(20) will be 20, and so on.
In the below code, note the difference in output while using 10 and ’10’. 10 (without quote) is treated as an integer and ’10’ (with quote) is treated as a string.
hash_key = hash('xyz')
print (hash_key) # Output: -5999452984703080694
hash_key = hash('10')
print (hash_key) # Output: 6272037681056609
hash_key = hash(10)
print (hash_key) # Output: 10
hash_key = hash('20')
print (hash_key) # Output: 6400038450057764
hash_key = hash(10)
print (hash_key) # Output: 20
hash_key = hash('25')
print (hash_key) # Output: 6400038450057761
hash_key = hash(25)
print (hash_key) # Output: 25
hash_key = hash('10') % len(hash_table)
print (hash_key) # Output: 9
hash_key = hash('20') % len(hash_table)
print (hash_key) # Output: 4
hash_key = hash('25') % len(hash_table)
print (hash_key) # Output: 1
hash_key = hash(10) % len(hash_table)
print (hash_key) # Output: 0
hash_key = hash(20) % len(hash_table)
print (hash_key) # Output: 0
hash_key = hash(25) % len(hash_table)
print (hash_key) # Output: 5
Create empty Hash Table
Creating the hash table as a nested list (lists inside a list).
hash_table = [[] for _ in range(10)]
print (hash_table)
# Output:
# [[], [], [], [], [], [], [], [], [], []]
Insert Data into the Hash Table
Let’s name each individual list inside the hash table list as ‘bucket’.
While inserting a new element into the hash table, we first search if the key already exists in the hash table.
– If the key is already present in the hash table, then we update its value with the new one.
– Otherwise, we insert a new key-value pair into the hash table.
def insert(hash_table, key, value):
hash_key = hash(key) % len(hash_table)
key_exists = False
bucket = hash_table[hash_key]
for i, kv in enumerate(bucket):
k, v = kv
if key == k:
key_exists = True
break
if key_exists:
bucket[i] = ((key, value))
else:
bucket.append((key, value))
insert(hash_table, 10, 'Nepal')
insert(hash_table, 25, 'USA')
insert(hash_table, 20, 'India')
print (hash_table)
# Output:
# [[(10, 'Nepal'), (20, 'India')], [], [], [], [], [(25, 'USA')], [], [], [], []]
Search Data from the Hash Table
While searching for any key in the hash table, we have to loop through each individual sublist.
def search(hash_table, key):
hash_key = hash(key) % len(hash_table)
bucket = hash_table[hash_key]
for i, kv in enumerate(bucket):
k, v = kv
if key == k:
return v
print (search(hash_table, 10)) # Output: Nepal
print (search(hash_table, 20)) # Output: India
print (search(hash_table, 30)) # Output: None
Delete Data from the Hash Table
Deleting element (key-value pair) from the hash table is somewhat similar to inserting element. The loop is almost the same.
While deleting any existing element from the hash table, we first search if the key already exists in the hash table.
– If the key is present (found) in the hash table, then we simply delete it. We delete that particular key-value pair from the hash table.
– Otherwise, no operation is done. We can simply print a message saying that the key was not found in the hash table.
def delete(hash_table, key):
hash_key = hash(key) % len(hash_table)
key_exists = False
bucket = hash_table[hash_key]
for i, kv in enumerate(bucket):
k, v = kv
if key == k:
key_exists = True
break
if key_exists:
del bucket[i]
print ('Key {} deleted'.format(key))
else:
print ('Key {} not found'.format(key))
delete(hash_table, 100)
print (hash_table)
# Output:
# Key 100 not found
# [[(10, 'Nepal'), (20, 'India')], [], [], [], [], [(25, 'USA')], [], [], [], []]
delete(hash_table, 10)
print (hash_table)
# Output:
# Key 10 deleted
# [[(20, 'India')], [], [], [], [], [(25, 'USA')], [], [], [], []]
References:
Hope this helps. Thanks.