Hashing

Shivani Sundriyal
6 min readJan 1, 2021

Hashing is a technique to find the address where the data is located with the help of a key by using mathematical function called hash function .In simple words, the process of mapping key to address is known as hashing.

Why do we need hashing?

Hashing is useful for searching. Let’s suppose we have a list of keys and we want to search some key from them then hashing will come into force .However, we already have some known search procedures namely linear search and binary search.

On analysis, we got to know that the time complexity for linear search is O(n) and binary search is O(logn) which makes binary search a faster searching method.

We still want a faster method than Binary search whose time complexity should be smaller than O(logn) i.e. O(1).Hashing will try to search the keys in less than O(logn) times that means in almost constant time. Sounds great! Isn’t it?

Mathematical model for hashing

Suppose , we are given some set of keys, consider them as key space. These keys should be mapped on hash table.

· Hash table is a data structure which stores keys and their associated values and it uses hash function for mapping keys with their associated values.

· Hash table is an array [0 to M-1] having size M.

It is just like domain and range. So, the elements in domain are mapped in elements in range which makes a relationship.

Mapping domain upon range is relation or a function.

There are four relational mapping namely:

  • One to One (function)
  • One to many
  • Many to One(function)
  • Many to Many

We are storing elements through hash function rather than storing them directly into the hash table as it saves lots of space from being wasted. That hash function will give us the index and will store a key at that index.

Given a set of elements, a hash function which maps each item into a unique slot is known as a perfect hash function. Our goal is to create a hash function that minimizes the no. of collisions, which is easy to compute, and evenly distributes the elements in the hash table

Features of good Hash Function:

  • Addresses (or index values) generated from the key are distributed uniformly and randomly.
  • If there is small variation in key values then it will cause large variation in indexes so that they are distributed evenly.
  • The hash function should minimize the collisions.

Implementation of Hash Function:

  • Division Method: Hash Function: Hash(Key)= Key % M,where M is hash table size
  • Mid Square: Take the square of key and take middle digits of squared key as address. Eg. Key= 2341 Square= 5480281 Hashed Address= 802
  • Folding: key is divided into subparts of equal length (besides the last part) which are combined or folded to form a new address. Eg: Key is 876743. For fold shift, the sum is 87+67+43= 197;Discard 1 , address is 97

So, for the next time if we want to search for a key whom should we approach first?

The Hash function of course!

Now, have a look at some more important terms related to hashing:

  • Load Factor: It is the no. of elements in hash table divided by size of the table.
  • Probe: Each action of address calculation and checking for success.

When two or more keys hash to the same address it leads to collisions. The process of finding the alternate location of the key is called as collision resolution .Now, we will learn some collision resolution techniques.

COLLISON RESOLUTION TECHNIQUES

  1. Open Hashing: When extra space is taken beyond the hash table it is called open hashing.

Chaining

· Collision resolution by separate chaining combines linked representation with hash table.

· When two or more records are hashed to the same location, these records are inserted into a singly-linked list called a chain.

· However, pointers are needed to handle to form a chain. The extra memory is required to store pointers.

· Loading factor(lambda):n/size

Drawbacks:

  • If all keys gets the same index then non uniform distribution is seen.
  • One should have the idea of selecting proper hash function.

2.Closed Hashing: When no extra space is taken beyond the hash table it is called closed hashing.

Linear Probing

  • In linear probing, the collision is resolved by storing the element in the next empty slot.
  • It is implemented by using linear search for an empty slot.
  • The function used for linear probing is as follows:(H(k)+p(i)) MOD M

As p(i)=i for linear probing, the function becomes

(H(k)+i) MOD M

where H(k)=k MOD M, M is table size, i={0,1,2,… M-1}

Drawbacks:

  • It should be half filled.
  • Lots of keys allocating at one place forms a cluster(primary clusting).
  • Deletion is not easy in linear probing.

Quadratic Probing

  • The primary clustering can be eliminated by using Quadratic Probing method by examining certain slots away from the original probed position.
  • Here, Collision Function is F(i)=i2 is quadratic.
  • The function used for quadratic probing is as follows: H(k)+i2) MOD M where H(k)=k MOD M, M is table size.

Drawbacks:

  • In quadratic probing, items which hash to the same address will follow the same path or alternative cells, which is called as secondary clustering.

Double Hashing

  • When a collision occurs, we apply a second hash function in double hashing.
  • The function used for double hashing is as follows:H(k,i)= (H1(k) + i H2(k) ) mod M where , M is table size and i=0,1,…

example: Given, Table size=11, h1(k) = k Mod 11, h2(key) = 7- (k mod 7)

Insert following keys: 47, 14, 91, 25

H(k,i) = (H1(k) + i H2(k) ) mod 11

H(47,0)= 47 mod 11 = 3

H(14,0)=14 mod 11 = 3

H(14,1)= (3 + 7- (14 mod 7)) mod 11 = 10

H(91,0)= 91 mod 11 = 3

H(91,1)= (3 +7 –(91 mod 7)) mod 11 =10

H(91,2)= 3+ 2* 7 mod 11 = 6

H(25,0)= 25 mod 11 = 3

H(25,1)=(3 + 7- (25 mod 7)) mod 11=6

H(25,2)= (3 + 2 * 7- (25 mod 7)) mod 11= 9

Drawbacks:

  • Double hashing is more complicated to implement than quadratic probing.

Thank you for reading!

--

--