What is Hashing in Data Structure? Explore Hashing Techniques, Benefits, Limitations, and More
Updated on Feb 24, 2025 | 26 min read | 184.8k views
Share:
For working professionals
For fresh graduates
More
Updated on Feb 24, 2025 | 26 min read | 184.8k views
Share:
Table of Contents
Hashing in data structure is a technique that assigns each piece of data (often called a key) to a specific index in a hash table. This mapping happens through a function known as a hash function, which converts the key into a fixed-size code. Once the data lands at its correct spot, you can access it on average in constant time.
In simpler terms, hashing speeds up lookups. It makes life easier when you need quick results from large lists of items — such as user passwords, dictionary words, or catalog records.
In this blog, you’ll see why hashing is considered a powerful method for fast data handling. You’ll learn how it works, which hashing techniques are common, how to build a custom hash table, and much more. Let’s get started!
Hashing in data structure assigns each data element, called a key, to a slot in a hash table through a function that converts the key into a numeric output. This output, or hash code, points to the exact place in the table for quick searches and updates. The goal is to keep operations (like inserting or finding items) at a constant average time, even when the dataset grows large.
Each key gives the same code every time, which makes the system consistent. The technique is generally one-way, so you cannot work backwards from a hash code to recover the original key.
Let’s explore the components of hashing in detail now.
1. Key
A key is the piece of data you want to store and retrieve. It can be a number, text, or any unique identifier. The system treats each key as the starting point for generating a hash code.
When you use meaningful keys, locating or updating entries becomes much easier. A key must remain consistent across operations so that the same input always produces the same code.
2. Hash Function
The hash function converts each key into a numeric output called a hash code. A well-designed hash function distributes keys evenly across the hash table and reduces collisions.
Simple versions might use modulo arithmetic (key % tableSize), while advanced ones use more complex math or encryption-based formulas.
A good function should do the following things:
3. Hash Table
A hash table is an array or similar structure that reserves spots for each hash code. The table’s size depends on factors like expected data volume and desired efficiency. Each index in the table points to a location where data is kept.
When a new key arrives, the hash function determines which slot the data should occupy. If a collision occurs, the system uses a separate strategy — like chaining or open addressing — to store items without discarding anything.
Also Read: Hash Tables and Hash Maps in Python
Each component in hashing works in unison: the key defines the data, the hash function determines where to place it, and the table holds it for instant lookups. This setup allows efficient insertion, retrieval, and deletion in large databases or any data-heavy scenario.
Here’s a step-by-step guide on how hashing works:
Let’s understand this better through two examples that illustrate how hashing works.
Example 1: Numeric Hashing
Suppose you have three numbers: 14, 19, and 23, along with a simple hash function h(key) = key mod 5.
The table below shows how each key maps to a slot in a hash table of size 5 (indexes 0 through 4):
Key |
H (key) =keymod 5 |
Resulting Index |
14 | 14 % 5 = 4 | 4 |
19 | 19 % 5 = 4 | 4 (collision) |
23 | 23 % 5 = 3 | 3 |
You place 14 at index 4, but 19 also maps to index 4, which causes a collision. The last key, 23, goes to index 3 with no issue. Collisions get handled through methods like chaining or open addressing, which you’ll learn more about later.
Example 2: String Hashing
When hashing a string like "CAT," you might add the ASCII values of each character (C=67, A=65, T=84) for a total of 216. If you apply 216 % 10, you get 6 as the final index.
Below is a table for three strings:
String |
Sum of ASCII Values |
Computation |
Final Index |
CAT | 67 + 65 + 84 = 216 | 216 % 10 = 6 | 6 |
DOG | 68 + 79 + 71 = 218 | 218 % 10 = 8 | 8 |
KEY | 75 + 69 + 89 = 233 | 233 % 10 = 3 | 3 |
Each string generates an index in the table, although collisions can still occur if different strings produce the same final value. The main advantage is that hashing can point you directly to each item's spot instead of forcing you to check every element in the table.
Take the leap into Data Science success – Explore upGrad's comprehensive data science courses and boost your career now!
Also Read: Sorting in Data Structure: Categories & Types [With Examples]
Hash functions follow different approaches, or hashing techniques, to turn keys into compact codes. While all hashing techniques in a data structure aim to minimize collisions, they handle numeric or string inputs in unique ways.
Below is a closer look at widely used hashing techniques, complete with examples to show how keys transform into hash codes.
The division method is one of the simplest techniques. You take a key and divide it by the size of the hash table, then use the remainder as the index. This approach is quick because it involves a single division operation, but picking the right table size is important to reduce collisions.
How Does It Work?
Let’s understand this through an example: Suppose you have keys [13, 27, 18, 42] and a hash table of size 10:
Key |
Computation |
Remainder |
Index |
13 | 13 ÷ 10 | 3 | 3 |
27 | 27 ÷ 10 | 7 | 7 |
18 | 18 ÷ 10 | 8 | 8 |
42 | 42 ÷ 10 | 2 | 2 |
You place each key at its remainder index. Even if different keys produce the same remainder, collisions can still happen, so you must have a plan to handle them (for example, chaining or open addressing).
The mid-square method squares the key and extracts the middle digits to form the index. This technique can work well if the squared values vary enough to reduce collisions.
How Does It Work?
Example: Use a table size of 100 for clarity. Let’s take the key ‘56’:
If the key is 123, then:
This approach distributes keys better than a basic division if the squared numbers vary enough. However, repeated patterns can still result in collisions.
This method taps into more of the key's digits by squaring it, often spreading values. However, if certain patterns repeat, collisions can still occur.
In folding, you slice the key into equal parts (except for the last piece if the key length isn't divisible evenly) and sum those parts to get the index. This method is useful when keys contain multiple segments, such as phone numbers or identification codes.
How Does It Work?
Example: Take a phone number-like key, such as 123456, and split it into two-digit segments: 12, 34, 56.
If the key had odd digits, the last chunk might have fewer digits. You still add it to the sum.
The multiplication method multiplies the key by a constant (often a fraction between 0 and 1), then extracts the fractional part and multiplies it by the table size. The goal is to spread out keys more uniformly than basic division.
How Does It Work?
Example: Let’s use A = 0.7 and a table size of 10 for the key 45:
The multiplication method can still collide but often handles data better than a plain division approach.
Universal hashing picks a random function from a set of possible options. This makes it harder for any adversary or pattern to consistently cause collisions. It can be more secure for certain applications, though it can add complexity.
How Does It Work?
Example: Say we keep a set of arithmetic functions like the ones listed below:
We pick one function at run time. Since it’s random, a worst-case scenario for one function may not apply to the next one.
Though often used for security (like passwords or checksums), cryptographic hashing techniques can also serve in data structures if you need robust collision resistance.
How Does It Work?
Example: When you apply SHA-256 to “apple,” you get a 64-character hexadecimal string. It’s more expensive computationally than simpler methods but offers greater security.
Also Read: What is MD5 Algorithm? How Does it Work?
Many languages include built-in data structures that rely on hashing for quick operations. These structures typically accept a key, compute a hash code behind the scenes, and store the value in a table.
Below are some short Python, Java, and C++ examples that illustrate how each handles key-value pairs.
Python Hash Method
Python’s built-in dictionary uses a hash function on each key.
Below is a short code snippet that stores and retrieves data.
def demo_python_dict():
# Create a dictionary
phonebook = {
"Asha": 12345,
"Rahul": 67890
}
# Insert a new key-value pair
phonebook["Meera"] = 55555
# Retrieve values
print("Asha's number:", phonebook["Asha"])
print("Rahul's number:", phonebook["Rahul"])
print("Meera's number:", phonebook["Meera"])
if __name__ == "__main__":
demo_python_dict()
Output:
Asha's number: 12345
Rahul's number: 67890
Meera's number: 55555
Python’s internal hash function generates an index for each name. When you call phonebook["Asha"], it uses that hash code to locate her number in constant average time.
Java Hash Method
Java provides the HashMap class, which maps keys to values using hashing. It automatically resizes the table when the load factor becomes high and handles collisions by linking entries in buckets.
import java.util.HashMap;
public class DemoHashMap {
public static void main(String[] args) {
HashMap<String, Integer> phonebook = new HashMap<>();
// Insert entries
phonebook.put("Asha", 12345);
phonebook.put("Rahul", 67890);
// Insert a new one
phonebook.put("Meera", 55555);
// Retrieve values
System.out.println("Asha's number: " + phonebook.get("Asha"));
System.out.println("Rahul's number: " + phonebook.get("Rahul"));
System.out.println("Meera's number: " + phonebook.get("Meera"));
}
}
Output:
Asha's number: 12345
Rahul's number: 67890
Meera's number: 55555
When you add a key, HashMap turns that key into a hash code and places the data in the corresponding bucket. If two keys end up in the same bucket, Java uses a linked list or tree to differentiate them internally.
Also Read: What is Hashtable in Java? Explained with Examples
C++ Hash Method
C++ provides an unordered_map for hashing. When you insert a key, it figures out the hash code and finds a spot for the value. Collisions are often handled by chaining.
#include <iostream>
#include <unordered_map>
#include <string>
int main() {
std::unordered_map<std::string, int> phonebook;
// Insert entries
phonebook["Asha"] = 12345;
phonebook["Rahul"] = 67890;
// Insert a new one
phonebook["Meera"] = 55555;
// Retrieve values
std::cout << "Asha's number: " << phonebook["Asha"] << std::endl;
std::cout << "Rahul's number: " << phonebook["Rahul"] << std::endl;
std::cout << "Meera's number: " << phonebook["Meera"] << std::endl;
return 0;
}
Output:
Asha's number: 12345
Rahul's number: 67890
Meera's number: 55555
unordered_map calculates a hash code for each name. When you call phonebook["Asha"], it looks up the code and jumps to the correct spot. This method keeps data retrieval at a constant average time.
Also Read: Types of Data Structures in Python: List, Tuple, Sets & Dictionary
Hashing makes large-scale lookups feel instant. You can store thousands or even millions of records and still reach the item you want in constant average time. Many data-driven systems rely on this technique for real-time performance, whether it’s checking passwords or searching entries in a phone directory.
The idea is straightforward: convert each key into a numeric code, then use that code to jump directly to the data’s location. You skip repetitive scans through entire lists, which boosts speed dramatically. By cutting down the overhead of sequential searches, hashing keeps data operations efficient, even as your collection expands.
Below are some key reasons why hashing offers strong benefits for data retrieval:
Collisions happen when two or more distinct keys map to the same slot in a hash table. The table has a finite number of indices, but the possible set of keys might be far larger, so different inputs can generate identical hash codes.
When this occurs, one slot tries to hold multiple items.
Collisions are normal in hashing — no function guarantees absolute uniqueness in every case. However, you can reduce them by choosing a strong hash function and adjusting the table size. Even with those measures, you still need a proper plan to store colliding entries without losing data.
Here are some common reasons why collisions happen:
Now, let’s explore the common collision-resolution methods you can use.
1. Separate Chaining (Open Hashing)
Separate chaining attaches items with the same hash index in a linked list. Each slot in the hash table can point to a separate chain of entries. This helps when collisions appear frequently because items keep extending the list instead of overwriting each other.
Adding or searching for an element takes extra time if a chain becomes long, but it’s straightforward to implement and scales naturally.
How Does It Work?
2. Open Addressing (Closed Hashing)
Open addressing keeps all data within the main array. When a collision happens, the hash table probes other slots until it finds an empty space. The table never expands beyond its original array, which helps if you prefer a contiguous structure. However, overfilling can increase collision chains, so a balanced load factor is crucial.
Below are three probing techniques under open addressing:
How Does it Work (Linear Probing Example)?
A custom hash table gives you full control over how data gets stored and how collisions are resolved. You can pick a hash function, decide on the table size, and choose whether to use chaining or open addressing.
Below is a step-by-step approach, followed by an example in pseudo-code.
Step 1. Choose a Hash Function
Select a function that spreads your keys. A simple choice is the division method:
index = key % tableSize
If you need a better distribution, you can choose more advanced approaches (mid-square, multiplication). Consider data patterns before deciding.
Step 2. Decide on Table Size
A table that is too small leads to excessive collisions. Depending on the function, a prime number or power of two is common. Ensure you keep an eye on the load factor — if too many items fill up your table, your performance may decline.
Step 3. Select a Collision Resolution Method
Step 4. Implement Basic Operations
Step 5. Manage the Load Factor and Rehashing
Below is a short pseudo-code showing how you might implement insertion and search using separate chaining.
Initialize array hashTable of size N
For i in [0..N-1]:
hashTable[i] = EmptyLinkedList()
function hashFunction(key):
return key % N
function insert(key, value):
index = hashFunction(key)
// Insert at the head or tail of the linked list
hashTable[index].addNode(key, value)
function search(key):
index = hashFunction(key)
// Traverse linked list at hashTable[index]
for node in hashTable[index]:
if node.key == key:
return node.value
return null // Not found
How Does It Work?
What About Collision Handling?
Here, collisions occur when more than one key lands on the same index. The linked list approach stores them all in that slot. This keeps the insert logic simple, although searches can take longer if the list grows too big. Balancing the table size and keeping an eye on the load factor help maintain a fast average case.
A solid hash function lies at the center of efficient data handling. It controls how keys turn into numeric codes, which affects how well a table avoids collisions and maintains quick lookups.
If the function repeatedly sends different keys to the same index, you lose most of hashing’s benefits. By focusing on certain qualities — such as balanced distribution and low collision rates — you ensure that each stored item remains easy to access.
Below are the main features that define an effective hash function:
A carefully chosen hash function keeps operations near constant time. While no single formula works perfectly for every scenario, testing different approaches and measuring collision rates helps you pick the right one for your data.
Hashing techniques excel at direct lookups but do not handle every scenario equally well. If you rely on sequential order or frequently run range queries, you might find that a self-balancing tree or another data structure delivers smoother performance.
Hash-based tables also lack a natural way to support prefix searches for text. In those cases, specialized structures like tries can handle data more effectively.
Below is a quick overview of situations where hashing may not be your best bet, along with alternative solutions:
Scenario |
Why Hashing Falls Short? |
Suggested Alternative |
You need sorted data for easy traversal | Hash tables store elements in arbitrary positions, losing any natural ordering. | Self-balancing BST (e.g., AVL, Red-Black), or a B-tree |
You run frequent range queries (like 5 < x < 20) | Hash lookups require exact key matches; they are not built to handle range-based searches. | BST structures, segment trees, or skip lists |
You need prefix searches on strings | Hashing locates exact matches by a key, but does not easily support partial matches. | Trie (prefix tree) |
You require floor and ceiling operations | Hashing cannot quickly find the immediate smaller or bigger element if exact keys are missing. | Balanced BST (e.g., AVL, Red-Black) |
You must preserve insertion order | A standard hash table may reorder items internally, losing the sequence in which they were added. | Linked list, or specialized data structures that track order |
Hashing techniques are known for delivering quick access to data, whether it’s a single lookup or a frequent insert. They let you manage large collections without scanning the entire dataset, making them a reliable choice for performance-focused applications.
When your keys vary in structure or arrive in huge volumes, a well-chosen hash function and table size can keep operations from slowing down.
Below are some specific advantages you gain by using hashing:
Hashing in data structure delivers swift lookups, but it isn’t a universal fix for data management. Certain operations — like finding the smallest element — may be slow if your structure relies solely on hash codes.
Some functions can produce frequent collisions, reducing the benefits of constant-time access. Once you go beyond direct lookups, you start to see the trade-offs that come with a hash-based system.
Below are some key drawbacks to consider:
Hashing underpins everything from password validation in online services to database indexing at large scale. By converting keys into fixed-size outputs, hashing allows systems to locate, verify, or compare data in constant average time, even when volumes soar.
Below is a quick overview of how various industries and technologies rely on hashing:
Application |
How Hashing Helps |
Database Indexing | Hash-based indices let you jump directly to rows without scanning entire tables. |
Password Storage | Storing hashed (and salted) passwords prevents anyone who sees the database from reading plain-text details. |
Message Integrity | Hash codes (like checksums) detect tampering or corruption, since any slight modification changes the result. |
Caching | Hashing speeds up lookups by computing a code for the requested item and returning its cached data if present. |
Blockchain | Each block references the previous block’s hash, so altering any block breaks the chain’s consistency. |
Symbol Tables | A hash table can store variable and function names, mapping them to memory addresses or attribute details. |
File Integrity | Hashes (like MD5, SHA-256) confirm that the file you receive matches the sender’s original version. |
Load Balancing | Consistent hashing assigns each incoming call to a server based on a hash value, which avoids excessive overlaps. |
Both hashing and encryption transform data, yet they serve distinct purposes. Hashing aims to produce a fixed code that helps confirm identity or integrity without revealing the original input. Encryption, on the other hand, encodes data in a way that can be reversed if the recipient knows the key.
Below is a quick contrast of these two approaches:
Aspect |
Hashing |
Encryption |
Reversibility | Irreversible in practice; cannot retrieve original data. | Reversible with the proper key |
Primary Purpose | Data integrity, authentication, or quick lookups. | Protecting data confidentiality and privacy. |
Output Length | Typically a fixed-size string or numeric code. | Often as large as or larger than the original data. |
Keys | Not required for basic hashing; salted hashes use extra data. | Requires a key (public/private or shared) to lock and unlock data. |
Collision Resistance | Aims to minimize two distinct inputs producing the same output. | Focuses on preventing unauthorized decryption rather than collisions. |
Use Cases | Password storage, checksums, verifying file integrity. | Encrypting messages, securing files in transit, ensuring secrecy. |
A hash table works great at first, but it can slow down if too many items pile into the same space. The load factor tracks how many elements fill the table relative to its capacity. As that ratio grows, collisions become more likely, which makes it harder to keep lookups at constant time.
To fix this problem, you can expand or rebuild the table — an action known as rehashing.
Below is a closer look at how these two ideas keep hashing efficient.
Load Factor
The load factor is the fraction of occupied slots over the total table size. If you have 75 items in a table of 100 slots, the load factor is 0.75. A higher number means more collisions and longer searches because each index handles more entries.
Rehashing
Rehashing involves creating a new table, often bigger than the old one, and computing fresh indices for every key. You repeat the hash function (or pick a new one) so items spread out again in a larger space.
Load factor and rehashing go hand in hand. One checks the table’s capacity, and the other fixes collisions by spreading data across a roomier table. By managing both, you keep operations at constant average time, even as your dataset scales.
upGrad’s data science and machine learning courses equip you with the skills to master regression analysis through various courses. Here are some of the most popular courses with positive learner outcomes which you can try:
Ready to advance your career in data? Get personalized counseling from upGrad’s experts to help you choose the right program for your goals. You can also visit your nearest upGrad Career Center to kickstart your future!
Related Blogs You Might Like:
Boost your career with our popular Software Engineering courses, offering hands-on training and expert guidance to turn you into a skilled software developer.
Master in-demand Software Development skills like coding, system design, DevOps, and agile methodologies to excel in today’s competitive tech industry.
Stay informed with our widely-read Software Development articles, covering everything from coding techniques to the latest advancements in software engineering.
Get Free Consultation
By submitting, I accept the T&C and
Privacy Policy
India’s #1 Tech University
Executive PG Certification in AI-Powered Full Stack Development
77%
seats filled
Top Resources