Database Indexing Explained: A Comprehensive Guide

Database indexing is one of the most powerful yet often misunderstood concepts in database optimization. In this post, we'll dive deep into what indexes are, how they work, and when to use them effectively.

How Database Indexes Work

Think of a database index like the index at the back of a book. Instead of flipping through every page to find a specific topic, you can quickly look up the page number in the index. Database indexes work similarly, creating a separate data structure that points to the location of data in your tables.

Let's consider a simple example using a users table:

1CREATE TABLE users (
2    id INT PRIMARY KEY,
3    email VARCHAR(255),
4    last_name VARCHAR(100),
5    created_at TIMESTAMP
6);
7
8-- Without an index, this query would require a full table scan
9SELECT * FROM users WHERE email = 'john@example.com';

When you create an index, the database engine stores a sorted copy of the selected columns along with a pointer to the full record in the main table. This sorted structure allows for much faster searching, especially in large datasets.

Types of Database Indexes

1. Single-Column Index

The most basic form of indexing, where the index is created on a single column.

1-- Creating a single-column index
2CREATE INDEX idx_users_email ON users(email);
3
4-- Query that will use this index
5SELECT * FROM users WHERE email = 'john@example.com';

2. Composite (Multi-Column) Index

These indexes include multiple columns and are particularly useful for queries that frequently filter or sort by the same combination of columns.

1-- Creating a composite index
2CREATE INDEX idx_users_lastname_email ON users(last_name, email);
3
4-- Queries that will use this index effectively
5SELECT * FROM users WHERE last_name = 'Smith' AND email = 'john@example.com';
6SELECT * FROM users WHERE last_name = 'Smith';  -- Uses the index
7SELECT * FROM users WHERE email = 'john@example.com';  -- Won't use the index efficiently

3. Unique Index

This type ensures that no two rows can have the same value in the indexed column(s).

1-- Creating a unique index
2CREATE UNIQUE INDEX idx_users_email_unique ON users(email);
3
4-- This will now fail if the email already exists
5INSERT INTO users (email, last_name) VALUES ('john@example.com', 'Smith');

4. Clustered vs. Non-Clustered Indexes

Clustered Indexes

1-- In most databases, the primary key automatically creates a clustered index
2CREATE TABLE orders (
3    order_id INT PRIMARY KEY,  -- This creates a clustered index
4    order_date DATE,
5    customer_id INT
6);

Non-Clustered Indexes

1-- Creating a non-clustered index
2CREATE INDEX idx_orders_date ON orders(order_date);
3
4-- Query that will use this index
5SELECT * FROM orders WHERE order_date = '2024-01-01';

Performance Impact Example

Let's look at a real-world scenario:

1-- Creating a large table for demonstration
2CREATE TABLE transactions (
3    id INT PRIMARY KEY,
4    user_id INT,
5    amount DECIMAL(10,2),
6    transaction_date DATE
7);
8
9-- Without index
10EXPLAIN ANALYZE
11SELECT * FROM transactions 
12WHERE transaction_date BETWEEN '2024-01-01' AND '2024-01-31';
13-- This might show a full table scan
14
15-- Creating an index
16CREATE INDEX idx_transactions_date ON transactions(transaction_date);
17
18-- With index
19EXPLAIN ANALYZE
20SELECT * FROM transactions 
21WHERE transaction_date BETWEEN '2024-01-01' AND '2024-01-31';
22-- This should show an index scan/seek

When to Use Indexes - Practical Examples

Good Index Usage:

1-- High-selectivity column frequently used in WHERE clause
2CREATE INDEX idx_orders_status ON orders(status);
3SELECT * FROM orders WHERE status = 'PENDING';
4
5-- Columns used in JOINs
6CREATE INDEX idx_orders_customer_id ON orders(customer_id);
7SELECT o.*, c.name 
8FROM orders o 
9JOIN customers c ON o.customer_id = c.id;

Poor Index Usage:

1-- Low-selectivity column (few unique values)
2CREATE INDEX idx_users_is_active ON users(is_active);  -- Not recommended
3SELECT * FROM users WHERE is_active = true;  -- Poor selectivity
4
5-- Frequently updated column
6CREATE INDEX idx_posts_view_count ON posts(view_count);  -- Be cautious
7UPDATE posts SET view_count = view_count + 1 WHERE id = 123;  -- Frequent updates

Best Practices

Monitor Index Usage

1-- PostgreSQL example to find unused indexes
2SELECT schemaname || '.' || tablename as table_name,
3       indexname as index_name,
4       idx_scan as number_of_scans
5FROM pg_stat_user_indexes
6WHERE idx_scan = 0
7ORDER BY schemaname, tablename;

Balance the Number of Indexes

1-- Check index size in PostgreSQL
2SELECT pg_size_pretty(pg_total_relation_size('users')) as total_size,
3       pg_size_pretty(pg_indexes_size('users')) as index_size;

Conclusion

Database indexing is a powerful tool for optimizing query performance, but it requires careful consideration and ongoing maintenance. By understanding how indexes work and following best practices, you can significantly improve your database's performance while avoiding common pitfalls.

Remember: indexes are not a "set and forget" solution. They require regular monitoring and adjustment as your application's needs evolve. The key is finding the right balance between read performance, write performance, and maintenance overhead for your specific use case.

Use the SQL examples provided above as a starting point, but always remember to:

Test index performance with your actual data
Monitor index usage in production
Regularly maintain and update indexes based on changing query patterns