Sunday, Dec 10th

Last update12:59:40 PM GMT

Database Indexing

Write e-mail

Introduction to Indexing 

argaiv1865

We need Indexes to search for information. Just like the indexes in a book, an index entry for a table enables easy and faster location of specific information. A table scan is the most costly database operation and indexes prevent engines from performing this.

Databases employ a form of Binary Search when searching rows in a table. Each comparison of this sort becomes a file seek operation on a disk (or cache). Now, file seeks are expensive operations as even the fastest disks in the market are slower than RAM operations. Therefore, the main concern for applications is to minimize the number of disk reads when querying databases. Indexes reduce i\o operations by reducing the number of data pages to be read to locate a row in a table. When searching for a row, it only needs to now read a small number of pages in the index rather than going through the entire table.

A point worth noting here: Just as indexes improve read performance, they degrade write performances. This is because, for every new row written, an associated index will need to be updated.

Database Index: A B-Tree Index example

Figure 1. An example of a B-Tree Search Index

Types of Indexes 

The most common types of indexes are

  • Unique Indexes 
    There are No duplicate entries in a unique index. Unique indexes are generally (not necessarily) used for the primary key of a table.
  • Non-unique Indexes 
    These May have duplicate values and are used anywhere.

Some database engines also create the following two types

  • Clustered Index 
    The actual rows in the table reside on the leaf pages of the index.
  • Non-clustered Index 
    The leaf pages of the index are pointers to the data pages containing the rows of the table.

Performance wise clustered indexes are better than non-clustered indexes, as the actual table data resides at the lowest level of the index. This means that there is always one less seek involved in the file. However, chose your clustered index carefully as there can only be one clustered index per table.

What to Index 

Indexes enable faster retrieval of data when using SELECT queries, because the database engine is prevented from the hassle of performing a full table scan looking for the required information.

The types of data which are good candidates for indexing:

  • Columns used in joins 
    Joins will always benefit from having an index available on both sides of the join.
  • Columns used in a WHERE query 
    If it is a frequently run query, an index may improve query performance. For less frequently used queries, consider the cost in terms of inserts, updates, and deletes against the gain in select queries.
  • Columns used in a ORDER BY query
    Sort performance can be improved to a great extent by indexing the columns used in the ORDER BY clause of the query. Also, a clustered index improves performance for the sort by storing data in the sorted order)
  • Columns used in a GROUP BY query 
    Grouping operations will be improved by indexing especially if the range of values being grouped is small as compared to number of rows in the table.

What Not to Index 

This list indicates columns that may not benefit from indexing.

  • Tables with a small number of rows 
    If a table has only a handful of rows, the query optimizer will determine that a table scan is more efficient than using an index. In such a case, the index would only slow down inserts, updates, and deletes in the table.
  • Columns with a wide range of values 
    If the data in a column is different for nearly every row (such as a table of addresses) the index may not be useful because again the database engine might determine that table scans would be more economical than traversing such a large index. However, if a clustered index is created using the same sorting order as a query, the data would be stored in sorted order in the table.
  • Tables with heavy transaction loads but limited decision support load 
    If a table has a lot of insert, update, and delete activity but there are only a few SELECT queries to run, indexes will mostly result in a net penalty in the overall performance.
  • Columns not used in queries 
    Columns which are rarely retrieved do not need to be indexed since the index only enhances performance for SELECT queries. Even if the column is part of many queries but is never included in criteria, sorting, etc., an index will not benefit it.

An Endnote: Don’t go overboard with indexing. If you decided to maximize read performance by indexing every column in a table and creating a multiple column index for every possible search or sort, performance of inserts, updates, and deletes will come to a grinding halt.

References
Oracle Downloads

Share this post



Add comment

Please refrain from using slang or abusive language in the comments.
To avoid waiting for your comment to be published after being reviewed, please log in as a registered user.


Security code
Refresh

Web Hosting