The most important resource to any SQL optimizer is the statistics collected for different tables within the catalog. Statistics is the information about indexes and their distribution with respect to each other. Optimizer uses this information to decide the least expensive path that satisfies a query. Outdated or missing statistics information will cause the optimizer to take a less optimized path hence increasing the overall response time.Optimizers always tend to select the least expensive path u2013 one that returns least number of rows in fastest time. Why do optimizers rely on statistics? Consider the following query that is run against our sample database to answer this question.
Oracle: ANALYZE command or DBMS_UTILITY package
DB2: RUNSTATS command
MS SQL Server: UPDATE STATISTICS
Create optimized indexes
SQL optimizer heavily depends on indexes defined for a particular table. Indexes are double-edged sword: no index will degrade performance of your SELECT statements and too many indexes will slow down your DML (INSERT, UPDATE, and DELETE) queries. Therefore, it is important to have a right balance of index on tables. Besides the number of indexes, fields that are involved and their order is also very important.
Composite index – Indexes containing more than one field are called composite index. Such indexes should be created if you expect to run queries that will have multiple fields in the WHERE clause and all fields combined will give significantly less rows than the first field alone.
Clustered index – A clustered index determines the physical order of data in a table – meaning the actual data is sorted according to the fields in the index. This is similar to a telephone directory, which arranges data by last name. There can be only one clustered index per table. These indexes are particularly efficient on columns that are often searched for range of values.
Avoid functions on RHS of the operator
Often developers use functions or method with their SQL queries. Consider the following example.
select * from Orders where YEAR(AccountCreatedOn) == 2005 and MONTH(AccountCreatedOn) = 6
Note that even though AccountCreatedOn has an index, the above query changes the where clause such a way that this index cannot be used anymore.
Rewriting the query in the following way will increase the performance tremendously.
Select * From Orders Where AccountCreatedOn between ‘6/1/2013’ and ‘6/30/2013’
Predetermine expected growth
As mentioned earlier indexes have a negative impact on DML queries. One way to minimize this negative affect is to specify an appropriate value for fill factor when creating indexes.When an index is created, the data for indexed columns is stored on the disk. When new rows of data are inserted into the table or the values in the indexed columns are changed, the database may have to reorganize the storage of the data to make room for the new rows. This reorganization can take additional toll on DML queries. However, if you expect new rows on a regular basis in any table, you can specify the expected growth for an index. The terminology used for this expected growth is different in every database. The following table lists the terms used by different RDBMS for expected growth.
Oracle: PCTFREE – Percent Free
DB2: PCTFREE – Percent Free
MS SQL Server: FILL FACTOR
Specify optimizer hints in SELECT
Although in most cases the query optimizer will pick the appropriate index for a particular table based on statistics, sometimes it is better to specify the index name in your
Example: SELECT * FROM customer WITH ( Index(IdxPhone)) WHERE city = ‘New York City’ and phone = ‘212-555-1212’
Most databases return the execution plan for any SELECT statement that is created by the optimizer. This plan is very useful in fine tuning SQL queries. The following table lists SQL syntax for different databases.
Oracle: EXPLAIN PLAN FOR >Your query<
DB2: EXPLAIN PLAN SET queryno = xxx for >Your query<
MS SQL Server: Set SHOWPLAN_ALL ON >Your query<
Avoid foreign key constraints
Foreign keys constraints ensure data integrity at the cost of performance. Therefore, if performance is your primary goal you can push the data integrity rules to your application layer. A good example of a database design that avoids foreign key constraints is the System tables in most databases. Every major RDBMS has a set of tables known as system tables. These tables contain meta data information about user databases. Although there are relationships among these tables, there is no foreign key relationship. This is because the client, in this case the database itself, enforces these rules.
Two heads are better than one
Hard disk I/O is among the slowest resource on a computer, which becomes apparent as the size of your database increase. Many databases allow users to split their database onto multiple physical hard drives. In fact, some even go a step further and allow splitting the contents of a table on multiple disks. When you use multiple physical disks, I/O operations speed up significantly since more heads fetch data in parallel.
Select limited data
The less data retrieved, the faster the query will run. Rather than filtering on the client, push as much filtering as possible on the server-end. This will result in less data being sent on the wire and you will see results much faster. Eliminate any obvious or computed columns. Consider the following example.
Example: Select FirstName, LastName, City Where City = ‘New York City’
Drop indexes before loading data
Consider dropping the indexes on a table before loading a large batch of data. This makes the insert statement run faster. Once the inserts are completed, you can recreate the index again.
If you are inserting thousands of rows in an online system, use a temporary table to load data. Ensure that this temporary table does not have any index. Since moving data from one table to another is much faster than loading from an external source, you can now drop indexes on your primary table, move data from temporary to final table, and finally recreate the indexes.