Monday, May 2, 2011

Blog Post: Extra Logical Reads Are Not So Logical

When normalizing a database, foreign keys are commonly created in order to maintain data integrity between the parent and child tables. Oftentimes, however, the foreign key columns are overlooked when it comes to proper indexing strategy.

Using the Adventureworks2008R2 sample database, available at http://msftdbprodsamples.codeplex.com/, let’s look at what happens when a join is performed on a table where an index is omitted on the foreign key columns, when the index is not properly optimized, and then at an optimal index.

USE AdventureWorks2008R2  GO  SET STATISTICS IO ON  GO    --The first query is run against the Production.WorkOrderRouting table that does   --not have an index on the LocationID column (which is a Foreign Key column)  SELECT L.*  FROM Production.Location L INNER JOIN Production.WorkOrderRouting WR  ON L.LocationID = WR.LocationID  WHERE WR.LocationID = 30  GO    --Notice the Logical Reads below for the Production.WorkOrderRouting table that does not have  --an index on the LocationID column  --Table 'WorkOrderRouting'. Scan count 1, logical reads 699, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  --Table 'Location'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.    --Next, create an index on the LocationID column to support join on the foreign key  CREATE INDEX IDX_LocationID ON Production.WorkOrderRouting (LocationID)  GO    --Rerun the query and observe the difference.  The Logical Reads for the query decreased from 699 to 6!  SELECT L.*  FROM Production.Location L INNER JOIN Production.WorkOrderRouting WR  ON L.LocationID = WR.LocationID  WHERE WR.LocationID = 30  --Table 'WorkOrderRouting'. Scan count 1, logical reads 6, physical reads 0, read-ahead reads 3, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  --Table 'Location'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.    --Lastly, does the column position in the index matter?   --Drop the existing index and recreate a composite index with LocationID as the second position in the index  DROP INDEX IDX_LocationID ON Production.WorkOrderRouting  GO  CREATE INDEX IDX_LocationID ON Production.WorkOrderRouting (ProductID, LocationID)  GO    --Rerun the query and observe the difference again.  The Logical Reads for the query is now 129.  Although this is   --an improvement over having no index (699 reads), it is much higher than the 6 logical reads that occurred with  --an optimal index.    SELECT L.*  FROM Production.Location L INNER JOIN Production.WorkOrderRouting WR  ON L.LocationID = WR.LocationID  WHERE WR.LocationID = 30  --Table 'WorkOrderRouting'. Scan count 1, logical reads 129, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  --Table 'Location'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.    --Now that we have seen the effect of no index, a non-optimal index, and an optimal index on a single   --column foreign key, what is the effect of non-optimal indexing on a multi-column foreign key?  --Let's create some tables to observe the difference.  USE AdventureWorks2008R2  GO  SET STATISTICS IO OFF  GO    CREATE TABLE dbo.Employee  (LastName varchar(50),  FirstName varchar(50),  Address varchar(100)  CONSTRAINT PK_Employee PRIMARY KEY (LastName, FirstName))  GO    --Populate the Employee Table  DECLARE @i int  SET @i = 1  DECLARE @LastName varchar(25)  SET @LastName = 'Proactive' + CAST (@i AS varchar(7))  DECLARE @FirstName varchar(25)  SET @FirstName = 'Miss T. ' + CAST (@i AS varchar(7))  DECLARE @Address varchar(25)  SET @Address = '123 ' + CAST (@i AS varchar(7)) + ' Street'  WHILE @i < 100000  BEGIN   INSERT INTO dbo.Employee (LastName, FirstName, Address)     VALUES (@LastName, @FirstName, @Address)   SET @i = @i + 1   SET @LastName = 'Proactive' + CAST (@i AS varchar(7))   SET @FirstName = 'Miss T. ' + CAST (@i AS varchar(7))   SET @Address = '123 ' + CAST (@i AS varchar(7)) + ' Street'  END  GO    CREATE TABLE dbo.EmployeeDepartment  (LastName varchar(50),  FirstName varchar(50),  DepartmentName varchar(100),  EffectiveDate datetime)  GO    --Populate the EmployeeDepartment Table  DECLARE @i int  SET @i = 1  DECLARE @LastName varchar(25)  SET @LastName = 'Proactive' + CAST (@i AS varchar(7))  DECLARE @FirstName varchar(25)  SET @FirstName = 'Miss T. ' + CAST (@i AS varchar(7))  DECLARE @DepartmentName varchar(25)  SET @DepartmentName = 'IT -  ' + CAST (@i AS varchar(7))  WHILE @i < 100000  BEGIN   INSERT INTO dbo.EmployeeDepartment (LastName, FirstName, DepartmentName, EffectiveDate)    VALUES (@LastName, @FirstName, @DepartmentName, GETDATE())   SET @i = @i + 1   SET @LastName = 'Proactive' + CAST (@i AS varchar(7))   SET @FirstName = 'Miss T. ' + CAST (@i AS varchar(7))   SET @DepartmentName = 'IT -  ' + CAST (@i AS varchar(7))  END  GO    --Create the Foreign Key constraint  ALTER TABLE dbo.EmployeeDepartment  ADD CONSTRAINT FK_EmployeeName FOREIGN KEY (LastName, FirstName) REFERENCES dbo.Employee (LastName, FirstName)  GO    --Create an index on the Foreign Key constraint  CREATE INDEX IDX_LastName ON dbo.EmployeeDepartment (LastName, FirstName)  GO    --Turn on Statistics IO  SET STATISTICS IO ON  GO    --Run the query with the index created as LastName, FirstName which matches the index key ordinal   --position of the Foreign Key.  Observe the Logical Reads for EmployeeDepartment.  SELECT E.lastname, E.firstname  FROM dbo.Employee E  INNER JOIN dbo.EmployeeDepartment ED  ON E.LastName = ED.LastName AND E.FirstName = ED.FirstName  --Table 'EmployeeDepartment'. Scan count 1, logical reads 586, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  --Table 'Employee'. Scan count 1, logical reads 1195, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.    --Drop the index and recreate it swapping the position to First, LastName  DROP INDEX IDX_LastName ON dbo.EmployeeDepartment  GO  CREATE INDEX IDX_LastName ON dbo.EmployeeDepartment (FirstName, LastName)  GO    --Run the query with the index created as FirstName, LastName.  Observe the difference in Logical Reads for   --EmployeeDepartment, as well as the addition of a "Worktable" object.  SELECT E.lastname, E.firstname  FROM dbo.Employee E  INNER JOIN dbo.EmployeeDepartment ED  ON E.LastName = ED.LastName AND E.FirstName = ED.FirstName  --Table 'Employee'. Scan count 5, logical reads 1337, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  --Table 'EmployeeDepartment'. Scan count 5, logical reads 645, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  --Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.  

Now that we have seen the effects of a missing index and an index that is not optimal for the foreign key, we will look at identifying foreign keys that do not have an index and that have the index order reversed from the order of the foreign key.

--Identify Foreign Keys columns that are not indexed, including multi-column Foreign Keys,  --or Foreign Key columns that are indexed, but the position of the key in the index is not  --the same position of the column in the index.  The query for identifying each missing index  --on the Foreign Key as well as key ordinal position issues is identical and will be rerun   --below a number of times to demonstrate each scenario.  DROP INDEX IDX_LocationID ON Production.WorkOrderRouting   GO  --Index is missing  SELECT FK.name ForeignKeyName, SCHEMA_NAME(FK.schema_id) SchemaName,    OBJECT_NAME(FK.parent_object_id) TableName,    COL.name ForeignKeyColumnName, IDXCOL.object_id, FKC.constraint_column_id ForeignKeyColumnPosition,   IDXCOL.key_ordinal IndexKeyColumnPosition  FROM sys.foreign_keys FK  INNER JOIN sys.foreign_key_columns FKC  ON FK.object_id = FKC.constraint_object_id  INNER JOIN sys.columns COL  ON FKC.parent_column_id = COL.column_id AND FKC.parent_object_id = COL.object_id  LEFT JOIN sys.index_columns IDXCOL  ON FKC.parent_column_id = IDXCOL.column_id AND FKC.parent_object_id = IDXCOL.object_id  WHERE OBJECT_NAME(FK.parent_object_id) = 'WorkOrderRouting'  AND (IDXCOL.object_id IS NULL   OR IDXCOL.key_ordinal <> FKC.constraint_column_id)  ORDER BY ForeignKeyName, SchemaName, TableName, ForeignKeyColumnPosition  GO    --Create an index with Location ID in the second position in the index  CREATE INDEX IDX_LocationID ON Production.WorkOrderRouting (ProductID, LocationID)  GO    --Identify where the key ordinal of the Foreign Key does not match the key ordinal of the index  SELECT FK.name ForeignKeyName, SCHEMA_NAME(FK.schema_id) SchemaName,    OBJECT_NAME(FK.parent_object_id) TableName,    COL.name ForeignKeyColumnName, IDXCOL.object_id, FKC.constraint_column_id ForeignKeyColumnPosition,   IDXCOL.key_ordinal IndexKeyColumnPosition  FROM sys.foreign_keys FK  INNER JOIN sys.foreign_key_columns FKC  ON FK.object_id = FKC.constraint_object_id  INNER JOIN sys.columns COL  ON FKC.parent_column_id = COL.column_id AND FKC.parent_object_id = COL.object_id  LEFT JOIN sys.index_columns IDXCOL  ON FKC.parent_column_id = IDXCOL.column_id AND FKC.parent_object_id = IDXCOL.object_id  WHERE OBJECT_NAME(FK.parent_object_id) = 'WorkOrderRouting'  AND (IDXCOL.object_id IS NULL   OR IDXCOL.key_ordinal <> FKC.constraint_column_id)  ORDER BY ForeignKeyName, SchemaName, TableName, ForeignKeyColumnPosition  GO    --What about composite Foreign Keys?  DROP INDEX IDX_LastName ON dbo.EmployeeDepartment  GO    --Missing index on composite Foreign Key  SELECT FK.name ForeignKeyName, SCHEMA_NAME(FK.schema_id) SchemaName,    OBJECT_NAME(FK.parent_object_id) TableName,    COL.name ForeignKeyColumnName, IDXCOL.object_id, FKC.constraint_column_id ForeignKeyColumnPosition,   IDXCOL.key_ordinal IndexKeyColumnPosition  FROM sys.foreign_keys FK  INNER JOIN sys.foreign_key_columns FKC  ON FK.object_id = FKC.constraint_object_id  INNER JOIN sys.columns COL  ON FKC.parent_column_id = COL.column_id AND FKC.parent_object_id = COL.object_id  LEFT JOIN sys.index_columns IDXCOL  ON FKC.parent_column_id = IDXCOL.column_id AND FKC.parent_object_id = IDXCOL.object_id  WHERE OBJECT_NAME(FK.parent_object_id) = 'EmployeeDepartment'  AND (IDXCOL.object_id IS NULL   OR IDXCOL.key_ordinal <> FKC.constraint_column_id)  ORDER BY ForeignKeyName, SchemaName, TableName, ForeignKeyColumnPosition  GO    --Create an index with the key positions reversed from the Foreign Key positions  CREATE INDEX IDX_LastName ON dbo.EmployeeDepartment (FirstName, LastName)  GO    --Key ordinal positions in the index are not the same as in the Foreign Key  SELECT FK.name ForeignKeyName, SCHEMA_NAME(FK.schema_id) SchemaName,    OBJECT_NAME(FK.parent_object_id) TableName,    COL.name ForeignKeyColumnName, IDXCOL.object_id, FKC.constraint_column_id ForeignKeyColumnPosition,   IDXCOL.key_ordinal IndexKeyColumnPosition  FROM sys.foreign_keys FK  INNER JOIN sys.foreign_key_columns FKC  ON FK.object_id = FKC.constraint_object_id  INNER JOIN sys.columns COL  ON FKC.parent_column_id = COL.column_id AND FKC.parent_object_id = COL.object_id  LEFT JOIN sys.index_columns IDXCOL  ON FKC.parent_column_id = IDXCOL.column_id AND FKC.parent_object_id = IDXCOL.object_id  WHERE OBJECT_NAME(FK.parent_object_id) = 'EmployeeDepartment'  AND (IDXCOL.object_id IS NULL   OR IDXCOL.key_ordinal <> FKC.constraint_column_id)  ORDER BY ForeignKeyName, SchemaName, TableName, ForeignKeyColumnPosition  GO      --Cleanup  DROP TABLE dbo.EmployeeDepartment  DROP TABLE dbo.Employee  DROP INDEX IDX_LocationID ON Production.WorkOrderRouting  GO  

In conclusion, remember that extra logical reads are not really logical at all. Let’s work to prevent them by creating optimal indexes for Foreign Keys that are used in join scenarios.

Katie Holmes Victoria Beckham Missy Peregrym Sarah Gellman Eliza Dushku

No comments:

Post a Comment