04 September, 2010

Sql Server Optimization

This article answer questions like:

1. Do restart of sql server service result in loss of optimization?

2. Do indexing helps in gaining performance? To what extent?

3. To what extent sql server optimize query when executed multiple times?

I have put 10 million rows in the test table to clearly see any performance gain. Upto 10,000 rows size of a table, the execution time is mostly flat.

PREPARATION

STEP 1: Create a table.

CREATE TABLE [dbo].[Person](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name] [varchar](50) NOT NULL,
[NICNo] [varchar](50) NOT NULL,
[Addresss] [varchar](100) NOT NULL,
[Dob] [datetime] NOT NULL,
[Employed] [bit] NULL,
CONSTRAINT [PK_Person_Duplicate] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO



STEP 2: Put 10 million non-unique rows in the table.

DECLARE @i INT
SET @i = 1

WHILE @i <= 1000000
BEGIN
INSERT INTO person(Name, NICNo, [Address] , Dob)
VALUES('Person No. ' + Convert(VARCHAR, @i), Convert(VARCHAR, @i), 'Some address ' + CONVERT(VARCHAR, @i), GETDATE() - 2)

SET @i = @i + 1
END



STEP 3: Make half "Persons" unemployed and other half employed.

UPDATE TABLE Person SET Employed = 0 WHERE ID <= 5000000

UPDATE TABLE Person SET Employed = 1 WHERE ID > 5000000



STEP 4: Write Paging With Sorting Stored Procedure For Person

CREATE Procedure [dbo].[MoreEfficient_Paging_Person]
@SortColumn varchar(50),
@SortDirection varchar(4),
@PageSize int,
@PageNumber int
AS

DECLARE @MaxRows int
SET @MaxRows = @PageSize * @PageNumber

DECLARE @Count int
SET @Count = (SELECT COUNT(*) FROM Person)

IF( (@MaxRows - @Count) > 0)
SET @PageSize = @Count - ( (@PageNumber - 1) * @PageSize )

IF(@SortDirection = 'ASC')
BEGIN
SELECT t.ID, t.Name, t.NicNo, t.Addresss, t.Dob, t.Employed
FROM (
SELECT Top (@PageSize) ID, Name, NicNo, Addresss, Dob, Employed
FROM (
SELECT TOP (@MaxRows) ID, Name, NicNo, Addresss, Dob, Employed
FROM Person
--WHERE --conditions
ORDER BY (SELECT CASE @SortColumn
WHEN 'ID' THEN convert(sql_variant, ID)
WHEN 'Name' THEN Name
WHEN 'NicNo' THEN NicNo
WHEN 'Address' THEN Addresss
WHEN 'Dob' THEN Dob
WHEN 'Employed' THEN Employed
END),
ID
)
AS Foo
ORDER BY (SELECT CASE @SortColumn
WHEN 'ID' THEN convert(sql_variant, ID)
WHEN 'Name' THEN Name
WHEN 'NicNo' THEN NicNo
WHEN 'Address' THEN Addresss
WHEN 'Dob' THEN Dob
WHEN 'Employed' THEN Employed
END) DESC,
ID DESC
)
AS bar
INNER JOIN Person AS t ON bar.ID = t.ID
ORDER BY (SELECT CASE @SortColumn
WHEN 'ID' THEN convert(sql_variant, bar.ID)
WHEN 'Name' THEN bar.Name
WHEN 'NicNo' THEN bar.NicNo
WHEN 'Address' THEN bar.Addresss
WHEN 'Dob' THEN bar.Dob
WHEN 'Employed' THEN bar.Employed
END),
bar.ID
END
ELSE IF(@SortDirection = 'DESC')
BEGIN
SELECT t.ID, t.Name, t.NicNo, t.Addresss, t.Dob, t.Employed
FROM (
SELECT Top (@PageSize) ID, Name, NicNo, Addresss, Dob, Employed
FROM (
SELECT TOP (@MaxRows) ID, Name, NicNo, Addresss, Dob, Employed
FROM Person
--WHERE --conditions
ORDER BY (SELECT CASE @SortColumn
WHEN 'ID' THEN convert(sql_variant, ID)
WHEN 'Name' THEN Name
WHEN 'NicNo' THEN NicNo
WHEN 'Address' THEN Addresss
WHEN 'Dob' THEN Dob
WHEN 'Employed' THEN Employed
END),
ID

)
AS Foo
ORDER BY (SELECT CASE @SortColumn
WHEN 'ID' THEN convert(sql_variant, ID)
WHEN 'Name' THEN Name
WHEN 'NicNo' THEN NicNo
WHEN 'Address' THEN Addresss
WHEN 'Dob' THEN Dob
WHEN 'Employed' THEN Employed
END) DESC,
ID DESC
)
AS bar
INNER JOIN Person AS t ON bar.ID = t.ID
ORDER BY (SELECT CASE @SortColumn
WHEN 'ID' THEN convert(sql_variant, bar.ID)
WHEN 'Name' THEN bar.Name
WHEN 'NicNo' THEN bar.NicNo
WHEN 'Address' THEN bar.Addresss
WHEN 'Dob' THEN bar.Dob
WHEN 'Employed' THEN bar.Employed
END) DESC,
bar.ID
END




TEST SEQUENCE

Query : EXEC MoreEfficient_Paging_Person 'Employed', 'ASC', 10, 100000
Basic : A table Person with 10 million rows. A column Employed of type bit.
Format : (nth execution : Execution time in seconds).

Test No. 1
Specific : Automatic index PK_Person, clustered.
System : Memory usage 1.87 GB out of 2 GB.

1: 117

Test No. 2
Specific : No index at all, not even PK_Person.
System : Memory usage 1.87 GB out of 2 GB.

1: 131 seconds

Test No. 3
Specific : No index at all, not even PK_Person.
Specific : Restarted sql server service. The following 11 runs are in a row, without restarting sql server. After 1st exec, memory usage is 1.87 GB.
System : Memory usage 678 MB out of 2 GB at start.

1: 202
2: 157
3: 135
4: 123
5: 115
6: 114
7: 112
8: 110
9: 123
10: 123
11: 123


Test No. 4
Specific : No index at all, not even PK_Person.
Specific : Restarted sql server service. The following 10 runs are in a row, without restarting sql server. After 1st exec, memory usage is 1.87 GB.
System : Memory usage 678 MB out of 2 GB at start.

1: 231
2: 137
3: 118
4: 112
5: 104
6: 112
7: 119
8: 114
9: 110
10: 117


Test No. 5
Specific : No index at all, not even PK_Person.
Specific : Restarted sql server service. The following 10 runs are in a row, without restarting sql server. After 1st exec, memory usage is 1.87 GB.
System : Memory usage 678 MB out of 2 GB at start.

1: 212
2: 134
3: 102
4: 115
5: 119
6: 119
7: 108
8: 116
9: 107
10: 120

Test No. 6
Specific : Person has only one index, PK_Person.
Specific : Restarted sql server service. The following 10 runs are in a row, without restarting sql server. After 1st exec, memory usage is 1.87 GB.
System : Memory usage 678 MB out of 2 GB.
Setup : I made a new table Person_Duplicate, set its primary key (that automatically set its PK_Person_Duplicate index). Then copy all data from all

columns (except ID column) from Person to Person_Duplicate. Then deleted Person. Then renamed Person_Duplicate to Person.

1: 123
2: 87
3: 84
4: 81
5: 82
6: 103
7: 85
8: 89
9: 82
10: 84


Test No. 7
Specific : Person has two indexex, a clustered index on ID column (identity column) and a non-clustered index on Employed column.
Specific : Restarted sql server service. The following 10 runs are in a row, without restarting sql server. After 1st exec, memory usage is 1.87 GB.

1: 131
2: 82
3: 79
4: 80
5: 80
6: 79
7: 81
8: 82
9: 80
10: 81

ANALYSIS

Finding No. 1 : Comparing Test No. 1 with Test No. 2

The difference between the two tests is that first test have primary index, second don't. When the table has default index, searching took 117 seconds, when

it don't, searching took 131 seconds. Its a difference of 11%, not a huge one but may be important. Note that both tests run in identical system conditions,

1.87 GB busy RAM. The sql server service consume more than 1 GB RAM.

Conclusion => Removal of primary index don't affect 1st execution time much (increased 11%) in the first run.

Finding No. 2: Comparing Test No.s 3 to 5

3 4 5

1: 202 231 212
2: 157 137 134
3: 135 118 102
4: 123 112 115
5: 115 104 119
6: 114 112 119
7: 112 119 108
8: 110 114 116
9: 123 110 107
10: 123 117 120
11: 123

The only difference between these three tests is that after ending of each test, the sql server service is restarted.

It can be easily noted that as a test continues, from first run to tenth run, the sql server optimized the query reducing execution time. This somewhat

settled down at the 4rth run, a hyper optimization is done from 5th to 8th run, from 9th run onwards the execution time stabilizes to where it was at the

4rth run. The average of first runs is (202 + 231 + 212 = ) 645, the average of 4rth runs is (123 + 112 + 115 = ) 350, the average of 8th runs is (110 + 114

+ 116) = 340, the average of 10th runs is (123 + 117 + 120 = ) 360.

Conclusion => At 4th run, execution time is halved (reduced 46%).

Conclusion => At 8th run, execution time is hyper optimized to half (reduced 48%)

Conclusion => At 10th run, execution time stabilized to half (reduced 45%).

It can also be noted that once the sql server service is restarted, all the gains in optimization achieved at the end of previous test is lost, the execution

time of first run of all tests is approximately the same, so do every nth run. From one test to another, even though gains in optimization are lost due to

restart of sql server service, still some gains in optimization are lost, so there is a general trend downwards of nth query as test numbers proceed (look

horizontally at chart, there is a slop downwards from left to right).

Conclusion => Restarting sql server service results in lost of all previous optimizations.

Conclusion => If sql server service restart at end of each test, then nth run of each test executes in same time, with a slight downward trend forward.

Comparing these tests with test no. 1 we can find that although the first run in these tests are made when system was at a lower usage level of RAM, still

execution times are considerably high (202, 231, 212) as compare to test no. 1 (117). Since the only difference between the two series of tests (1st series =

test no.1, 2nd series = test no. 3, 4, 5) is absence of any index in the latter series, therefore the higher execution time of latter series must be due to

absence of that index. Taking averages of each series (117,215), the difference is of 47% or almost half. So, the conclusion here is:

Conclusion => Just by putting index on primary key, execution time of first run can be halved.

Finding No. 3: Comparing Test No. 1, 6, 7.

Test No.1 run with PK_Person (primary index) but under heavy system load and with a lots of queries ran before. Test No. 6 and 7 each ran with restart of sql

server service, test no. 6 has only one index PK_Person (similar to test no. 1) and test no. 7 has two indexes. These two indexes are PK_Person and

IDX_Person_Employed. Note that all the queries in this entire article order the page by Employed column. Also note that half rows in Person table have

Employed column set to 0 and other half have it set to 1.

1 6 7

1: 117 123 131
2: 87 82
3: 84 79
4: 81 80
5: 82 80
6: 103 79
7: 85 81
8: 89 82
9: 82 80
10: 84 81

The first run of test 6 takes almost same execution time (a rise of 5%) as first run of test 1. Nothing new there.

Conclusion => Restart of sql server service, don't result in any difference in execution time of first runs.

Comparing averages of 6 and 7, 90 vs 85.5, there is a reduction of 5%, nothing important.

Conclusion => Putting a non-clustered index on a column that can have only two values, and have half rows set to value 1 and rest half rows set to value 2,

and then ordering by this column, there is almost no difference (a reduction of 5%) in execution time between having that non-clustered index and not.

No comments:

Post a Comment