Included columns don't form part of the key for the index, but they do exist on the index. Essentially the values will be duplicated, so there is a storage overhead, but there is a greater chance that your index will cover (i.e. be selected by the query optimizer for) more queries. This duplication also improves performance when querying, since the database engine can return the value without having to look at the table itself.
Only nonclustered indexes can have included columns, because in a clustered index, every column is effectively included.
Index key columns are part of the b-tree of the index. Included columns are not.
Take two indexes:
CREATE INDEX index1 ON table1 (col1, col2, col3)
CREATE INDEX index2 ON table1 (col1) INCLUDE (col2, col3)
index1 is better suited for this kind of query:
SELECT * FROM table1 WHERE col1 = x AND col2 = y AND col3 = z
Whereas index2 is better suited for this kind of query:
SELECT col2, col3 FROM table1 WHERE col1 = x
In the first query, index1 provides a mechanism for quickly identifying the rows of interest. The query will (probably) execute as an index seek, followed by a bookmark lookup to retrieve the full row(s).
In the second query, index2 acts as a covering index. SQL Server doesn't have to hit the base table at all, since the index provides all the data it needs to satisfy the query. index1 could also act as a covering index in this case.
If you want a covering index, but don't want to add all columns to the b-tree because you don't seek on them, or can't because they aren't an allowed datatype (eg, XML), use the INCLUDE clause.
Let's think about the book. Every page in the book has the page number. All information in this book is presented sequentially based on this page number. Speaking in the database terms, a page number is the clustered index. Now think about the glossary at the end of the book. This is in alphabetical order and allows you to quickly find the page number specific glossary term belongs to. This represents a non-clustered index with a glossary term as the key column.
Now assuming that every page also shows "chapter" title at the top. If you want to find in what chapter is the glossary term, you have to look up what page # describes glossary term, next - open the corresponding page, and see the chapter title on the page. This clearly represents key lookup - when you need to find the data from a non-indexed column, you have to find actual data record (clustered index) and look at this column value. Included column helps in terms of performance - think about glossary where each chapter title includes in addition to glossary term. If you need to find out what chapter the glossary term belongs - you don't need to open actual page - you can get it when you look up the glossary term.
So included columns are like those chapter titles. Non clustered Index (glossary) has an additional attribute as part of the non-clustered index. An index is not sorted by included columns - it just additional attributes that help to speed up the lookup (e.g. you don't need to open the actual page because the information is already in the glossary index).
CREATE Proc [dbo].[InsertIntoProfileTable]
As
BEGIN
SET NOCOUNT ON
Declare @currentRow int
Declare @Details varchar(Max)
Declare @dob Date
set @currentRow =1;
set @Details ='Let''s think about the book. Every page in the book has the page number. All information in this book is presented sequentially based on this page number. Speaking in the database terms, a page number is the clustered index. Now think about the glossary at the end of the book. This is in alphabetical order and allows you to quickly find the page number specific glossary term belongs to. This represents non-clustered index with glossary term as the key column. Now assuming that every page also shows "chapter" title at the top. If you want to find in what chapter is the glossary term, you have to look up what page # describes glossary term, next - open the corresponding page, and see the chapter title on the page. This clearly represents key lookup - when you need to find the data from non-indexed column, you have to find actual data record (clustered index) and look at this column value. Included column helps in terms of performance - think about glossary where each chapter title includes in addition to glossary term. If you need to find out what chapter the glossary term belongs - you don''t need to open the actual page - you can get it when you look up the glossary term. So included columns are like those chapter titles. Non clustered Index (glossary) has an additional attribute as part of the non-clustered index. Index is not sorted by included columns - it just additional attributes that help to speed up the lookup (e.g. you don''t need to open the actual page because the information is already in the glossary index).'
while(@currentRow <=200000)
BEGIN
insert into dbo.Profile values( 'FName'+ Cast(@currentRow as varchar), 'MName' + Cast(@currentRow as varchar), 'MName' + Cast(@currentRow as varchar), 'NickName' + Cast(@currentRow as varchar), DATEADD(DAY, ROUND(10000*RAND(),0),'01-01-1980'),NULL, NULL, @currentRow%3, NULL,NULL,NULL,NULL,NULL, 1000+@currentRow,@Details,@Details,@Details,@Details)
set @currentRow +=1;
END
SET NOCOUNT OFF
END
GO
Using the above SP you can insert 200000 records at one time.
You can see that there is a clustered index on column “EnrollMentId”.
Now Create a non-Clustered index on “ UIDNO” Column.
Script
CREATE NONCLUSTERED INDEX [NonClusteredIndex-20140216-223309] ON [dbo].[Profile]
(
[UIDNO] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Now Run the following Query
select UIDNO,FName,DOB, MaritalStatus, Detail1 from dbo.Profile
--Takes about 30-50 seconds and return 200,000 results.
Query 2
select UIDNO,FName,DOB, MaritalStatus, Detail1 from dbo.Profile
where DOB between '01-01-1980' and '01-01-1985'
--Takes about 10-15 seconds and return 36,479 records.
Now drop the above non-clustered index and re-create with the following script
CREATE NONCLUSTERED INDEX [NonClusteredIndex-20140216-231011] ON [dbo].[Profile]
(
[UIDNO] ASC,
[FName] ASC,
[DOB] ASC,
[MaritalStatus] ASC,
[Detail1] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
It will throw the following error
Msg 1919, Level 16, State 1, Line 1
Column 'Detail1' in table 'dbo.Profile' is of a type that is invalid for use as a key column in an index.
Because we can not use varchar(Max) datatype as key column.
Now Create a non-Clustered Index with included columns using following script
CREATE NONCLUSTERED INDEX [NonClusteredIndex-20140216-231811] ON [dbo].[Profile]
(
[UIDNO] ASC
)
INCLUDE ( [FName],
[DOB],
[MaritalStatus],
[Detail1]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Now Run the following Query
select UIDNO,FName,DOB, MaritalStatus, Detail1 from dbo.Profile --Takes about 20-30 seconds and return 200,000 results.
Query 2
select UIDNO,FName,DOB, MaritalStatus, Detail1 from dbo.Profile
where DOB between '01-01-1980' and '01-01-1985'
--Takes about 3-5 seconds and return 36,479 records.
Included columns dont form part of the key for the index, but they do exist on the index. Essentially the values will be duplicated
Below Take two type of indexes with example column
CREATE clustered INDEX NC_index1 ON tableName (column1, column1, column1,column4)
CREATE clustered INDEX NC_index2 ON tableName (column1) INCLUDE (column2, column3,column4)
NC_index1 is better suited for this kind of query:
SELECT * FROM tableName WHERE column1 = x AND column1 = y AND column1 = z and column4=n
Whereas NC_index2 is better suited for this kind of query:
SELECT column1, column2 FROM tableName WHERE column1 = a
because sql server can't allowed to create index on datatype (eg, XML,text, etc)
I would like to add to the other answers more detailed information about index key columns and included columns, the benefits of included columns usage. For this answer, I have taken information from a post "A Close Look at the Index Include Clause" by Markus Winand published 2019-04-30 here:https://use-the-index-luke.com/blog/2019-04/include-columns-in-btree-indexes
A brief summary of how index key columns differ from included columns
To understand the include clause, you must first understand that using
an index affects up to three layers of data structures:
The B-tree
The doubly linked list at the leaf node level of the B-tree
The table
The first two structures together form an index so they could be
combined into a single item, i.e. the “B-tree index”.
In the general case, the database software starts traversing the B-tree to find the
first matching entry at the leaf node level (1). It then follows the
doubly linked list until it has found all matching entries (2) and
finally it fetches each of those matching entries from the table (3).
When loading a few rows, the B-tree makes the greatest contribution to
the overall effort. As soon as you need to fetch just a handful of
rows from the table, this step takes the lead. In either case—few or
many rows—the doubly linked list is usually a minor factor because it
stores rows with similar values next to each other so that a single
read operation can fetch 100 or even more rows. The most generic idea about optimization > is to do less work to achieve the same goal. When it comes to index access, this means that the database software omits accessing a data structure if it doesn’t need any data from it. The index-only scan does exactly that: it omits the table access if
the required data is available in the doubly linked list of the index.
It is a common misconception that indexes only help the where clause. B-tree indexes can also help the order by, group by, select and other clauses. It is just the B-tree part of an index—not the doubly linked list—that cannot be used by other clauses.
The include clause allows us to make a distinction between columns we
would like to have in the entire index (key columns) and columns we
only need in the leaf nodes (include columns). That means it allows us
to remove columns from the non-leaf nodes if we don’t need them there.
How included columns affect multiple aspects of query execution and the benefits of their usage
The order of the leaf node entries does not take the included columns
into account. The index is solely ordered by its key columns. This has
two consequences: included columns cannot be used to prevent sorting
nor are they considered for uniqueness.
The term “covering index” is sometimes used in the context of
index-only scans or include clauses. What matters is whether a given
index can support a given query by means of an index-only scan.
Whether or not that index has an include clause or contains all table
columns is not relevant.
The new definition with the include clause has some advantages:
The tree might have fewer levels (<~40%)
As the tree nodes above the doubly linked
list do not contain the include columns, the database
can store more branches in each block so that the tree might have
fewer levels.
The index is slightly smaller (<~3%)
As the non-leaf nodes of the tree don’t contain
include columns, the overall size of that index is
slightly less. However, the leaf node level of the index needs the
most space anyway so that the potential savings in the remaining nodes
is very little.
It documents its purpose
This is definitely the most underestimated benefit of the include clause: the reason
why the column is in the index is document in the index definition itself. When extending an existing index, it is very important to know exactly why the index is currently defined the way it happens to be defined. The freedoms you have in changing the index without breaking any other
queries is a direct result of this knowledge.
CREATE INDEX idx
ON sales ( subsidiary_id )
INCLUDE ( eur_value )
As the eur_value column is in the include clause, it is not in the
non-leaf nodes and thus neither useful for navigating the tree nor for
ordering. Adding a new column to the end of the key part is relatively
safe.
CREATE INDEX idx
ON sales ( subsidiary_id, ts )
INCLUDE ( eur_value )
Even though there is still a small risk of negative impacts for other
queries, it is usually worth taking that risk.
Filtering on included columns
Until now we have focused on how the include clause can enable index-only scans. Let’s
also look at another case where it is beneficial to have an extra
column in the index.
SELECT * FROM sales
WHERE subsidiary_id = ?
AND notes LIKE '%search term%'
I’ve made the search term a literal value to show the leading and
trailing wildcards—of course you would use a bind parameter in your
code. Now, let’s think about the right index for this query. Obviously, the
subsidiary_id needs to be in the first position. If we take the
previous index from above, it already satisfies this requirement:
CREATE INDEX idx
ON sales ( subsidiary_id, ts )
INCLUDE ( eur_value )
The database software can use that index with the three-step procedure
as described at the beginning: (1) it will use the B-tree to find the
first index entry for the given subsidiary; (2) it will follow the
doubly linked list to find all sales for that subsidiary; (3) it will
fetch all related sales from the table, remove those for which the
like pattern on the notes column doesn’t match and return the
remaining rows.
The problem is the last step of this procedure: the table access loads
rows without knowing if they will make it into the final result. Quite
often, the table access is the biggest contributor to the total effort
of running a query. Loading data that is not even selected is a huge
performance no-no.
The challenge with this particular query is that it uses an in-fix
like pattern. Normal B-tree indexes don’t support searching such
patterns. However, B-tree indexes still support filtering on such
patterns. Note the emphasis: searching vs. filtering.
In other words, if the notes column was present in the doubly linked
list, the database software could apply the like pattern before
fetching that row from the table (not PostgreSQL, see below). This
prevents the table access if the like pattern doesn’t match. If the
table has more columns, there is still a table access to fetch those
columns for the rows that satisfy the where clause—due to the select
*.
CREATE INDEX idx
ON sales ( subsidiary_id, ts )
INCLUDE ( eur_value, notes )
If there are more columns in the table, the index does not enable an
index-only scan. Nonetheless, it can bring the performance close to
that of an index-only scan if the portion of rows that match the like
pattern is very low. In the opposite case—if all rows match the
pattern—the performance is a little bit worse due to the increased
index size. However, the breakeven is easy to reach: for overall
performance improvement, it is often enough that the like filter
removes a small percentage of the rows. Your mileage will vary
depending on the size of the involved columns.
Unique indexes with the include clause
An entirely different aspect of the include clause: unique indexes with an include clause only consider the key columns for the uniqueness. That
allows us to create unique indexes that have additional columns in the
leaf nodes, e.g. for an index-only scan.
CREATE UNIQUE INDEX …
ON … ( id )
INCLUDE ( payload )
This index protects against duplicate values in the id column, yet it supports an index-only scan for the next query.
SELECT payload
FROM …
WHERE id = ?
Note that the include clause is not strictly required for this
behavior: databases that make a proper distinction between unique constraints and unique indexes just need an index with the unique key columns as the leftmost columns—additional
columns are fine.