Microsoft SQL Server Filtered Indexes & Sparse Columns Feb 2011

Microsoft SQL Server
Filtered Indexes and Sparse Columns:
Together,
Together Separately
Speaker: Don Vilen
Chief S i i BuySight
Chi f Scientist, B Si h

February 2011

Mark Ginnebaugh, User Group Leader
www.bayareasql.org

15 Feb 2011

Filtered Indexes and
Sparse Columns:
Together, Separately –

Don Vilen Chief Scientist Buysight
Vilen, Scientist,
DVilen@buysight.com

Agenda
◦ Filtered Indexes
◦ Filtered Statistics
◦ Wide Tables
◦ Sparse Columns
S C l

◦ T th …
Together
◦ … and Separately

◦ Everything is SQL Server 2008 (and later), in
all editions

The Scenario
◦ 100,000 rows in the table
 99 500 rows are hi
99,500 historical, remaining 500 rows are current
i l i i
 Indicated by NULL EndDate column or IsActive bit, etc.
◦ All queries on current data use index
◦ But why index all the historical 99.5% of the table?

◦ 1 000 columns in a table
1,000
◦ BikeColor column is relevant only if ItemType is
‘Bicycle’
 For 0.5% of the rows; remainder are NULL
◦ But why index all the rows regardless of ItemType
value?

Filtered Indexes
◦ Indexes only rows with values that match WHERE clause
 CREATE INDEX xyz ON table(columns, …)
y ( , )
 WHERE EndDate IS NULL
 WHERE IsActive = 1
 WHERE ItemType = ‘Bicycle’
◦ Uses:
 Ranges of values for smaller portion of large table
 Avoid the common 80-90% of data where the index wouldn’t be helpful
 For categories of row data
 Index on Column120 and Column121 only useful when C1 = 37
 Table partitions, where index is needed only on the ‘current’ partition(s)
 Each partition will have the index structure, but only ‘current’ partitions will have any
rows in the index
◦ Benefits
 Better query performance
 Reduction in storage costs
 Reduction in maintenance cost/time

Filtered Index – Allowed Syntax
◦ WHERE <filter_predicate>[from BOL: CREATE INDEX]
 <filter_predicate> ::= <conjunct> [ AND <conjunct> ]
 <conjunct> ::= <disjunct> | <comparison>
 <disjunct> ::= column_name IN (constant ,…)
 <comparison> ::= column_name <comparison_op> constant
 <comparison_op> ::= { IS | IS NOT | = | <> | != | > | >= | !> | < | <= | !< }

◦ No BETWEEN, no LIKE, no subquery, no variables

◦ So must be simple and deterministic

Filtered Indexes – Requirements
◦ Always some comparison involved, so must agree
on how operations work, so requires standard
work
SET options
 ON for ANSI_NULLS, ANSI_PADDING,
ANSI_WARNINGS, ARITHABORT
ANSI WARNINGS ARITHABORT,
CONCAT_NULL_YIELDS_NULL,
QUOTED_IDENTIFIER
 OFF for NUMERIC_ROUNDABORT
◦ Else:
 If not set when index is created, won’t create the index
 If not set when INSERT, UPDATE, DELETE, MERGE
affects the data, gives error and rolls back
 If not set when the index might be used to optimize the
query, it will not be considered

Filtered Indexes – Applicability
◦ Non-clustered indexes only (rather obviously )
◦ F UNIQUE i d
For indexes, only th i d d rows
l the indexed
must have unique index values
 Duplicates in the non-indexed rows are not checked, but
be careful that an update to a qualifying column doesn’t
doesn t
cause a duplicate to occur
 CREATE UNIQUE INDEX ix1 ON xyz (c3)
WHERE c2 = 10
 So now there is a way to create a unique index on
column with multiple NULL values; create index WHERE
ColY IS NOT NULL
◦ Fil
Filtered i d
d indexes d not apply to:
do l
 XML indexes
 Full-text indexes
 Spatial indexes

Filtered Indexes – Getting Them Used 1
◦ QO can only use the index when it knows the index will
match the conditions in the query’s WHERE clause
query s
◦ Assume Column120 and Column121 useful only when
C1 = 37
 So CREATE INDEX i1 on dbo.t1 (Column120, Column121)
dbo t1 (Column120
WHERE C1 = 37
 SELECT Column121
FROM dbo.t1
WHERE Column120 = 13
Cannot use the index even if Column120 and Column121 only
appear for C1 = 37
 As far as the QO knows, there may be other Column120 or Column121
values that are not in the index
◦ Help the QO by adding more limiting predicates to
WHERE clause
 Make it WHERE Column120 = 13 AND C1 = 37

◦ WHERE with a variable rather than a literal
◦ Assume index is on WHERE IsActive > 0
 DECLARE @IsActive int; SET @IsActive = 1;
 SELECT xyz FROM table WHERE IsActive = @IsActive
◦ QO doesn’t know value of variable, so doesn’t
know if index fits
 So shouldn’t use variables as if they were constants
◦ Again, help the QO by adding more limiting
p
predicates to WHERE clause
 Make it WHERE IsActive = @IsActive AND IsActive > 0

But
B t perhaps that d
h th t doesn’t really make sense h
’t ll k here

◦ WHERE with a function or conversion on the filter
predicate
 Obvious: WHERE ABS(C1) = 37
 Cannot use index on WHERE C1 = 37
 Could change it to WHERE C1 = ABS(37) if same meaning .. but not in
this case
hi
 Implicit conversions:
 Assume index is WHERE c3 > 100
 DECLARE @varR real; SET @varR = 1000.5;
@ @
 SELECT * FROM tv2 WHERE c3 = @varR
 Requires conversion of c3 to real before comparison, so can’t use
index
 SELECT * FROM tv2 WHERE c3 = cast(@varR as int)
(@ )
 At least it requires no conversion of c3, but is unknown value at
optimization time, so can’t use index
 So add a limiting predicate … assuming you know it will always be
right
 SELECT * FROM tv2 WHERE c3 = cast(@varR as int) AND c3 > 100

A Mis-Application of Filtered Indexes
Mis-
◦ Create a filtered index on c and b with
WHERE on c

◦ Attempt to use the index as a validation table

◦ In code use the index in a hint and expect to
get no row back for a b where c is a match,
but
b it gets an error instead due to hint
dd h
prevents a plan from being created

Filtered Indexes – And Views
◦ Cannot create a Filtered index on a view, not
even a non-clustered index on an indexed view
 But a filtered index can be chosen by the QO for the
query formed from a view .. or function
f df i f ti

Filtered Indexes – Considerations 1
◦ Storage size differences
 Fewer index rows take less space
 Less IO, more information fits in memory
 4,000 pages vs. 1 page
p g p g
◦ Limits auto-parameterization
 QO will not auto-parameterize if predicate is used in a
filtered index (“in most cases”, per BOL)
( in cases
 Otherwise would inhibit use of filtered index
 So can affect plan reuse
◦ Index maintenance – same rebuild and reorganize
as regular index
 But hopefully much less work to do

Filtered Indexes – Considerations 2
◦ Covering index
 Consider INCLUDEing other columns so more
likely to be selected by QO
◦ DTA can suggest a filtered index
fil di d
 ColX IS NOT NULL – only of this form
 But the missing indexes functionality does not flag
missing-indexes
them as missing
◦ When not to use:
 When non-filtered index already exists, or another
access path is likely better or adequate
 Avoid the extra index maintenance

Filtered Statistics
◦ CREATE STATISTICS stats1 ON table (cols)
WHERE <condition>
◦ Uses:
 Can create filtered statistics on skewed data to assist QO
 Filtered Statistics will likely be more precise because they cover only the
data in the filtered subset (or filtered index)
 Table partitions, where statistics are needed only on ‘current’ partition(s)
◦ Cannot reference a computed column, a UDT column, a spatial
data type column, or a hierarchyID data type column

◦ AutoCreateStats will create statistics on Filtered Index key
columns
◦ AutoCreateStats will not create filtered statistics on other
columns
 You have to create them yourself
◦ AutoUpdateStats will keep them updated once they are created

Metadata for Indexes, Statistics
◦ sys.indexes
 has_filter, filter_definition
◦ sys.stats
 has_filter, filter_definition

◦ SSMS
 Indexes and Statistics Properties have a Filter tab

Questions on Filtered Indexes,
Statistics
 Any q
y questions?

 Now we’ll move on to Wide Tables
we ll Tables,
Sparse Columns

Wide Tables
◦ Up to 30,000 Columns
 Great for Sharepoint-like “a row is an object, some
attributes depend on other attributes”
◦ Some limits:
 Columns per non-wide table: 1,024
 Columns per wide table: 30,000
 Columns per SELECT statement: 4,096
 Columns per INSERT statement: 4,096
 Indexes per table: 1 000
1,000
 Statistics per table: 30,000
 BOL: Maximum Capacity Specifications for SQL Server

Wide Table
◦ A wide table has defined a column set, using sparse
columns
 New row structure for sparse columns
 {column, value}, {column, value} …
 Can create flexible schemas within an application
 Can add or drop columns whenever you want without
having to touch each row
◦ The maximum size of a wide table row is 8,018
8 018
bytes, so most of the data in a row has to be NULL
 Or has to be varchar-type columns so it can overflow to
another page
◦ Limit is still 1,024 for number of non-sparse
columns plus computed columns, even in a wide
table

Wide Tables – Performance Impact
◦ Performance considerations:
 Increased run-time and compile-time memory
requirements
 Wid t bl can h
Wide tables have up t 30,000 columns defined;
to 30 000 l d fi d
this can increase compile time
 There can be up to 1,000 indexes on a wide table,
p , ,
which increases the index maintenance time
 Nonclustered indexes should be filtered indexes to
minimize their impact

 For more information, see BOL: Performance Considerations
for Wide Tables

Sparse Columns
◦ CREATE TABLE … (…, c1 int SPARSE NULL,
…)
◦ New row format for sparse columns

◦ Column:
 Must be NULLable
 Cannot be part of a cluster index
 Cannot b part of a primary key index
C be f k d
 Cannot have a DEFAULT
 Cannot be a computed column

Sparse Columns – Some More Cannots
◦ Some types cannot be sparse:
 geography • ntext • User-defined data types
 geometry • text
 image • timestamp

◦S
Some attributes cannot be on sparse columns
b b l
 No Filestream
 N t Id tit
Not Identity
 Not RowGuidCol

Sparse Columns – Types and Size
◦ Size impact
 An important consideration but not the only one

◦ At what percentage of NULLs does a sparse
column take less space than a non-sparse
column?
Non-Sparse
N S Sparse
S Null Estimate
N ll E i
 BIT 1/8th byte 4 1/8th bytes –> 98%
 BIGINT 8 bytes
y 12 bytes
y –> 52%

 See BOL: Using Sparse Columns for a complete table of types

Column Sets
◦ How do you know which columns ‘exist’ for a row?
◦ You could just SELECT them; those that don t exist are NULL
don’t
◦ Can define a “Column set”
 Optional, only one per table
◦ Include a column:
 MyColSet XML COLUMN_SET FOR ALL_SPARSE_COLUMNS
◦ Selecting from MyColSet returns an XML description of the sparse
columns in that row
 <c25>ABC</c25><c34>599</c34>
◦ Can INSERT / UPDATE sparse columns by
 Referring to them by name as usual, or
 Specifying the XML for the Column_Set column

 See BOL: Using Column Sets for more details

Feature / Technology Support
◦ Sparse columns and column sets are not fully
supported b some SQL Server technologies
d by S h l i

◦ S arse Col mns not s
Sparse Columns supported b :
orted by:
 Merge Replication

◦ Column Sets not supported by:
 Replication, Distributed Query, Change Data
p y g
Capture

 See BOL: Using Column Sets for more details

Meta Data for Sparse Columns
◦ sys.columns – is_sparse, is_column_set
 And in:
 sys.system_columns
 sys.all_columns
sys all columns
 sys.computed_columns
 sys.identity_columns

◦ Do not confuse with sparse files as used for
Database Snapshots
 The is_sparse in sys.database_files, sys.master_files

Together
◦ Sparse Columns together with Filtered Index
◦ On Sparse column, filtered index with
xx IS NOT NULL
avoids indexing all the rows with no value

◦ Makes a lot of sense, and likely the driving
force behind filtered indexes
◦ B not needed on every sparse column
But d d l

Separately
◦ Filtered Index without Sparse Column
 Filtered indexes on skewed data
 Filtered statistics on skewed data

◦ Sparse Column without Filtered Index
 Sparse columns on sparse data, perhaps no index to
go with it

Summary
◦ Filtered Indexes
◦ Filtered Statistics
◦ Wide Tables
◦ Sparse Columns

◦ Together …
◦ … and Separately

◦ Don Vilen
 Chief Scientist, Buysight
 DVilen@buysight com
DVilen@buysight.com

To learn more or inquire about speaking opportunities, please
q p g pp ,p
contact:

Mark Ginnebaugh, User Group Leader mark@designmind.com

Microsoft SQL Server Filtered Indexes & Sparse Columns Feb 2011

More Related Content

What's hot

Similar to Microsoft SQL Server Filtered Indexes & Sparse Columns Feb 2011

More from Mark Ginnebaugh

Recently uploaded

Microsoft SQL Server Filtered Indexes & Sparse Columns Feb 2011