Indexing

INDEXING Davood Pour Yousefian Barfeh

INDEXING What is index? Why is it needed? When should it be used? Types of indexes

What is index? a data structure a way of sorting holds the field value, and pointer to the record it relates to

Why is Index needed? (((((Advantage))))) speed up retrieval of data without index: Linear Search N= number of records - key (unique value) – N/2 - non-key – N using index: Binary Search log 2 N

Indexing (((((Disadvantage))))) Additional space on the disk Slow down

Field name Data type Size on disk id (Primary key) Unsigned INT 4 bytes firstName Char(50) 50 bytes lastName Char(50) 50 bytes emailAddress Char(100) 100 bytes *char was used in place of varchar to allow for an accurate size on disk value *database contains five million rows, and is unindexed r = 5,000,000 records & record length R = 204 bytes & block size B = 1,024 bytes bfr = (B/R) = 1024/204 = 5 records per disk block total number of blocks required N = (r/bfr) = 5,000,000 / 5 = 1,000,000 blocks linear search for a key field: N / 2 = 500,000 blocks -- can be log 2 N = 19.93  20 blocks Linear search for a non-key field: N = 1,000,000 blocks Ex. Without Indexing

Field name Data type Size on disk firstName Char(50) 50 bytes (record pointer) Special 4 bytes *Pointers in MySQL are 2, 3, 4 or 5 bytes in length depending on the size of the table r = 5,000,000 records & index record length R = 54 bytes & block size B = 1,024 bytes bfr = (B/R) = 1024 / 54 = 18 records per disk block The total number of blocks required to hold the index is: N = (r/bfr) = 5000000 / 18 -> 277,778 blocks Binary Search: log 2 N = log 2 277,778 = 18.08 -> 19 blocks Ex.Using Indexing

When should indexing be used? can General Rule: Anything that limits the number of results you are trying to find. speed up finding data cardinality table that references other table

When should indexing be used? speed up finding data but slow down inserting , deleting or updating data - not only table must be updated but the index as well bank account number is better than one on balance

Cardinality: The number of distinct values for a column Binary Search Linear Search When should indexing be used?

When should indexing be used? Cardinality Ex. good Selectivity: A table having 100'000 records and one of its indexed column has 88’000 distinct values, then the selectivity of this index is 88'000 / 100’000 = 88% Ex. bad Selectivity: A table of 100'000 records had only 200 distinct values, then the index's selectivity is 200 / 100'000 = 0.2% Number of records in each group= 100’000 / 200 = 5’000 full table scan is more efficient as using such an index where much more I/O is needed to scan repeatedly the index and the table Index Selectivity = Number of distinct values Number of records

When should indexing be used? table that references other table - join Ex. SELECT newstitle, firstname, lastname FROM newsitem n, authors a WHERE n.authorid=a.authorid; CREATE INDEX newsitem_authorid ON newsitem(authorid); General Rule: Any fields involved in a table join must be indexed CREATE TABLE newsitem ( newsid INT PRIMARY KEY, newstitle VARCHAR(255), newscontent TEXT, authorid INT, newsdate TIMESTAMP ); CREATE TABLE authors ( authorid INT PRIMARY KEY, username VARCHAR(255), firstname VARCHAR(255), lastname VARCHAR(255) );

When should indexing be used? SELECT n.newstitle, c.categoryname FROM categories c, newsitem_categories nc, newsitem n WHERE c.categoryid=nc.categoryid AND nc.newsid=n.newsid; These fields must be indexed: newsitem  newsid newsitem_categories  newsid newsitem_categories  categoryid categories  categoryid CREATE INDEX newscat_news ON newsitem_categories(newsid); CREATE INDEX newscat_cats ON newsitem_categories(categoryid); Ex. CREATE TABLE newsitem ( newsid INT PRIMARY KEY, newstitle VARCHAR(255), newscontent TEXT, authorid INT, newsdate TIMESTAMP ); CREATE TABLE newsitem_categories ( newsid INT, categoryid INT ); CREATE TABLE categories ( categoryid INT PRIMARY KEY, categoryname VARCHAR(255) );

Combination on Indexing CREATE INDEX newscat_news ON newsitem_categories(newsid); CREATE INDEX newscat_cats ON newsitem_categories(categoryid); CREATE INDEX news_cats ON newsitem_categories(newsid, categoryid); Can we do? YES but LIMITATIONs

Conjunctions in Cobnations on Indexing CREATE TABLE example ( a int, b int, c int ); CREATE INDEX example_index ON example(a,b,c); It will be used when you check against ‘a’. It will be used when you check against ‘a’ and ‘b’. It will be used when you check against ‘a’, ‘b’ and ‘c’. It will not be used if you check against ‘b’ and ‘c’, or if you only check ‘b’ or you only check ‘c’ It will be used when you check against ‘a’ and ‘c’ but only for the ‘a’ column – it will not be used to check the ‘c’ column as well. A query against ‘a’ OR ‘b’ like this: SELECT a,b,c FROM example where a=1 OR b=2; Will only be able to use the index to check the ‘a’ column as well – it will not be able to use it to check the ‘b’ column.

Types of indexes (1) Clustered and Non-clustered Indexes indexes whose order of the rows in the data page correspond to the order of the rows in the index Only one per table – primary key Faster to read than non clustered as data is physically stored in index order Can be used many times per table Quicker for insert, delete, and update operations than a clustered index Order of rows is not important

Types of indexes (2) Unique and Non-unique Indexes help maintain data integrity by ensuring that no two rows of data in a table have identical key values uniqueness is enforced improve query performance by maintaining a sorted order of data values that are used frequently

Types of indexes (3) Bitmap index - stores the bulk of its data as bit array values of a variable repeat very frequently Dense index - An index record appears for every search key value in file. This record contains search key value and a pointer to the actual record Sparse index - Index records are created only for some of the records primary key Reverse index - reverses the key value before entering it in the index sequence numbers, where new key values monotonically increase

Types of indexes (4) Fulltext - search engine examines all of the words in every stored document as it tries to match search words supplied by the user many other types of search: Two words near each other Any word derived from a particular root (for example run, ran, or running) Multiple words with distinct weightings A word or phrase close to the search word or phrase Spatial - allow users to treat data within a data - store as existing within a two dimensional context extended index that allows you to index a spatial column. A spatial column is a table column that contains data of a spatial data type, such as geometry or geography

Syntax of Index (1) Creation: CREATE [UNIQUE|FULLTEXT|SPATIAL] INDEX index_name [ index_type ] ON tbl_name ( index_col_name ,...) [ index_type ] index_col_name : col_name [( length )] [ASC | DESC] index_type : USING {BTREE | HASH}

Access Method BTree: Keys have some locality of reference They can be sorted well Neighborhood-expect that a query for a given key will likely be followed by a query for one of its neighbors Hash: Dataset is extremely large

Syntax of Index(2) Displaying Index Information: SHOW INDEX FROM table_name Deletion: DROP INDEX index_name ON table_name

Summary What is index? - data structure – sorting a number of records Why is it needed? - advantages & disadvantages When should it be used? - finding Types of indexes - clustered & non-clustered – unique & non-unique Syntax - creation, display, deletion

Indexing

In this document

More Related Content

What's hot

Similar to Indexing

Recently uploaded

Indexing