INDEXING Davood Pour Yousefian Barfeh
INDEXING What is index? Why is it needed? When should it be used? Types of indexes
What is index? a data structure a way of sorting  holds the field value, and pointer to the record it relates to
Why is Index needed? (((((Advantage))))) speed up  retrieval  of data  without index: Linear Search  N= number of records - key (unique value) – N/2 - non-key – N using index: Binary Search log 2 N
Indexing (((((Disadvantage))))) Additional space on the disk Slow down
Field name    Data type   Size on disk id (Primary key)  Unsigned INT    4 bytes firstName   Char(50)    50 bytes lastName   Char(50)    50 bytes emailAddress   Char(100)    100 bytes  *char was used in place of varchar to allow for an accurate size on disk value   *database contains five million rows, and is unindexed   r = 5,000,000 records  &  record length  R = 204 bytes  &  block size B = 1,024 bytes bfr = (B/R) = 1024/204 = 5 records per disk block  total number of blocks required  N = (r/bfr) = 5,000,000 / 5 = 1,000,000 blocks linear search for a key field:  N / 2 = 500,000 blocks -- can be log 2 N = 19.93    20 blocks  Linear search for a non-key field: N = 1,000,000 blocks  Ex. Without Indexing
Field name    Data type   Size on disk firstName    Char(50)    50 bytes (record pointer)      Special    4 bytes  *Pointers in MySQL are 2, 3, 4 or 5 bytes in length depending on the size of the table r  = 5,000,000 records  &  index record length R = 54 bytes  &  block size B = 1,024 bytes bfr = (B/R) = 1024 / 54 = 18 records per disk block  The total number of blocks required to hold the index is: N = (r/bfr) = 5000000 / 18  ->  277,778 blocks Binary Search: log 2 N =   log 2 277,778 = 18.08 -> 19 blocks  Ex.Using Indexing
When  should  indexing be used?   can General Rule: Anything that limits the number of results you are trying to find. speed up  finding  data   cardinality   table that references other table
When should indexing be used? speed up  finding  data   but slow down  inserting  ,  deleting  or  updating  data - not only table must be updated but    the index as well bank   account number   is better than one on   balance
Cardinality:  The number of distinct values for a column     Binary Search     Linear Search When should indexing be used?
When should indexing be used? Cardinality Ex. good Selectivity:   A table having 100'000 records and one of its indexed column has 88’000 distinct values, then the selectivity of this index is 88'000  /  100’000  =  88% Ex.   bad Selectivity:  A table of 100'000 records had only 200 distinct values, then the index's selectivity is 200 / 100'000 = 0.2% Number of records in each group= 100’000 / 200 = 5’000   full table scan is more efficient as using such an index where much more I/O is needed to scan repeatedly the index and the table   Index Selectivity = Number of distinct values Number of records
When should indexing be used? table that references other table - join Ex. SELECT newstitle, firstname, lastname  FROM newsitem n, authors a  WHERE n.authorid=a.authorid; CREATE INDEX newsitem_authorid  ON newsitem(authorid); General Rule: Any fields involved in a table join must be indexed  CREATE TABLE newsitem (   newsid INT PRIMARY KEY,   newstitle VARCHAR(255),   newscontent TEXT,   authorid INT,   newsdate TIMESTAMP ); CREATE TABLE authors (   authorid INT PRIMARY KEY,   username VARCHAR(255),   firstname VARCHAR(255),   lastname VARCHAR(255) );
When should indexing be used? SELECT n.newstitle, c.categoryname  FROM categories c, newsitem_categories nc, newsitem n  WHERE c.categoryid=nc.categoryid AND nc.newsid=n.newsid; These fields must be indexed: newsitem   newsid newsitem_categories   newsid newsitem_categories   categoryid categories   categoryid CREATE INDEX newscat_news  ON newsitem_categories(newsid); CREATE INDEX newscat_cats  ON newsitem_categories(categoryid); Ex. CREATE TABLE newsitem (   newsid INT PRIMARY KEY,   newstitle VARCHAR(255),   newscontent TEXT,   authorid INT,   newsdate TIMESTAMP ); CREATE TABLE newsitem_categories (   newsid INT,   categoryid INT ); CREATE TABLE categories (   categoryid INT PRIMARY KEY,    categoryname VARCHAR(255) );
Combination on Indexing CREATE INDEX newscat_news ON newsitem_categories(newsid); CREATE INDEX newscat_cats ON newsitem_categories(categoryid); CREATE INDEX news_cats ON newsitem_categories(newsid, categoryid); Can we do? YES but LIMITATIONs
Conjunctions in Cobnations on Indexing CREATE TABLE example (   a int,   b int,   c int ); CREATE INDEX example_index  ON example(a,b,c);   It will be used when you check against ‘a’. It will be used when you check against ‘a’ and ‘b’. It will be used when you check against ‘a’, ‘b’ and ‘c’. It will  not  be used if you check against ‘b’ and ‘c’, or if you only check ‘b’ or you only check ‘c’ It will be used when you check against ‘a’ and ‘c’ but only for the ‘a’ column – it will  not  be used  to check the ‘c’ column as well. A query against ‘a’ OR ‘b’ like this: SELECT a,b,c FROM example where a=1 OR b=2; Will only be able to use the index to check the ‘a’ column as well – it will  not  be able to use it  to check the ‘b’ column.
Types of indexes (1) Clustered  and   Non-clustered   Indexes indexes whose order of the rows in the data page correspond to the order of the rows in the index  Only one per table – primary key  Faster to read than non clustered as  data is physically stored in index order  Can be used many times per table Quicker for insert, delete, and update operations than a clustered index   Order of rows is not important
Types of indexes (2) Unique       and   Non-unique   Indexes help maintain data integrity by ensuring  that no two rows of data in a table  have identical key values uniqueness is enforced   improve query performance  by maintaining a sorted order of data values  that are used frequently
Types of indexes (3) Bitmap index -  stores the bulk of its data as  bit array     values of a variable repeat very frequently   Dense index -  An index record appears for  every  search key value in file.  This record contains search key value and a pointer to the  actual record Sparse index -  Index records are created only for  some  of the records   primary key Reverse index -  reverses the key value before entering it in the index   sequence numbers, where new key values monotonically increase
Types of indexes (4) Fulltext -  search engine examines all of the words in every stored document as   it tries to match search words supplied by the user many other types of search: Two words near each other  Any word derived from a particular root (for example run, ran, or running)  Multiple words with distinct weightings  A word or phrase close to the search word or phrase  Spatial -  allow users to treat data within a data - store as existing within a two dimensional context   extended index that allows you to index a spatial column. A spatial column is a table column that contains data of a spatial data type, such as geometry or geography
Syntax of Index (1) Creation: CREATE [UNIQUE|FULLTEXT|SPATIAL] INDEX  index_name [ index_type ] ON  tbl_name  ( index_col_name ,...) [ index_type ] index_col_name : col_name  [( length )] [ASC | DESC] index_type : USING {BTREE | HASH}
Access Method BTree: Keys have some locality of reference They can be sorted well Neighborhood-expect that a query for a given key  will likely be followed by a query for one of its neighbors Hash: Dataset is extremely large
Syntax of Index(2) Displaying Index Information: SHOW INDEX FROM table_name   Deletion: DROP INDEX index_name ON table_name
Summary  What is index? -  data structure – sorting a number of records Why is it needed? -  advantages & disadvantages When should it be used? -  finding Types of indexes -  clustered & non-clustered – unique & non-unique Syntax -  creation, display, deletion

Indexing

  • 1.
    INDEXING Davood PourYousefian Barfeh
  • 2.
    INDEXING What isindex? Why is it needed? When should it be used? Types of indexes
  • 3.
    What is index?a data structure a way of sorting holds the field value, and pointer to the record it relates to
  • 4.
    Why is Indexneeded? (((((Advantage))))) speed up retrieval of data without index: Linear Search N= number of records - key (unique value) – N/2 - non-key – N using index: Binary Search log 2 N
  • 5.
    Indexing (((((Disadvantage))))) Additionalspace on the disk Slow down
  • 6.
    Field name Data type Size on disk id (Primary key) Unsigned INT 4 bytes firstName Char(50) 50 bytes lastName Char(50) 50 bytes emailAddress Char(100) 100 bytes *char was used in place of varchar to allow for an accurate size on disk value *database contains five million rows, and is unindexed r = 5,000,000 records & record length R = 204 bytes & block size B = 1,024 bytes bfr = (B/R) = 1024/204 = 5 records per disk block total number of blocks required N = (r/bfr) = 5,000,000 / 5 = 1,000,000 blocks linear search for a key field: N / 2 = 500,000 blocks -- can be log 2 N = 19.93  20 blocks Linear search for a non-key field: N = 1,000,000 blocks Ex. Without Indexing
  • 7.
    Field name Data type Size on disk firstName Char(50) 50 bytes (record pointer) Special 4 bytes *Pointers in MySQL are 2, 3, 4 or 5 bytes in length depending on the size of the table r = 5,000,000 records & index record length R = 54 bytes & block size B = 1,024 bytes bfr = (B/R) = 1024 / 54 = 18 records per disk block The total number of blocks required to hold the index is: N = (r/bfr) = 5000000 / 18 -> 277,778 blocks Binary Search: log 2 N = log 2 277,778 = 18.08 -> 19 blocks Ex.Using Indexing
  • 8.
    When should indexing be used? can General Rule: Anything that limits the number of results you are trying to find. speed up finding data cardinality table that references other table
  • 9.
    When should indexingbe used? speed up finding data but slow down inserting , deleting or updating data - not only table must be updated but the index as well bank account number is better than one on balance
  • 10.
    Cardinality: Thenumber of distinct values for a column Binary Search Linear Search When should indexing be used?
  • 11.
    When should indexingbe used? Cardinality Ex. good Selectivity: A table having 100'000 records and one of its indexed column has 88’000 distinct values, then the selectivity of this index is 88'000 / 100’000 = 88% Ex. bad Selectivity: A table of 100'000 records had only 200 distinct values, then the index's selectivity is 200 / 100'000 = 0.2% Number of records in each group= 100’000 / 200 = 5’000 full table scan is more efficient as using such an index where much more I/O is needed to scan repeatedly the index and the table Index Selectivity = Number of distinct values Number of records
  • 12.
    When should indexingbe used? table that references other table - join Ex. SELECT newstitle, firstname, lastname FROM newsitem n, authors a WHERE n.authorid=a.authorid; CREATE INDEX newsitem_authorid ON newsitem(authorid); General Rule: Any fields involved in a table join must be indexed CREATE TABLE newsitem (   newsid INT PRIMARY KEY,   newstitle VARCHAR(255),   newscontent TEXT,   authorid INT,   newsdate TIMESTAMP ); CREATE TABLE authors (   authorid INT PRIMARY KEY,   username VARCHAR(255),   firstname VARCHAR(255),   lastname VARCHAR(255) );
  • 13.
    When should indexingbe used? SELECT n.newstitle, c.categoryname FROM categories c, newsitem_categories nc, newsitem n WHERE c.categoryid=nc.categoryid AND nc.newsid=n.newsid; These fields must be indexed: newsitem  newsid newsitem_categories  newsid newsitem_categories  categoryid categories  categoryid CREATE INDEX newscat_news ON newsitem_categories(newsid); CREATE INDEX newscat_cats ON newsitem_categories(categoryid); Ex. CREATE TABLE newsitem (   newsid INT PRIMARY KEY,   newstitle VARCHAR(255),   newscontent TEXT,   authorid INT,   newsdate TIMESTAMP ); CREATE TABLE newsitem_categories (   newsid INT,   categoryid INT ); CREATE TABLE categories (   categoryid INT PRIMARY KEY,   categoryname VARCHAR(255) );
  • 14.
    Combination on IndexingCREATE INDEX newscat_news ON newsitem_categories(newsid); CREATE INDEX newscat_cats ON newsitem_categories(categoryid); CREATE INDEX news_cats ON newsitem_categories(newsid, categoryid); Can we do? YES but LIMITATIONs
  • 15.
    Conjunctions in Cobnationson Indexing CREATE TABLE example (   a int,   b int,   c int ); CREATE INDEX example_index ON example(a,b,c); It will be used when you check against ‘a’. It will be used when you check against ‘a’ and ‘b’. It will be used when you check against ‘a’, ‘b’ and ‘c’. It will not be used if you check against ‘b’ and ‘c’, or if you only check ‘b’ or you only check ‘c’ It will be used when you check against ‘a’ and ‘c’ but only for the ‘a’ column – it will not be used to check the ‘c’ column as well. A query against ‘a’ OR ‘b’ like this: SELECT a,b,c FROM example where a=1 OR b=2; Will only be able to use the index to check the ‘a’ column as well – it will not be able to use it to check the ‘b’ column.
  • 16.
    Types of indexes(1) Clustered and Non-clustered Indexes indexes whose order of the rows in the data page correspond to the order of the rows in the index Only one per table – primary key Faster to read than non clustered as data is physically stored in index order Can be used many times per table Quicker for insert, delete, and update operations than a clustered index Order of rows is not important
  • 17.
    Types of indexes(2) Unique and Non-unique Indexes help maintain data integrity by ensuring that no two rows of data in a table have identical key values uniqueness is enforced improve query performance by maintaining a sorted order of data values that are used frequently
  • 18.
    Types of indexes(3) Bitmap index - stores the bulk of its data as bit array values of a variable repeat very frequently Dense index - An index record appears for every search key value in file. This record contains search key value and a pointer to the actual record Sparse index - Index records are created only for some of the records primary key Reverse index - reverses the key value before entering it in the index sequence numbers, where new key values monotonically increase
  • 19.
    Types of indexes(4) Fulltext - search engine examines all of the words in every stored document as it tries to match search words supplied by the user many other types of search: Two words near each other Any word derived from a particular root (for example run, ran, or running) Multiple words with distinct weightings A word or phrase close to the search word or phrase Spatial - allow users to treat data within a data - store as existing within a two dimensional context extended index that allows you to index a spatial column. A spatial column is a table column that contains data of a spatial data type, such as geometry or geography
  • 20.
    Syntax of Index(1) Creation: CREATE [UNIQUE|FULLTEXT|SPATIAL] INDEX index_name [ index_type ] ON tbl_name ( index_col_name ,...) [ index_type ] index_col_name : col_name [( length )] [ASC | DESC] index_type : USING {BTREE | HASH}
  • 21.
    Access Method BTree:Keys have some locality of reference They can be sorted well Neighborhood-expect that a query for a given key will likely be followed by a query for one of its neighbors Hash: Dataset is extremely large
  • 22.
    Syntax of Index(2)Displaying Index Information: SHOW INDEX FROM table_name Deletion: DROP INDEX index_name ON table_name
  • 23.
    Summary Whatis index? - data structure – sorting a number of records Why is it needed? - advantages & disadvantages When should it be used? - finding Types of indexes - clustered & non-clustered – unique & non-unique Syntax - creation, display, deletion