Indexes are data structures that improve retrieval speed for data in a database. They work by sorting field values and storing pointers to records, allowing for faster searching. Indexes should be used on fields involved in searches, joins, or with high cardinality. There are different types of indexes including clustered, non-clustered, unique, non-unique, bitmap and full text. Indexes are created using SQL commands and their information can be displayed and deleted as needed.
Overview and importance of indexing in data management, types of indexes included.
Definition of index, advantages for data retrieval efficiency. Disadvantages include additional disk space and potential slowdowns.
Comparison of data storage size, records, and index efficiencies with examples: unindexed linear search vs indexed binary search.
Guidelines for effective use of indexing to enhance speed of data retrieval while noting limitations such as slower updates.Importance of indexing on fields involved in table joins, providing examples of SQL queries and indexing methods.
Strategies for creating composite indexes, including limitations on effectiveness, and combining multiple fields to optimize queries.
Different types of indexes: clustered, non-clustered, unique, bitmap, sparse, full-text, and spatial indexes.
Syntax for creating, displaying, and deleting indexes, explaining methods such as BTree and Hash for access.
A conclusive summary outlining the definition, necessity, application timing, types of indexes, and syntax.
INDEXING What isindex? Why is it needed? When should it be used? Types of indexes
3.
What is index?a data structure a way of sorting holds the field value, and pointer to the record it relates to
4.
Why is Indexneeded? (((((Advantage))))) speed up retrieval of data without index: Linear Search N= number of records - key (unique value) – N/2 - non-key – N using index: Binary Search log 2 N
Field name Data type Size on disk id (Primary key) Unsigned INT 4 bytes firstName Char(50) 50 bytes lastName Char(50) 50 bytes emailAddress Char(100) 100 bytes *char was used in place of varchar to allow for an accurate size on disk value *database contains five million rows, and is unindexed r = 5,000,000 records & record length R = 204 bytes & block size B = 1,024 bytes bfr = (B/R) = 1024/204 = 5 records per disk block total number of blocks required N = (r/bfr) = 5,000,000 / 5 = 1,000,000 blocks linear search for a key field: N / 2 = 500,000 blocks -- can be log 2 N = 19.93 20 blocks Linear search for a non-key field: N = 1,000,000 blocks Ex. Without Indexing
7.
Field name Data type Size on disk firstName Char(50) 50 bytes (record pointer) Special 4 bytes *Pointers in MySQL are 2, 3, 4 or 5 bytes in length depending on the size of the table r = 5,000,000 records & index record length R = 54 bytes & block size B = 1,024 bytes bfr = (B/R) = 1024 / 54 = 18 records per disk block The total number of blocks required to hold the index is: N = (r/bfr) = 5000000 / 18 -> 277,778 blocks Binary Search: log 2 N = log 2 277,778 = 18.08 -> 19 blocks Ex.Using Indexing
8.
When should indexing be used? can General Rule: Anything that limits the number of results you are trying to find. speed up finding data cardinality table that references other table
9.
When should indexingbe used? speed up finding data but slow down inserting , deleting or updating data - not only table must be updated but the index as well bank account number is better than one on balance
10.
Cardinality: Thenumber of distinct values for a column Binary Search Linear Search When should indexing be used?
11.
When should indexingbe used? Cardinality Ex. good Selectivity: A table having 100'000 records and one of its indexed column has 88’000 distinct values, then the selectivity of this index is 88'000 / 100’000 = 88% Ex. bad Selectivity: A table of 100'000 records had only 200 distinct values, then the index's selectivity is 200 / 100'000 = 0.2% Number of records in each group= 100’000 / 200 = 5’000 full table scan is more efficient as using such an index where much more I/O is needed to scan repeatedly the index and the table Index Selectivity = Number of distinct values Number of records
12.
When should indexingbe used? table that references other table - join Ex. SELECT newstitle, firstname, lastname FROM newsitem n, authors a WHERE n.authorid=a.authorid; CREATE INDEX newsitem_authorid ON newsitem(authorid); General Rule: Any fields involved in a table join must be indexed CREATE TABLE newsitem ( newsid INT PRIMARY KEY, newstitle VARCHAR(255), newscontent TEXT, authorid INT, newsdate TIMESTAMP ); CREATE TABLE authors ( authorid INT PRIMARY KEY, username VARCHAR(255), firstname VARCHAR(255), lastname VARCHAR(255) );
13.
When should indexingbe used? SELECT n.newstitle, c.categoryname FROM categories c, newsitem_categories nc, newsitem n WHERE c.categoryid=nc.categoryid AND nc.newsid=n.newsid; These fields must be indexed: newsitem newsid newsitem_categories newsid newsitem_categories categoryid categories categoryid CREATE INDEX newscat_news ON newsitem_categories(newsid); CREATE INDEX newscat_cats ON newsitem_categories(categoryid); Ex. CREATE TABLE newsitem ( newsid INT PRIMARY KEY, newstitle VARCHAR(255), newscontent TEXT, authorid INT, newsdate TIMESTAMP ); CREATE TABLE newsitem_categories ( newsid INT, categoryid INT ); CREATE TABLE categories ( categoryid INT PRIMARY KEY, categoryname VARCHAR(255) );
14.
Combination on IndexingCREATE INDEX newscat_news ON newsitem_categories(newsid); CREATE INDEX newscat_cats ON newsitem_categories(categoryid); CREATE INDEX news_cats ON newsitem_categories(newsid, categoryid); Can we do? YES but LIMITATIONs
15.
Conjunctions in Cobnationson Indexing CREATE TABLE example ( a int, b int, c int ); CREATE INDEX example_index ON example(a,b,c); It will be used when you check against ‘a’. It will be used when you check against ‘a’ and ‘b’. It will be used when you check against ‘a’, ‘b’ and ‘c’. It will not be used if you check against ‘b’ and ‘c’, or if you only check ‘b’ or you only check ‘c’ It will be used when you check against ‘a’ and ‘c’ but only for the ‘a’ column – it will not be used to check the ‘c’ column as well. A query against ‘a’ OR ‘b’ like this: SELECT a,b,c FROM example where a=1 OR b=2; Will only be able to use the index to check the ‘a’ column as well – it will not be able to use it to check the ‘b’ column.
16.
Types of indexes(1) Clustered and Non-clustered Indexes indexes whose order of the rows in the data page correspond to the order of the rows in the index Only one per table – primary key Faster to read than non clustered as data is physically stored in index order Can be used many times per table Quicker for insert, delete, and update operations than a clustered index Order of rows is not important
17.
Types of indexes(2) Unique and Non-unique Indexes help maintain data integrity by ensuring that no two rows of data in a table have identical key values uniqueness is enforced improve query performance by maintaining a sorted order of data values that are used frequently
18.
Types of indexes(3) Bitmap index - stores the bulk of its data as bit array values of a variable repeat very frequently Dense index - An index record appears for every search key value in file. This record contains search key value and a pointer to the actual record Sparse index - Index records are created only for some of the records primary key Reverse index - reverses the key value before entering it in the index sequence numbers, where new key values monotonically increase
19.
Types of indexes(4) Fulltext - search engine examines all of the words in every stored document as it tries to match search words supplied by the user many other types of search: Two words near each other Any word derived from a particular root (for example run, ran, or running) Multiple words with distinct weightings A word or phrase close to the search word or phrase Spatial - allow users to treat data within a data - store as existing within a two dimensional context extended index that allows you to index a spatial column. A spatial column is a table column that contains data of a spatial data type, such as geometry or geography
20.
Syntax of Index(1) Creation: CREATE [UNIQUE|FULLTEXT|SPATIAL] INDEX index_name [ index_type ] ON tbl_name ( index_col_name ,...) [ index_type ] index_col_name : col_name [( length )] [ASC | DESC] index_type : USING {BTREE | HASH}
21.
Access Method BTree:Keys have some locality of reference They can be sorted well Neighborhood-expect that a query for a given key will likely be followed by a query for one of its neighbors Hash: Dataset is extremely large
22.
Syntax of Index(2)Displaying Index Information: SHOW INDEX FROM table_name Deletion: DROP INDEX index_name ON table_name
23.
Summary Whatis index? - data structure – sorting a number of records Why is it needed? - advantages & disadvantages When should it be used? - finding Types of indexes - clustered & non-clustered – unique & non-unique Syntax - creation, display, deletion