CS 542 Database Management SystemsControlling Database Integrity and PerformanceJ Singh January 31, 2011
Today’s TopicsDatabase IntegrityPrimary Key Constraints – Prevent DuplicatesForeign Key Constraints – Prevent Dangling ReferencesAttribute Constraints – Prevent Inconsistent Attribute ValuesTuple Constraints – More vigilant checking of attribute valuesAssertions – Paranoid integrity checkingViewsPerformance TopicsIndexesDiscussion of presentation topic proposals
Primary Key ConstraintsWhat are Primary Keys good for?Uniquely identify the subject of each tupleEnsure that there are no duplicatesCannot be null – that would imply a NULL subject.A table may not have more than one primary keyA Primary Key may consist of one or more columnsMultiple Unique keys are OKFor Table R, <P1, P2, …, Pm> together constitute a primary key if for each tuple in R,<P1, P2, …, Pm> are uniqueP1, P2, …, Pm are non-null<U1, U2, …, Um> together constitute a unique key if for each tuple in R,<U1, U2, …, Um> are uniqueBut U1, U2, …, Umcan be null
Foreign Key Constraints (p1)Main Idea: Prevent Dangling TuplesForeign KeyKey ReferenceForeign KeyMust point to a Key ReferenceCREATE TABLE City (  ::CountryCode char(3)  REFERENCES Country(Code))Key ReferenceMust be unique or primary keyTry: INSERT INTO city(Name, CountryCode) value ('xyzzy', 'XYZ');Try: UPDATE cityset CountryCode='XYZ' where CountryCode='FIN';Key reference must already exist before a referencing tuple can be added
Foreign Key Constraints (p2)Alternative methods of defining a foreign keyCREATE TABLE City (CountryCode char(3) REFERENCES COUNTRY(Code), …)CREATE TABLE City (CountryCode char(3), …,     FOREIGN KEY CountryCode       [CONSTRAINT [ctyREFcntry]] REFERENCES COUNTRY(Code))CREATE TABLE City (CountryCode char(3), …)    Then, later,     ALTER TABLE City ADD [CONSTRAINT [ctyREFcntry]]        FOREIGN KEY CountryCode REFERENCES COUNTRY(Code);Notation: [] signifies optional
Foreign Key Constraints (p3)Foreign KeyKey ReferenceReferential Integrity OptionsRestrict (default)Reject requestCascadeReflect changes backSet NullSet the foreign key to NULLChanges to Key ReferencesTry: DELETE FROM country       WHERE code=‘FIN’;Try: UPDATE country       SET Code='XYZ'        WHERE Code='FIN‘;
Foreign Key Constraints (p4)Chicken and Egg definitionsCREATE TABLE chicken (cID INT PRIMARY KEY, eID INT     REFERENCES egg(eID));CREATE TABLE egg(eID INT PRIMARY KEY,cID INT  REFERENCES chicken(cID));Consistently failsCan’t define a foreign key to a table before it has been definedSolutionDefine the tables w/o constraintsCREATE TABLE chicken(cID INT PRIMARY KEY,eID INT); CREATE TABLE egg(eID INT PRIMARY KEY,cID INT);And then add foreign keysALTER TABLE chicken   ADD CONSTRAINT c_e    FOREIGN KEY (eID)    REFERENCES egg(eID);ALTER TABLE egg   ADD CONSTRAINT e_c    FOREIGN KEY (cID)    REFERENCES chicken(cID);
Foreign Key Constraints (p5)Chicken and Egg insertionINSERT INTO chicken  VALUES(1, 1001);INSERT INTO egg   VALUES(1001, 1);Still consistently failsNeed a way to postpone constraint checkingHow long to postpone?Until transaction commit  SolutionDefine the tables with deferred constraint-checkingALTER TABLE chicken  ADD CONSTRAINT c_e    FOREIGN KEY (eID)     REFERENCES egg(eID)  INITIALLY DEFERRED DEFERRABLE;ALTER TABLE egg   ADD CONSTRAINT e_c    FOREIGN KEY (cID)    REFERENCES chicken(cID)  INITIALLY DEFERRED DEFERRABLE;And thenINSERT INTO chicken VALUES(1, 1001);INSERT INTO egg VALUES(1001, 1);COMMIT;
Attribute-Based ConstraintsNOT NULLThe most commonReasonability ConstraintsValidate incoming data? e.g.,Population Density < 30000Specification:Population INT(11) NOT NULL  CHECK (Population <= 30000 * SurfaceArea),The condition in CHECK(cond) can take any value that a condition in WHERE(cond) can takeIncluding subqueriesThe attribute constraint is checked when assignedCan be violated underneath as long as it is not re-evaluatedFor example, if we update SurfaceArea, the violation won’t be flaggedNot implemented in all databases, e.g., MySQL
Tuple-Based ConstraintsValidate the entire tuple whenever anything in that tuple is updatedMore integrity enforcement than with attribute-based constraints e.g.,Population Density <= 30000Specification:Population INT(11) NOT NULL,CHECK (Population <= 30000 * SurfaceArea),The condition in CHECK(cond) can take any value that a condition in WHERE(cond) can takeIncluding subqueriesThe attribute constraint is checked when tuple is updatedIf we update SurfaceArea, the violation will be flaggedBut the violation ofCHECK (Population > (      SELECT SUM(Population)          FROM City WHERE City.CountryCode = Code))	which specifies a subquery involving another table, will not be flaggedNot implemented in all databases, e.g., MySQL
AssertionsValidate the entire database whenever anything in the database is updatedPart of the database, not any specific tableSpecification: Table-likeCREATE ASSERTION CountryPop CHECK (  NOT EXISTS    (SELECT * FROM Country      WHERE Population <        (SELECT SUM(Population)        FROM City WHERE City.CountryCode = Code)))Difficult to implement efficientlyOften not implementedI don’t know of any implementationsCan be implemented for specific cases using Triggers, see Section 7.5
ViewsAlso called Virtual ViewsDon’t actually exist in the database but behave as if they doCan be subsets of the data or joins – actually, arbitrary queriesSubset example,CREATE VIEW ct AS SELECT c.Name AS nm, c.countrycode AS cntryFROM city c WHERE population > 0Join exampleCREATE VIEW CityLanguage as   SELECT city.name, city.countrycode, lang.language as Language   FROM city, countrylanguage as lang  WHERE city.countrycode = lang.countrycode  AND lang.isOfficial = ‘T‘;
Operations on Views (p1)SELECT   SELECT * FROM CityLanguage WHERE Language='Dutch';Shouldn’t ‘temporarily’ create the table and SELECT from it.Should use the definition of CityLanguage to make a query, i.e.,   SELECT *      FROM       (SELECT …blabla…       FROM city, countrylanguage as lang       WHERE city.countrycode = lang.countrycode       AND lang.isOfficial = 'T')     WHERE Language='Dutch';
Operations on Views (p2)UPDATE, INSERT not always possible, exceptCan sometimes be implemented using INSTEAD OF triggersModifications are permitted when the view is derived from a single table R andThe WHERE clause does not involve R in a SubqueryThe FROM clause can only consist of one occurrence of RThe valued of all attributes not specified in the view definition can be ‘manufactured’ by the databaseExample. For the view ctCREATE VIEW ct AS SELECT c.Name AS nm, c.countrycode AS cntryFROM city c WHERE population > 0     the queryINSERT INTO ct (nm, cntry) values ('FirSPA', 'FIN')      can be automatically rewritten as INSERT INTO CITY (Name, CountryCode) values ('FirSPA', 'FIN')
Top-Down Datalog Recursion RevisitedIDB’s are conceptualized (and implemented) as Viewsfor IDB predicate p(x,y, …)	FOR EACH subgoal of p DO	  IF subgoal is IDB, recursive call;	  IF subgoal is EDB, look up
IndexesMain Idea: Data Structures for Fast SearchMotivation:Preventing the need for linear search through a big tableExample query: SELECT * FROM City WHERE CountryCode = 'FIN';Another:  SELECT * FROM City   WHERE Population > (0.4 * (    SELECT Population FROM Country     WHERE CountryCode = Code));Expected time for first example: O(n). For the second, O(n2)DeclarationCREATE INDEX CityIndex ON City(CountryCode);CREATE INDEX CityPopIndex ON City(Population);CREATE INDEX CountryPopIndex ON Country(Population);
Selection of Indexes (p1)Why not create an index for every attribute?Useful indexes, and not so useful onesPrimary key?Unique key?From previous examples, CityIndex?CityPopIndex?CountryPopIndex?
Selection of Indexes (p2)The Mantra:Don’t define indexes too early: know your workload firstBe as empirical as is practicalThe Greedy approach to index selection:Start with no indexesEvaluate candidate indexes, choose the one potentially most effectiveRepeatQuery execution will take advantage of defined indexes
CS 542 Database Management SystemsReport ProposalsJ Singh January 31, 2011
Report Proposals – General ObservationsSimply Impressive!Corrective ThemesWhen in doubt, prefer depth over breadthTilt the balance toward obtaining and working with real dataFocus on your contributionsSeparate the report from the projectIf your intent in the project is to do a significant piece of development, make the report about the designGo light on implementation; toy application is good to get your feet wet but leave the heavy lifting for the projectFor big papers, don’t try to swallow it whole. Take a piece and focus on that.
Next meetingFebruary 7Index Structures, Chapter 14

CS 542 Database Index Structures

  • 1.
    CS 542 DatabaseManagement SystemsControlling Database Integrity and PerformanceJ Singh January 31, 2011
  • 2.
    Today’s TopicsDatabase IntegrityPrimaryKey Constraints – Prevent DuplicatesForeign Key Constraints – Prevent Dangling ReferencesAttribute Constraints – Prevent Inconsistent Attribute ValuesTuple Constraints – More vigilant checking of attribute valuesAssertions – Paranoid integrity checkingViewsPerformance TopicsIndexesDiscussion of presentation topic proposals
  • 3.
    Primary Key ConstraintsWhatare Primary Keys good for?Uniquely identify the subject of each tupleEnsure that there are no duplicatesCannot be null – that would imply a NULL subject.A table may not have more than one primary keyA Primary Key may consist of one or more columnsMultiple Unique keys are OKFor Table R, <P1, P2, …, Pm> together constitute a primary key if for each tuple in R,<P1, P2, …, Pm> are uniqueP1, P2, …, Pm are non-null<U1, U2, …, Um> together constitute a unique key if for each tuple in R,<U1, U2, …, Um> are uniqueBut U1, U2, …, Umcan be null
  • 4.
    Foreign Key Constraints(p1)Main Idea: Prevent Dangling TuplesForeign KeyKey ReferenceForeign KeyMust point to a Key ReferenceCREATE TABLE City ( ::CountryCode char(3) REFERENCES Country(Code))Key ReferenceMust be unique or primary keyTry: INSERT INTO city(Name, CountryCode) value ('xyzzy', 'XYZ');Try: UPDATE cityset CountryCode='XYZ' where CountryCode='FIN';Key reference must already exist before a referencing tuple can be added
  • 5.
    Foreign Key Constraints(p2)Alternative methods of defining a foreign keyCREATE TABLE City (CountryCode char(3) REFERENCES COUNTRY(Code), …)CREATE TABLE City (CountryCode char(3), …, FOREIGN KEY CountryCode [CONSTRAINT [ctyREFcntry]] REFERENCES COUNTRY(Code))CREATE TABLE City (CountryCode char(3), …) Then, later, ALTER TABLE City ADD [CONSTRAINT [ctyREFcntry]] FOREIGN KEY CountryCode REFERENCES COUNTRY(Code);Notation: [] signifies optional
  • 6.
    Foreign Key Constraints(p3)Foreign KeyKey ReferenceReferential Integrity OptionsRestrict (default)Reject requestCascadeReflect changes backSet NullSet the foreign key to NULLChanges to Key ReferencesTry: DELETE FROM country WHERE code=‘FIN’;Try: UPDATE country SET Code='XYZ' WHERE Code='FIN‘;
  • 7.
    Foreign Key Constraints(p4)Chicken and Egg definitionsCREATE TABLE chicken (cID INT PRIMARY KEY, eID INT REFERENCES egg(eID));CREATE TABLE egg(eID INT PRIMARY KEY,cID INT REFERENCES chicken(cID));Consistently failsCan’t define a foreign key to a table before it has been definedSolutionDefine the tables w/o constraintsCREATE TABLE chicken(cID INT PRIMARY KEY,eID INT); CREATE TABLE egg(eID INT PRIMARY KEY,cID INT);And then add foreign keysALTER TABLE chicken ADD CONSTRAINT c_e FOREIGN KEY (eID) REFERENCES egg(eID);ALTER TABLE egg ADD CONSTRAINT e_c FOREIGN KEY (cID) REFERENCES chicken(cID);
  • 8.
    Foreign Key Constraints(p5)Chicken and Egg insertionINSERT INTO chicken VALUES(1, 1001);INSERT INTO egg VALUES(1001, 1);Still consistently failsNeed a way to postpone constraint checkingHow long to postpone?Until transaction commit SolutionDefine the tables with deferred constraint-checkingALTER TABLE chicken ADD CONSTRAINT c_e FOREIGN KEY (eID) REFERENCES egg(eID) INITIALLY DEFERRED DEFERRABLE;ALTER TABLE egg ADD CONSTRAINT e_c FOREIGN KEY (cID) REFERENCES chicken(cID) INITIALLY DEFERRED DEFERRABLE;And thenINSERT INTO chicken VALUES(1, 1001);INSERT INTO egg VALUES(1001, 1);COMMIT;
  • 9.
    Attribute-Based ConstraintsNOT NULLThemost commonReasonability ConstraintsValidate incoming data? e.g.,Population Density < 30000Specification:Population INT(11) NOT NULL CHECK (Population <= 30000 * SurfaceArea),The condition in CHECK(cond) can take any value that a condition in WHERE(cond) can takeIncluding subqueriesThe attribute constraint is checked when assignedCan be violated underneath as long as it is not re-evaluatedFor example, if we update SurfaceArea, the violation won’t be flaggedNot implemented in all databases, e.g., MySQL
  • 10.
    Tuple-Based ConstraintsValidate theentire tuple whenever anything in that tuple is updatedMore integrity enforcement than with attribute-based constraints e.g.,Population Density <= 30000Specification:Population INT(11) NOT NULL,CHECK (Population <= 30000 * SurfaceArea),The condition in CHECK(cond) can take any value that a condition in WHERE(cond) can takeIncluding subqueriesThe attribute constraint is checked when tuple is updatedIf we update SurfaceArea, the violation will be flaggedBut the violation ofCHECK (Population > ( SELECT SUM(Population) FROM City WHERE City.CountryCode = Code)) which specifies a subquery involving another table, will not be flaggedNot implemented in all databases, e.g., MySQL
  • 11.
    AssertionsValidate the entiredatabase whenever anything in the database is updatedPart of the database, not any specific tableSpecification: Table-likeCREATE ASSERTION CountryPop CHECK ( NOT EXISTS (SELECT * FROM Country WHERE Population < (SELECT SUM(Population) FROM City WHERE City.CountryCode = Code)))Difficult to implement efficientlyOften not implementedI don’t know of any implementationsCan be implemented for specific cases using Triggers, see Section 7.5
  • 12.
    ViewsAlso called VirtualViewsDon’t actually exist in the database but behave as if they doCan be subsets of the data or joins – actually, arbitrary queriesSubset example,CREATE VIEW ct AS SELECT c.Name AS nm, c.countrycode AS cntryFROM city c WHERE population > 0Join exampleCREATE VIEW CityLanguage as SELECT city.name, city.countrycode, lang.language as Language FROM city, countrylanguage as lang WHERE city.countrycode = lang.countrycode AND lang.isOfficial = ‘T‘;
  • 13.
    Operations on Views(p1)SELECT SELECT * FROM CityLanguage WHERE Language='Dutch';Shouldn’t ‘temporarily’ create the table and SELECT from it.Should use the definition of CityLanguage to make a query, i.e., SELECT * FROM (SELECT …blabla… FROM city, countrylanguage as lang WHERE city.countrycode = lang.countrycode AND lang.isOfficial = 'T') WHERE Language='Dutch';
  • 14.
    Operations on Views(p2)UPDATE, INSERT not always possible, exceptCan sometimes be implemented using INSTEAD OF triggersModifications are permitted when the view is derived from a single table R andThe WHERE clause does not involve R in a SubqueryThe FROM clause can only consist of one occurrence of RThe valued of all attributes not specified in the view definition can be ‘manufactured’ by the databaseExample. For the view ctCREATE VIEW ct AS SELECT c.Name AS nm, c.countrycode AS cntryFROM city c WHERE population > 0 the queryINSERT INTO ct (nm, cntry) values ('FirSPA', 'FIN') can be automatically rewritten as INSERT INTO CITY (Name, CountryCode) values ('FirSPA', 'FIN')
  • 15.
    Top-Down Datalog RecursionRevisitedIDB’s are conceptualized (and implemented) as Viewsfor IDB predicate p(x,y, …) FOR EACH subgoal of p DO IF subgoal is IDB, recursive call; IF subgoal is EDB, look up
  • 16.
    IndexesMain Idea: DataStructures for Fast SearchMotivation:Preventing the need for linear search through a big tableExample query: SELECT * FROM City WHERE CountryCode = 'FIN';Another: SELECT * FROM City WHERE Population > (0.4 * ( SELECT Population FROM Country WHERE CountryCode = Code));Expected time for first example: O(n). For the second, O(n2)DeclarationCREATE INDEX CityIndex ON City(CountryCode);CREATE INDEX CityPopIndex ON City(Population);CREATE INDEX CountryPopIndex ON Country(Population);
  • 17.
    Selection of Indexes(p1)Why not create an index for every attribute?Useful indexes, and not so useful onesPrimary key?Unique key?From previous examples, CityIndex?CityPopIndex?CountryPopIndex?
  • 18.
    Selection of Indexes(p2)The Mantra:Don’t define indexes too early: know your workload firstBe as empirical as is practicalThe Greedy approach to index selection:Start with no indexesEvaluate candidate indexes, choose the one potentially most effectiveRepeatQuery execution will take advantage of defined indexes
  • 19.
    CS 542 DatabaseManagement SystemsReport ProposalsJ Singh January 31, 2011
  • 20.
    Report Proposals –General ObservationsSimply Impressive!Corrective ThemesWhen in doubt, prefer depth over breadthTilt the balance toward obtaining and working with real dataFocus on your contributionsSeparate the report from the projectIf your intent in the project is to do a significant piece of development, make the report about the designGo light on implementation; toy application is good to get your feet wet but leave the heavy lifting for the projectFor big papers, don’t try to swallow it whole. Take a piece and focus on that.
  • 21.
    Next meetingFebruary 7IndexStructures, Chapter 14