Terms

   ‡    Population the totality of all possible values (measurements or counts) of a particular
        characteristic for specified group of objects

   ‡    Sample     part of a population selected according to some rule or plan

   ‡    Parameter     a descriptive property of a population

   ‡    Statistic any numerical value describing a characteristic of a sample

   ‡    Sampling the process of choosing a representative portion of a population (reading assignment:
        SAMPLING METHODS)

   ‡    Statistical Method    procedure used in the collection, presentation and analysis of data

STATISTICS

           -     presentation and interpretation of chance outcomes that occur in a planned or scientific
                 investigation

           -     deals with other NUMERICAL DATA representing COUNTS or MEASUREMENTS or
                 CATEGORICAL DATA that can be classified according to some criterion

           -     looks at TRENDS in the data, patterns

Uses of Statistics

   1. Measures probability, predicting odds

   2. For maintenance of quality      use a statistic as basis or benchmark

   3. For verifying claims

   4. Predicting outcomes (interpolation)

   5. Verifying correlations

2 Major Categories of Statistical Methods

   1. DESCRIPTIVE STATISTICS collecting and describing a set of data; no inferences or conclusions
      about a larger set of data

   2.   INFERENTIAL STATISTICS analyzing a subset of data leading to predictions or inferences about the
        entire set of data using a sample to gauge the behaviour of the population

        NOTE: A statistical inference is subject to uncertainty
Introduction to Not tions     

             ÂŁ
 If v      e X is the v iable of inte est, and that n meas ements are taken, then the notation X1, X2, X3,
          ¥¤¡ ¢¡            ¢¡            ¢                    ¢¦                                                    ,
 Xn will be used to re resent n observations.
                            §

 Sigma             , Indicates summation of

 Su   ¨¨     ation Notation

 If variable X is the variable of interest, and that n measurements are taken, the sum of n observations can be written
 as




 THEOREMS:

     1.




2.                                                        3.
MEASURES

   ‡   Measures of Central Tendency
         ± Mean
         ± Median
         ± Mode
   ‡   Measures of Variability and Dis ersion
                                      ©
         ± Range
         ± Average deviation
         ± Variance
         ± Standard deviation

Measures of Central Tendency
MEAN
  ‡ The sum of all values of the observations divided by the total number of observations
  ‡ The sum of all scores divided by the total fre uency
                                                    




   Properties
   ‡ The most stable measure of central tendency
   ‡ Can be affected by extreme values
   ‡ Its value may not be an actual value in the data set
   ‡ If a constant c is added/substracted to all values, the new mean will increase/decrease by the same
      amount c

MEDIAN
  ‡ Positional middle of an array of data
  ‡ Divides ranked values into halves with 50% larger than and 50% smaller than the median value.




   Properties
   ‡ The median is a positional measure
   ‡ Can be determined only if arranged in order
   ‡ Its value may not be an actual value in the data set
   ‡ It is affected by the position of items in the series but not by the value of each item
   ‡ Affected less by extreme values
MODE
  ‡ Value that occurs most fre uently in the data set
                                
  ‡ Locates the point where scores occur with the greatest density
  ‡ Less popular compared to mean and median measures
  Properties
  ‡ It may not exist, or if it does, it may not be unique
  ‡ Not affected by extreme values
  ‡ Applicable for both qualitative and quantitative data

Measures of Variability and Dispersion
RANGE
   ‡ Measure of distance along the number line over where data exists
   ‡ Exclusive and inclusive range
          ± Exclusive range = largest score - smallest score
          ± Inclusive range = upper limit - lower limit
   Properties
   ‡ Rough and general measure of dispersion
   ‡ Largest and smallest extreme values determine the range
   ‡ Does not describe distribution of values within the upper and lower extremes
   ‡ Does not depend on number of data

ABSOLUTE DEVIATION
Average of absolute deviations of scores from the mean (Mean Deviation) or the median (Median Absolute
Deviation)




   Properties
   ‡ Measures variability of values in the data set
   ‡ Indicates how compact the group is on a certain measure

VARIANCE
   ‡ Average of the square of deviations measured from the mean
   ‡ Population variance ( 2) and sample variance (s2)
Properties
   ‡     Addition/subtraction of a constant c to each score will not change the variance of the scores
   ‡     Multiplying each score by a constant c changes the variance, resulting in a new variance multiplied
         by c2

STANDARD DEVIATION
   ‡ Square root of the average of the square of deviations measured from the mean square root of
     the variance
   ‡ Population standard deviation ( ) and sample standard deviation (s)




  Why n-1?
  ‡ Degrees of freedom
         ± Measure of how much precision an estimate of variation has
         ± General rule is that the degrees of freedom decrease as moreparameters have to be
           estimated
  ‡ Xbar estimates
  ‡ Using an estimated mean to find the standard deviation causes the loss of ONE degree of freedom

   Properties
   ‡ Most used measure of variability
   ‡ Affected by every value of every observation
   ‡ Less affected by fluctuations and extreme values
   ‡ Addition/subtraction of a constant c to each score will not change the standard of the scores
   ‡ Multiplying each score by a constant c changes the standard deviation, resulting in a new standard
      deviation multiplied by c


CHOOSING A MEASURE
  ‡ Range
        ± Data are too little or scattered to justify more precise and laborious measures
        ± Need to know only the total spread of scores
  ‡ Absolute Deviation
        ± Find and weigh deviations from the mean/median
        ± Extreme values unduly skews the standarddeviation
  ‡ Standard Deviation
        ± Need a measure with the best stability
        ± Effect of extreme values have been deemed acceptable
        ± Compare and correlate with other data sets

Str statistics lec notes

  • 1.
    Terms ‡ Population the totality of all possible values (measurements or counts) of a particular characteristic for specified group of objects ‡ Sample part of a population selected according to some rule or plan ‡ Parameter a descriptive property of a population ‡ Statistic any numerical value describing a characteristic of a sample ‡ Sampling the process of choosing a representative portion of a population (reading assignment: SAMPLING METHODS) ‡ Statistical Method procedure used in the collection, presentation and analysis of data STATISTICS - presentation and interpretation of chance outcomes that occur in a planned or scientific investigation - deals with other NUMERICAL DATA representing COUNTS or MEASUREMENTS or CATEGORICAL DATA that can be classified according to some criterion - looks at TRENDS in the data, patterns Uses of Statistics 1. Measures probability, predicting odds 2. For maintenance of quality use a statistic as basis or benchmark 3. For verifying claims 4. Predicting outcomes (interpolation) 5. Verifying correlations 2 Major Categories of Statistical Methods 1. DESCRIPTIVE STATISTICS collecting and describing a set of data; no inferences or conclusions about a larger set of data 2. INFERENTIAL STATISTICS analyzing a subset of data leading to predictions or inferences about the entire set of data using a sample to gauge the behaviour of the population NOTE: A statistical inference is subject to uncertainty
  • 2.
    Introduction to Nottions   ÂŁ If v e X is the v iable of inte est, and that n meas ements are taken, then the notation X1, X2, X3, ¥¤¡ ¢¡ ¢¡ ¢ ¢¦ , Xn will be used to re resent n observations. § Sigma , Indicates summation of Su ¨¨ ation Notation If variable X is the variable of interest, and that n measurements are taken, the sum of n observations can be written as THEOREMS: 1. 2. 3.
  • 3.
    MEASURES ‡ Measures of Central Tendency ± Mean ± Median ± Mode ‡ Measures of Variability and Dis ersion © ± Range ± Average deviation ± Variance ± Standard deviation Measures of Central Tendency MEAN ‡ The sum of all values of the observations divided by the total number of observations ‡ The sum of all scores divided by the total fre uency Properties ‡ The most stable measure of central tendency ‡ Can be affected by extreme values ‡ Its value may not be an actual value in the data set ‡ If a constant c is added/substracted to all values, the new mean will increase/decrease by the same amount c MEDIAN ‡ Positional middle of an array of data ‡ Divides ranked values into halves with 50% larger than and 50% smaller than the median value. Properties ‡ The median is a positional measure ‡ Can be determined only if arranged in order ‡ Its value may not be an actual value in the data set ‡ It is affected by the position of items in the series but not by the value of each item ‡ Affected less by extreme values
  • 4.
    MODE ‡Value that occurs most fre uently in the data set ‡ Locates the point where scores occur with the greatest density ‡ Less popular compared to mean and median measures Properties ‡ It may not exist, or if it does, it may not be unique ‡ Not affected by extreme values ‡ Applicable for both qualitative and quantitative data Measures of Variability and Dispersion RANGE ‡ Measure of distance along the number line over where data exists ‡ Exclusive and inclusive range ± Exclusive range = largest score - smallest score ± Inclusive range = upper limit - lower limit Properties ‡ Rough and general measure of dispersion ‡ Largest and smallest extreme values determine the range ‡ Does not describe distribution of values within the upper and lower extremes ‡ Does not depend on number of data ABSOLUTE DEVIATION Average of absolute deviations of scores from the mean (Mean Deviation) or the median (Median Absolute Deviation) Properties ‡ Measures variability of values in the data set ‡ Indicates how compact the group is on a certain measure VARIANCE ‡ Average of the square of deviations measured from the mean ‡ Population variance ( 2) and sample variance (s2)
  • 5.
    Properties ‡ Addition/subtraction of a constant c to each score will not change the variance of the scores ‡ Multiplying each score by a constant c changes the variance, resulting in a new variance multiplied by c2 STANDARD DEVIATION ‡ Square root of the average of the square of deviations measured from the mean square root of the variance ‡ Population standard deviation ( ) and sample standard deviation (s) Why n-1? ‡ Degrees of freedom ± Measure of how much precision an estimate of variation has ± General rule is that the degrees of freedom decrease as moreparameters have to be estimated ‡ Xbar estimates ‡ Using an estimated mean to find the standard deviation causes the loss of ONE degree of freedom Properties ‡ Most used measure of variability ‡ Affected by every value of every observation ‡ Less affected by fluctuations and extreme values ‡ Addition/subtraction of a constant c to each score will not change the standard of the scores ‡ Multiplying each score by a constant c changes the standard deviation, resulting in a new standard deviation multiplied by c CHOOSING A MEASURE ‡ Range ± Data are too little or scattered to justify more precise and laborious measures ± Need to know only the total spread of scores ‡ Absolute Deviation ± Find and weigh deviations from the mean/median ± Extreme values unduly skews the standarddeviation ‡ Standard Deviation ± Need a measure with the best stability ± Effect of extreme values have been deemed acceptable ± Compare and correlate with other data sets