BASIC STATISTICSBASIC STATISTICS
Some important concepts:
Statistics -Analysis and Interpretation of numerical data
Data- Collection and compilation of relevant information
Nature of data
-Raw and Processed
Sources of Data
- Surveys, Clinical trials
- Questionnaires and personal interviews
- Secondary sources
Probability
- ( No. of favorable outcomes) / (Total no. of mutually
exclusive, equally likely and exhaustive events)
Distribution of data:
• Tabulation plan
• Determination of class intervals
• Determination of number of class intervals
• Quartiles -
• Centiles/Percentiles
4321
,,, QQQQ
Prevalence and Incidence Rates:
• Two fundamental statistics in epidemiology
• Expressed per 100
•Expressed per 1000 or 10,000
riskatPopulation
diseasethewithcasesofNumber
evalence =Pr
periodabovetheduringriskatPopulation
periodtimegivenaincasesnewofNumber
Incidence =
Types of Studies/Clinical Trial:
• Cross sectional surveys
• Longitudinal cohort based studies
- Retrospective
- Prospective
• Randomized Controlled Trials or Experiments
- Case-Control studies
- Drug evaluation trials
- Simple (Placebo-controlled)
- Blinded (single or double)
DESCRIPTIVE STATISTICSDESCRIPTIVE STATISTICS
 DescribesDescribes
 SummarisesSummarises
 PresentsPresents
 Interprets dataInterprets data
 Makes meaning out of numbersMakes meaning out of numbers
VARIABLESVARIABLES
 Variables are attributes that varyVariables are attributes that vary
between subjectsbetween subjects
 Height, weight, intelligence,Height, weight, intelligence,
achievementachievement
 Can be grouped asCan be grouped as
 Qualitative variableQualitative variable
 Quantitative variableQuantitative variable
QUANTITATIVEQUANTITATIVE
VARIABLESVARIABLES
 Discreet variableDiscreet variable
 Countable,but only whole numbersCountable,but only whole numbers
 Continuous variableContinuous variable
 Countable as a continuumCountable as a continuum
QUALITATIVE VARIABLESQUALITATIVE VARIABLES
 Do not possess numerical valuesDo not possess numerical values
 Colour of hair,gender,blood groupColour of hair,gender,blood group
 Three typesThree types
 OrdinalOrdinal
 DichotomousDichotomous
 NominalNominal
MEASURES OF
CENTRAL TENDENCY
 The mean
 The
median
 The mode
Measures of central tendency:
• Mean
- Arithmetic mean ( )
- Geometric mean
- Harmonic mean
• Mode
- Most frequently occurring observation
• Median
- Middle value
n
x
x n
∑
=
n obstheallofoduct .Pr
x
THE MEAN
 Arithmetic average of all observations
 Influenced by extreme values
 Non resistant measure
THE MEDIAN
 Middle value of all observations
 Resistant measure
 Not influenced by extreme values
2 data set2 data set
88
1010
1212
Mean = 10Mean = 10
66
1010
1414
Mean = 10Mean = 10
Is the 2 data set same
Measures of Dispersion:
• Range - (minimum, maximum)
• Variance and Standard deviation
Variance =
Standard deviation ( ) =
Standard error =
( )2
1
1
∑ −
− n
i
xx
n
Variancex
σ
n÷σ
STANDARD DEVIATION
 Measure of spread
 Used extensively in normal distribution
 Calculated using mathematical formulae
 Large SD means
 Small SD means
STANDARD DEVIATION
 Advantage of SD
- measuring the variability in single figure
- estimating the probability of observed differences
between two means
 Unit as that of mean
Graphical presentation/distribution of data:
• Bar diagram
• Histogram
• Line diagram
• Pie-chart
• Scatter-plot
Some fundamental distributions:
• Bernoulli distribution
• Binomial distribution
• Poisson distribution
•Negative binomial distribution
• Normal distribution
Testing of Hypothesis:
•Hypothesis
- A statement relating to objective
•Null hypothesis ( )
- Hypothesis of no difference or no effect
• Alternative hypothesis ( )
- Hypothesis of one way or two way difference or effect
0
H
a
H
 Common Statistical TestsCommon Statistical Tests
 Large sample tests (z test)Large sample tests (z test)
 Small sample tests (student t test)Small sample tests (student t test)
 Paired t testPaired t test
 Chi-square testChi-square test
Chi Square test for finding association:
• Non-parametric test
• Easy to understand and execute
• Does not involve any assumptions but the cell frequency should not fall
below5
• If cell frequency falls below 5, apply Yates’ Correction
General formula for Chi-square test
For the previous example- giving p<0.001 with one d.f.
( )
∑
−
=
E
EO
2
2
χ
3.822
=χ
Student’s t-test:
• Small sample test preferably up to 30 observations
• For comparing means
• Easy to understand and apply
• Most popular and frequently used
• Types of t-test
- Paired t-test for comparing pre and post treatment means in
the same set of subjects
- Take differences between obs.
- Compute mean of the differences
- Compute standard error of the differences
- Divide mean by the standard error to get t-statistic
- Compare the calculated value of t with the tabulated value at the
required d.f.
Student’s t-test (contd.):
Two sample t-test
- Applicable with two independent samples
- Not necessarily of the same size
- Compute mean and standard deviations of the two samples
- Compute difference of the two means
- Compute standard error of the above difference
- Divide difference of the means with the standard error to obtain
value of the t-statistic
- Compare calculated value of t with the tabulated value at the
required d.f.
• With large sample size the t-statistic tends to Z-statistic
• Hence for large samples the Z-test (standard normal test) should be used in place of
t-test
Test of difference between two proportions:
• Two sample test
• Clearly defined dichotomy
• Use -test or Z-test
- Proportion of people with the attribute under
investigation should be known in the two samples
• Easy to use and understand
2
χ
Non-parametric tests of significance:
• For t-test normality of parent population is assumed
• In case of non-normality of the parent population, non-parametric tests
should be employed to assess significance of the difference e.g.
- Sign test, Run test, Mann Whitney U-test etc.
• Deal with positional information
• Does not estimate the parameters
• Possible to analyze qualitative data
Determination of sample size:
• Importance of Sample size
- Appropriateness
- Validity of results
- Applicability of statistical tools
• Situation demands
- New treatment better than the standard
- Discard the new treatment if slightly better
- Does not wish to drop if new treatment is substantially superior
Thank You

statistics introduction

  • 1.
  • 2.
    Some important concepts: Statistics-Analysis and Interpretation of numerical data Data- Collection and compilation of relevant information Nature of data -Raw and Processed Sources of Data - Surveys, Clinical trials - Questionnaires and personal interviews - Secondary sources Probability - ( No. of favorable outcomes) / (Total no. of mutually exclusive, equally likely and exhaustive events)
  • 3.
    Distribution of data: •Tabulation plan • Determination of class intervals • Determination of number of class intervals • Quartiles - • Centiles/Percentiles 4321 ,,, QQQQ
  • 4.
    Prevalence and IncidenceRates: • Two fundamental statistics in epidemiology • Expressed per 100 •Expressed per 1000 or 10,000 riskatPopulation diseasethewithcasesofNumber evalence =Pr periodabovetheduringriskatPopulation periodtimegivenaincasesnewofNumber Incidence =
  • 5.
    Types of Studies/ClinicalTrial: • Cross sectional surveys • Longitudinal cohort based studies - Retrospective - Prospective • Randomized Controlled Trials or Experiments - Case-Control studies - Drug evaluation trials - Simple (Placebo-controlled) - Blinded (single or double)
  • 6.
    DESCRIPTIVE STATISTICSDESCRIPTIVE STATISTICS DescribesDescribes  SummarisesSummarises  PresentsPresents  Interprets dataInterprets data  Makes meaning out of numbersMakes meaning out of numbers
  • 7.
    VARIABLESVARIABLES  Variables areattributes that varyVariables are attributes that vary between subjectsbetween subjects  Height, weight, intelligence,Height, weight, intelligence, achievementachievement  Can be grouped asCan be grouped as  Qualitative variableQualitative variable  Quantitative variableQuantitative variable
  • 8.
    QUANTITATIVEQUANTITATIVE VARIABLESVARIABLES  Discreet variableDiscreetvariable  Countable,but only whole numbersCountable,but only whole numbers  Continuous variableContinuous variable  Countable as a continuumCountable as a continuum
  • 9.
    QUALITATIVE VARIABLESQUALITATIVE VARIABLES Do not possess numerical valuesDo not possess numerical values  Colour of hair,gender,blood groupColour of hair,gender,blood group  Three typesThree types  OrdinalOrdinal  DichotomousDichotomous  NominalNominal
  • 10.
    MEASURES OF CENTRAL TENDENCY The mean  The median  The mode
  • 11.
    Measures of centraltendency: • Mean - Arithmetic mean ( ) - Geometric mean - Harmonic mean • Mode - Most frequently occurring observation • Median - Middle value n x x n ∑ = n obstheallofoduct .Pr x
  • 12.
    THE MEAN  Arithmeticaverage of all observations  Influenced by extreme values  Non resistant measure
  • 13.
    THE MEDIAN  Middlevalue of all observations  Resistant measure  Not influenced by extreme values
  • 14.
    2 data set2data set 88 1010 1212 Mean = 10Mean = 10 66 1010 1414 Mean = 10Mean = 10 Is the 2 data set same
  • 15.
    Measures of Dispersion: •Range - (minimum, maximum) • Variance and Standard deviation Variance = Standard deviation ( ) = Standard error = ( )2 1 1 ∑ − − n i xx n Variancex σ n÷σ
  • 16.
    STANDARD DEVIATION  Measureof spread  Used extensively in normal distribution  Calculated using mathematical formulae  Large SD means  Small SD means
  • 17.
    STANDARD DEVIATION  Advantageof SD - measuring the variability in single figure - estimating the probability of observed differences between two means  Unit as that of mean
  • 18.
    Graphical presentation/distribution ofdata: • Bar diagram • Histogram • Line diagram • Pie-chart • Scatter-plot
  • 19.
    Some fundamental distributions: •Bernoulli distribution • Binomial distribution • Poisson distribution •Negative binomial distribution • Normal distribution
  • 20.
    Testing of Hypothesis: •Hypothesis -A statement relating to objective •Null hypothesis ( ) - Hypothesis of no difference or no effect • Alternative hypothesis ( ) - Hypothesis of one way or two way difference or effect 0 H a H
  • 21.
     Common StatisticalTestsCommon Statistical Tests  Large sample tests (z test)Large sample tests (z test)  Small sample tests (student t test)Small sample tests (student t test)  Paired t testPaired t test  Chi-square testChi-square test
  • 22.
    Chi Square testfor finding association: • Non-parametric test • Easy to understand and execute • Does not involve any assumptions but the cell frequency should not fall below5 • If cell frequency falls below 5, apply Yates’ Correction General formula for Chi-square test For the previous example- giving p<0.001 with one d.f. ( ) ∑ − = E EO 2 2 χ 3.822 =χ
  • 23.
    Student’s t-test: • Smallsample test preferably up to 30 observations • For comparing means • Easy to understand and apply • Most popular and frequently used • Types of t-test - Paired t-test for comparing pre and post treatment means in the same set of subjects - Take differences between obs. - Compute mean of the differences - Compute standard error of the differences - Divide mean by the standard error to get t-statistic - Compare the calculated value of t with the tabulated value at the required d.f.
  • 24.
    Student’s t-test (contd.): Twosample t-test - Applicable with two independent samples - Not necessarily of the same size - Compute mean and standard deviations of the two samples - Compute difference of the two means - Compute standard error of the above difference - Divide difference of the means with the standard error to obtain value of the t-statistic - Compare calculated value of t with the tabulated value at the required d.f. • With large sample size the t-statistic tends to Z-statistic • Hence for large samples the Z-test (standard normal test) should be used in place of t-test
  • 25.
    Test of differencebetween two proportions: • Two sample test • Clearly defined dichotomy • Use -test or Z-test - Proportion of people with the attribute under investigation should be known in the two samples • Easy to use and understand 2 χ
  • 26.
    Non-parametric tests ofsignificance: • For t-test normality of parent population is assumed • In case of non-normality of the parent population, non-parametric tests should be employed to assess significance of the difference e.g. - Sign test, Run test, Mann Whitney U-test etc. • Deal with positional information • Does not estimate the parameters • Possible to analyze qualitative data
  • 27.
    Determination of samplesize: • Importance of Sample size - Appropriateness - Validity of results - Applicability of statistical tools • Situation demands - New treatment better than the standard - Discard the new treatment if slightly better - Does not wish to drop if new treatment is substantially superior
  • 28.