statistics introduction

BASIC STATISTICSBASIC STATISTICS

Some important concepts:
Statistics -Analysis and Interpretation of numerical data
Data- Collection and compilation of relevant information
Nature of data
-Raw and Processed
Sources of Data
- Surveys, Clinical trials
- Questionnaires and personal interviews
- Secondary sources
Probability
- ( No. of favorable outcomes) / (Total no. of mutually
exclusive, equally likely and exhaustive events)

Distribution of data:
• Tabulation plan
• Determination of class intervals
• Determination of number of class intervals
• Quartiles -
• Centiles/Percentiles
4321
,,, QQQQ

Prevalence and Incidence Rates:
• Two fundamental statistics in epidemiology
• Expressed per 100
•Expressed per 1000 or 10,000
riskatPopulation
diseasethewithcasesofNumber
evalence =Pr
periodabovetheduringriskatPopulation
periodtimegivenaincasesnewofNumber
Incidence =

Types of Studies/Clinical Trial:
• Cross sectional surveys
• Longitudinal cohort based studies
- Retrospective
- Prospective
• Randomized Controlled Trials or Experiments
- Case-Control studies
- Drug evaluation trials
- Simple (Placebo-controlled)
- Blinded (single or double)

DESCRIPTIVE STATISTICSDESCRIPTIVE STATISTICS
 DescribesDescribes
 SummarisesSummarises
 PresentsPresents
 Interprets dataInterprets data
 Makes meaning out of numbersMakes meaning out of numbers

VARIABLESVARIABLES
 Variables are attributes that varyVariables are attributes that vary
between subjectsbetween subjects
 Height, weight, intelligence,Height, weight, intelligence,
achievementachievement
 Can be grouped asCan be grouped as
 Qualitative variableQualitative variable
 Quantitative variableQuantitative variable

QUANTITATIVEQUANTITATIVE
VARIABLESVARIABLES
 Discreet variableDiscreet variable
 Countable,but only whole numbersCountable,but only whole numbers
 Continuous variableContinuous variable
 Countable as a continuumCountable as a continuum

QUALITATIVE VARIABLESQUALITATIVE VARIABLES
 Do not possess numerical valuesDo not possess numerical values
 Colour of hair,gender,blood groupColour of hair,gender,blood group
 Three typesThree types
 OrdinalOrdinal
 DichotomousDichotomous
 NominalNominal

MEASURES OF
CENTRAL TENDENCY
 The mean
 The
median
 The mode

Measures of central tendency:
• Mean
- Arithmetic mean ( )
- Geometric mean
- Harmonic mean
• Mode
- Most frequently occurring observation
• Median
- Middle value
n
x
x n
∑
=
n obstheallofoduct .Pr
x

THE MEAN
 Arithmetic average of all observations
 Influenced by extreme values
 Non resistant measure

THE MEDIAN
 Middle value of all observations
 Resistant measure
 Not influenced by extreme values

2 data set2 data set
88
1010
1212
Mean = 10Mean = 10
66
1010
1414
Mean = 10Mean = 10
Is the 2 data set same

Measures of Dispersion:
• Range - (minimum, maximum)
• Variance and Standard deviation
Variance =
Standard deviation ( ) =
Standard error =
( )2
1
1
∑ −
− n
i
xx
n
Variancex
σ
n÷σ

STANDARD DEVIATION
 Measure of spread
 Used extensively in normal distribution
 Calculated using mathematical formulae
 Large SD means
 Small SD means

STANDARD DEVIATION
 Advantage of SD
- measuring the variability in single figure
- estimating the probability of observed differences
between two means
 Unit as that of mean

Graphical presentation/distribution of data:
• Bar diagram
• Histogram
• Line diagram
• Pie-chart
• Scatter-plot

Some fundamental distributions:
• Bernoulli distribution
• Binomial distribution
• Poisson distribution
•Negative binomial distribution
• Normal distribution

Testing of Hypothesis:
•Hypothesis
- A statement relating to objective
•Null hypothesis ( )
- Hypothesis of no difference or no effect
• Alternative hypothesis ( )
- Hypothesis of one way or two way difference or effect
0
H
a
H

 Common Statistical TestsCommon Statistical Tests
 Large sample tests (z test)Large sample tests (z test)
 Small sample tests (student t test)Small sample tests (student t test)
 Paired t testPaired t test
 Chi-square testChi-square test

Chi Square test for finding association:
• Non-parametric test
• Easy to understand and execute
• Does not involve any assumptions but the cell frequency should not fall
below5
• If cell frequency falls below 5, apply Yates’ Correction
General formula for Chi-square test
For the previous example- giving p<0.001 with one d.f.
( )
∑
−
=
E
EO
2
2
χ
3.822
=χ

Student’s t-test:
• Small sample test preferably up to 30 observations
• For comparing means
• Easy to understand and apply
• Most popular and frequently used
• Types of t-test
- Paired t-test for comparing pre and post treatment means in
the same set of subjects
- Take differences between obs.
- Compute mean of the differences
- Compute standard error of the differences
- Divide mean by the standard error to get t-statistic
- Compare the calculated value of t with the tabulated value at the
required d.f.

Student’s t-test (contd.):
Two sample t-test
- Applicable with two independent samples
- Not necessarily of the same size
- Compute mean and standard deviations of the two samples
- Compute difference of the two means
- Compute standard error of the above difference
- Divide difference of the means with the standard error to obtain
value of the t-statistic
- Compare calculated value of t with the tabulated value at the
required d.f.
• With large sample size the t-statistic tends to Z-statistic
• Hence for large samples the Z-test (standard normal test) should be used in place of
t-test

Test of difference between two proportions:
• Two sample test
• Clearly defined dichotomy
• Use -test or Z-test
- Proportion of people with the attribute under
investigation should be known in the two samples
• Easy to use and understand
2
χ

Non-parametric tests of significance:
• For t-test normality of parent population is assumed
• In case of non-normality of the parent population, non-parametric tests
should be employed to assess significance of the difference e.g.
- Sign test, Run test, Mann Whitney U-test etc.
• Deal with positional information
• Does not estimate the parameters
• Possible to analyze qualitative data

Determination of sample size:
• Importance of Sample size
- Appropriateness
- Validity of results
- Applicability of statistical tools
• Situation demands
- New treatment better than the standard
- Discard the new treatment if slightly better
- Does not wish to drop if new treatment is substantially superior

statistics introduction

More Related Content

What's hot

Similar to statistics introduction

Recently uploaded

statistics introduction