Unit 4 Blocking and Confounding For Two level Factorials.pptx

28/09/2024 11:44:35 AM Ravinandan A P, Asst .Professor 1
Course Objectives:
• The objectives of this course are:
• At completion of this subject, it is expected that students will be able to
understand –
• To Know the various terminologies and methods used in biostatistics
• To Know the various statistical techniques to solve statistical problems
• The measures of central tendency, formulae, and calculation of mean,
median, mode, range, variation of mean, standard deviation, variance,
coefficient of variation, standard error of mean
• Able to construction and labeling of graphs

Course Objectives:
• The basics of testing hypothesis, parametric and non-parametric
tests.
• The significance of statistical software like SPSS, Epi Info, and SAS.
• The different research methodology methods merits and demerits
• Know the operation of M.S. Excel, SPSS, R and MINITAB®, DoE
(Design of Experiment)
• Appreciate statistical techniques in solving the problems.

Syllabus – Content

4
09/28/2024 11:44:35 AM Ravinandan A P, Asst .Professor

5

6
Unit-IV
1. Blocking and confounding system for Two-level
factorials
2. Regression modeling: Hypothesis testing in Simple and
Multiple regression models
3. Introduction to Practical Components of Industrial and
Clinical Trials Problems: Statistical Analysis Using Excel,
SPSS, MINITAB®, DESIGN OF EXPERIMENTS, R - Online
Statistical Software’s to Industrial and Clinical trial
approach

Blocking
and
Confounding System
For Two-level Factorials

Introduction
• When an experiment is performed on specific issues related to
problems of a population, many factors affect the situation, and these
factors or components of issues need to be analyzed systematically.
• So that the experiments involve the study of the effects of two or more
elements.
• In case of general causes are found quickly, in many studies, it is
essential to design the factors to perform experiments efficiently.
• This type of factor design for experimentation is termed factorial
design.
• Possible combinations of the factors' levels are investigated.

What is a Factorial Design?
• A factorial experimental design investigates the effect of two or more
independent variables on one dependent variable.
• For example, a researcher wanted to investigate components for
increasing CET Scores.
• The three components are:
CET intensive class (yes or no).
CET Prep book (yes or no).
Extra homework (yes or no).

• The researcher plans to manipulate each of these independent variables.
• Each of the independent variables is called a factor, and each factor has two
levels (yes or no).
• As this experiment has 3 factors with 2 levels, this is a 2 x 2 x 2 = 23
factorial
design.
• An experiment with 3 factors and 3 levels would be a 33
factorial design, and
an experiment with 2 factors and 3 levels would be a 32
factorial design.
• The vast majority of factorial experiments only have two levels.
• In some experiments where the number of level/factor combinations are
unmanageable, the experiment can be split into parts (for example, by half),
creating a fractional experimental design.

Introduction..
• For example, suppose there are A 2 factorial designs, each complete
trial or replication of the experiment at all levels of factor A and B.
• In that case, each replicate contains all AB combinations of treatment.
• If factors are arranged in a factorial design, they are crossed. If there
are two levels of two factors, then simple factorial design is denoted
by 22
.
• If there are K factors containing two levels, then it is termed as k-
factors as two factors factorial design factorial design symbolically
denoted by 2k
.

Introduction…
• In many situations, performing all types of runs in 2Kf factorial
experiments is impossible under homogeneous conditions.
• In some cases, it might be desirable to vary the experimental
conditions deliberately to ensure equally effective treatment across
many situations encountered during the practice.
• For example, a pharmacist may run a pilot project experiment with
several batches of equations because he knows that various variables'
different batches of equations are likely to be used in the full-scale
process.

• A single batch of raw material may not be large enough to make all
required runs. Ex: Farming in a field – 3 seasons
• In other cases, it might be desirable to deliberately vary the experimental
conditions to ensure that the treatments are equally effective (robust)
across many situations that are likely to be encountered in practice.
• For example, a Pharmaceutical chemists may run a pilot plant
experiment with several batches of raw material because he knows that
different raw material batches of different quality grades are likely to be
used in the actual full-scale process.
• The design technique used in these situations is blocking.

Blocking
• BLOCKING In a factorial experiment, the treatment structure consists
of all possible combinations of all levels of all factors under
investigation.
• Factorial experimentation is highly efficient because each observation
provides information about all the factors in the experiment.
• Factorial experimentation also provides a systematic method of
investigating the relationships among the effects of different factors
(i.e., Interactions).

• A common objective in research is to investigate the effect of each
number of variables, or factors, on some response variable.
• Sometimes, performing all 2* factorial experiments under homogeneous
conditions is impossible.
• In this case, blocking is used to make the treatments effective across many
situations. In brief, blocking is concerned with the following facts:
• Blocking is a technique for dealing with controllable variables within
blocks (replicates)
• Within each block, all treatments (level combinations) are conducted.
• The run order in each block is randomized.
• Analysis follows general block factorial design
• When k is large, it is not possible to conduct all the treatments within each
block.

• Blocking is another way of increasing precision.
• This is the basis of the increased precision accomplished by using the
two-way design.
• In these designs, the patients in a block have similar (and relevant)
characteristics.
• For example, if age and sex are variables that affect the therapeutic
response of two comparative drugs, patients may be “blocked” on these
variables.
• Thus, if a male of age 55 years is assigned to drug A, another male of age
approximately 55 years will be assigned Treatment B.
• In practice, patients of similar characteristics are grouped together in a
block and randomly assigned to treatments.

Confounding system for Two-Level Factorials
• In many situations when an experiment is performed, it is impossible
to perform a complete factorial design in one block.
• Therefore, an incomplete design technique for arranging a complete
factorial experiment in blocks, where the block size is smaller than the
number of replicate treatment combinations in one replicate, is called
Confounding (or Confusing) of treatment.
• The technique causes information about specific treatment effects
indistinguishable from (or factorial design in confounded with) blocks.
• Consider the construction and analysis of the 2k
factorial design in 2P
incomplete blocks with p <K.

• If the number of factors or levels increases in a factorial experiment,
the combinations are large, and the number of treatments rises rapidly.
• Then, getting blocks of sufficiently large size will become challenging to
accommodate all the treatment combinations.
• Under such situations, one may use incomplete and connected block
designs.
• Example: Balanced Incomplete Block Designs (BIBD), where all the main
effects and interaction contrasts can be estimated, or may use
unconnected designs, where not all these contrasts can be calculated.’
• The contrasts that are non-estimable are said to be confounded.
• For example, we could confound blocks of size 24
into two blocks of size
8, four blocks of size 4, or eight blocks of size 2.
• Like all incomplete blocking techniques, confounding has inefficiency.

OVERALL
• Blocking involves grouping experimental units into homogeneous sets to
minimize the impact of extraneous variation.
• Confounding, on the other hand, refers to the deliberate association of
certain effects with specific blocks to reduce the number of treatment
combinations required.
Blocking
Control for extraneous variation.
Homogeneous groups of experimental units.
Confounding
Associate effects with specific blocks.
Reduce the number of treatment combination

COMPLETE AND PARTIAL CONFOUNDING
• If the allocation of treatments between the two blocks of replication
is kept the same for all the replications, then this type of confounding
is termed Complete Confounding,
• If the treatment effects confounded are not the same for different
replications, that is, the block contents are varied from replication to
replication, then this type of confounding is termed Partial
Confounding

ARRANGEMENT FOR CONFOUNDING
• The arrangement for combinations of treatment in different blocks
where pre-determined effect contrasts are confounded is called an
arrangement for confounding

ADVANTAGES OF CONFOUNDING
• If any "subsidiary factors" are introduced into an experiment to
ensure that any results apply across some situations, then
confounding reduces the experimental errors in the homogeneous
system.

DISADVANTAGES OF CONFOUNDING
• In the confounding scheme, the increased precision is obtained at the cost
of the sacrifice of information that is partial or complete on specific relative
unimportant interactions
• The confounded contrasts are replicated fewer times than the other
contrasts and as such there is loss of information and they can be estimated
with a lower degree of precision as the number of replications reduced.
• The total possible number of combinations of treatment levels increases
rapidly as the number of factors increases.
• An indiscriminate use of confounding may result in complete or partial loss
of information on the contrasts or comparisons of greater importance.

DISADVANTAGES OF CONFOUNDING..
• Therefore, the experimenter should confound only those treatment
combinations or contrasts that are of relatively less importance.
• Higher-order interactions are usually more difficult, and the statistical
analysis is complex, especially when some units (observations) are
missing.
• Many problems arise if the treatments interact with blocks

Extraneous variables
• Extraneous variables are factors that can affect the outcome of a study or
experiment but are not the focus of the research.
• It's crucial to control them, as if left uncontrolled, they can lead to inaccurate
results, putting the responsibility in your hands.
• Here are some examples of extraneous variables:
• Demographics: In a study of physical performance, age, and gender could be
extraneous variables.
• Testing environment: The time of day of testing could be an extraneous variable.
• Participant variables: Extraneous variables could include participant characteristics
such as educational background, sex, or marital status.
• Experimenter effect: Unintentional actions of the experimenter can influence the
outcome.

Importance in Two-level
Factorial Designs
Blocking and confounding are crucial for two-level factorial designs as they
allow researchers to efficiently study the effects of multiple factors while
minimizing the impact of extraneous variation.
1 Reduce Experiment Size
Confounding enables a
reduction in the number of
runs, leading to cost and
time savings.
2 Control for Unwanted
Variation
Blocking helps isolate and
manage extraneous
variation, leading to more
accurate results.
3 Increase Efficiency
By reducing the number of runs and controlling variation, blocking
and confounding enhance the efficiency of experiments.

Choosing Appropriate Schemes
Choosing the right blocking and confounding schemes depends on
several factors, including the number of factors, levels, available
resources, and the importance of specific effects.
Number of Factors
The number of factors
determines the complexity of
the design and the potential
for confounding.
Available Resources
Resource constraints, such as
time and budget, may limit
the number of runs possible.
Importance of Effects
The researcher should prioritize the effects of interest and minimize
confounding for those effects.

THE 22
DESIGN WITH TWO BLOCKS
• Suppose there are two factors(A, B), each with 2 levels, and two
blocks (b1,b2) each containing two runs (treatments).
• Since b1 and b2 are interchangeable, there are three possible
blocking schemes

Confounding Patterns in
Two-level Factorial
Designs
In two-level factorial designs, specific confounding patterns
emerge based on the chosen blocking and confounding
schemes. Understanding these patterns is essential for
interpreting the results.
Factor 1 Factor 2 Factor 3 Confounde
d Effect
A B C AB
A B C AC
A B C BC

Analyzing Blocked and Confounded Designs
Analyzing blocked and confounded designs requires special attention to the confounding patterns and the
effects of blocking. Specific techniques and statistical software can be used to interpret the results.
Data Analysis
Analyze the data collected from the blocked and confounded experiment.
Factor Effects
Estimate the effects of individual factors and interactions.
Confounding Adjustments
Adjust the estimated effects to account for confounding, if necessary.
Interpretation
Interpret the results and draw conclusions about the factors' influence.

Blocking schemes (three)
A B Responses 1 2 3
- - y-- b1 b1 b2
+ - y+- b1 b2 b1
- + y-+ b2 b1 b1
+ + y++ b2 b2 b2

Comparing blocking schemes:
Scheme 1:
Block Effect: b=y:-yo=i(-y-- -y+-+y4+y49)
Main Effect: 5 = -y- +y.4+y40)
While block effect and main effect are same or confounded
Scheme 2:
Block Effect: b = o: - $o: = -(-y-- + y'4- -y-# +y44) -i(-y. +y+--y.+ +y44)
Main Effect: While block effect and main effect are same or confounded ario
Scheme 3: Block Efect: b = 0: - Io: = (y_-y.- -y-t + y4)

The relationship between x and y
• Correlation: is there a relationship between 2 variables?
• Regression: how well a specific independent variable predicts the
dependent variable?
• Regression measures the relation between the mean value of one variable
(e.g., output) and the corresponding values of other variables (e.g., time
and cost).
• CORRELATION  CAUSATION
• To infer causality, manipulate the independent variable and observe the effect on
the dependent variable

Regression Modelling
• A regression model provides a function that describes the relationship
between one or more independent variables and a response, dependent, or
target variable.
• For example, the relationship between height and weight may be described
by a linear regression model.
• A regression analysis is the basis for many prediction types and determining
the effects on target variables.
• When you hear about studies on the news that talk about fuel efficiency, or
the cause of pollution, or the effects of screen time on learning, there is often
a regression model being used to support their claims.

Regression Modelling
• The regression analysis is an average relationship between two or
more variables, which can be used to calculate the value of an
unknown variable from the given set of values of other variables.
• The regression model is a statistical procedure that allows a
researcher to estimate the linear or straight-line relationship
between two or more variables.

Hypothesis Testing
• To test a hypothesis in statistics, we must perform the following steps:
1. Formulate a null hypothesis and an alternative hypothesis on
population parameters
2. Build a statistic to test the hypothesis made.
3. Define a rule (based on decision) to reject or not to reject the null
hypothesis

Hypothesis Testing In
Simple Regression Models
• Simple linear regression is a statistical method that allows us to
summarize and study relationships between two continuous
(quantitative) variables: One variable, denoted x, is regarded as the
predictor, explanatory, or independent variable, and the other is the
dependent variable.
• Some rules bound these two variables together.

• Before establishing how to formulate the null and alternative
hypothesis, it is crucial to focus on the following terms:
• Null hypothesis, Simple hypothesis, and Composite hypotheses
Null hypothesis: The statistical hypothesis under sample study is called
Null Hypothesis
Alternative Hypothesis: Regarding every null hypothesis, it is desirable
to state an alternative hypothesis.
• It is complementary to the null hypothesis.
 Alternative hypotheses are as follows:
• Simple hypotheses: The hypotheses made through one or more
equalities are called simple hypotheses.

Composite hypotheses: The hypotheses are called composite if they are
formulated using the operators "inequality," i.e., "greater than" and "smaller
than."
• The null hypothesis is always simple, although it is possible to make composite
null hypotheses in the context of the regression model.
• To formulate a null hypothesis denoted by H0, the operator "equality" is used.
• Each equality implies a restriction on the parameters of the model.
• It is usually taken that the observations are the results of the actual effects and
chance variation.
• Let us look at a few examples of null hypotheses concerning the regression
model:
a) H0:A1=0
b) H0 :A1+A2=0
c) H0:A1=A1=0
d) H0:A1+A2=0

• Definition: A statistical hypothesis is an assumption
or a statement
• Which may or may not be true concerning one or
more populations.
Examples:
• The mean height of the college students is 1.63m.
• There is no difference between Pf and Pv malaria
distribution in India (they are distributed in equal
proportions.)
45
Ravinandan A P

The null & alternative hypotheses
• The main hypothesis which we wish to test is
called the null hypothesis, since acceptance of it
commonly implies “no effect” or “ no difference.”
• It is denoted by the symbol HO.
46
Ravinandan A P

HYPOTHESIS
47
Ravinandan A P
Null Hypothesis: Often it is referred to as a
hypothesis is of no difference. In the testing process
the null hypothesis is either rejected or not rejected.
If null hypothesis is not rejected, the data on which is
based do not provide sufficient evidence to cause
rejection.
Alternative Hypothesis: If the testing procedure leads to
rejection, we conclude that the data at hand are not compatible with
the null hypothesis but are supportive of some other hypothesis.
That is called as alternative hypothesis

Type I Error
• Is committed by rejecting the null hypothesis when in really
it is true & probability of committing Type I error is denoted
as α
• α = P (Type I error)
= (Rejecting the null hypothesis when it is true)
50
Ravinandan A P

Type II Error
• Is committed we accept the null hypothesis when in reality it is
false & probability of committing Type II error is denoted as β
• β=P (Type II error)
= (Accepting the null hypothesis when it is false)
51
Ravinandan A P

• In testing the hypothesis, there is no chance or probability for any
error.
• But, in practice, eliminating both types of errors is impossible.
• Hence, we fix the probability of one error (Type I error) i. e. α and
try to minimize the probability of the other (Type II error)
52
Ravinandan A P

Examples
• 1) HO: μ = 1.63 m (from the previous example).
• 2) At present, only 60% of patients with leukemia survive more
than 6 years.
• A Pharmacist develops a new drug. Of 40 patients, chosen at
random, on whom the new drug is tested, 26 are alive after 6
years.
• Is the new drug better than the former treatment?
54
Ravinandan A P

Hypothesis testing offers us two choices:
1. Conclude that the difference between the two groups is so large
that it is unlikely to be due to chance alone. Reject the null
hypothesis and conclude that the groups really do differ.
2. Conclude that the difference between the two groups could be
explained just by chance. Accept the null hypothesis, at least for
now.
Note that you could be making a mistake, either way!
55
Ravinandan A P

Hypothesis testing outcomes
Decision
Outcome if null
hypothesis true
Outcome if null
hypothesis false
Do not reject null
hypothesis
Correct decision. Type II error
Reject null hypothesis Type I error Correct decision
56
Ravinandan A P

Ravinandan A P 57
Hypothesis testing in Multiple regression models
• For the multiple linear regression models, there are three different hypothesis
tests for slopes that one could conduct.
• They are: a hypothesis test for testing that one slope parameter is 0, a
hypothesis test for testing that all of the slope parameters are 0, e.g. Multiple
Linear Regression attempts to model the relationship between two or more
explanatory variables and response variable by fitting a linear equation to
observed data.
• Every value of the independent variable x is associated with a value of the
dependent variable y.
• The purpose of multiple regression (the term was first used by Pearson, 1908)
is to learn more about the relationship between several independent or
predictor variables and a dependent or criterion variable

Hypothesis testing in Multiple regression models
• The term 1st
used by Pearson in 1908.
• Multiple regression models study the relationship between a response
variable and multiple predictor variables. In multiple regression, hypothesis
testing is used to determine whether the predictor variables significantly
affect the response variable and, if so, the nature and magnitude of these
effects.
• The following are the steps involved in hypothesis testing in multiple
regression models:
• State the null and alternative hypotheses: The null hypothesis states that
there is no significant relationship between the predictor variables and the
response variable. In contrast, the alternative hypothesis states a significant
relationship.

• Conduct the F-test: The F-test determines whether the overall
regression model is significant. If the p-value associated with the F-test
is less than the significance level (usually set at 0.05), then the null
hypothesis is rejected, and the model is concluded to be significant.
• Conduct t-tests for individual predictors: Once it has been established
that the overall model is significant, t-tests are conducted for each
predictor variable to determine its individual significance. The t-test
measures the significance of each predictor variable’s effect on the
response variable, while holding all other predictor variables constant.

• An F-test is any statistical test used to compare the
variances of two samples or the ratio of variances between
multiple samples.
What is the Difference Between an F-Test and T-Test?
• The t-test is used to compare the means of two groups and
determine if they are significantly different, while the F-test is used to
compare variances of two or more groups and assess if they are
significantly different

• Evaluate the coefficient estimates: The coefficient estimates indicate the strength and
direction of the relationship between each predictor variable and the response variable.
The sign of the coefficient indicates the direction of the effect (positive or negative),
while the magnitude of the coefficient indicates the strength of the effect.
• Evaluate the goodness-of-fit: The goodness-of-fit measures how well the model fits the
data. The most commonly used goodness-of-fit measure is the R-squared value, which
measures the proportion of the variability in the response variable that is explained by
the predictor variables.
• Check for assumptions: Before interpreting the results of the hypothesis tests, it is
important to check that the assumptions of multiple regression are met. These include
the assumptions of linearity, independence, normality, and equal variance.
• In conclusion, hypothesis testing in multiple regression models involves testing the
overall significance of the model, testing the significance of individual predictors, and
evaluating the model’s goodness-of-fit. It is important to check that the assumptions of
multiple regression are met before interpreting the results of the hypothesis tests

Unit 4 Blocking and Confounding For Two level Factorials.pptx

In this document