IS 4800 Empirical Research Methods
for Information Science
Class Notes Feb. 24, 2012
Instructor: Prof. Carole Hafner, 446 WVH
hafner@ccs.neu.edu Tel: 617-373-5116
Course Web site: www.ccs.neu.edu/course/is4800sp12/
2
Types of Quantitative Studies We’ve
Discussed
• Observational
• Survey
• Experimental
– One-factor, two-level, between-subjects
– One-factor, two-level, within-subjects
• aka “repeated measures” or “crossover”
– Matched pairs
3
Types of Experimental Designs
• Between-Subjects Design
– Different groups of subjects are randomly assigned to the levels
of your independent variable
– Data are averaged for analysis
– Use t-test for independent means
– Example: “single factor, two-level, between subjects” design
• Level A=Word vs. Level B=Wizziword
4
Types of Experimental Designs
• Within-Subjects Design
– A single group of subjects is exposed to all levels of the
independent variable
– Data are averaged for analysis
– aka “repeated measures design”, “crossover design”
– Use t-test for dependent means aka “paired samples t-test”
– We will discuss “single factor, two-level, within subjects”
designs.
5
Between-Subjects Design
• Each group is a sample from a population
• Big question: are the populations the same
(null hypothesis) or are they significantly
different?
6
Sidebar:
Randomization
• Crucial: method must not be applied
subjectively
• Point in time at which randomization occurs
is important
recruiting randomization experiment
final
measures
7
Sidebar:
Randomization
• Simple randomization
– Flip a coin
– Random number generator
– Table of random numbers
– Partition numeric range into number of conditions
• Problems?
8
Sidebar:
Randomization
• Blocked randomization
– Avoids serious imbalances in assignments of subjects to
conditions
– Guarantees that imbalance will never be larger than a specified
amount
– Example: want to ensure that every 4 subjects we have an equal
number assigned to each of 2 conditions => “block size of 4”
– Method: write all permutations of N conditions taken B at a time
(for B = block size)
• Example: AABB, ABAB, BAAB, BABA, BBAA, ABBA
– At the start of each block, select one of the orderings at random
– Should use block sizes > 2
Sidebar:
Randomization
• Stratified randomization
– First stratify Ss based on measured factors (prior
to randomization) (e.g., gender)
– Within each strata, randomize
• Either simple or blocked
Strata Sex Condition assignment
1 M ABBA BABA…
2 F BABA BBAA…
10
Within-Subjects Designs
Benefits
• More Power! Why?
– Controls for all inter-subject variability
– Randomized between-subjects design just balances
the effects between groups
– (Matched-pair controls for identified and matched
extraneous variables)
11
The Problem of Error Variance
• Error variance is the variability among scores not
caused by the independent variable
– Error variance is common to all experimental designs
– Error variance is handled differently in each design
• Sources of error variance (“extraneous variables”)
– Individual differences among subjects
– Environmental conditions not constant across levels of the
independent variable
– Fluctuations in the physical/mental state of an individual
subject
Error Variance
+
Independent
Variable
Individual
Differences
Environmental
Conditions
Measured
Outcomes
13
Handling Error Variance
• Taking steps to reduce error variance
– Hold extraneous variables constant by treating subjects as
similarly as possible
– Match subjects on crucial characteristics
• Increasing the effectiveness of the independent
variable
– Strong manipulations yield less error variance than weak
manipulations
14
Matched Group Design
Match
Pairs
Randomize
• Use when you know some third variable has
significant correlation with outcome
• A between-subjects design
• Use paired-samples t-test!
Treatment 2
Treatment 1
15
• Randomizing error variance across groups
– Distribute error variance equivalently across levels of the
independent variable
– Accomplished with random assignment of subjects to levels
of the independent variable
• Statistical analysis
– Random assignment tends to equalize error variance across
groups, but not guarantee that it will
– You can estimate the probability that observed differences
are due to error variance by using inferential statistics
Handling Error Variance
16
Within-Subjects Designs
• Subjects are not randomly assigned to treatment
conditions
– The same subjects are used in all conditions
– Closely related to the matched-groups design
• Advantages
– Reduces error variance due to individual differences among
subjects across treatment groups
– Reduced error variance results in a more powerful design
• Effects of independent variable are more likely to be detected
17
• More demanding on subjects, especially in complex
designs
• Subject attrition is a problem
• Carryover effects: Exposure to a previous treatment
affects performance in a subsequent treatment
Within-Subjects Designs
Disadvantages
Carryover
Example
• Embodied Conversational
Agents to Promote Health
Literacy for Older Adults
T0 T1 T2
Brochure Computer
Diabetes
Knowledge
Assessment
Diabetes
Knowledge
Assessment
Diabetes
Knowledge
Assessment
19
Sources of Carryover
• Learning
– Learning a task in the first treatment may affect performance in the second
• Fatigue
– Fatigue from earlier treatments may affect performance in later treatments
• Habituation
– Repeated exposure to a stimulus may lead to unresponsiveness to that stimulus
• Sensitization
– Exposure to a stimulus may make a subject respond more strongly to another
• Contrast
– Subjects may compare treatments, which may affect behavior
• Adaptation
– If a subject undergoes adaptation (e.g., dark adaptation), then earlier results may
differ from later ones
20
Dealing With Carryover Effects
• Counterbalancing
– The various treatments are presented in a different order for
different subjects
– May be complete or partial
– Balances the effects of carryover on each treatment
– Assumes carryover effect is independent of the order
21
• Taking Steps to Minimize Carryover
– Techniques such as pre-training, practice sessions, or
rest periods between treatments can reduce some forms
of carryover
• Make Treatment Order an Independent Variable
– Allows you to measure the size of carryover effects,
which can be taken into account in future experiments
Dealing With Carryover Effects
22
Dealing With Carryover Effects
• The Latin Square Design
– Sample partial counterbalancing approach
– Used when you make the number of treatment orders equal to the number of
treatments (each treatment occurs once in every row and column)
– Example: want to evaluate 4 different word processors, using 4 admins in 4
departments. A completely counterbalanced design would require 4x4x4=64
trials.
– Latin square attempts to eliminate systematic bias in assignment of treatment to
departments & subjects.
Subj Department
1 2 3 4
1 C B A D Treatments A-D
2 B A D C
3 D C B A
4 A D C B
Example of a Counterbalanced Single-Factor
Design With Two Treatments
Order
1
2
Treatment
Sequence
A B
B A
Subject
1
2
…
Order
2
1
…
Treatment A
23.5
14.6
…
Treatment B
14.2
11.5
…
How do you test for “order effects”?
Types of Studies We’ve Discussed
• Review pro’s and con’s of between subjects
and within subjects. What is matched pairs?
25
Example – Best Design?
• You’ve developed a new web-based help
system for your email client. You want to
compare your system to the old printed
manual.
26
Example – Best Design?
• You’ve just developed the “Matchmaker” – a
handheld device that beeps when you are in the
vicinity of a compatible person who is also
carrying a Matchmaker.
• You evaluate the number of users who are
married after six months of use compared to a
non-intervention control group.
27
Example – Best Design?
• You’ve just developed “Reado Speedo” that
reads print books using OCR and speaks
them to you at twice your normal reading
rate. You want to evaluate your product
against the old fashioned way on reading
rate, comprehension and satisfaction.
Introduction to Usability Testing
I. Summative evaluation: Measure/compare user
performance and satisfaction
•Quantitative measures
•Statistical methods
II. Formative Evaluation: Identify Usability Problems
•Quantitative and Qualitative measures
•Ethnographic methods such as interviews, focus
groups
Usability Goals (Nielsen)
1. Learnability
2. Efficiency
3. Memorability
4. Error avoidance/recovery
5. User satisfaction
Operationalize these goals to evaluate usability
What is a Usability Experiment?
Usability testing in a controlled environment
•There is a test set of users
•They perform pre-specified tasks
•Data is collected (quantitative and qualitative)
•Take mean and/or median value of measured attributes
•Compare to goal or another system
Contrasted with “expert review” and “field study” evaluation
methodologies
The growth of usability groups and usability laboratories
Subjects
representative
sufficient sample
Variables
independent variable (IV)
characteristic changed to produce different conditions.
e.g. interface style, number of menu items.
dependent variable (DV)
characteristics measured in the experiment
e.g. time taken, number of errors.
Experimental factors
Hypothesis
prediction of outcome framed in terms of IV and DV
null hypothesis: states no difference between conditions
aim is to disprove this.
Experimental design
within groups design
each subject performs experiment under each condition.
transfer of learning possible
less costly and less likely to suffer from user variation.
between groups design
each subject performs under only one condition
no transfer of learning
more users required
variation can bias results.
Experimental factors (cont.)
Summative Analysis
What to measure? (and it’s relationship to a usability goal)
Total task time
User “think time” (dead time??)
Time spent not moving toward goal
Ratio of successful actions/errors
Commands used/not used
frequency of user expression of:
confusion, frustration, satisfaction
frequency of reference to manuals/help system
percent of time such reference provided the needed answer
Measuring User Performance
Measuring learnability
Time to complete a set of tasks
Learnability/efficiency trade-off
Measuring efficiency
Time to complete a set of tasks
How to define and locate “experienced” users
Measuring memorability
The most difficult, since “casual” users are hard
to find for experiments
Memory quizzes may be misleading
Measuring User Performance (cont.)
Measuring user satisfaction
Likert scale (agree or disagree)
Semantic differential scale
Physiological measure of stress
Measuring errors
Classification of minor v. serious
Reliability and Validity
Reliability means repeatability. Statistical significance is a
measure of reliability
Validity means will the results transfer into a real-life situation.
It depends on matching the users, task, environment
Reliability - difficult to achieve because of high variability
in individual user performance
Formative Evaluation
What is a Usability Problem??
Unclear - the planned method for using the system is not
readily understood or remembered (info. design level)
Error-prone - the design leads users to stray from the
correct operation of the system (any design level)
Mechanism overhead - the mechanism design creates awkward
work flow patterns that slow down or distract users.
Environment clash - the design of the system does not
fit well with the users’ overall work processes. (any design level)
Ex: incomplete transaction cannot be saved
Qualitative methods for collecting usability
problems
Thinking aloud studies
Difficult to conduct
Experimenter prompting, non-directive
Alternatives: constructive interaction, coaching
method, retrospective testing
Output: notes on what users did and expressed: goals,
confusions or misunderstandings, errors, reactions expressed
Questionnaires
Should be usability-tested beforehand
Focus groups, interviews
user observed performing task
user asked to describe what he is doing and why, what he thinks is
happening etc.
Advantages
simplicity - requires little expertise
can provide useful insight
can show how system is actually use
Disadvantages
subjective
selective
act of describing may alter task performance
Observational Methods - Think Aloud
variation on think aloud
user collaborates in evaluation
both user and evaluator can ask each other questions throughout
Additional advantages
less constrained and easier to use
user is encouraged to criticize system
clarification possible
Observational Methods - Cooperative evaluation
paper and pencil
cheap, limited to writing speed
audio
good for think aloud, diffcult to match with other protocols
video
accurate and realistic, needs special equipment, obtrusive
computer logging
automatic and unobtrusive, large amounts of data difficult to analyze
user notebooks
coarse and subjective, useful insights, good for longitudinal studies
Mixed use in practice.
Transcription of audio and video difficult and requires skill.
Some automatic support tools available
Observational Methods - Protocol analysis
analyst questions user on one to one basis
usually based on prepared questions
informal, subjective and relatively cheap
Advantages
can be varied to suit context
issues can be explored more fully
can elicit user views and identify unanticipated problems
Disadvantages
very subjective
time consuming
Query Techniques - Interviews
Set of fixed questions given to users
Advantages
quick and reaches large user group
can be analyzed more rigorously
Disadvantages
less flexible
less probing
Query Techniques - Questionnaires
Advantages:
specialist equipment available
uninterrupted environment
Disadvantages:
lack of context
difficult to observe several users cooperating
Appropriate
if actual system location is dangerous or impractical for
to allow controlled manipulation of use.
Laboratory studies: Pros and Cons
Steps in a usability experiment
1. The planning phase
1. The execution phase
1. Data collection techniques
1. Data analysis
The planning phase
Who, what, where, when and how much?
•Who are test users, and how will they be recruited?
•Who are the experimenters?
•When, where, and how long will the test take?
•What equipment/software is needed?
•How much will the experiment cost?
Prepare detailed test protocol
*What test tasks? (written task sheets)
*What user aids? (written manual)
*What data collected? (include questionnaire)
How will results be analyzed/evaluated?
Pilot test protocol with a few users
Detailed Test Protocol
What tasks?
Criteria for completion?
User aids
What will users be asked to do (thinking aloud studies)?
Interaction with experimenter
What data will be collected?
All materials to be given to users as part of the test,
including detailed description of the tasks.
Execution phase
Prepare environment, materials, software
Introduction should include:
purpose (evaluating software)
voluntary and confidential
explain all procedures
recording
question-handling
invite questions
During experiment
give user written task description(s), one at a time
only one experimenter should talk
De-briefing
Execution phase: ethics of human
experimentation applied to usability testing
Users feel exposed using unfamiliar tools and making erros
Guidelines:
•Re-assure that individual results not revealed
•Re-assure that user can stop any time
•Provide comfortable environment
•Don’t laugh or refer to users as subjects or guinea pigs
•Don’t volunteer help, but don’t allow user to struggle too long
•In de-briefing
•answer all questions
•reveal any deception
•thanks for helping
Execution Phase: Designing Test Tasks
Tasks:
Are representative
Cover most important parts of UI
Don’t take too long to complete
Goal or result oriented (possibly with scenario)
Not frivolous or humorous (unless part of product goal)
First task should build confidence
Last task should create a sense of accomplishment
Data collection - usability labs and equipment
Pad and paper the only absolutely necessary data collection tool!
Observation areas (for other experimenters, developers,
customer reps, etc.) - should be shown to users
Videotape (may be overrated) - users must sign a release
Video display capture
Portable usability labs
Usability kiosks
Before you start to do any statistics:
look at data
save original data
Choice of statistical technique depends on
type of data
information required
Type of data
discrete - finite number of values
continuous - any value
Analysis of data
Testing usability in the field
1. Direct observation in actual use discover new uses
take notes, don’t help, chat later
2. Logging actual use objective, not intrusive
great for identifying errors which features are/are not
used privacy concerns
Testing Usability in the Field (cont.)
3. Questionnaires and interviews with real users
ask users to recall critical incidents
questionnaires must be short and easy to return
4. Focus groups
6-9 users
skilled moderator with pre-planned script
computer conferencing??
5 On-line direct feedback mechanisms
initiated by users
may signal change in user needs
trust but verify
6. Bulletin boards and user groups
Advantages:
natural environment
context retained (though observation may alter it)
longitudinal studies possible
Disadvantages:
distractions
noise
Appropriate
for “beta testing”
where context is crucial for longitudinal studies
Field Studies: Pros and Cons

classfeb24.ppt

  • 1.
    IS 4800 EmpiricalResearch Methods for Information Science Class Notes Feb. 24, 2012 Instructor: Prof. Carole Hafner, 446 WVH hafner@ccs.neu.edu Tel: 617-373-5116 Course Web site: www.ccs.neu.edu/course/is4800sp12/
  • 2.
    2 Types of QuantitativeStudies We’ve Discussed • Observational • Survey • Experimental – One-factor, two-level, between-subjects – One-factor, two-level, within-subjects • aka “repeated measures” or “crossover” – Matched pairs
  • 3.
    3 Types of ExperimentalDesigns • Between-Subjects Design – Different groups of subjects are randomly assigned to the levels of your independent variable – Data are averaged for analysis – Use t-test for independent means – Example: “single factor, two-level, between subjects” design • Level A=Word vs. Level B=Wizziword
  • 4.
    4 Types of ExperimentalDesigns • Within-Subjects Design – A single group of subjects is exposed to all levels of the independent variable – Data are averaged for analysis – aka “repeated measures design”, “crossover design” – Use t-test for dependent means aka “paired samples t-test” – We will discuss “single factor, two-level, within subjects” designs.
  • 5.
    5 Between-Subjects Design • Eachgroup is a sample from a population • Big question: are the populations the same (null hypothesis) or are they significantly different?
  • 6.
    6 Sidebar: Randomization • Crucial: methodmust not be applied subjectively • Point in time at which randomization occurs is important recruiting randomization experiment final measures
  • 7.
    7 Sidebar: Randomization • Simple randomization –Flip a coin – Random number generator – Table of random numbers – Partition numeric range into number of conditions • Problems?
  • 8.
    8 Sidebar: Randomization • Blocked randomization –Avoids serious imbalances in assignments of subjects to conditions – Guarantees that imbalance will never be larger than a specified amount – Example: want to ensure that every 4 subjects we have an equal number assigned to each of 2 conditions => “block size of 4” – Method: write all permutations of N conditions taken B at a time (for B = block size) • Example: AABB, ABAB, BAAB, BABA, BBAA, ABBA – At the start of each block, select one of the orderings at random – Should use block sizes > 2
  • 9.
    Sidebar: Randomization • Stratified randomization –First stratify Ss based on measured factors (prior to randomization) (e.g., gender) – Within each strata, randomize • Either simple or blocked Strata Sex Condition assignment 1 M ABBA BABA… 2 F BABA BBAA…
  • 10.
    10 Within-Subjects Designs Benefits • MorePower! Why? – Controls for all inter-subject variability – Randomized between-subjects design just balances the effects between groups – (Matched-pair controls for identified and matched extraneous variables)
  • 11.
    11 The Problem ofError Variance • Error variance is the variability among scores not caused by the independent variable – Error variance is common to all experimental designs – Error variance is handled differently in each design • Sources of error variance (“extraneous variables”) – Individual differences among subjects – Environmental conditions not constant across levels of the independent variable – Fluctuations in the physical/mental state of an individual subject
  • 12.
  • 13.
    13 Handling Error Variance •Taking steps to reduce error variance – Hold extraneous variables constant by treating subjects as similarly as possible – Match subjects on crucial characteristics • Increasing the effectiveness of the independent variable – Strong manipulations yield less error variance than weak manipulations
  • 14.
    14 Matched Group Design Match Pairs Randomize •Use when you know some third variable has significant correlation with outcome • A between-subjects design • Use paired-samples t-test! Treatment 2 Treatment 1
  • 15.
    15 • Randomizing errorvariance across groups – Distribute error variance equivalently across levels of the independent variable – Accomplished with random assignment of subjects to levels of the independent variable • Statistical analysis – Random assignment tends to equalize error variance across groups, but not guarantee that it will – You can estimate the probability that observed differences are due to error variance by using inferential statistics Handling Error Variance
  • 16.
    16 Within-Subjects Designs • Subjectsare not randomly assigned to treatment conditions – The same subjects are used in all conditions – Closely related to the matched-groups design • Advantages – Reduces error variance due to individual differences among subjects across treatment groups – Reduced error variance results in a more powerful design • Effects of independent variable are more likely to be detected
  • 17.
    17 • More demandingon subjects, especially in complex designs • Subject attrition is a problem • Carryover effects: Exposure to a previous treatment affects performance in a subsequent treatment Within-Subjects Designs Disadvantages
  • 18.
    Carryover Example • Embodied Conversational Agentsto Promote Health Literacy for Older Adults T0 T1 T2 Brochure Computer Diabetes Knowledge Assessment Diabetes Knowledge Assessment Diabetes Knowledge Assessment
  • 19.
    19 Sources of Carryover •Learning – Learning a task in the first treatment may affect performance in the second • Fatigue – Fatigue from earlier treatments may affect performance in later treatments • Habituation – Repeated exposure to a stimulus may lead to unresponsiveness to that stimulus • Sensitization – Exposure to a stimulus may make a subject respond more strongly to another • Contrast – Subjects may compare treatments, which may affect behavior • Adaptation – If a subject undergoes adaptation (e.g., dark adaptation), then earlier results may differ from later ones
  • 20.
    20 Dealing With CarryoverEffects • Counterbalancing – The various treatments are presented in a different order for different subjects – May be complete or partial – Balances the effects of carryover on each treatment – Assumes carryover effect is independent of the order
  • 21.
    21 • Taking Stepsto Minimize Carryover – Techniques such as pre-training, practice sessions, or rest periods between treatments can reduce some forms of carryover • Make Treatment Order an Independent Variable – Allows you to measure the size of carryover effects, which can be taken into account in future experiments Dealing With Carryover Effects
  • 22.
    22 Dealing With CarryoverEffects • The Latin Square Design – Sample partial counterbalancing approach – Used when you make the number of treatment orders equal to the number of treatments (each treatment occurs once in every row and column) – Example: want to evaluate 4 different word processors, using 4 admins in 4 departments. A completely counterbalanced design would require 4x4x4=64 trials. – Latin square attempts to eliminate systematic bias in assignment of treatment to departments & subjects. Subj Department 1 2 3 4 1 C B A D Treatments A-D 2 B A D C 3 D C B A 4 A D C B
  • 23.
    Example of aCounterbalanced Single-Factor Design With Two Treatments Order 1 2 Treatment Sequence A B B A Subject 1 2 … Order 2 1 … Treatment A 23.5 14.6 … Treatment B 14.2 11.5 … How do you test for “order effects”?
  • 24.
    Types of StudiesWe’ve Discussed • Review pro’s and con’s of between subjects and within subjects. What is matched pairs?
  • 25.
    25 Example – BestDesign? • You’ve developed a new web-based help system for your email client. You want to compare your system to the old printed manual.
  • 26.
    26 Example – BestDesign? • You’ve just developed the “Matchmaker” – a handheld device that beeps when you are in the vicinity of a compatible person who is also carrying a Matchmaker. • You evaluate the number of users who are married after six months of use compared to a non-intervention control group.
  • 27.
    27 Example – BestDesign? • You’ve just developed “Reado Speedo” that reads print books using OCR and speaks them to you at twice your normal reading rate. You want to evaluate your product against the old fashioned way on reading rate, comprehension and satisfaction.
  • 28.
    Introduction to UsabilityTesting I. Summative evaluation: Measure/compare user performance and satisfaction •Quantitative measures •Statistical methods II. Formative Evaluation: Identify Usability Problems •Quantitative and Qualitative measures •Ethnographic methods such as interviews, focus groups
  • 29.
    Usability Goals (Nielsen) 1.Learnability 2. Efficiency 3. Memorability 4. Error avoidance/recovery 5. User satisfaction Operationalize these goals to evaluate usability
  • 30.
    What is aUsability Experiment? Usability testing in a controlled environment •There is a test set of users •They perform pre-specified tasks •Data is collected (quantitative and qualitative) •Take mean and/or median value of measured attributes •Compare to goal or another system Contrasted with “expert review” and “field study” evaluation methodologies The growth of usability groups and usability laboratories
  • 31.
    Subjects representative sufficient sample Variables independent variable(IV) characteristic changed to produce different conditions. e.g. interface style, number of menu items. dependent variable (DV) characteristics measured in the experiment e.g. time taken, number of errors. Experimental factors
  • 32.
    Hypothesis prediction of outcomeframed in terms of IV and DV null hypothesis: states no difference between conditions aim is to disprove this. Experimental design within groups design each subject performs experiment under each condition. transfer of learning possible less costly and less likely to suffer from user variation. between groups design each subject performs under only one condition no transfer of learning more users required variation can bias results. Experimental factors (cont.)
  • 33.
    Summative Analysis What tomeasure? (and it’s relationship to a usability goal) Total task time User “think time” (dead time??) Time spent not moving toward goal Ratio of successful actions/errors Commands used/not used frequency of user expression of: confusion, frustration, satisfaction frequency of reference to manuals/help system percent of time such reference provided the needed answer
  • 34.
    Measuring User Performance Measuringlearnability Time to complete a set of tasks Learnability/efficiency trade-off Measuring efficiency Time to complete a set of tasks How to define and locate “experienced” users Measuring memorability The most difficult, since “casual” users are hard to find for experiments Memory quizzes may be misleading
  • 35.
    Measuring User Performance(cont.) Measuring user satisfaction Likert scale (agree or disagree) Semantic differential scale Physiological measure of stress Measuring errors Classification of minor v. serious
  • 36.
    Reliability and Validity Reliabilitymeans repeatability. Statistical significance is a measure of reliability Validity means will the results transfer into a real-life situation. It depends on matching the users, task, environment Reliability - difficult to achieve because of high variability in individual user performance
  • 37.
    Formative Evaluation What isa Usability Problem?? Unclear - the planned method for using the system is not readily understood or remembered (info. design level) Error-prone - the design leads users to stray from the correct operation of the system (any design level) Mechanism overhead - the mechanism design creates awkward work flow patterns that slow down or distract users. Environment clash - the design of the system does not fit well with the users’ overall work processes. (any design level) Ex: incomplete transaction cannot be saved
  • 38.
    Qualitative methods forcollecting usability problems Thinking aloud studies Difficult to conduct Experimenter prompting, non-directive Alternatives: constructive interaction, coaching method, retrospective testing Output: notes on what users did and expressed: goals, confusions or misunderstandings, errors, reactions expressed Questionnaires Should be usability-tested beforehand Focus groups, interviews
  • 39.
    user observed performingtask user asked to describe what he is doing and why, what he thinks is happening etc. Advantages simplicity - requires little expertise can provide useful insight can show how system is actually use Disadvantages subjective selective act of describing may alter task performance Observational Methods - Think Aloud
  • 40.
    variation on thinkaloud user collaborates in evaluation both user and evaluator can ask each other questions throughout Additional advantages less constrained and easier to use user is encouraged to criticize system clarification possible Observational Methods - Cooperative evaluation
  • 41.
    paper and pencil cheap,limited to writing speed audio good for think aloud, diffcult to match with other protocols video accurate and realistic, needs special equipment, obtrusive computer logging automatic and unobtrusive, large amounts of data difficult to analyze user notebooks coarse and subjective, useful insights, good for longitudinal studies Mixed use in practice. Transcription of audio and video difficult and requires skill. Some automatic support tools available Observational Methods - Protocol analysis
  • 42.
    analyst questions useron one to one basis usually based on prepared questions informal, subjective and relatively cheap Advantages can be varied to suit context issues can be explored more fully can elicit user views and identify unanticipated problems Disadvantages very subjective time consuming Query Techniques - Interviews
  • 43.
    Set of fixedquestions given to users Advantages quick and reaches large user group can be analyzed more rigorously Disadvantages less flexible less probing Query Techniques - Questionnaires
  • 44.
    Advantages: specialist equipment available uninterruptedenvironment Disadvantages: lack of context difficult to observe several users cooperating Appropriate if actual system location is dangerous or impractical for to allow controlled manipulation of use. Laboratory studies: Pros and Cons
  • 45.
    Steps in ausability experiment 1. The planning phase 1. The execution phase 1. Data collection techniques 1. Data analysis
  • 46.
    The planning phase Who,what, where, when and how much? •Who are test users, and how will they be recruited? •Who are the experimenters? •When, where, and how long will the test take? •What equipment/software is needed? •How much will the experiment cost? Prepare detailed test protocol *What test tasks? (written task sheets) *What user aids? (written manual) *What data collected? (include questionnaire) How will results be analyzed/evaluated? Pilot test protocol with a few users
  • 47.
    Detailed Test Protocol Whattasks? Criteria for completion? User aids What will users be asked to do (thinking aloud studies)? Interaction with experimenter What data will be collected? All materials to be given to users as part of the test, including detailed description of the tasks.
  • 48.
    Execution phase Prepare environment,materials, software Introduction should include: purpose (evaluating software) voluntary and confidential explain all procedures recording question-handling invite questions During experiment give user written task description(s), one at a time only one experimenter should talk De-briefing
  • 49.
    Execution phase: ethicsof human experimentation applied to usability testing Users feel exposed using unfamiliar tools and making erros Guidelines: •Re-assure that individual results not revealed •Re-assure that user can stop any time •Provide comfortable environment •Don’t laugh or refer to users as subjects or guinea pigs •Don’t volunteer help, but don’t allow user to struggle too long •In de-briefing •answer all questions •reveal any deception •thanks for helping
  • 50.
    Execution Phase: DesigningTest Tasks Tasks: Are representative Cover most important parts of UI Don’t take too long to complete Goal or result oriented (possibly with scenario) Not frivolous or humorous (unless part of product goal) First task should build confidence Last task should create a sense of accomplishment
  • 51.
    Data collection -usability labs and equipment Pad and paper the only absolutely necessary data collection tool! Observation areas (for other experimenters, developers, customer reps, etc.) - should be shown to users Videotape (may be overrated) - users must sign a release Video display capture Portable usability labs Usability kiosks
  • 52.
    Before you startto do any statistics: look at data save original data Choice of statistical technique depends on type of data information required Type of data discrete - finite number of values continuous - any value Analysis of data
  • 53.
    Testing usability inthe field 1. Direct observation in actual use discover new uses take notes, don’t help, chat later 2. Logging actual use objective, not intrusive great for identifying errors which features are/are not used privacy concerns
  • 54.
    Testing Usability inthe Field (cont.) 3. Questionnaires and interviews with real users ask users to recall critical incidents questionnaires must be short and easy to return 4. Focus groups 6-9 users skilled moderator with pre-planned script computer conferencing?? 5 On-line direct feedback mechanisms initiated by users may signal change in user needs trust but verify 6. Bulletin boards and user groups
  • 55.
    Advantages: natural environment context retained(though observation may alter it) longitudinal studies possible Disadvantages: distractions noise Appropriate for “beta testing” where context is crucial for longitudinal studies Field Studies: Pros and Cons