CS 221: Artificial Intelligence Lecture 3: Probability and Bayes Nets Sebastian Thrun and Peter Norvig Naranbaatar Bayanbat, Juthika Dabholkar, Carlos Fernandez-Granda, Yan Largman, Cameron Schaeffer, Matthew Seal Slide Credit: Dan Klein (UC Berkeley)
Goal of Today Structured representation of probability distribution
Probability Expresses uncertainty Pervasive in all of AI Machine learning Information Retrieval (e.g., Web) Computer Vision Robotics Based on mathematical calculus Disclaimer: We only discuss finite distributions
Probability Probability of a fair coin
Probability Probability of cancer
Joint Probability Multiple events: cancer, test result Has cancer? Test positive? P(C,TP) yes yes 0.018 yes no 0.002 no yes 0.196 no no 0.784
Joint Probability The problem with joint distributions It takes 2 D -1 numbers to specify them!
Conditional Probability Describes the cancer test: Put this together with: Prior probability
Conditional Probability We have: We can now calculate joint probabilities Has cancer? Test positive? P(TP, C) yes yes yes no no yes no no Has cancer? Test positive? P(TP, C) yes yes 0.018 yes no 0.002 no yes 0.196 no no 0.784
Conditional Probability “ Diagnostic” question: How likely do is cancer given a positive test? Has cancer? Test positive? P(TP, C) yes yes 0.018 yes no 0.002 no yes 0.196 no no 0.784
Bayes Network We just encountered our first Bayes network: Cancer Test positive P(cancer) and P(Test positive | cancer) is called the “model” Calculating P(Test positive) is called “prediction” Calculating P(Cancer | test positive) is called “diagnostic reasoning”
Bayes Network We just encountered our first Bayes network: Cancer Test positive Cancer Test positive versus
Independence Independence What does this mean for our test? Don’t take it! Cancer Test positive
Independence Two variables are  independent  if: This says that their joint distribution  factors  into a product two simpler distributions This implies: We write:  Independence is a simplifying  modeling assumption Empirical  joint distributions: at best  “close” to independent
Example: Independence N fair, independent coin flips: h 0.5 t 0.5 h 0.5 t 0.5 h 0.5 t 0.5
Example: Independence? T W P warm sun 0.4 warm rain 0.1 cold sun 0.2 cold rain 0.3 T W P warm sun 0.3 warm rain 0.2 cold sun 0.3 cold rain 0.2 T P warm 0.5 cold 0.5 W P sun 0.6 rain 0.4
Conditional Independence P(Toothache, Cavity, Catch) If I have a Toothache, a dental probe might be more likely to catch But: if I have a cavity, the probability that the probe catches doesn't depend on whether I have a toothache: P(+catch | +toothache, +cavity) = P(+catch | +cavity) The same independence holds if I don ’t have a cavity: P(+catch | +toothache,   cavity) = P(+catch|   cavity) Catch is  conditionally independent  of Toothache given Cavity: P(Catch | Toothache, Cavity) = P(Catch | Cavity) Equivalent  conditional independence  statements: P(Toothache | Catch , Cavity) = P(Toothache | Cavity) P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity) One can be derived from the other easily We write:
Bayes Network Representation Cavity Catch Toothache P(cavity) P(catch | cavity) P(toothache | cavity) 1 parameter 2 parameters 2 parameters Versus: 2 3 -1 = 7 parameters
A More Realistic Bayes Network
Example Bayes Network: Car
Graphical Model Notation Nodes: variables (with domains) Can be assigned (observed) or unassigned (unobserved) Arcs: interactions Indicate  “direct influence” between variables Formally: encode conditional independence (more later) For now: imagine that arrows mean direct causation  (they may not!)
Example: Coin Flips N independent coin flips No interactions between variables:  absolute independence X 1 X 2 X n
Example: Traffic Variables: R: It rains T: There is traffic Model 1: independence Model 2: rain causes traffic Why is an agent using model 2 better? R T
Example: Alarm Network Variables B: Burglary A: Alarm goes off M: Mary calls J: John calls E: Earthquake! B urglary E arthquake A larm J ohn calls M ary calls
Bayes Net Semantics A set of nodes, one per variable X A directed, acyclic graph A conditional distribution for each node A collection of distributions over X, one for each combination of parents ’ values CPT: conditional probability table Description of a noisy  “causal” process A 1 X A n A Bayes net = Topology (graph) + Local Conditional Probabilities
Probabilities in BNs Bayes nets  implicitly  encode joint distributions As a product of local conditional distributions To see what probability a BN gives to a full assignment, multiply all the relevant conditionals together: Example: This lets us reconstruct any entry of the full joint Not every BN can represent every joint distribution The topology enforces certain conditional independencies
Example: Coin Flips X 1 X 2 X n Only distributions whose variables are absolutely independent can be represented by a Bayes ’ net with no arcs. h 0.5 t 0.5 h 0.5 t 0.5 h 0.5 t 0.5
Example: Traffic R T +r 1/4  r 3/4 +r +t 3/4  t 1/4  r +t 1/2  t 1/2 R T joint +r +t 3/16 +r -t 1/16 -r +t 3/8 -r -t 3/8
Example: Alarm Network B urglary E arthqk A larm J ohn calls M ary calls How many parameters? 1 1 4 2 2 10
Example: Alarm Network B urglary E arthqk A larm J ohn calls M ary calls B P(B) +b 0.001  b 0.999 E P(E) +e 0.002  e 0.998 B E A P(A|B,E) +b +e +a 0.95 +b +e  a 0.05 +b  e +a 0.94 +b  e  a 0.06  b +e +a 0.29  b +e  a 0.71  b  e +a 0.001  b  e  a 0.999 A J P(J|A) +a +j 0.9 +a  j 0.1  a +j 0.05  a  j 0.95 A M P(M|A) +a +m 0.7 +a  m 0.3  a +m 0.01  a  m 0.99
Example: Alarm Network B urglary E arthquake A larm J ohn calls M ary calls
Bayes ’ Nets A Bayes ’ net is an efficient encoding of a probabilistic model of a domain Questions we can ask: Inference: given a fixed BN, what is P(X | e)? Representation: given a BN graph, what kinds of distributions can it encode? Modeling: what BN is most appropriate for a given domain?
Remainder of this Class Find Conditional (In)Dependencies Concept of “d-separation”
Causal Chains This configuration is a  “causal chain” Is X independent of Z given Y? Evidence along the chain  “blocks” the influence X Y Z Yes! X: Low pressure Y: Rain Z: Traffic
Common Cause Another basic configuration: two effects of the same cause Are X and Z independent? Are X and Z independent given Y? Observing the cause blocks influence between effects. X Y Z Yes! Y: Alarm X: John calls Z: Mary calls
Common Effect Last configuration: two causes of one effect (v-structures) Are X and Z independent? Yes: the ballgame and the rain cause traffic, but they are not correlated Still need to prove they must be (try it!) Are X and Z independent given Y? No: seeing traffic puts the rain and the ballgame in competition as explanation? This is backwards from the other cases Observing an effect  activates  influence between possible causes. X Y Z X: Raining Z: Ballgame Y: Traffic
The General Case Any complex example can be analyzed using these three canonical cases General question: in a given BN, are two variables independent (given evidence)? Solution: analyze the graph
Reachability Recipe: shade evidence nodes Attempt 1: Remove shaded nodes. If two nodes are still connected by an undirected path, they are not conditionally independent Almost works, but not quite Where does it break? Answer: the v-structure at T doesn ’t count as a link in a path unless “active” R T B D L
Reachability (D-Separation) Question: Are X and Y conditionally independent given evidence vars {Z}? Yes, if X and Y  “separated” by Z Look for active paths from X to Y No active paths = independence! A path is active if each triple is active: Causal chain A    B    C where B is unobserved (either direction) Common cause A    B    C where B is unobserved Common effect (aka v-structure) A    B    C where B  or one of its descendents  is observed All it takes to block a path is a single inactive segment Active Triples Inactive Triples
Example Yes R T B T ’
Example R T B D L T ’ Yes Yes Yes
Example Variables: R: Raining T: Traffic D: Roof drips S: I ’m sad Questions: T S D R Yes
A Common BN T 1 T 2 A T N T 3 … Unobservable cause Tests time Diagnostic Reasoning:
A Common BN T 1 T 2 A T N T 3 … Unobservable cause Tests time Diagnostic Reasoning:
A Common BN T 1 T 2 A T N T 3 … Unobservable cause Tests time Diagnostic Reasoning:
A Common BN T 1 T 2 A T N T 3 … Unobservable cause Tests time Diagnostic Reasoning:
Causality? When Bayes ’ nets reflect the true causal patterns: Often simpler (nodes have fewer parents) Often easier to think about Often easier to elicit from experts BNs need not actually be causal Sometimes no causal net exists over the domain End up with arrows that reflect correlation, not causation What do the arrows really mean? Topology may happen to encode causal structure Topology only guaranteed to encode conditional independence
Summary Bayes network:  Graphical representation of joint distributions Efficiently encode conditional independencies Reduce number of parameters from exponential to linear (in many cases) Thursday: Inference in (general) Bayes networks

Cs221 lecture3-fall11

  • 1.
    CS 221: ArtificialIntelligence Lecture 3: Probability and Bayes Nets Sebastian Thrun and Peter Norvig Naranbaatar Bayanbat, Juthika Dabholkar, Carlos Fernandez-Granda, Yan Largman, Cameron Schaeffer, Matthew Seal Slide Credit: Dan Klein (UC Berkeley)
  • 2.
    Goal of TodayStructured representation of probability distribution
  • 3.
    Probability Expresses uncertaintyPervasive in all of AI Machine learning Information Retrieval (e.g., Web) Computer Vision Robotics Based on mathematical calculus Disclaimer: We only discuss finite distributions
  • 4.
  • 5.
  • 6.
    Joint Probability Multipleevents: cancer, test result Has cancer? Test positive? P(C,TP) yes yes 0.018 yes no 0.002 no yes 0.196 no no 0.784
  • 7.
    Joint Probability Theproblem with joint distributions It takes 2 D -1 numbers to specify them!
  • 8.
    Conditional Probability Describesthe cancer test: Put this together with: Prior probability
  • 9.
    Conditional Probability Wehave: We can now calculate joint probabilities Has cancer? Test positive? P(TP, C) yes yes yes no no yes no no Has cancer? Test positive? P(TP, C) yes yes 0.018 yes no 0.002 no yes 0.196 no no 0.784
  • 10.
    Conditional Probability “Diagnostic” question: How likely do is cancer given a positive test? Has cancer? Test positive? P(TP, C) yes yes 0.018 yes no 0.002 no yes 0.196 no no 0.784
  • 11.
    Bayes Network Wejust encountered our first Bayes network: Cancer Test positive P(cancer) and P(Test positive | cancer) is called the “model” Calculating P(Test positive) is called “prediction” Calculating P(Cancer | test positive) is called “diagnostic reasoning”
  • 12.
    Bayes Network Wejust encountered our first Bayes network: Cancer Test positive Cancer Test positive versus
  • 13.
    Independence Independence Whatdoes this mean for our test? Don’t take it! Cancer Test positive
  • 14.
    Independence Two variablesare independent if: This says that their joint distribution factors into a product two simpler distributions This implies: We write: Independence is a simplifying modeling assumption Empirical joint distributions: at best “close” to independent
  • 15.
    Example: Independence Nfair, independent coin flips: h 0.5 t 0.5 h 0.5 t 0.5 h 0.5 t 0.5
  • 16.
    Example: Independence? TW P warm sun 0.4 warm rain 0.1 cold sun 0.2 cold rain 0.3 T W P warm sun 0.3 warm rain 0.2 cold sun 0.3 cold rain 0.2 T P warm 0.5 cold 0.5 W P sun 0.6 rain 0.4
  • 17.
    Conditional Independence P(Toothache,Cavity, Catch) If I have a Toothache, a dental probe might be more likely to catch But: if I have a cavity, the probability that the probe catches doesn't depend on whether I have a toothache: P(+catch | +toothache, +cavity) = P(+catch | +cavity) The same independence holds if I don ’t have a cavity: P(+catch | +toothache,  cavity) = P(+catch|  cavity) Catch is conditionally independent of Toothache given Cavity: P(Catch | Toothache, Cavity) = P(Catch | Cavity) Equivalent conditional independence statements: P(Toothache | Catch , Cavity) = P(Toothache | Cavity) P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity) One can be derived from the other easily We write:
  • 18.
    Bayes Network RepresentationCavity Catch Toothache P(cavity) P(catch | cavity) P(toothache | cavity) 1 parameter 2 parameters 2 parameters Versus: 2 3 -1 = 7 parameters
  • 19.
    A More RealisticBayes Network
  • 20.
  • 21.
    Graphical Model NotationNodes: variables (with domains) Can be assigned (observed) or unassigned (unobserved) Arcs: interactions Indicate “direct influence” between variables Formally: encode conditional independence (more later) For now: imagine that arrows mean direct causation (they may not!)
  • 22.
    Example: Coin FlipsN independent coin flips No interactions between variables: absolute independence X 1 X 2 X n
  • 23.
    Example: Traffic Variables:R: It rains T: There is traffic Model 1: independence Model 2: rain causes traffic Why is an agent using model 2 better? R T
  • 24.
    Example: Alarm NetworkVariables B: Burglary A: Alarm goes off M: Mary calls J: John calls E: Earthquake! B urglary E arthquake A larm J ohn calls M ary calls
  • 25.
    Bayes Net SemanticsA set of nodes, one per variable X A directed, acyclic graph A conditional distribution for each node A collection of distributions over X, one for each combination of parents ’ values CPT: conditional probability table Description of a noisy “causal” process A 1 X A n A Bayes net = Topology (graph) + Local Conditional Probabilities
  • 26.
    Probabilities in BNsBayes nets implicitly encode joint distributions As a product of local conditional distributions To see what probability a BN gives to a full assignment, multiply all the relevant conditionals together: Example: This lets us reconstruct any entry of the full joint Not every BN can represent every joint distribution The topology enforces certain conditional independencies
  • 27.
    Example: Coin FlipsX 1 X 2 X n Only distributions whose variables are absolutely independent can be represented by a Bayes ’ net with no arcs. h 0.5 t 0.5 h 0.5 t 0.5 h 0.5 t 0.5
  • 28.
    Example: Traffic RT +r 1/4  r 3/4 +r +t 3/4  t 1/4  r +t 1/2  t 1/2 R T joint +r +t 3/16 +r -t 1/16 -r +t 3/8 -r -t 3/8
  • 29.
    Example: Alarm NetworkB urglary E arthqk A larm J ohn calls M ary calls How many parameters? 1 1 4 2 2 10
  • 30.
    Example: Alarm NetworkB urglary E arthqk A larm J ohn calls M ary calls B P(B) +b 0.001  b 0.999 E P(E) +e 0.002  e 0.998 B E A P(A|B,E) +b +e +a 0.95 +b +e  a 0.05 +b  e +a 0.94 +b  e  a 0.06  b +e +a 0.29  b +e  a 0.71  b  e +a 0.001  b  e  a 0.999 A J P(J|A) +a +j 0.9 +a  j 0.1  a +j 0.05  a  j 0.95 A M P(M|A) +a +m 0.7 +a  m 0.3  a +m 0.01  a  m 0.99
  • 31.
    Example: Alarm NetworkB urglary E arthquake A larm J ohn calls M ary calls
  • 32.
    Bayes ’ NetsA Bayes ’ net is an efficient encoding of a probabilistic model of a domain Questions we can ask: Inference: given a fixed BN, what is P(X | e)? Representation: given a BN graph, what kinds of distributions can it encode? Modeling: what BN is most appropriate for a given domain?
  • 33.
    Remainder of thisClass Find Conditional (In)Dependencies Concept of “d-separation”
  • 34.
    Causal Chains Thisconfiguration is a “causal chain” Is X independent of Z given Y? Evidence along the chain “blocks” the influence X Y Z Yes! X: Low pressure Y: Rain Z: Traffic
  • 35.
    Common Cause Anotherbasic configuration: two effects of the same cause Are X and Z independent? Are X and Z independent given Y? Observing the cause blocks influence between effects. X Y Z Yes! Y: Alarm X: John calls Z: Mary calls
  • 36.
    Common Effect Lastconfiguration: two causes of one effect (v-structures) Are X and Z independent? Yes: the ballgame and the rain cause traffic, but they are not correlated Still need to prove they must be (try it!) Are X and Z independent given Y? No: seeing traffic puts the rain and the ballgame in competition as explanation? This is backwards from the other cases Observing an effect activates influence between possible causes. X Y Z X: Raining Z: Ballgame Y: Traffic
  • 37.
    The General CaseAny complex example can be analyzed using these three canonical cases General question: in a given BN, are two variables independent (given evidence)? Solution: analyze the graph
  • 38.
    Reachability Recipe: shadeevidence nodes Attempt 1: Remove shaded nodes. If two nodes are still connected by an undirected path, they are not conditionally independent Almost works, but not quite Where does it break? Answer: the v-structure at T doesn ’t count as a link in a path unless “active” R T B D L
  • 39.
    Reachability (D-Separation) Question:Are X and Y conditionally independent given evidence vars {Z}? Yes, if X and Y “separated” by Z Look for active paths from X to Y No active paths = independence! A path is active if each triple is active: Causal chain A  B  C where B is unobserved (either direction) Common cause A  B  C where B is unobserved Common effect (aka v-structure) A  B  C where B or one of its descendents is observed All it takes to block a path is a single inactive segment Active Triples Inactive Triples
  • 40.
    Example Yes RT B T ’
  • 41.
    Example R TB D L T ’ Yes Yes Yes
  • 42.
    Example Variables: R:Raining T: Traffic D: Roof drips S: I ’m sad Questions: T S D R Yes
  • 43.
    A Common BNT 1 T 2 A T N T 3 … Unobservable cause Tests time Diagnostic Reasoning:
  • 44.
    A Common BNT 1 T 2 A T N T 3 … Unobservable cause Tests time Diagnostic Reasoning:
  • 45.
    A Common BNT 1 T 2 A T N T 3 … Unobservable cause Tests time Diagnostic Reasoning:
  • 46.
    A Common BNT 1 T 2 A T N T 3 … Unobservable cause Tests time Diagnostic Reasoning:
  • 47.
    Causality? When Bayes’ nets reflect the true causal patterns: Often simpler (nodes have fewer parents) Often easier to think about Often easier to elicit from experts BNs need not actually be causal Sometimes no causal net exists over the domain End up with arrows that reflect correlation, not causation What do the arrows really mean? Topology may happen to encode causal structure Topology only guaranteed to encode conditional independence
  • 48.
    Summary Bayes network: Graphical representation of joint distributions Efficiently encode conditional independencies Reduce number of parameters from exponential to linear (in many cases) Thursday: Inference in (general) Bayes networks