CSCE 582 {=STAT 582} (Spring 2006): Lecture Log

January 10 (Tue), 2006 Introduction to the Course: Goals and Topics to Be Covered. The bulletin description, explained: "582 -- Bayesian Networks and Decision Graphs. {=STAT 582} (3) (Prereq: CSCE 350 and STAT 509) Normative approaches to uncertainty in artificial intelligence. Probabilistic and causal modeling with Bayesian networks and influence diagrams. Applications in decision analysis and support. Algorithms for probability update in graphical models." Examples of plausible reasoning and causal graphs: icy roads and earthquake.

January 12 (Thu), 2006 More examples of plausible reasoning and causal graphs: wet grass and car start. Reasoning in the evidential and causal direction. Intercausal reasoning and explaining away. Uncertain reasoning is non-functional. Serial, diverging, and converging connections. Definition of d-separation, with several examples. Complexity of checking d-separation using the naive algorithm based on the definition: list all chains, check every connection on each chain.

January 17 (Tue), 2006 HW1 assigned, due Tuesday, January 24. Probability. The three axioms of Kolmogorov [1950]. Outcome space, event space, probability measure. Three historically important interpretations in which the axioms of Kolmogorov are true (i.e., three models of them): classical, frequentist, and subjective. Detailed presentation of the classical approach, including proof of the three properties that are the axioms of Kolmogorov. Brief presentation of the frequentist approach. Introduction to the subjective approach.

January 19 (Thu), 2006 Subjective probability defined from betting behavior: "[De Finetti] defines the probability P(E) of an event E as the fraction of a whole unit value which one would feel is the fair amount to exchange for the promise that one would receive a whole unit of value if E turns out to be true and zero units if E turns out to be false" [Neapolitan, 1990, p.56]. Similarly, subjective probability could be defined using the equivalent urn model. The probability P(E) of an event E is the fraction of red balls in an urn containing red and brown balls such that one would feel indifferent between the statement "E will occur" and "a red ball would be extracted from the urn." (I believe that this definition is due to D.V. Lindley.) Kolmogorov's axioms and the "definition" of conditional probability can be derived from the definition of subjective probability and the assumption of coherence in (betting) behavior, as explained on pp.56-58 of the handout. "Definition" of conditional probability derived in the classical approach (Theorem 2.2), with the assumption that equiprobable outcomes remain equiprobable in a new state of knowledge. Definition of conditional probability as an additional axiom of probability, which is true in the three major models of probability (classical, frequentist, subjective).

January 24 (Tue), 2006 HW1 collected. Display system does not work. Handout: Section 1.5 of Korb and Nicholson. Bayes rule, likelihood, prior probability, posterior probability, conditionalization used to update beliefs. Where do the numbers come from? Soft evidence and Jeffrey's rule.

January 28 (Tue), 2006 HW2 assigned: exercises 1.6 through 1.10 [J01], due Thursday, February 2. Handout: pages 472-475 of Russell and Norvig's _Artificial Intelligence: A Modern Approach, 2nd edition_. Utility, expected utility: introduction. Dutch books. Examples of Bayesian reasoning: breast cancer and "people vs. Collins," from the Korb and Nicholson handout. Probability tables.

January 31 (Tue), 2006 HW1 returned. Solution discussed in class. HW2 due date changed to Tuesday, February 7. Probability tables. Conditional and unconditional independence. Richard Neapolitan's definition of Bayesian network (on web slides).

February 2 (Thu), 2006 Detailed computation of prior and posterior probabilities in the icy road example: exploiting conditional independence arising from qualitative considerations, using Bayes' rule and the fundamental rule, table pointwise multiplication and division, marginalization. Start of similar treatment of the wet grass example.

February 7 (Tue), 2006 HW2 collected. HW3 assigned: exercises 11-15 in Chapter 1 [J01]. Note that Figure 1.3 should be used for exercise 14, as noted in the errata sheet prepared by the author. Wet grass example, with numbers (partly). Definition of Bayesian networks [Neapolitan, 1990]. The visit to Asia example. The chain rule for Bayesian networks, claimed.

February 9 (Thu), 2006 HW3 due date set for Thursday, February 16. Completion of the wet grass number example, in full. Variable families, clusters, calibration. Review of dags, topological sort, partial and linear orders. Proof of the chain rule for Bayesian networks (version without explicit induction).

February 14 (Tue), 2006 HW3 (PR1) due date reset to Tuesday, February 21. See departmental dropbox. HW2 returned, exercises corrected in class. Introduction to Hugin, with a small example. Review for midterm, which will be on Thursday, February 16: midterm of 2000 (now linked to web site).

February 16 (Thu), 2006 MT1.

February 21 (Tue), 2006 MT1 returned. Handout: Rina Dechter, "Bucket Elimination: A Unifying Framework for Probabilistic Inference" In "Uncertainty in Artificial Intelligence," UAI96, 1996, pp. 211-219. The variable elimination (Dechter's "bucket elimination" variant) algorithm for computation of posterior probabilities ("belief assessment"), with detailed example using "Visit to Asia."

February 23 (Thu), 2006 Bucket elimination for the MPE. MAP restated. Bertele' and Brioschi's Non-Serial Dynamic Programming: basic definitions and example.

February 28 (Tue), 2006 HW4 (PR2) assigned, due March 7: exercises 2.1-2.8 except 2.5 [J01]. Variable elimination continued. The order of elimination does not affect the result, but it affects the complexity of computing a marginal. Better elimination orders lead to h functions that have small domains. Definition of interaction graph. Changes in the interaction graph depend on the order of elimination. Minimum degree heuristic. Fill-in. Minimum deficiency heuristic. Asia example. Moralization. BOBLO network: handout from [J96] and network from samples in the Hugin distribution.

March 2 (Thu), 2006 PR2 is due on March 16 (Thu) Hypothesis variables. Infected milk, one-day and seven-day versions. Hidden Markov Models (HMMs). Non-HMM version of the seven-day infected milk model.

March 14 (Tue), 2006 Cold, hay fever and sneezing. The intervention-based notion of causality, again. The stratum method for constructing Bayesian networks (from independence information derived from qualitative descriptions). Cold and angina.

March 16 (Thu), 2006 PR2 returned. More examples of the stratum method: telescopes, bell and coins. Reconstruction of CPTs for bell and coins.

March 21 (Tue), 2006 PR3 assigned: exercises 2.9 and 2.12 [J01]. More on bell and coins. Poker game example, also as an influence diagram. Monty Hall puzzle solved with BNs (started).

March 23 (Thu), 2006 Monty Hall completed. Several other examples in Hugin, including stud farm and symbol transmission.

March 28 (Tue), 2006 HW3 (note that old HW3 and HW4 were renamed PR1 and PR2, respectively; note that in the departmental dropbox, this is HW2!!!) assigned: exercise 2.17 [J01]. Family out example. Noisy or. Representing relations. Socks example.

March 30 (Thu), 2006 Divorcing (2.3.3), with loan example; noisy functional dependence (2.3.4), with fever example; time-stamped models (2.3.5), with infected milk example.

April 4 (Tue), 2006 Expert disagreement (2.3.6). Special features (2.4): joint probability tables, (two methods), MPE, data conflict, sensitivity analysis, with examples.

April 6 (Thu), 2006 PR5 assigned, due April 18, 2006: exercises 2.8, 2.10 (i,ii), 2.11, 2.12, 2.13, 2.15, 2.18 [J01], and exercise based on section 2.3.6 of [J01]: compute P(S) for the BN whose structure is given in Figure 2.36, with n = 5, P(S_i|S_{i-1}) (i=2..5) as given in Table 2.17, P(A) given on p.66, P(D|B,C,S) given in Table 2.15, P(S_1) = (0.25, 0.25, 0.5), P(B_i) = (0.6, 0.4), P(C_i=y|Ai) = (0.9, 0.1), and the five cases (A,B,C,D) = {(n,n,n,y), (y,n,n,y), (n,y,n,n), (n,y,n,y), (n,y,y,n)}. Second midterm will be on Tuesday, April 11. Introduction to learning, adaptation, and tuning (beginning of ch.3 [J01]. Distance measure for probability distributions (3.1): the logarithmic scoring rule and cross-entropy (KL divergence); the quadratic scoring rule and the quadratic (or Brier) scoring rule.

April 11 (Tue), 2006 Value of information (section 4.3 [J01]), with examples.

April 13 (Thu), 2006 PR6: exercise 4.5 (Hugin), due Monday, April 24. Sections 4.1, 4.2, 4.3 (non-utility-based value functions), non-myopic VOI, with the two-test infected milk example, shown as an abbreviated decision tree.

April 18 (Tue), 2006 Guest lecturer: Yimin Huang. Batch Learning: 3.2.1, 3.2.2.

April 20 (Thu), 2006 Guest lecturer: Yimin Huang. Adaptation: 3.3.1, 3.3.2, 3.3.5.