Lugano,
Tuesday 27 July
Speaker: Scott Ferson
These five approaches redress, or comprehensively solve, several major deficiencies
of Monte Carlo simulations and of standard probability theory in risk assessments.
For instance, it is almost always difficult, if not impossible, to completely
characterize precise distributions of all the variables in a risk assessment,
or the multivariate dependencies among the variables. As a result, in the
practical situations where empirical data are limiting, analysts are often
forced to make assumptions that can result in assessments that are arbitrarily
over-specified and therefore misleading. In practice, the assumptions typically
made in these situations, such as independence, (log)normality of distributions,
and linear relationships, can under
More fundamentally, it can be argued that probability theory has an inadequate
model of ignorance because it uses equiprobability as a model for incertitude
and thus cannot distinguish uniform risk from pure lack of knowledge. In most
practical risk assessments, some uncertainty is epistemic rather than aleatory,
that is, it is incertitude rather than variability. For example, uncertainty
about the shape of a probability distribution and most other instances of
model uncertainty are typically epistemic. Treating incertitude as though
it were variability is even worse than overspecification because it confounds
epistemic and aleatory uncertainty and leads to risk conclusions that are
simply wrong. The five approaches based on interval and imprecise probabilities
allow an analyst to keep these kinds of uncertainty separate and treat them
differently as necessary to maintain the interpretation of risk as the frequency
of adverse outcomes.
The five approaches also make backcalculations possible and practicable in
risk assessments. Backcalculation is required to compute cleanup goals, remediation
targets and performance standards from available knowledge and constraints
about uncertain variables. The needed calculations are notoriously difficult
with standard probabilistic methods and cannot be done at all with straightforward
Monte Carlo simulation, except by approximate, trial-and-error strategies.
Although the five approaches arose from distinct scholarly traditions and
have many important differences, the tutorial emphasizes that they share a
commonality of purpose and employ many of the same ideas and methods. They
can be viewed as complementary, and they constitute a single perspective on
risk analysis that is sharply different from both traditional worst-case and
standard probabilistic approaches. Each approach is illustrated with a numerical
case study and summarized by a checklist of reasons to use, and not to use,
the approach.
The presentation style will be casual and interactive. Participants will receive
a CD of some demonstration software and the illustrations used during the
tutorial.
Overview of topics
What's missing from Monte Carlo?
Correlations are special cases of dependencies
Probability theory has an inadequate model of ignorance
Model uncertainty is epistemic rather than aleatory in nature
Backcalculation cannot be done with Monte Carlo methods
Interval probability
Conjunction and disjunction (ANDs and ORs)
Fréchet case (no assumption about dependence)
Mathematical programming solution
Case study 1: fault-tree for a pressurized tank system
Why and why not use interval probability
Robust Bayes and Bayesian sensitivity analysis
Bayes' rule and the joy of conjugate pairs
Dogma of Ideal Precision
Classes of priors and classes of likelihoods
Robustness and escaping subjectivity
Case study 2: extinction risk and conservation of pinnipeds
Why and why not use robust Bayes
Dempster-Shafer theory
Indistinguishability in evidence
Belief and plausibility
Convolution via the Cartesian product
Case study 3: reliability of dike construction
Case study 4: human health risk from ingesting PCB-contaminated waterfowl
Why and why not use Dempster-Shafer theory
Probability bounds analysis
Marrying interval analysis and probability theory
Fréchet case in convolutions
Case study 5: environmental exposure of wild mink to mercury contaminationbirds
to an agricultural insecticide
Backcalculation
Case study 6: planning cleanup for selenium contamination in San Francisco
Bay
Why and why not use probability bounds analysis
Imprecise probabilities
Comparative probabilities
Closed convex sets of probability distributions
Multifurcation of the concept of independence
Case study 7: medical diagnosis
Why and why not use imprecise probabilities
***
Slides of Scott's talk and exercises are available.
Speaker:
In the afternoon session, we move on to the notion of conditioning, and shed
more light on fundamental results such as the Generalised Bayes Rule
and the Marginal Extension Theorem. These lead to techniques, based
on the rationality criterion of coherence, that allow us to construct a conditional
model from an unconditional one, and to combine conditional and marginal models.
The classroom exercises are intended to allow the students to become more
familiar with the more theoretical notions discussed in the theory part.
***
Slides of Gert's talk and exercises are available.
Thursday 29 July
Speaker:
We begin with a discussion of canonical expected utility theory, and topics
related to decisions in static (non-sequential) decisions. These include criteria
relating to coherence, avoiding sure-loss – sometimes called “Book” – and
admissibility. We will consider criteria relating to ordering assumptions.
Third, we will review results that do not require an Archimedean (or Continuity)
condition.
Following that, the discussion will focus on some criteria that affect sequential
decision theory, including equivalence of normal and extensive form decisions,
and various
notions of “dynamic” coherence.
Next, we will examine what becomes of these same criteria with various decision
rules that apply when either probability or utility is allowed to go indeterminate.
The class will include some practice with tools for elicitation and for sequential
decision analysis.
***
Slides of Teddy's talk are available.
Friday 30 July
Speakers:
Bayesian networks are models to represent complex and uncertain relationships
between a large number of variables. They are based on an explicit representation
of independence relationships by means of a graph, and procedures to exploit
the factorization associated to independence in order to produce fast inferences.
They have been very successful in building real applications, but one of their
main drawbacks is that, very often it is necessary to give precise estimations
for a large number of probability values, sometimes with very small sample
sizes. Credal networks try to avoid this difficulty by allowing the
use of imprecise probabilities. We will review the work that has been done
in credal networks with two main points: inference (much more difficult than
with precise probability) and learning from data (in general, probabilistic
procedures are applied to learn the structure and very few genuine methods
based on imprecision have been proposed).
More generally speaking, we will show that the task called knowledge discovery
from data sets can benefit from adopting imprecise probability methods.
Knowledge discovery typically assumes that data are the only source of information
about a domain, and aims at inferring models that make domain knowledge explicit.
Learning from data is thus started in conditions of prior ignorance; and the
data are often available in incomplete way, such as when values are missing
in the data set, which involves another form of ignorance that is about the
data themselves. When pattern classification is concerned, the inferred
models are used in practice to do medical diagnosis, fraud detection, or image
recognition, just to name a few applications. Modeling ignorance carefully
is a central issue to make these models and applications reliable. This issue
is strictly related to the possibility to state, and work with, weak assumptions.
Initially we will show how imprecise probability allows to reliably dealing
with incomplete data in a way that significantly departs from established
approaches. Missing data are a serious problem of knowledge discovery
application, that can severely limit the credibility of the inferred models.
Imprecise probability makes robust modeling of missing data possible by permitting
to do no assumptions on the mechanism that turns complete into incomplete
data. The issues of learning from, and classifying, incomplete data will be
treated in a unified framework by a generalized updating rule. This will naturally
produce generalized classifiers, called credal classifiers, with the
novel characteristic of being able to (partially) suspend the judgment when
there are reasonable doubts about the correct classification. Credal classifiers
will be shown to be able to carefully deal also with the prior ignorance problem,
by relying on the imprecise Dirichlet model.
Finally, we will focus on the practical design of credal classifiers. We will
consider the naive Bayes, TAN, and C4.5 classifiers.
Naive Bayes and TAN are special cases of Bayesian networks, while C4.5 is
a classification tree. These are traditional classifiers, which are very popular
and widely recognized to be good in the knowledge discovery community. We
will review the extension of these models to credal classification, showing
how to infer them from data and to carry out the classification. Real case
studies will be presented to show the impact of credal classification.
***
Slides of Serafin's talk
and exercises are available.
Slides of Marco's
talk and exercises
are available.
Saturday 31 July
Speaker:
***
Slides of Thomas'
talk and summary lecture are available.
Typical schedule
08:30-10:30 Talk
10:30-11:00 Coffee break
11:00-13:00 Exercises
13:00-14:30 Lunch
14:30-16:30 Talk
16:30-17:00 Coffee break
17:00-19:00 Exercises