Hidden Markov Models (HMMs) are one of the most powerful statistical
tools developed in the last twenty years to model
sequences of data such as time series, speech signals or biological sequences.
One of their distinctive features lies on the fact that they can handle
sequences of varying sizes, through the use of an internal state variable.
HMMs are usually used in a generative framework in which one constructs
a different HMM for each class of data to be modeled. For instance in speech,
this could be one HMM per phoneme; in biological sequence analysis, it could be
one HMM per sequence family. Each trained HMM can then be used to compute the
likelihood of a given unknown sequence. When one has to decide the class of
an unknown sequence, one usually uses either the Maximum Likelihood (ML)
criterion that says that the most probable class corresponds to the HMM that
gives the highest likelihood over the sequence, or more generally
the Maximum A Posteriori (MAP) criterion, which weights the likelihood
of the sequence given the HMM by the a priori probability of the HMM.
Unfortunately, it is well known that for classification problems, a
better solution should in theory be to use a discriminant framework. In
that case, instead of constructing a model independently for each class,
one constructs a unique model that decides where the frontiers between
There have been a lot of developments in discriminant approaches
for classification, one of the most recent regarding the use of
support vector machines (SVMs), which
computes a linear combination of the most important examples (the support
vectors) in a high dimensional space (the kernel space).
This model has nice theoretical properties
but, unfortunately, cannot be used easily for problems expressed in terms
of variable size sequences (such as temporal patterns) or any kind of higher
order structures (such as trees).
However, a series of recent papers have suggested some possible
techniques that could be used to mix generative models such as
HMMs (to handle the sequential aspects) and discriminant
models such as SVMs (to generate discriminant properties). In
one of these papers [--1--], the authors show how to compute a
fixed size vector (named the Fisher score) containing a
sufficient statistic of a given sequence for an already trained
HMM, and use this score to train an SVM to decide in a
discriminant way which class the sequence belongs to. This
method has recently been applied successfully to biological
sequence classification. While providing several
advantages, such an approach has also been proved to always be
at least as good as the results that can be obtained from the
generative MAP criterion.
The purpose of the present project is thus to study, experiment
(on different kinds of sequential data), enhance, and adapt
these new approaches of integrating discriminant models such
as SVMs into generative models for sequence processing such as
HMMs. While focusing on the generic properties of these new
models, they will also be tested on various sequence processing
tasks of interest to IDIAP, such as speaker verification and
speech recognition, allowing the comparison with state-of-the-art
systems (as available at IDIAP).
In speaker verification, one has to validate, on the basis
of a speech signal, the claimed identity of a
speaker This is clearly a classification task (in fact, an
hypothesis test) where a system has to discriminate between
clients and impostors. In speech recognition, one has to
translate a speech signal into a sequence of words, each word
being represented by a sequence of phonemes. Each phoneme is
usually modeled using an HMM and again the methods discussed
above could be used to discriminate between the different
phoneme models. Of course, in this case, these methods will have
to be adapted to handle the problem of signal segmentation.
Keywords: learning algorithms, hidden Markov models,
kernel methods, support vector machines, sequence processing,
speech processing, speaker verification, speach recognition.
T. JAAKKOLA AND D. HAUSSLER
, Exploiting generative models in
discriminative classifiers, in Advances in Neural Information Processing
Systems 11: Proceedings of the 1998 Conference", M. Kearns, S. Solla, and
D. Cohn, eds., MIT Press, 1999, pp. 487-493.