Link to the paper: http://homepages.inf.ed.ac.uk/srenals/plda-spl2014.pdf
PLDA is formulated by a generative model, where an acoustic feature vector \boldsymbol{y}_t from the j-th HMM state at time index t can be expressed as
\boldsymbol{y}_t | j, m = \boldsymbol{U}_m \boldsymbol{x}_{jmt} + \boldsymbol{G}_m \boldsymbol{z}_{jm} + \boldsymbol{b}_m + \epsilon_{mt},
where m is the Gaussian component index of the GMM for state j.
\boldsymbol{z}_{jm} is the component dependent variable, shared by the whole set of acoustic feature frames generated by the j-th state's m-th Gaussian.
\boldsymbol{x}_{jmt} is the channel variable which explains the per-frame variations.
In their work, the prior distributions of \boldsymbol{z}_{jm} and \boldsymbol{x}_{jmt} are assumed to be \mathcal{N}(\boldsymbol{0}, \boldsymbol{I}).
\boldsymbol{b} denotes the bias.
\epsilon_t is the residual noise which is Gaussian with a zero mean and diagonal covariance, i.e. \epsilon_t \sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{\lambda})
No comments:
Post a Comment