Thursday, May 22, 2014

[paper] Probabilistic linear discriminant Analysis for acoustic model

Link to the paper: http://homepages.inf.ed.ac.uk/srenals/plda-spl2014.pdf

PLDA is formulated by a generative model, where an acoustic feature vector $\boldsymbol{y}_t$ from the $j$-th HMM state at time index $t$ can be expressed as

$\boldsymbol{y}_t | j, m = \boldsymbol{U}_m \boldsymbol{x}_{jmt} + \boldsymbol{G}_m \boldsymbol{z}_{jm} + \boldsymbol{b}_m + \epsilon_{mt}$,

where $m$ is the Gaussian component index of the GMM for state $j$.

$\boldsymbol{z}_{jm}$ is the component dependent variable, shared by the whole set of acoustic feature frames generated by the $j$-th state's $m$-th Gaussian.

$\boldsymbol{x}_{jmt}$ is the channel variable which explains the per-frame variations.

In their work, the prior distributions of $\boldsymbol{z}_{jm}$ and $\boldsymbol{x}_{jmt}$ are assumed to be $\mathcal{N}(\boldsymbol{0}, \boldsymbol{I})$.

$\boldsymbol{b}$ denotes the bias.

$\epsilon_t$ is the residual noise which is Gaussian with a zero mean and diagonal covariance, i.e. $\epsilon_t \sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{\lambda})$