Link to the paper: http://homepages.inf.ed.ac.uk/srenals/plda-spl2014.pdf
PLDA is formulated by a generative model, where an acoustic feature vector $\boldsymbol{y}_t$ from the $j$-th HMM state at time index $t$ can be expressed as
$\boldsymbol{y}_t | j, m = \boldsymbol{U}_m \boldsymbol{x}_{jmt} + \boldsymbol{G}_m \boldsymbol{z}_{jm} + \boldsymbol{b}_m + \epsilon_{mt}$,
where $m$ is the Gaussian component index of the GMM for state $j$.
$\boldsymbol{z}_{jm}$ is the component dependent variable, shared by the whole set of acoustic feature frames generated by the $j$-th state's $m$-th Gaussian.
$\boldsymbol{x}_{jmt}$ is the channel variable which explains the per-frame variations.
In their work, the prior distributions of $\boldsymbol{z}_{jm}$ and $\boldsymbol{x}_{jmt}$ are assumed to be $\mathcal{N}(\boldsymbol{0}, \boldsymbol{I})$.
$\boldsymbol{b}$ denotes the bias.
$\epsilon_t$ is the residual noise which is Gaussian with a zero mean and diagonal covariance, i.e. $\epsilon_t \sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{\lambda})$
No comments:
Post a Comment