1). Lexicon preparation.
In the language specification, there are totally 25 consonants and 45 vowels (12 monophthongs, 25 diphthongs and 8 triphthongs). The total number of phonemes is 54 which may be too much for a "limited" language. In Kaldi's setup, both the diphthongs and triphthongs are mapped back to the monophthongs with the following option:
phoneme_mapping="i@U=i @ U;oaI=o a I;oaI:=o a I:;u@I=u @ I;uI@= u I @;1@I=1 @ I;1@U=1 @ U;
a:I=a: I; a:U=a: U; aU=a U; @U=@ U; aI=a I; @I=@ I; EU=E U; eU=e U; i@=i @; iU=i U; Oa:=O a: ; Oa=O a;
OE=O E; OI=O I; oI=o I; @:I=@: I; u@=u @; 1@=1 @; ue=u e; uI=u I; 1I=1 I; u@:=u @:; 1U=1 U; ui:=u i:"
Through this processing, the number of individual phonemes are reduced to the number of consonants and monophthongs.
2). Tone information.
Although tones are only applied to vowels in theory, in the Kaldi setup, the tone is applied to all the phonemes of the corresponding syllable. One possible explanation is that the use of tones to vowels may also affect the realization of consonants due to the co-articulation effects.
One original lexicon item:
The corresponding Kaldi item:
2). Tone information.
Although tones are only applied to vowels in theory, in the Kaldi setup, the tone is applied to all the phonemes of the corresponding syllable. One possible explanation is that the use of tones to vowels may also affect the realization of consonants due to the co-articulation effects.
One original lexicon item:
Amway a: m _1 . w aI _1
The corresponding Kaldi item:
Amway a:_1 m_1 w_1 a_1 I_1
The period in the pronunciation indicates the syllable boundary. With this tonal information, the number of phonemes increased to around 6 times as there are totally 6 different phonemes.
3). Position dependent phonemes.
The phonemes used in Kaldi are further distinguished using their positions in words. Four positions marker are used: (B)egin, (E)nd, (I)nternal and (S)ingleton . For this setup, even SIL is marked to have following variations:
4). Features.
In the Kaldi's setup, PLP features are used together with Pitch features and/or FFV (fundamental frequency variations).
3). Position dependent phonemes.
The phonemes used in Kaldi are further distinguished using their positions in words. Four positions marker are used: (B)egin, (E)nd, (I)nternal and (S)ingleton . For this setup, even SIL is marked to have following variations:
SIL SIL_B SIL_E SIL_I SIL_S
.4). Features.
In the Kaldi's setup, PLP features are used together with Pitch features and/or FFV (fundamental frequency variations).
No comments:
Post a Comment