Tuesday, January 14, 2014

Dryrun

To start the participation of this year's openKWS, setup the dry-run of the submission is carried out first before having any system yet.

Detailed instructions could be found at http://www.nist.gov/itl/iad/mig/openkws14dryrun.cfm

Following are the steps done:

1. Vietnamese data download, provided by Prof.
2. IndusDB - not available
3. SCTK installed
4. JobRunner extracted
5. F4DE: so much staff, maybe only care about the KWSEval is enough, but just in case, install all!
         make
         sudo apt-get install gnu-plot libxml2 sqlite3
         make perl_install
6. Account application - needs PI


Following are some notes from the doc (http://www.nist.gov/itl/iad/mig/upload/KWS14-evalplan-v11.pdf) to be kept in mind:

1. the KWS task is to final all of the occurrences of a keyword, a sequence of one or more words in a corpus of un-segmented speech data.

2. the lexicon provided in the "build pack" for training contains entries for both the training and development test data. The lexical items that exist only in the development test data must be excluded during model training. 

3. keywords, a sequence of contiguous lexical items, will be specified in the language's UTF-8 encoded, native orthographic representation.

4. Homographs, words with the same written form but different meanings, will not be differentiated. Morphological variations of a keyword will not be considered positive variations.

5. transcript comparisons will be case insensitive

6. the silence gap between adjacent words in a keyword must be <= 0.5 second