Reference guide

This is your starting point for programming in or with the toolkit. Class reference can be found in the menu above.

I use the great application "kdevelop" to edit and build my source code, but if you want to build the source code manually, this is how kdevelop does it. Start in the main shout directory and type:

  • gmake -f Makefile.cvs
  • mkdir optimized
  • cd optimized
  • CXXFLAGS="-O3 -funroll-loops -march=pentium4 -malign-double -mfpmath=sse -msse -msse2" ../configure
  • gmake -j1

The building blocks of Shout

This software package contains multiple applications. A short description of each application is given below. If you just want to use the toolkit and you do not have the intention to develop yourself, it is better to read the user manual.


The hart of the software package, the decoder, is called shout. This is where all the models come together... During the early days of the development the decoder was called 'whisper'. Unfortunately another decoder with the same name already existed. I have changed the decoder name, but the main class that is handling the top-level recognition is still called Whisper. Whisper will load all needed models. After that, most work is done by the LexicalTree class.


This application reads an acoustic model file and a training/adapting phone directory and creates a new acoustic model file that is adapted to the training data using the Structured Maximum a Posteriori Linear Regression (SMAPLR) method. The main class of shout_adapt_am is Adapt_AM.


This is the speaker diarization application. See Shout_Cluster.


This application translates a dictionary file (text file containing rows of: word - tabs or spaces - phone pronunciation) into a lexical Prefix Tree, also called a Pronunciation Prefix Tree (PPT), suitable for the decoder to read. The main class of shout_dct2lextree is Shout_dct2lextree.


Shout can handle uni-, bi-, tri- and four-gram ARPA language models (depending on how the distribution is compiled). The application shout_lm2bin will read an ARPA LM and translate it to a binary format suitable for the decoder. The main class of shout_lm2bin is Shout_lm2bin.


Shout_maketrainset will read an hypothesis file (the output of Shout in native format) or a Master Label File (MLF) and will store all phones in the training directory. This data directory is later used by the training application (shout_train_master). The main class of shout_maketrainset is Shout_MakeTrainSet. It is possible to create a training directory for either ASR models, SAD models, diarization models or VTLN models.


In case you don't want to re-train all phones, with this application you can choose phone models from two AM files to store in a new AM file. See ShoutMergeAm.


This is the speech/non-speech detector. The main class is ShoutSegment.


Determines the VTLN warping factor. The main class of shout_vtln is Shout_VTLN.