PhoneModel Class Reference

The PhoneModel class handles likelihood calculation of phones given an observation sequence. More...

Inheritance diagram for PhoneModel:

List of all members.


Public Member Functions

 PhoneModel (int dim=ASR_DEFAULT_VECTORSIZE)
 PhoneModel (FILE *inFile, int dim=ASR_DEFAULT_VECTORSIZE, bool onlyUseFastP=false)
 PhoneModel (MixGaussian *mix, double toNext)
 ~PhoneModel ()
int dim () const
bool isSilModel () const
void setSilTrans (double tr)
int getStateNr (int contextKey, int state)
ModelStatsgetStatistics (void)
int getNumberOfGaussians ()
virtual int touchPDF (int contextKey, int t, MixGaussian **updateThese, double **resultHere)
virtual void processVector (int contextKey, Vector *v, int t, int index, TokenType **token, int tLength, TokenType *inToken, DecoderSettings *settings, float *bestL)
virtual void getOutput (int contextKey, TokenType **token, int tLength, TokenType **outToken, DecoderSettings *settings, float *bestL)
virtual double getLookaheadLogP (double *vectorList, int timeStamp, bool doSecondHalf)
void copyGaussians (MixGaussian *destMixGaussian, int maxNmbr)
void resetPhoneAdmin ()
void mapAdaptMeans ()
void adapt_setInitialNode (Adapt_AM_TreeNode *node)
void adapt_setNode ()
void adapt_addAcumulatorData (int state, int contextKey, Vector *observation, double probability=1.0)
void adapt_setHelperMatrices ()
void adapt_clear ()
void adapt_adapt ()
void adapt_setVarTrans ()
void adapt_adaptVar ()
void adapt_unAdapt ()
void adapt_setAcumulators (int useLabel, int useSegmentation, FeaturePool *usePool)
void writeAccumulators (FILE *file, FILE *fileF=NULL, FILE *fileST=NULL, bool doBinary=false)
void addAccumulators (FILE *file)
void writeModel (FILE *outFile)
double getLogPDFProbability (int contextKey, Vector *v)
double getPDFProbability (int contextKey, Vector *v, int stateNr, int time)
double getTransition (int contextKey, bool toSelf, int stateNr)
void getSilSumHist (Vector **histogram)
void printModel (FILE *fileMean, FILE *fileVariance, FILE *fileWeight)
int printInfo (Vector *v)

Static Public Member Functions

static void addChain (TokenType **tokenNew, float likelihood, TokenType *token, int stateNr, int curTime, DecoderSettings *settings, float *bestL, bool checkCollission)
static void addChain_ordered (TokenType **tokenNew, float likelihood, TokenType *token, int stateNr, int curTime, DecoderSettings *settings, float *bestL)
static void initialiseToken (TokenType **token)
static PLRTypecopyPhonePath (PLRType *pP)
static void initialisePhonePath (PLRType *t)

Public Attributes

MixtureSetmixtureSetData

Static Protected Member Functions

static bool replaceTokenLM (TokenType *nt, TokenType *token, float like, float *bestL)
static void addChain_lmla (TokenType **tokenNew, float likelihood, TokenType *token, int index, DecoderSettings *settings, float *bestL)

Protected Attributes

ModelStats statistics
 Statistical information about this acoustic model.
int * timeStamp
 The model only calculates some context info once for every 'timestamp' frame.
int * stateMix_1
 An array with the index i for mixtureSetData[i] for every context (state 1).
int * stateMix_2
 An array with the index i for mixtureSetData[i] for every context (state 2).
int * stateMix_3
 An array with the index i for mixtureSetData[i] for every context (state 3).
bool isSil
int dimensions
int silRinglastPos
float * weightRinglastPos

Detailed Description

The PhoneModel class handles likelihood calculation of phones given an observation sequence.

For this implementation, the phone models are represented by Hidden Markov Models (HMM). Training the HMM is done by the TrainPhoneModel class. TrainPhoneModel stores the HMM paramaters in a binary file (all models together form the binary acoustic model file) and PhoneModel can load the model in memory at startup.

PhoneModel objects are able to determine the likelihood that a phone is pronounced, given an observation sequence AND a TokenType string. PhoneModel does not store state tokens itself, it only handles the (static) parameters needed to calculate HMM likelihoods. Because the user handles the token administration (in LexicalTree), it is possible to use a phone model in more than one node, without copying its parameters.

The acoustic models are context-dependent models. During training (TrainPhoneModel) it is decided which contexts share one or more states and transition probabilities. Each model contains a pool of states, stored in the variable mixtureSetData. The arrays stateMix_1, stateMix_2 and stateMix_3 contain for each context the index of mixtureSetData. stateMix_1 is used to determine which state of the pool-of-states should be used for the first state of the HMM. stateMix_2 is used for the second and stateMix_3 for the third. The index for stateMix_x is calculated as followes:

contextKey = leftContext * numberOfPhones + rightContext

Therefore, the first state of the context 'A' - 'l' - 's' (where this model is 'l' and the left/right context is 'A'/'s' and 'A' is the 2nd phone and 's' the 8th) can be found by:

First HMM-state = mixtureSetData[stateMix_1[2*54+8]]


Constructor & Destructor Documentation

PhoneModel::PhoneModel ( int  dim = ASR_DEFAULT_VECTORSIZE  ) 

PhoneModel::PhoneModel ( FILE *  inFile,
int  dim = ASR_DEFAULT_VECTORSIZE,
bool  onlyUseFastP = false 
)

PhoneModel::~PhoneModel (  ) 

During destruction, the context-information (stateMix_x, mixtureSetData and timestamp) is deleted.

References mixtureSetData, ModelStats::nrOfContexts, MixtureSet::state, stateMix_1, stateMix_2, stateMix_3, statistics, timeStamp, and weightRinglastPos.


Member Function Documentation

void PhoneModel::adapt_adapt (  ) 

After all adaptation matrices are calculated, the adapt_adapt() method will adapt the PhoneModel.

References MixGaussian::adapt_adapt(), mixtureSetData, ModelStats::nrOfContexts, MixtureSet::state, and statistics.

Referenced by Adapt_AM::Adapt_AM(), ShoutPrepareAdapt::adaptingModel(), and Adapt_Segmenter::adaptModel().

Here is the call graph for this function:

void PhoneModel::adapt_adaptVar (  ) 

Todo:
Docs

References MixGaussian::adapt_adaptVar(), mixtureSetData, ModelStats::nrOfContexts, MixtureSet::state, and statistics.

Referenced by Adapt_AM::Adapt_AM().

Here is the call graph for this function:

void PhoneModel::adapt_addAcumulatorData ( int  state,
int  contextKey,
Vector observation,
double  probability = 1.0 
)

The LexicalTree will align acoustic data and call this method with the state- and phone context of the observed feature Vector. The Vector will be used for training the MixGaussian object that is used in the model at this state and for this context.

References mixtureSetData, MixtureSet::state, stateMix_1, stateMix_2, stateMix_3, and MixGaussian::train().

Referenced by LexicalTree::adaptAMs(), and LexicalTree::latticeBaumWelch_mmi_accumulatorsPosteriors().

Here is the call graph for this function:

void PhoneModel::adapt_clear (  ) 

This method will clear all adaptation data and make the PhoneModel ready for a new adaptation run.

References MixGaussian::adapt_clear(), mixtureSetData, ModelStats::nrOfContexts, MixtureSet::state, and statistics.

Referenced by ShoutPrepareAdapt::adaptingModel(), Adapt_Segmenter::adaptModel(), ShoutPrepareAdapt::ShoutPrepareAdapt(), SpkrecStats::SpkrecStats(), and Whisper::Whisper().

Here is the call graph for this function:

void PhoneModel::adapt_setAcumulators ( int  useLabel,
int  useSegmentation,
FeaturePool usePool 
)

Performs the adaptation training-run. This run will set all acumulators for SIL models only!

References FeaturePool::getFirstVectorFirstSegment(), FeaturePool::getFirstVectorNextSegment(), FeaturePool::getNextVector(), FeaturePool::getSegmentID(), isSil, mixtureSetData, MixtureSet::state, stateMix_1, and MixGaussian::train().

Referenced by TrainPhoneModel::adapt_setAcTrain(), and SpkrecStats::SpkrecStats().

Here is the call graph for this function:

void PhoneModel::adapt_setHelperMatrices (  ) 

This method will call the MixGaussian::adapt_setHelperMatrices() method of the MixGaussian object of all contexts in this model.

References MixGaussian::adapt_setHelperMatrices(), mixtureSetData, ModelStats::nrOfContexts, MixtureSet::state, and statistics.

Referenced by Adapt_AM_TreeNode::setHelperMatrices_1().

Here is the call graph for this function:

void PhoneModel::adapt_setInitialNode ( Adapt_AM_TreeNode node  ) 

The SMAPLR adaptation starts with creating one single cluster (node) at which all gaussians in the system will need to register. The method adapt_setInitialNode() will call the MixGaussian::adapt_setInitialNode() of all Gaussian objects that are in the mix (of this MixGaussian object)

References MixGaussian::adapt_setInitialNode(), mixtureSetData, ModelStats::nrOfContexts, MixtureSet::state, and statistics.

Referenced by Adapt_AM::Adapt_AM(), ShoutPrepareAdapt::adaptingModel(), and Adapt_Segmenter::adaptModel().

Here is the call graph for this function:

void PhoneModel::adapt_setNode (  ) 

The PhoneModel has registered at least once to an adaptation node when this method is called. The first time adapt_setInitialNode() shall be used. This method will call the Mix Gaussian::adapt_setNode() method of the MixGaussian object of all contexts of the model (of this PhoneModel object).

References MixGaussian::adapt_setNode(), mixtureSetData, ModelStats::nrOfContexts, MixtureSet::state, and statistics.

Referenced by Adapt_AM::Adapt_AM().

Here is the call graph for this function:

void PhoneModel::adapt_setVarTrans (  ) 

Todo:
Docs

References MixGaussian::adapt_setVarTrans(), mixtureSetData, ModelStats::nrOfContexts, MixtureSet::state, and statistics.

Referenced by Adapt_AM::Adapt_AM().

Here is the call graph for this function:

void PhoneModel::adapt_unAdapt (  ) 

After adapting the PhoneModel, the old mean vectors of all gaussians can be restored by calling adapt_unAdapt(). This can not be done recursivly (only one old Gaussian is stored, calling adapt_adapt twice will delete the original vectors).

References MixGaussian::adapt_unAdapt(), mixtureSetData, ModelStats::nrOfContexts, MixtureSet::state, and statistics.

Here is the call graph for this function:

void PhoneModel::addAccumulators ( FILE *  file  ) 

This method will read training accumulators from file.

References MixGaussian::addAccumulators(), mixtureSetData, ModelStats::nrOfContexts, MixtureSet::state, and statistics.

Here is the call graph for this function:

void PhoneModel::addChain_lmla ( TokenType **  tokenNew,
float  likelihood,
TokenType token,
int  index,
DecoderSettings settings,
float *  bestL 
) [static, protected]

addChain_lmla() does exactly the same as addChain(), only the tokens are sorted with the lookahead value of the node with 'index'. This will speed up LMLA for output tokens.

Because addChain_lmla() is only used for output tokens, which are always coming from one single node and the tokens coming from one node are always sorted, it is not needed to sort the list except for lookahead!

References addChain(), doPhoneAlignment, TokenType::likelihood, TokenType::lmLookAhead, LMLAGlobalListType::lookAhead, TokenType::lookAheadV, TokenType::next, TokenType::path, TokenType::phonePath, and DecoderSettings::prune_Lmla.

Referenced by processVector().

Here is the call graph for this function:

void PhoneModel::addChain_ordered ( TokenType **  tokenNew,
float  likelihood,
TokenType token,
int  stateNr,
int  curTime,
DecoderSettings settings,
float *  bestL 
) [static]

Because of the internal structure of Shout, it has to be possible to store multiple tokens in one state. Therefore each state has its own token list. Each token contains language model history information. Only tokens with an identical history are merged together.

The addChain() method adds a token list (an input token list or a list from another state) to an existing token list. A fixed likelihood (input parameter) is added to the likelihoods of the input token list. This is done, because the state transition and observation likelihood that are calculated (by processVector()) and need to be added to these likelihoods in order to get the new correct value are the same for all tokens in the list.

The result token list is directly sorted to speed up pruning in a later stage. A static beam value is defined for the biggest difference between best likelihood and worst likelihood inside one state. If this threshold is reached, the token is not added to the list to begin with.

Todo:
Solve this strange bug!

References copyPhonePath(), doPhoneAlignment, TokenType::likelihood, TokenType::lmLookAhead, TokenType::lookAheadV, TokenType::next, TokenType::path, TokenType::phonePath, DecoderSettings::prune_Beam, DecoderSettings::prune_StateBeam, replaceTokenLM(), PLRType::stateOffset, PLRType::timeStamp, and tokCount.

Referenced by LexicalTree::processWord().

Here is the call graph for this function:

void PhoneModel::copyGaussians ( MixGaussian destMixGaussian,
int  maxNmbr 
)

The copyGaussians() method copies at most maxNmbr gaussians of each HMM state to the HMM of destMixGaussian. The gaussians that represent the HMM state the most will be used.

We use the first SIL model as context...

References MixGaussian::copyGaussians(), mixtureSetData, ModelStats::nrOfContexts, MixtureSet::state, and statistics.

Here is the call graph for this function:

PLRType * PhoneModel::copyPhonePath ( PLRType pP  )  [static]

int PhoneModel::dim (  )  const [inline]

double PhoneModel::getLogPDFProbability ( int  contextKey,
Vector v 
)

double PhoneModel::getLookaheadLogP ( double *  vectorList,
int  timeStamp,
bool  doSecondHalf 
) [virtual]

Returns the PDF probability of the first state of this phone.

References MixGaussian::getLookaheadLogP(), mixtureSetData, and MixtureSet::state.

Referenced by SpeakerRecognition::runTrials().

Here is the call graph for this function:

int PhoneModel::getNumberOfGaussians ( void   ) 

Returns the total number of gaussians in this model (sum of all clusters)

References MixGaussian::getNumberOfGaussians(), mixtureSetData, ModelStats::nrOfContexts, MixtureSet::state, and statistics.

Referenced by LexicalTree::checkAMs(), Adapt_Segmenter::proceedMerge(), and writeModel().

Here is the call graph for this function:

void PhoneModel::getOutput ( int  contextKey,
TokenType **  token,
int  tLength,
TokenType **  outToken,
DecoderSettings settings,
float *  bestL 
) [virtual]

With this method, the user can obtain the output token list of this phone model at this moment in time.

With processVector(), new input can be added to the model. The 'index' parameter is used when language model look-ahead is used. It will re-arrange the sorted output list according to the LMLA tables output.

References addChain(), isSil, TokenType::likelihood, mixtureSetData, ModelStats::name, silRinglastPos, stateMix_1, stateMix_3, statistics, MixtureSet::transitionP_toNext, and weightRinglastPos.

Referenced by LexicalTree::processNodeOutput(), and ArticulatoryStream::testStream().

Here is the call graph for this function:

double PhoneModel::getPDFProbability ( int  contextKey,
Vector v,
int  stateNr,
int  time 
)

Returns the PDF probability of one of the states of this phone.

References MixtureSet::currentVectorP, MixGaussian::getP(), mixtureSetData, MixtureSet::state, stateMix_1, stateMix_2, stateMix_3, and timeStamp.

Referenced by LexicalTree::latticeBaumWelch_setLikelihoods(), and Train_Segmenter::writePosteriors().

Here is the call graph for this function:

void PhoneModel::getSilSumHist ( Vector **  histogram  ) 

References MixGaussian::getSumHist(), mixtureSetData, MixtureSet::state, and stateMix_1.

Referenced by FeaturePool::createNewPool().

Here is the call graph for this function:

int PhoneModel::getStateNr ( int  contextKey,
int  state 
)

Returns the state number of this phone.

References stateMix_1, stateMix_2, and stateMix_3.

Referenced by Shout_Preprocess::Shout_Preprocess().

double PhoneModel::getTransition ( int  contextKey,
bool  toSelf,
int  stateNr 
)

void PhoneModel::initialiseToken ( TokenType **  token  )  [static]

This method deletes an entire token list. The token 'token' and all tokens that are next in the list are deleted. When phone alignment is activated, also the phone history paths are deleted.

References doPhoneAlignment, initialisePhonePath(), TokenType::next, TokenType::phonePath, and tokCount.

Referenced by LexicalTree::createLattice(), LexicalTree::deleteNodes(), LexicalTree::initialiseNode(), LexicalTree::initialiseSystem(), LexicalTree::processNodeOutput(), processVector(), LexicalTree::processVector_processNodes(), and ArticulatoryStream::testStream().

Here is the call graph for this function:

bool PhoneModel::isSilModel (  )  const [inline]

void PhoneModel::mapAdaptMeans (  ) 

Todo:
docs

References MixGaussian::mapAdaptMeans(), mixtureSetData, ModelStats::nrOfContexts, MixtureSet::state, and statistics.

Referenced by Adapt_AM::Adapt_AM().

Here is the call graph for this function:

int PhoneModel::printInfo ( Vector v  ) 

References mixtureSetData, ModelStats::nrOfContexts, MixGaussian::printInfo(), MixtureSet::state, and statistics.

Here is the call graph for this function:

void PhoneModel::printModel ( FILE *  fileMean,
FILE *  fileVariance,
FILE *  fileWeight 
)

References mixtureSetData, MixGaussian::printModel(), and MixtureSet::state.

Referenced by SpeakerRecognition::SpeakerRecognition(), and SpkrecStats::SpkrecStats().

Here is the call graph for this function:

void PhoneModel::processVector ( int  contextKey,
Vector v,
int  time,
int  index,
TokenType **  token,
int  tLength,
TokenType inToken,
DecoderSettings settings,
float *  bestL 
) [virtual]

For each observation (Vector) of a sequence, processVector() needs to be called to update the internal structure of the phone model. The user can request for the output probability token list at any time by calling getOutput().

The inToken variable may be a single input token, but it is also allowed to be an entire list of tokens, sorted on likelihood. The most likely token needs to be the first in the list.

The token pointer shall point to an array of three token lists; each of the lists form the tokens for one of the HMM states. The inToken list will be inserted in these lists according to the HMM likelihoods. The token variable is updated and needs to be provided by the user when the next observation Vector is processed.

The time variable is used to add timing history and to determine if preprocessing is needed (by preProcessVector()).

References addChain(), addChain_lmla(), LMLAGlobalListType::collissionNode, LMLAGlobalListType::collissionTime, copyPhonePath(), MixtureSet::currentVectorP, doPhoneAlignment, MixGaussian::getLogP(), initialisePhonePath(), initialiseToken(), isSil, TokenType::likelihood, TokenType::lmLookAhead, TokenType::lookAheadV, mixtureSetData, ModelStats::name, TokenType::next, TokenType::path, TokenType::phonePath, DecoderSettings::prune_Beam, DecoderSettings::prune_StateBeam, silRinglastPos, MixtureSet::state, stateMix_1, stateMix_2, stateMix_3, statistics, timeStamp, tokCount, MixtureSet::transitionP_toSelf, uniqueNumber, weightRinglastPos, and DecoderSettings::weights_SilPenalty.

Referenced by LexicalTree::processNode(), and ArticulatoryStream::testStream().

Here is the call graph for this function:

bool PhoneModel::replaceTokenLM ( TokenType nt,
TokenType token,
float  like,
float *  bestL 
) [static, protected]

This is a helper function for addChain(). It checks if the language model history of the new token is identical to the history of a token from the existing list. If so, it returns true, otherwise it returns false. When the token history is identical, the function checks if the new token likelihood is better than the old. If this is the case, the old token is replaced by the new token.

Note that this method is responsible for the fact that internally, only the first-best recognition is stored.

References copyPhonePath(), doPhoneAlignment, initialisePhonePath(), TokenType::likelihood, WLRType::lmHistory, TokenType::path, and TokenType::phonePath.

Referenced by addChain_ordered().

Here is the call graph for this function:

void PhoneModel::resetPhoneAdmin (  ) 

References silRinglastPos.

Referenced by Segmenter::createLexicalTree().

void PhoneModel::setSilTrans ( double  tr  ) 

int PhoneModel::touchPDF ( int  contextKey,
int  time,
MixGaussian **  updateThese,
double **  resultHere 
) [virtual]

void PhoneModel::writeAccumulators ( FILE *  file,
FILE *  fileF = NULL,
FILE *  fileST = NULL,
bool  doBinary = false 
)

This method will write training accumulators to file.

References mixtureSetData, ModelStats::nrOfContexts, MixtureSet::state, statistics, and MixGaussian::writeAccumulators().

Referenced by SpkrecStats::SpkrecStats(), and Whisper::Whisper().

Here is the call graph for this function:


Member Data Documentation

Todo:
This is a quick fix to check if this can help SAT training. Will change it back to protected later!
The entire set of states and transition parameters used by this model.

Referenced by adapt_adapt(), adapt_adaptVar(), adapt_addAcumulatorData(), adapt_clear(), adapt_setAcumulators(), adapt_setHelperMatrices(), adapt_setInitialNode(), adapt_setNode(), adapt_setVarTrans(), adapt_unAdapt(), addAccumulators(), TrainPhoneModel::addCountedGaussians(), TrainPhoneModel::addGaussian(), TrainPhoneModel::appendSAT(), TrainPhoneModel::baumWelch(), copyGaussians(), TrainPhoneModel::count(), TrainPhoneModel::fillDistanceArray(), TrainPhoneModel::finishSAT(), TrainPhoneModel::getClusterP(), TrainPhoneModel::getCoSim(), TrainPhoneModel::getDominantGaussian(), TrainPhoneModel::getKLDistance(), getLogPDFProbability(), getLookaheadLogP(), TrainPhoneModel::getNormDistance(), getNumberOfGaussians(), getOutput(), getPDFProbability(), TrainPhoneModel::getSilP(), getSilSumHist(), getTransition(), mapAdaptMeans(), TrainPhoneModel::maxNrOfGaussians(), TrainPhoneModel::moveModelGaussians(), TrainPhoneModel::normalize(), PhoneModel(), printInfo(), printModel(), processVector(), TrainPhoneModel::readModel(), TrainPhoneModel::setMaxGaussians(), setSilTrans(), TrainPhoneModel::startCount(), TrainPhoneModel::stopCount(), touchPDF(), TrainPhoneModel::train(), TrainPhoneModel::trainMMI(), TrainPhoneModel::TrainPhoneModel(), TrainPhoneModel::viterbi(), writeAccumulators(), writeModel(), TrainPhoneModel::writeSAT(), and ~PhoneModel().

int* PhoneModel::timeStamp [protected]

The model only calculates some context info once for every 'timestamp' frame.

Referenced by getPDFProbability(), PhoneModel(), processVector(), touchPDF(), TrainPhoneModel::TrainPhoneModel(), and ~PhoneModel().