. 2021 Mar 11;64(6 Suppl):2213–2222. doi: 10.1044/2020_JSLHR-20-00268

Table 1.

Comparison of the forced-alignment algorithms under consideration.

Algorithm	Engine	Alignment	English training set	Remark
P2FA (Yuan & Liberman, 2008)	HMM-GMM on PLP features. HTK backend.	Monophone	25 hr of U.S. Supreme Court oral arguments	Not trainable.
Prosodylab (Gorman et al., 2011)	HMM-GMM on MFCC features. HTK backend.	Monophone	10 hr laboratory-recorded North American speech
Kaldi (Povey et al., 2011)	HMM-GMM on MFCC features. Kaldi backend.	Two passes: monophone, triphone	Librispeech (Panayotov et al., 2015): 1,000 hr of adult-read audiobooks	Kaldi is a speech recognition engine but recipes are available for forced alignment.
MFA-No-SAT (McAuliffe et al., 2017)	HMM-GMM on MFCC features. Kaldi backend.	Two passes: monophone, triphone	Librispeech	Automates Kaldi alignment recipes. Developed by same lab as Prosodylab.
MFA-SAT (McAuliffe et al., 2017)	HMM-GMM on MFCC features. Kaldi backend.	Three passes: monophone, triphone, speaker-adapted triphone	Librispeech

Note. P2FA = Penn Phonetics Lab Forced Aligner; HMM = Hidden Markov model; GMM = Gaussian mixture model; PLP = perceptual linear predictor; HTK = Hidden Markov Model Toolkit (Young et al., 2015); MFCC = Mel-frequency cepstral coefficient; MFA = Montreal Forced Aligner; No-SAT = No speaker adaptive training; SAT = speaker adaptive training.