At Exotel Labs we analyze tons of anonymized voice samples for addressing varied research problem in the voice space. While there are multiple techniques to this, a proven methodology to do voice analysis begins with converting audio into Hidden Markov Models (HMM).
HTK is the Hidden Markov Model Toolkit originally developed at the Machine Intelligence Lab of the Cambridge University Engineering Department that helps in speech analyses, among a variety of applications. This blog will enumerate steps to convert a standard .wav audio file into its corresponding .htk format.
Though primarily stored in the WAV format, its characteristics include –
— 16 kHz sampling
— Monophone
— 16-bit sampling
— Big Endian format
(Courtesy – Machine Intelligence Lab, University of Cambridge)
Here is a quick primer on how to get started with HTK.
1. Register and receive a password for downloading HTK software from:
http://htk.eng.cam.ac.uk/download.shtml
2. Compile directories in the following order:
HTKLib, HLMLib, HTKTools, HLMTools as documented in the README.
If compiling for CPU do a “make -f MakefileCPU all” in each directory
If doing so for GPU (assuming you have Nvidia’s nvcc compiler and CUDA drivers installed), then “make -f MakefileNVCC all”
3. Configuration settings and running the demo:
cd HTKDemo
// Create empty directories accessed by the demo
mkdir -p samples/HTKDemo/accs samples/HTKDemo/hmms samples/HTKDemo/hmms/hmm.0 samples/HTKDemo/hmms/hmm.1 samples/HTKDemo/hmms/hmm.2 samples/HTKDemo/hmms/hmm.3 samples/HTKDemo/proto samples/HTKDemo/test
// Add HInit to your environment path
cat /etc/environment
…
export PATH=$PATH:/path/to/htk/HTKTools
//Ensure the HCopy command works on your command prompt now
htk# HCopy
USAGE: HCopy [options] src [ + src …] tgt …
Option Default
-a i Use level i labels 1
-e t End copy at time t EOF
-i mlf Save labels to mlfs null
-l dir Output target label files to dir current
-m t Set margin of t around x/n segs 0
-n i [j] Extract i’th [to j’th] label off
-s t Start copy at time t 0
-t n Set trace line width to n 70
-x s [n] Extract [n’th occ of] labels off
-A Print command line arguments off
-C cf Set config file to cf default
-D Display configuration variables off
-F fmt Set source data format to fmt as config
-G fmt Set source label format to fmt as config
-I mlf Load master label file mlfs
-L dir Set input label (or net) dir current
-O Set target data format to fmt as config
-P Set target label format to fmt as config
-S f Set script file to f none
-T N Set trace flags to N 0
-V Print version information off
-X ext Set input label (or net) file ext lab
// Run the demo to verify settings
./runDemo configs/monPlainM1S3.dcf
The output should look like this –
……..
HResults -A -s -L labels/bcplabs/mon lists/bcplist test/te1.rec test/te2.rec test/te3.rec
====================== HTK Results Analysis =======================
Date: Fri Nov 18 10:24:15 2016
Ref : labels/bcplabs/mon
Rec : test/te1.rec
: test/te2.rec
: test/te3.rec
———————— Overall Results ————————–
SENT: %Correct=0.00 [H=0, S=3, N=3]
WORD: %Corr=64.66, Acc=56.39 [H=86, D=36, S=11, I=11, N=133]
===================================================================
At this stage, the environment is well set up with all per-requisites in place.
4. To define specific conversion, create a config file similar to this:
htk# cat config.code
SOURCEFORMAT = WAV — indicate the format of the input
TARGETFORMAT = HTK — indicate format of output
NATURALBYTEORDER = TRUE — indicate preserving byte ordering
5. Generate HTK format file given an input WAV
htk# HCopy -C config.code sample_two_speakers.wav sample_two_speakers.htk
htk#
htk# ll sample_two_speakers*
… 2658572 Nov 18 11:18 ../sample_two_speakers.htk
… 2658604 Nov 15 13:35 ../sample_two_speakers.wav
You will notice that the resultant file is about 32 bytes less than the original WAV file since the HTK format strips the header.
The original version of this post appeared on gogoingo.