Converting audio WAV format to HTK format on Ubuntu 14.04

At Exotel Labs we analyze tons of anonymized voice samples for addressing varied research problem in the voice space. While there are multiple techniques to this, a proven methodology to do voice analysis begins with converting audio into Hidden Markov Models (HMM).

HTK is the Hidden Markov Model Toolkit originally developed at the Machine Intelligence Lab of the Cambridge University Engineering Department that helps in speech analyses, among a variety of applications. This blog will enumerate steps to convert a standard .wav audio file into its corresponding .htk format.

Though primarily stored in the WAV format, its characteristics include –

— 16 kHz sampling

— Monophone

— 16-bit sampling

— Big Endian format

(Courtesy – Machine Intelligence Lab, University of Cambridge)

Here is a quick primer on how to get started with HTK.

1. Register and receive a password for downloading HTK software from:

http://htk.eng.cam.ac.uk/download.shtml

2. Compile directories in the following order:

HTKLib, HLMLib, HTKTools, HLMTools as documented in the README.

If compiling for CPU do a “make -f MakefileCPU all” in each directory

If doing so for GPU (assuming you have Nvidia’s nvcc compiler and CUDA drivers installed), then “make -f MakefileNVCC all”

3. Configuration settings and running the demo:

cd HTKDemo

// Create empty directories accessed by the demo

mkdir -p samples/HTKDemo/accs samples/HTKDemo/hmms samples/HTKDemo/hmms/hmm.0 samples/HTKDemo/hmms/hmm.1 samples/HTKDemo/hmms/hmm.2 samples/HTKDemo/hmms/hmm.3 samples/HTKDemo/proto samples/HTKDemo/test

// Add HInit to your environment path

cat /etc/environment

…

export PATH=$PATH:/path/to/htk/HTKTools

//Ensure the HCopy command works on your command prompt now

htk# HCopy
USAGE: HCopy [options] src [ + src …] tgt …
Option Default

-a i        Use level i labels             1
-e t        End copy at time t                EOF
-i mlf    Save labels to mlfs              null
-l dir      Output target label files to dir       current
-m t       Set margin of t around x/n segs    0
-n i [j]    Extract i’th [to j’th] label       off
-s t         Start copy at time t                 0
-t n        Set trace line width to n       70
-x s [n] Extract [n’th occ of] labels            off
-A         Print command line arguments     off
-C cf     Set config file to cf                        default
-D         Display configuration variables       off
-F fmt Set source data format to fmt       as config
-G fmt Set source label format to fmt     as config
-I mlf    Load master label file mlfs
-L dir    Set input label (or net) dir           current
-O         Set target data format to fmt as config
-P          Set target label format to fmt          as config
-S f        Set script file to f               none
-T N      Set trace flags to N                              0
-V          Print version information              off
-X ext    Set input label (or net) file ext          lab

// Run the demo to verify settings

./runDemo configs/monPlainM1S3.dcf

The output should look like this –

……..

HResults -A -s -L labels/bcplabs/mon lists/bcplist test/te1.rec test/te2.rec test/te3.rec

====================== HTK Results Analysis =======================

Date: Fri Nov 18 10:24:15 2016

Ref : labels/bcplabs/mon

Rec : test/te1.rec

: test/te2.rec

: test/te3.rec

———————— Overall Results ————————–

SENT: %Correct=0.00 [H=0, S=3, N=3]

WORD: %Corr=64.66, Acc=56.39 [H=86, D=36, S=11, I=11, N=133]

===================================================================

At this stage, the environment is well set up with all per-requisites in place.

4. To define specific conversion, create a config file similar to this:

htk# cat config.code

SOURCEFORMAT = WAV — indicate the format of the input

TARGETFORMAT = HTK — indicate format of output

NATURALBYTEORDER = TRUE — indicate preserving byte ordering

5. Generate HTK format file given an input WAV

htk# HCopy -C config.code sample_two_speakers.wav sample_two_speakers.htk

htk#

htk# ll sample_two_speakers*

… 2658572 Nov 18 11:18 ../sample_two_speakers.htk

… 2658604 Nov 15 13:35 ../sample_two_speakers.wav

You will notice that the resultant file is about 32 bytes less than the original WAV file since the HTK format strips the header.

The original version of this post appeared on gogoingo.

Converting audio WAV format to HTK format on Ubuntu 14.04

Table of Contents

Technology That Drives Growth

1. Register and receive a password for downloading HTK software from:

2. Compile directories in the following order:

3. Configuration settings and running the demo:

4. To define specific conversion, create a config file similar to this:

5. Generate HTK format file given an input WAV

Manisha Mishra

Are we sharing too much in the name of convenience?

Phone Number masking 101 - How It Can Help Your Business

Related Articles

Converting audio WAV format to HTK format on Ubuntu 14.04

Table of Contents

Technology That Drives Growth

Found this interesting? Share it now!

Join Our Community

1. Register and receive a password for downloading HTK software from:

2. Compile directories in the following order:

3. Configuration settings and running the demo:

4. To define specific conversion, create a config file similar to this:

5. Generate HTK format file given an input WAV

Manisha Mishra

Are we sharing too much in the name of convenience?

Phone Number masking 101 - How It Can Help Your Business

Related Articles

Inbound vs Outbound IVR: Which One Does Your Call Center Need?

4 Easy Ways To Truly Connect With Your Customer

Ethics in Generative AI: Navigating the Fine Line