Table of Content

    Found this interesting? Share it now!

    At Exotel Labs we analyze tons of anonymized voice samples for addressing varied research problem in the voice space. While there are multiple techniques to this, a proven methodology to do voice analysis begins with converting audio into Hidden Markov Models (HMM).

    HTK is the Hidden Markov Model Toolkit originally developed at the Machine Intelligence Lab of the Cambridge University Engineering Department that helps in speech analyses, among a variety of applications. This blog will enumerate steps to convert a standard .wav audio file into its corresponding .htk format.

    Though primarily stored in the WAV format, its characteristics include –

    — 16 kHz sampling

    — Monophone

    — 16-bit sampling

    — Big Endian format

    (Courtesy – Machine Intelligence Lab, University of Cambridge)

    Here is a quick primer on how to get started with HTK.

    1. Register and receive a password for downloading HTK software from:

        http://htk.eng.cam.ac.uk/download.shtml

    2. Compile directories in the following order:

    HTKLib, HLMLib, HTKTools, HLMTools as documented in the README.

    If compiling for CPU do a “make -f MakefileCPU all” in each directory

    If doing so for GPU (assuming you have Nvidia’s nvcc compiler and CUDA drivers installed), then “make -f MakefileNVCC all”

    3. Configuration settings and running the demo:

    cd HTKDemo

    // Create empty directories accessed by the demo

    mkdir -p samples/HTKDemo/accs samples/HTKDemo/hmms samples/HTKDemo/hmms/hmm.0 samples/HTKDemo/hmms/hmm.1 samples/HTKDemo/hmms/hmm.2 samples/HTKDemo/hmms/hmm.3 samples/HTKDemo/proto samples/HTKDemo/test

    // Add HInit to your environment path

    cat /etc/environment

    export PATH=$PATH:/path/to/htk/HTKTools

    //Ensure the HCopy command works on your command prompt now

    htk# HCopy
    USAGE: HCopy [options] src [ + src …] tgt …
    Option                                                            Default

    -a i        Use level i labels                               1
    -e t        End copy at time t                            EOF
    -i mlf    Save labels to mlfs                            null
    -l dir      Output target label files to dir       current
    -m t       Set margin of t around x/n segs    0
    -n i [j]    Extract i’th [to j’th] label                   off
    -s t         Start copy at time t                           0
    -t n        Set trace line width to n                   70
    -x s [n]  Extract [n’th occ of] labels                off
    -A         Print command line arguments       off
    -C cf     Set config file to cf                              default
    -D         Display configuration variables       off
    -F fmt   Set source data format to fmt         as config
    -G fmt  Set source label format to fmt         as config
    -I mlf    Load master label file mlfs
    -L dir    Set input label (or net) dir                 current
    -O         Set target data format to fmt           as config
    -P          Set target label format to fmt          as config
    -S f        Set script file to f                                 none
    -T N      Set trace flags to N                              0
    -V          Print version information                  off
    -X ext    Set input label (or net) file ext          lab

    // Run the demo to verify settings

    ./runDemo configs/monPlainM1S3.dcf

    The output should look like this –

    ……..

    HResults -A -s -L labels/bcplabs/mon lists/bcplist test/te1.rec test/te2.rec test/te3.rec

    ====================== HTK Results Analysis =======================

     Date: Fri Nov 18 10:24:15 2016

     Ref : labels/bcplabs/mon

     Rec : test/te1.rec

      : test/te2.rec

      : test/te3.rec

    ———————— Overall Results ————————–

    SENT: %Correct=0.00 [H=0, S=3, N=3]

    WORD: %Corr=64.66, Acc=56.39 [H=86, D=36, S=11, I=11, N=133]

    ===================================================================

    At this stage, the environment is well set up with all per-requisites in place.

    4. To define specific conversion, create a config file similar to this:

    htk# cat config.code

    SOURCEFORMAT = WAV  — indicate the format of the input

    TARGETFORMAT = HTK — indicate format of output

    NATURALBYTEORDER = TRUE — indicate preserving byte ordering

    5. Generate HTK format file given an input WAV

    htk# HCopy -C config.code sample_two_speakers.wav sample_two_speakers.htk

    htk#

    htk# ll sample_two_speakers*

    … 2658572 Nov 18 11:18 ../sample_two_speakers.htk

    … 2658604 Nov 15 13:35 ../sample_two_speakers.wav

    You will  notice that the resultant file is about 32 bytes less than the original WAV file since the HTK format strips the header.

    The original version of this post appeared on gogoingo.

    Manisha Mishra

    © 2024, Exotel Techcom Pvt. Ltd. All Rights Reserved