Segmentation and my first Python script

The next thing I did was create a Python script to segment the larger audio file into 50 individual files with useful filenames. I adapted a script that uses the Python bindings for FFmpeg, a tool for converting audio. The script takes the following arguments:

  1. Audio file to be segmented
  2. File with timing information
  3. Speaker name
  4. List number

Aside from argument 1, these are unclear. Let’s go into them.

For timing information, I used the CSV file output by Gentle in the previous step. A few lines of that look like this:

which,which,20.8,21.51
doc,doc,21.91,22.46
https,<unk>,22.94,23.98

The first item of each line is the expected word, followed by the found word, then the timestamps for the start and end of the word respectively. As you can see in the sample above, some words weren’t found by Gentle but were given accurate timing info anyway. I can only assume that the “word” “https” was not in the dictionary that Gentle uses in its alignment process. To avoid issues like this, my script looks only at the expected word and timing info.

It tells FFmpeg to create a new audio file beginning at the start time. It then subtracts the start time from the end time and tells it that the file should last for that length of time.

I foresee that, as the body of word files increases in size, there could be repeated words. We may also involve additional speakers, whose lists of words may overlap, and we may want to do multiple takes where words are spoken with different affect. Hence the speaker name and list number. If the speaker name argument is ‘aaron’ or ‘ian’ (my collaborator and myself, so the two most common speakers), it appends a ‘-a’ or ‘-i’ to the filename. For other speakers it simply passes on their name, like ‘-fernanda’. The list number adds a number after another dash. So a typical file from my list might have the name ‘any-i-1.wav’.

Here is what the command line code to run the script might look like:

python3 splitnames.py 1-50.wav 1-50-gentle.csv i 1

After running each set of 50 words, I listened to the files to make sure they were segmented properly and manually re-edited any that didn’t work. For an entire 1000 word list, there were fewer than ten files that didn’t work — saving me a lot of time over manually chopping each word.

Leave a comment

Your email address will not be published. Required fields are marked *