More phones

Well, as phones are linked to possible sounds regardless of language and meaning, it would appear they are the same worldwide. But one organization that has kept track of them is the Speech Group at Carnegie Mellon University. Their CMU Pronouncing Dictionary is a dictionary of mostly English words with pronunciation given in phones. Their github repository has a handy list of those 39 phones which corresponds with those used by Gentle, a good way to start to build up a database of phone information for my words.

At first I worked on building up a database for each phone. The phone “eh,” for example, would have its own CSV file. Each line would list a word in which “eh” was present, its corresponding sound file, and the position of “eh” in the word — B, I, or E, as discussed in the last post. As I began to narrow down my approach in Max/MSP, however, I realized I needed a different approach. I settled on using a suite of objects called MuBu, which can store a variety of data quite handily, but they want each audio file to be paired with data about that file, rather than having databases that refer to many audio files.

The MuBu objects store data in a matrix that expects a fixed number of columns. Words will always have a first and last phone, but as the number of inner phones is variable, I wasn’t sure how to deal with them. For now, then, I have decided to discard those inner phones.

MuBu with phone data for the first and last phones of the word “any”, interpreted as numbers.

With that decision made, I needed to write code that would go through a directory of JSON files with phone data about my words, assign numbers to the first and last phone of each word, and write those into a text file in a format that Max/MSP can understand.

First I gave number values to each phone. The CMU list and Gentle both have them in alphabetical order, which is what I’ve adopted for now — so “aa” has a value of 1 and “zh” is 39. Eventually I may order them by sound and/or sound type (vowel, fricative, etc.).

The code then opens this list as a CSV dictionary. It reads through a directory of JSON files and looks for a few things. Some words weren’t aligned properly by Gentle, and these are given phone values of 0 and 0. In the future I’d like to manually fix such words, but this at least gets them into the Max patch.

For words that were handled well, the code ignores the inner phones, strips “_B” and “_E” off the first and last phone, looks them up in the dictionary, and replaces them with a number. So the word “any” gets the values 11 and 18, for the phones “eh” and “iy.”

elif "_I" not in j:
                    
    # remove the "_B" or "_E" tag before comparing to dictionary
    sep1 = '_'
    j = j.split(sep1, 1)[0]
        
    # Look phoneme up in dictionary, replace with number value
    k = " ".join(lookup_dict.get(ele, ele) for ele in j.split())
                    
    # Add number to phoneme list
    phone_to_num.phones.append(k)

Another Python function writes this data into a text file meant for Max. A few lines might look like this:

1, 0 11 18;
2, 0 11 18;
3, 0 11 24;
4, 0 11 24;

The first value is the index, which corresponds to the order of both the audio files and the JSON files. The zero goes into the “time” column of the MuBu, which tells it that this refers to the beginning of the file, or as I’m treating it, the entire file. The next two numbers are the first and last phone. This snippet of file refers to the words “any” and “anything,” each of which occur twice in my corpus, as they were found on both my list and Aaron’s.

With all of this data collected, I could finally use it. In Max/MSP. For creative purposes.

This entry is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.

Leave a comment

Cancel reply

Need help with the Commons?