First steps

Hello. I’m working on an independent study. Please see the About page for general information.

This is a skeletal look at my progress so far, to be fleshed out soon.

What I’m working on:

Learning about the workings of the voice — how sound is produced and varied, what the results are. Reading The Science of the Singing Voice by Johan Sundberg.
Familiarizing myself with analysis tools — spectrograms, how to read them, how different settings change their effectiveness.
Learning the basics of Python and libraries that can process and analyze audio.
Identifying and segmenting recordings of speech with the aid of forced aligners.

One compositional area I’m interested in involves exploring and exaggerating vocal resonance, so learning more about formants and other frequency zones that can be accessed and altered is an ongoing interest here.

In another area, I’m working on a piece with my colleague Aaron Snyder using lists of words collected by computer password programs. In performance, we will likely read from these lists live, as well as have accompanying playback of pre-recorded speech. We want to build up a large set of recorded words that can be flexibly called up by the computer, and to look into ways of organizing the words that could bring out compositionally interesting patterns.

This has been the bulk of my work the last few weeks. The first goal was to find a way to automate (or at least semi-automate) segmentation of a recording of 1,000 words. I did this by:

Breaking the recording down into sets of 50 words
Feeding each set of 50, along with a “transcription” of the words therein, into Gentle forced aligner, which returns a CSV file with timestamps for the start and end of each word.
Running the same audio files and the CSV file through a Python script I adapted which outputs a file for each individual word, with a naming convention that I hope will be useful for later organization and manipulation.
Listening and manually correcting words that were not aligned well by Gentle.

The process takes some time, but is still much less work than cutting 1,000 words manually.

Gentle also returns the phonemes found in each word; one of my next steps will be to find a way to store that and other extracted information (perhaps fundamental frequency and MFCC) in a list that can be called by Python for offline composing, or Max/MSP for real-time calling or words and/or audio files.

This entry is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.

Leave a comment

Cancel reply

Need help with the Commons?