Fruehwald, Josef

PI: Josef Fruehwald

Affiliation: University of Kentucky

Job Title: Assistant Professor

Acoustic analysis of large sets of audio data

There is the potential to process hundreds of hours of recordings of human speech in two steps. First is “forced alignment.” Forced alignment takes audio data and an associated transcription, returns timestamps for where in the audio the individual speech sounds of interest occur. Second, once the timestamps in audio have been identified, automated acoustic analysis can be conducted. This acoustic analysis can be relatively simple, such as extracting acoustic values given a fixed set of parameters, to more complex, where the parameters of the analysis can be individualized for each speaker.

Statistical analysis and computational modelling of the derived acoustic data

The acoustic analysis can produce very large datasets for analysis. We are increasingly using Bayesian statistical techniques that involve iterative sampling form a posterior distribution.

Software:

The underlying software packages that have been utilized in the past for the acoustic analysis discussed above are:

HTK 3.4.1(http://htk.eng.cam.ac.uk)
Kaldi (http://kaldi-asr.org)
Praat (https://www.fon.hum.uva.nl/praat/)
Python
R
Stan

This underlying software has been bundled up to be used with:

The FAVE Suite (https://github.com/JoFrhwld/FAVE)
- extra documentation here (https://github.com/JoFrhwld/FAVE/wiki)
- The Montreal Forced Aligner (https://montreal-forced-aligner.readthedocs.io)

Moving forward, most projects in this vein will probably be utilizing PolyglotDB and ISCAN

https://github.com/MontrealCorpusTools/PolyglotDB/

https://github.com/MontrealCorpusTools/iscan-spade-server

Personnel:

Josef Fruehwald, PI

Connor Bechler, Fellowship, added on LCC resources, 11/08/2022

Timing of formant dynamics

We will explore to what extent changes in the relative timing of vowel formant dynamics are implicated in number of sound changes in Philadelphia. Data will be drawn from vowel formant tracks extracted from audio using the FAVE-suite, and we will utilize the brms package to fit Bayesian non-linear models to the vowels /ay/, /aw/, /ey/ and /ow/.

Personnel:

Josef Fruehwald, PI

Software:

R, RStan, Stan, brms

Grants:

Publications:

Projects

Fruehwald, Josef

Analytics