Fruehwald, Josef
PI: Josef Fruehwald
Affiliation: University of Kentucky
Job Title: Assistant Professor
Acoustic analysis of large sets of audio data
There is the potential to process hundreds of hours of recordings of human speech in two steps. First is “forced alignment.” Forced alignment takes audio data and an associated transcription, returns timestamps for where in the audio the individual speech sounds of interest occur. Second, once the timestamps in audio have been identified, automated acoustic analysis can be conducted. This acoustic analysis can be relatively simple, such as extracting acoustic values given a fixed set of parameters, to more complex, where the parameters of the analysis can be individualized for each speaker.
Statistical analysis and computational modelling of the derived acoustic data
The acoustic analysis can produce very large datasets for analysis. We are increasingly using Bayesian statistical techniques that involve iterative sampling form a posterior distribution.
Software:
The underlying software packages that have been utilized in the past for the acoustic analysis discussed above are:
- HTK 3.4.1(http://htk.eng.cam.ac.uk)
- Kaldi (http://kaldi-asr.org)
- Praat (https://www.fon.hum.uva.nl/praat/)
- Python
- R
- Stan
This underlying software has been bundled up to be used with:
- The FAVE Suite (https://github.com/JoFrhwld/FAVE)
- extra documentation here (https://github.com/JoFrhwld/FAVE/wiki)
- The Montreal Forced Aligner (https://montreal-forced-aligner.readthedocs.io)
Moving forward, most projects in this vein will probably be utilizing PolyglotDB and ISCAN
https://github.com/MontrealCorpusTools/PolyglotDB/
https://github.com/MontrealCorpusTools/iscan-spade-server
Personnel:
Josef Fruehwald, PI
Connor Bechler, Fellowship, added on LCC resources, 11/08/2022
Timing of formant dynamics
We will explore to what extent changes in the relative timing of vowel formant dynamics are implicated in number of sound changes in Philadelphia. Data will be drawn from vowel formant tracks extracted from audio using the FAVE-suite, and we will utilize the brms package to fit Bayesian non-linear models to the vowels /ay/, /aw/, /ey/ and /ow/.
Personnel:
Josef Fruehwald, PI
Software:
R, RStan, Stan, brms
Grants:
Publications:
Center for Computational Sciences