Hey all-
I've been using pd to mix audio, provide sound effects, and record audio for my dissertation project -- a serious game for teaching team members to coordinate. As the next step in this research, I need to be able to detect when each player is speaking to the others -- voice activity detection. Essentially logging the times that participants start and stop talking.
Each player has a bluetooth headset with a noise-cancelling microphone. Looking at the audio I've recorded directly from the headset, it's pretty obvious to human eyes when a participant is speaking, and when they are not, as the SNR is fairly high.
I am trying to use a combination of an envelope follower ([env~]) and a thresholder ([threshold~]) to create the start and stop events. I am having some difficulty in figuring out what the proper settings for the thresholder should be, and I'm also wondering if there are any other pd components that might be useful. One big problem that I'm encountering is that the thresholder catches the quiet portions of speech as turnoffs (generally followed almost immediately by another turnon). Does anyone out there have any experience with this? Or any advice?
Thanks!
-Zach