|Title||:||Speaker Recognition on VoIP Speech|
|Speaker||:||Anil Kumar Chilli (IITM)|
|Details||:||Tue, 20 Sep, 2016 3:30 PM @ BSB 361|
|Abstract:||:||Automatic speaker recognition (ASR) systems trained using the direct microphone speech give a degraded performance on Voice over Internet Protocol (VoIP) speech. The codecs used in VoIP communication are based on lossy compression methods developed to achieve reduced bit rates for transmission. The use of codecs introduces variations in the spectral characteristics of the codec speech at the receiving end. The choice of codec used for communication is dependent on the type of service provider and the availability of bandwidth at the time of communication.
In this work, we address the issues in building the ASR systems to perform well on VoIP speech. Histogram equalization techniques and regression based techniques are explored to estimate the direct speech features from the codec speech features. The ASR system trained with the direct speech features is then used to perform recognition on the direct speech features estimated from the codec speech features. The effect of different codecs on the speech features for different categories of speech sounds is studied, and the ASR system for VoIP speech is built by taking these effects into consideration. We also explored different approaches to build ASR systems for VoIP speech using the Universal Background Model-Gaussian Mixture Model (UBM-GMM) framework. Three approaches explored are as follows: (i) ASR using parallel codec-specific models, (ii) Codec identification followed by ASR using codec-specific models, and (iii) ASR using codec-independent models. The i-vector framework based ASR systems are explored to reduce the variations in the performance for the speech from different codecs. We present the performance of the proposed methods on the benchmark speech databases such as TIMIT, NTIMIT and NIST SRE-2003.