#### DMCA

## TRAP Language Identification

### Citations

6475 | LIBSVM: a library for support vector machines, 2001, Software available at 〈http://www.csie.ntu.edu.tw/∼cjlin/libsvm
- Chang, Lin
(Show Context)
Citation Context ...re 1), especially for shorter duration trials. (We will discuss it in more detail in Section 4.2.) One-versus-all binary SVM classifiers for the target languages were trained using the LIBSVM package =-=[13]-=-. 3.2. USC With CI-SAD, USC implemented 4 sub-systems using simplified and supervised i-vector modeling [14, 15] based on 4 different frontend features, each of which feature warping was applied for: ... |

1413 | LIBLINEAR: A library for large linear classification
- Fan, Chang, et al.
(Show Context)
Citation Context ...e provided in [14, 15]. Within-Class Covariance Normalization (WCCN) [19] was applied on the resulting i-vectors before SVM. For fast training of SVM models, we used the 2nd-order polynomial mappings =-=[20]-=- in the LIBLINEAR package [21], which resulted in a multi-class SVM classifier for each duration testing. Moreover, we sub sampled the in the SVM training to make it more balanced and efficient. 4. Ex... |

1009 | Speaker verification using adapted gaussian mixture models
- Reynolds, Quatieri, et al.
- 2000
(Show Context)
Citation Context ...ee parts for system training, calibration and internal evaluation; TRAIN, COMB, and TEST. The TRAIN data set was used to capture background statistics and train the Universal Background Models (UBMs) =-=[2]-=-. This data set was also utilized to find subspace projections for compact feature representations and backend classifiers. The COMB data set was prepared to calibrate parameters for score combination... |

394 | Greedy layer-wise training of deep networks
- Bengio, Lamblin, et al.
- 2007
(Show Context)
Citation Context ...ognition (ASR) [24]. In ASR applications, the discriminative pre-training of DBNs is done by training a single-hidden layer MLP which is used as an initialization for MLPs with multiple hidden layers =-=[25]-=-. For LID applications, we use a single hidden layer MLP as a feature transformation before SVM classification. The ivectors are used as features for the MLP and dimensionality of the hidden layer is ... |

315 | Front-End Factor Analysis For Speaker Verification
- Dehak, Kenny, et al.
- 2010
(Show Context)
Citation Context ...plified and supervised i-vector modeling framework [14, 15] which not only achieved good results but also reduced a computational time by more than 100 times. In this framework, traditional i-vectors =-=[18]-=- are extended to label-regularized supervised vectors by concatenating GMM mean supervectors (GSVs) and the total variability matrix (T-matrix) with a label vector and a linear classifier matrix, resp... |

89 | Deep belief networks for phone recognition‖, NIPS Workshop on Deep Learning for Speech Recognition and Related Applications,
- Mohamed, Dahl, et al.
- 2009
(Show Context)
Citation Context ...for LID [23], our proposal of a “deep” architecture for LID using an MLP-based non-linear projection was inspired by the advances in Deep Belief Networks (DBNs) for Automatic Speech Recognition (ASR) =-=[24]-=-. In ASR applications, the discriminative pre-training of DBNs is done by training a single-hidden layer MLP which is used as an initialization for MLPs with multiple hidden layers [25]. For LID appli... |

79 | Application-Independent Evaluation of Speaker Detection
- Brummer, J
- 2006
(Show Context)
Citation Context ...s combined by multi-class logistic regression using the FoCal toolkit 1504 Figure 3: Scatter plot of two dimensional t-SNE projections for the input i-vectors as well as the MLP hidden layer outputs. =-=[22]-=- in a duration-specific manner. For the Contrastive I system, we reduced the number of sub-systems by choosing ten system configurations that could achieve similar performance with the Primary system ... |

65 |
Visualizing high-dimensional data using t-sne.
- Maaten, Hinton
- 2008
(Show Context)
Citation Context ...one retained and these features are used for SVM classifiation. We illustrate the usefulness of MLP-based transformation with the Stochastic Neighborhood Embedding (SNE) based data visualization tool =-=[26]-=-. The input i-vectors as well as the high dimensional MLP hidden layer outputs are projected to two dimensions and this scatter plot is shown in Figure 3. We use 200 random utterances from two differe... |

64 | Multiresolution spectrotemporal analysis of complex sounds
- Chi, Ru, et al.
(Show Context)
Citation Context ... Cortical modulation (CORT): Two dimensional (2-D) spectrographic representations are derived for a given signal by emulating various processing stages in the periphery of the human auditory system =-=[8]-=-. The auditory spectrogram is then converted to modulation representation using Fourier transforms along the spectral and temporal axis and modulation filtering is applied to extract key dynamics in t... |

42 | Autoregressive modelling of temporal envelopes,”
- Athineos, Ellis
- 2007
(Show Context)
Citation Context ...rm windows (32ms with a shift of 10ms) to derive a spectrographic representation of the signal which is used as power spectral representation for the second autoregressive (AR) model across the bands =-=[7]-=-. The output of the second AR model is converted to 14-dimensional cepstral features, which are added with delta and acceleration coefficients. Cortical modulation (CORT): Two dimensional (2-D) spec... |

36 | Qualcomm-ICSIOGI features for ASR”,
- Adami, Burget, et al.
- 2002
(Show Context)
Citation Context ...re used with delta and acceleration components. CI-SAD was used for MFCC and SDC while CD-SAD for the other features. All the frontends areWiener-filtered before SAD to suppress channel noise effects =-=[11]-=-. For each feature stream except WB-MFCC1, two separate projection/classifier backends were developed. One backend consists of PCA-based feature space projection and SVM classification with the 5th-or... |

27 | The RATS radio traffic collection system,”
- Walker, Strassel
- 2012
(Show Context)
Citation Context ...(LID) Evaluation in the DARPA Robust Automatic Transcription of Speech (RATS) program. In the RATS program, noisy speech data transmitted on eight different highfrequency radio communication channels =-=[1]-=- are studied for four tasks; Speech Activity Detection (SAD), Keyword Spotting (KWS), Speaker Identification (SID) and Language Identification (LID). For the LID task, four durations (120s, 30s, 10s a... |

22 |
JR Deller Jr, “Approaches to language identification using gaussian mixture models and shifted delta cepstral features
- Torres-Carrasquillo, Singer, et al.
(Show Context)
Citation Context ...700Hz for every 32ms frame with a shift of 10ms. Then they are added with delta and acceleration components to yield 57-dimensional features. Shifted Delta Cepstrum (SDC): 7-3-3-7 SDC configuration =-=[5]-=- for MFCCs. It is then concatenated with the base MFCC features to a create 56-dimensional feature vector per frame. Frequency-Domain Linear Prediction (FDLP): Windowing of the Discrete Cosine Trans... |

14 | Robust speaker identification using auditory features and computational auditory scene analysis
- Shao, Wang
- 2008
(Show Context)
Citation Context ...: 7-1-3-7 SDC configuration for MFCCs. It is then concatenated with the base MFCC features including C0 to a 56-dimensional feature vector per frame. Gammatone Frequency Cepstral Coefficient (GFCC) =-=[16]-=-: 44-dimensional feature vector per frame generated using 64 Gammatone filter banks (22-dimensional base features without C0, and their first derivatives). Gabor Filtering (GABF) [17]: Gabor filters... |

11 | Speaker verification using simplified and supervised i-vector modeling,”
- Li, Tsiartas, et al.
- 2013
(Show Context)
Citation Context ...s-all binary SVM classifiers for the target languages were trained using the LIBSVM package [13]. 3.2. USC With CI-SAD, USC implemented 4 sub-systems using simplified and supervised i-vector modeling =-=[14, 15]-=- based on 4 different frontend features, each of which feature warping was applied for: MFCC: 25ms Hamming window applied with a 10ms shift. 18-dimensional base features are appended with their delt... |

10 | Patrol team language identification system for darpa rats p1 evaluation
- Matějka, Plchot, et al.
- 2012
(Show Context)
Citation Context ...to adapt the GMM means and the GSVs are transformed to i-vectors using total variability matrix (T-matrix). In contrast to the past approach of using i-vectors as features to a MLP classifier for LID =-=[23]-=-, our proposal of a “deep” architecture for LID using an MLP-based non-linear projection was inspired by the advances in Deep Belief Networks (DBNs) for Automatic Speech Recognition (ASR) [24]. In ASR... |

9 | Power-normalized cepstral coefficients for robust speech recognition,”
- Kim, Stern
- 2012
(Show Context)
Citation Context ...n features are appended to obtain 42-dimensional features. Power-Normalized Cepstral Coefficient (PNCC): The power law nonlinearity is applied on temporal envelopes estimated from Gammatone filters =-=[10]-=-. This is followed by a noise supression procedure using assymmetric filters and a power normalization module using a long window span. A frequency smoothing is applied and cepstral features are deriv... |

7 | Feature extraction using 2-D autoregressive models for speaker recognition
- Ganapathy, Thomas, et al.
- 2012
(Show Context)
Citation Context ...ing of the Discrete Cosine Transform (DCT) of a longterm segment (1; 000ms) for a given signal is followed by the linear prediction of sub-band DCT components to yield temporal envelopes in each band =-=[6]-=-. The subband envelopes are then integrated in short-term windows (32ms with a shift of 10ms) to derive a spectrographic representation of the signal which is used as power spectral representation for... |

6 |
Speech Activity Detection for Noisy Data Using Adaptation Tech-niques[C]//Proceedings of Interspeech
- Omar
(Show Context)
Citation Context ...ntation and fusion of multiple feature streams [3] and the other is channel-independent (CI) SADwith a two-pass modified Cumulative Sum (CUSUM) approach based on Maximum A Posteriori (MAP) adaptation =-=[4]-=-. Each setup has distinct ingredients as follows: CD-SAD: Channel detection with eight channeldependent Gaussian Mixture Models (GMMs), followed by speech/non-speech HMM Viterbi segmentation using c... |

6 |
Spectrotemporal Modulation Subspace-Spanning Filter Bank Features for Robust Automatic Speech Recognition
- Schädler, Meyer, et al.
- 2011
(Show Context)
Citation Context ...ficient (GFCC) [16]: 44-dimensional feature vector per frame generated using 64 Gammatone filter banks (22-dimensional base features without C0, and their first derivatives). Gabor Filtering (GABF) =-=[17]-=-: Gabor filters applied for spectro-temporal information to yield 153-dimensional feature vectors. For these frontends, we adopted the simplified and supervised i-vector modeling framework [14, 15] wh... |

4 |
A multistream feature framework based on bandpass modulation filtering for robust speech recognition
- Nemala, Patil, et al.
- 2013
(Show Context)
Citation Context ...d to modulation representation using Fourier transforms along the spectral and temporal axis and modulation filtering is applied to extract key dynamics in the scale and rate dimensions, respectively =-=[9]-=-. The modulation filters used in this feature extraction scheme are broad enough to cover a wide range of dynamics (0–2 cycles per octave in the scale dimension and 0.25–25Hz in the rate dimension). C... |

4 |
On the use of nonlinear polynomial kernel SVMs in language recognition
- Yaman, Pelecanos, et al.
- 2012
(Show Context)
Citation Context ...eam except WB-MFCC1, two separate projection/classifier backends were developed. One backend consists of PCA-based feature space projection and SVM classification with the 5th-order polynomial kernel =-=[12]-=-. In [12], higher order polynomial kernels such as 5th or 6th were experimentally proven to outperform lower orders like 2nd or 3rd in SVM classification. The other, “advanced backend”, con1WB-MFCC ha... |

2 |
The IBM speech activity detection system for the DARPARATS program,” submitted to Interspeech
- Saon, Thomas, et al.
- 2013
(Show Context)
Citation Context ...m configurations. It is based on two SAD setups, one of which is a channel-dependent (CD) SAD utilizing multi-pass Hidden MarkovModel (HMM) Viterbi segmentation and fusion of multiple feature streams =-=[3]-=- and the other is channel-independent (CI) SADwith a two-pass modified Cumulative Sum (CUSUM) approach based on Maximum A Posteriori (MAP) adaptation [4]. Each setup has distinct ingredients as follow... |

2 |
Simplified supervised i-vector modeling and sparse representation with application to robust language recognition,” submitted to Comp. Speech Lang
- Li, Narayanan
(Show Context)
Citation Context ...s-all binary SVM classifiers for the target languages were trained using the LIBSVM package [13]. 3.2. USC With CI-SAD, USC implemented 4 sub-systems using simplified and supervised i-vector modeling =-=[14, 15]-=- based on 4 different frontend features, each of which feature warping was applied for: MFCC: 25ms Hamming window applied with a 10ms shift. 18-dimensional base features are appended with their delt... |

2 | Low-degree polynomial mapping of data for SVM
- Chang, Hsieh, et al.
- 2009
(Show Context)
Citation Context ...-Class Covariance Normalization (WCCN) [19] was applied on the resulting i-vectors before SVM. For fast training of SVM models, we used the 2nd-order polynomial mappings [20] in the LIBLINEAR package =-=[21]-=-, which resulted in a multi-class SVM classifier for each duration testing. Moreover, we sub sampled the in the SVM training to make it more balanced and efficient. 4. Experimental Results 4.1. Discus... |

1 |
Generalized lineer kernels for one-versus-all classification: application to speaker recognition
- Hatch, Stolcke
- 2006
(Show Context)
Citation Context ...further enhance the efficiency by using a pre-computed table. More details about the simplified and supervised i-vector modeling are provided in [14, 15]. Within-Class Covariance Normalization (WCCN) =-=[19]-=- was applied on the resulting i-vectors before SVM. For fast training of SVM models, we used the 2nd-order polynomial mappings [20] in the LIBLINEAR package [21], which resulted in a multi-class SVM c... |