Results 1 -
4 of
4
766 Combinatorial QSAR Modeling of Chemical Toxicants Tested against
, 2007
"... Selecting most rigorous quantitative structure-activity relationship (QSAR) approaches is of great importance in the development of robust and predictive models of chemical toxicity. To address this issue in a systematic way, we have formed an international virtual collaboratory consisting of six in ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Selecting most rigorous quantitative structure-activity relationship (QSAR) approaches is of great importance in the development of robust and predictive models of chemical toxicity. To address this issue in a systematic way, we have formed an international virtual collaboratory consisting of six independent groups with shared interests in computational chemical toxicology. We have compiled an aqueous toxicity data set containing 983 unique compounds tested in the same laboratory over a decade against Tetrahymena pyriformis. A modeling set including 644 compounds was selected randomly from the original set and distributed to all groups that used their own QSAR tools for model development. The remaining 339 compounds in the original set (external set I) as well as 110 additional compounds (external set II) published recently by the same laboratory (after this computational study was already in progress) were used as two independent validation sets to assess the external predictive power of individual models. In total, our virtual collaboratory has developed 15 different types of QSAR models of aquatic toxicity for the training set. The internal
Sequence analysis Beyond the “Best ” Match: Machine Learning Annotation of Protein Sequences by Integration of Different Sources of Information
"... Accurate automatic assignment of protein functions remains a chal-lenge for genome annotation. We have developed and compared the automatic annotation of four bacterial genomes employing a 5-fold cross-validation procedure and several machine learning meth-ods. The analyzed genomes were manually ann ..."
Abstract
- Add to MetaCart
Accurate automatic assignment of protein functions remains a chal-lenge for genome annotation. We have developed and compared the automatic annotation of four bacterial genomes employing a 5-fold cross-validation procedure and several machine learning meth-ods. The analyzed genomes were manually annotated with FunCat categories in MIPS providing a gold standard. Features describing a pair of sequences rather than each sequence alone were used. The descriptors were derived from sequence alignment scores, InterPro domains, synteny information, sequence length, and calculated protein properties. Following training we scored all pairs from the validation sets, selected a pair with the highest predicted score and annotated the target protein with functional categories of the proto-type protein. The data integration using machine-learning methods provided significantly higher annotation accuracy compared to the use of individual descriptors alone. The neural network approach showed the best performance. The descriptors derived from the InterPro domains and sequence similarity provided the highest con-tribution to the method performance. The predicted annotation scores allow differentiation of reliable vs. non-reliable annotations. The developed approach was applied to annotate the protein se-quences from 180 complete bacterial genomes. The FUNcat Anno-tation Tool (FUNAT) is available on-line as Web Services at
Application of GA-MLR method in QSPR modeling of stability constants of diverse 15-crown-5 complexes with sodium cation
"... Abstract A genetic algorithm based multiple linear regressions (GA-MLR) method was applied for quantitative structure property relationship (QSPR) modeling of stability constants for 65 complexes of 1,4,7,10,13-pentaoxacyclopentadecane ethers (15C5) with sodium cation (Na ? ). The best subset of mo ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract A genetic algorithm based multiple linear regressions (GA-MLR) method was applied for quantitative structure property relationship (QSPR) modeling of stability constants for 65 complexes of 1,4,7,10,13-pentaoxacyclopentadecane ethers (15C5) with sodium cation (Na ? ). The best subset of molecular descriptors was selected with genetic algorithm subset selection procedure, to a variety of theoretical molecular descriptors, calculated by the Dragon software. The MLR model was developed with particular attention to external validation and applicability domain (AD). The validation was performed on the internal and external validation sets. The QSPR model presented in this study showed most accurate predictions with the leave one out cross validated variance (Q 2 looÀcv = 0.88) and the external-validated variance (Q 2 ext = 0.82). The AD of the models was analysed by the leverage approach.
BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm633 Sequence analysis
, 2008
"... Beyond the ‘best ’ match: machine learning annotation of protein sequences by integration of different sources of information ..."
Abstract
- Add to MetaCart
Beyond the ‘best ’ match: machine learning annotation of protein sequences by integration of different sources of information