
Full text loading...
Computer-assisted drug design is used to increase the chances of finding valuable drug candidates, by applying a wide range of computational methods, such as machine learning, structure-activity relationships, quantitative structure-activity relationships, molecular mechanics, quantum mechanics, molecular dynamics, and drug-protein docking. Machine learning is an important field of artificial intelligence, and includes a diversity of methods and algorithms that extract rules and functions from large datasets. The most important algorithms are linear discriminant analysis, artificial neural networks, decision trees, lazy learning, k-nearest neighbors, Bayesian methods, Gaussian processes, support vector machines, and kernel algorithms. This special issue presents a representative selection of machine learning applications for the virtual screening of chemical libraries. Machine learning is a rich and dynamic field, with new methods proposed constantly, which makes difficult to estimate the quality of predictions expected from a particular algorithm. Schwaighofer et al. explore the theoretical and practical aspects of estimating the confidence (error bars) of predictions obtained with quantitative structure-activity relationships based on three prevalent nonlinear regression methods, namely support vector regression, Gaussian processes, and decision trees. This practical aspect of estimating biological activities is currently overlooked in many structure-activity models, but the algorithms presented in this paper demonstrate an efficient approach in computing confidence levels for activity predictions. Naive Bayesian classifiers are robust and efficient algorithms for the rapid virtual screening of large compound libraries. Klon presents a substantial and comprehensive review of Bayesian classifiers that are currently used in drug design and discovery. Bayesian models have consistently been shown to be tolerant of noisy training data, often outperforming more elaborated machine learning algorithms, and may provide reliable predictions even when trained with limited amounts of experimental data. Alternatively, Bayesian classifiers have been used as an effective post-processing technique to integrate sets of predictions obtained with other machine learning methods. Ligand-protein docking is an effective approach in selecting promising inhibitors, but its main drawback is the large computation time necessary to screen large chemical libraries. Plewczynski et al. propose a hybrid method in which a fast machine learning algorithm, random forest, is coupled with ligand-protein docking to obtain a virtual screening procedure that demonstrates in practical applications both speed and reliable predictions. The random forest machine learning is trained with predictions obtained from ligand-protein docking and scoring, and thus the virtual screening procedure may be applied even when trained only with limited number of experimental data.