Deep learning in drug discovery: opportunities, challenges and future prospects
An important decision before undertaking either generative or predictive modeling of molecules is the choice of representation. Transforming a molecule into vectors of numbers that can be accepted by learning systems is called molecular featurization Four kinds of forms that describe molecules can be inputted into DL models: molecular descriptors (e.g. logP, polar surface area, etc.) and fingerprints (vectors that encode molecule structure), molecular graph-based models, simplified molecular input line entry system (SMILES) strings, and grids for convolutional neural networks (CNNs). An in-depth description of these molecular representations is beyond the scope of this review. For this the reader is referred to a number of excellent reviews and textbooks on the subject.
Convolutional neural networks (CNNs)
CNN’s are feed-forward neural networks, which have made remarkable achievements in the field of image recognition. CNN’s are inspired by the mammal’s visual cortex, which contains very small neuronal cells that is sensitive to specific areas of the visual field, called the receptive field. These cells act as local filters over the input space. Hubel and Wiesel further studied this idea in 1962. In their experiments, they showed that some specific neuronal cells in the brain responded (or excited) only in the presence of edges or lines of a certain orientation. For example, some neurons responded when exposed to vertical edges and some when had horizontal or diagonal edges. All of the neurons, which appeared to be spatially arranged in columnar structures, are able to produce visual perception. This idea of specialized structures within a system having specific assignments is one that machines use as well and is the basis of CNNs. A simple CNN is a succession of layers. A CNN takes an image and passes it through a series of convolution, activation, pooling/subsampling, and fully-connection layers to get an output. This output can be a single class or a probability of classes that best describe the image. Part of the networks may remove the pooling layer or fully-connection layer because of the special task.
DL applications in molecular property and activity prediction
DL has been employed in numerous cases for property and activity prediction. In many of these studies, a comparison to other ML techniques has been made demonstrating that DL achieves comparable or better performance than other ML techniques for different properties including prediction of biological activity,
ADMET properties and Physico-chemical parameters. One of the first applications of DL in drug discovery dates to 2012 when competition on the prediction of drug properties and activities organized by the pharmaceutical company Merck was won by a multitask deep feed-forward algorithm developed in academia, with an improvement of about 15% in relative accuracy even over Merck’s proprietary systems. The report disclosed that the performance of DNN changes depending on the activation function used and the network architecture (number of hidden layers as well as number of neurons in each layer). Mayr et al. developed the DeepTox pipeline, an ensemble-based model for predicting the toxic effects of chemical compounds, that won the Tox21 toxicology prediction challenge in 2014 on a dataset containing 12,000 drugs and environmental chemicals with up to 12 different toxicity endpoints. Of particular significance is the ability of this model to outperform other ML approaches in 9 out of 12 toxic endpoints. Following these works, many groups proved that massively multitask DL architectures perform better than single-task and Random Forrest (RF) models in property prediction. Others also compared several ML approaches with different ChEMBL data sets using random split and temporal cross-validation to show the superiority of DL.
CNN applications in predicting drug-target interactions
Drug-target scoring is a key step in a structure-based drug design pipeline. Choosing a suitable binding pose and predicting the binding affinity of a drug-target complex increases the chances of a successful virtual screening application. Specifically, CNN scoring functions have shown considerable skill in pose/affinity prediction and active/inactive detection for drug-target complexes, demonstrating an exceptional performance when compared with several well-performing scoring functions developed with both linear and nonlinear methods. For example, Rogoza et al. demonstrated that multilayer CNN models can successfully learn to differentiate between correct and incorrect binding poses when trained on 3D drug-target structures. To train and test the model as well as compare its performance with that of simple docking software, the Database of Useful Decoys-Enhanced (DUD-E), which contains a large number of experimentally verified actives and property-matched decoys, was utilized. The authors found that the scoring function obtained based on the CNN algorithm performed significantly better than Autodock Vina in terms of predicting both binding poses and affinities. Wallach et al. built AtomNet, a deep CNN, for bioactivity prediction of small molecules in drug discovery applications. The authors evaluated the accuracy of the model on DUD-E benchmark platform.
Author: Antonio Lavecchia