The automation of physical experiments through robotics to effectively perform hundreds of thousands or millions of experiments in a short time has opened the door to a large-scale brute-force approach to drug discovery. This approach is generally called High Throughput Screening (HTS). The motivation behind this approach is to reduce, and possibly eliminate, time-consuming and costly manual interventions by physically synthesizing and testing a very large number of compounds. This HTS brute-force ideal can, perhaps, be realized when a few million compounds need to be tested; however, two factors will likely interfere with the HTS ideal:
These two factors strongly suggest that "Brute Force HTS" will have to become "Smart HTS" rather quickly. In other words, to reduce the total number of experiments an experiment/analysis cycle will have to be developed so that, for example, the results of an HTS run on 100,000 compounds are analyzed and used to determine the next 100,000 compounds to be tested.
It is generally accepted that the structure, composition, or physical properties of a ligand directly affect its biological activity against a target. The attempt to transform this qualitative belief into a quantitative method of activity assessment is known as the determination of Quantitative Structure Activity Relationships (QSAR). Determining a QSAR generally proceeds as follows:
Express the ligand in some quantitative manner; that is, select a collection of numbers that characterize the ligand. These numbers are called molecular descriptors or, descriptors.
QSAR methods are used to generalize experimental data in order to design or optimize new biologically active compounds that are more potent, less toxic, more selective, or satisfy other relevant criteria. Currently, QSAR techniques are applied to relatively small data sets consisting of several tens, or perhaps several hundreds, of molecules for which activity measurements are available. These activity measurements are performed manually in the laboratory and produce relatively accurate measurements (e.g., IC50 numbers: the concentration of ligand required to attain 50% inhibition). The most widely used method of determining the functional relationship is the statistical technique of regression or least squares.
It is natural and tempting to assume that all one needs to do is apply current QSAR methodology to the large scale data sets of HTS and provide the necessary analysis portion of the proposed HTS experiment/analysis cycle. Unfortunately, two critical factors render the current QSAR technology practically useless for HTS:
The problems do not lie with the concepts of QSAR itself but with the underlying mathematical techniques used to determine the functional relationship between structure and biological activity. Indeed, the fundamentals of QSAR are a promising avenue for HTS data analysis.
Chemical Computing Group Inc. has recently developed (and has sought patent protection for) a new technology, called QuaSAR-Binary™, designed to analyze the binary results of HTS and make predictions regarding the biological activity of untested compounds. This new "QSAR for HTS" methodology successfully uses error-prone binary activity measurements as input. The new technology has several important and immediate applications:
These applications need not run sequentially; in fact, a parallel implementation would exploit the HTS data in many diverse ways:
QuaSAR-Binary is fast enough to keep pace with the HTS experiments themselves. This timely production of HTS analyses means that QuaSAR-Binary will not be the bottleneck in the HTS experiment/analysis cycle.
QuaSAR-Binary is a fundamental away from the empirically fitted functional relationship methods of traditional QSAR methodology. Rather that fitting the parameters of a model to experimental data, QuaSAR-Binary builds predictive binary models through the use of large-scale probabilistic and statistical inference. Because data fitting is not used, the predictive capacity of QuaSAR-Binary is not interpolative, but based on generalizations substantiated by the experimental data. Arguably, QuaSAR-Binary analyzes data and makes predictions similar to the way a scientist would: by examining past experience, weighing the alternatives and making a recommendation regarding what to do next.
Chemical Computing Group plans to apply this new technology to other binary criteria relevant to the pharmaceutical industry. Any True/False or Pass/Fail criterion is subject to QuaSAR-Binary analysis provided that there is sufficient experimental data available. The possible applications "is drug-like", toxicity, and bioavailability. The generality of this new technology and its technical foundations will allow it to be applied to a wide variety of data analysis problems. QuaSAR-Binary will have a profound impact not only on accelerated drug discovery but also the analysis of complex biological systems.