Binary QSAR: A New Technology for HTS and UHTS Data Analysis


P. Labute
Chemical Computing Group Inc.


The automation of physical experiments through robotics to effectively perform hundreds of thousands or millions of experiments in a short time has opened the door to a large-scale brute-force approach to drug discovery. This approach is generally called High Throughput Screening (HTS). The motivation behind this approach is to reduce, and possibly eliminate, time-consuming and costly manual interventions by physically synthesizing and testing a very large number of compounds. This HTS brute-force ideal can, perhaps, be realized when a few million compounds need to be tested; however, two factors will likely interfere with the HTS ideal:

These two factors strongly suggest that "Brute Force HTS" will have to become "Smart HTS" rather quickly. In other words, to reduce the total number of experiments an experiment/analysis cycle will have to be developed so that, for example, the results of an HTS run on 100,000 compounds are analyzed and used to determine the next 100,000 compounds to be tested.

It is generally accepted that the structure, composition, or physical properties of a ligand directly affect its biological activity against a target. The attempt to transform this qualitative belief into a quantitative method of activity assessment is known as the determination of Quantitative Structure Activity Relationships (QSAR). Determining a QSAR generally proceeds as follows:

  1. Define a quantitative measure of activity (e.g., the amount of ligand needed to produce an interference with the functioning of the target).

  2. Express the ligand in some quantitative manner; that is, select a collection of numbers that characterize the ligand. These numbers are called molecular descriptors or, descriptors.

  3. Determine a functional relationship between activity and the selected descriptors; that is, search for a mathematical function, f, that has the property that "activity = f (descriptors)" to a suitably high level of accuracy.

  4. Use the determined activity measure, molecular descriptors and determined functional relationship to predict the activity of new candidate ligands.

QSAR methods are used to generalize experimental data in order to design or optimize new biologically active compounds that are more potent, less toxic, more selective, or satisfy other relevant criteria. Currently, QSAR techniques are applied to relatively small data sets consisting of several tens, or perhaps several hundreds, of molecules for which activity measurements are available. These activity measurements are performed manually in the laboratory and produce relatively accurate measurements (e.g., IC50 numbers: the concentration of ligand required to attain 50% inhibition). The most widely used method of determining the functional relationship is the statistical technique of regression or least squares.

It is natural and tempting to assume that all one needs to do is apply current QSAR methodology to the large scale data sets of HTS and provide the necessary analysis portion of the proposed HTS experiment/analysis cycle. Unfortunately, two critical factors render the current QSAR technology practically useless for HTS:

  1. Precision Loss. HTS has given rise to the following trade-off: higher throughput reduces the precision of the activity measurement. Many HTS technologies report a binary condition: a candidate ligand is either "active" or "inactive." Some HTS technologies report a discrete measure; e.g., activity on a scale from 1 to 10. In either case, current QSAR technology requires a continuous activity measurement; e.g., accurate to 2 or 3 decimal places.

  2. Significant Error Rate. Many HTS techniques have the unfortunate property that the activity measurement is error prone. The error rate is significant enough to warrant special attention since current QSAR technology is very sensitive to outliers and errors. A significant error rate will neutralize the predictive capabilities of current QSAR technology.

The problems do not lie with the concepts of QSAR itself but with the underlying mathematical techniques used to determine the functional relationship between structure and biological activity. Indeed, the fundamentals of QSAR are a promising avenue for HTS data analysis.

Chemical Computing Group Inc. has recently developed (and has sought patent protection for) a new technology, called QuaSAR-Binary™, designed to analyze the binary results of HTS and make predictions regarding the biological activity of untested compounds. This new "QSAR for HTS" methodology successfully uses error-prone binary activity measurements as input. The new technology has several important and immediate applications:

These applications need not run sequentially; in fact, a parallel implementation would exploit the HTS data in many diverse ways:

QuaSAR-Binary is fast enough to keep pace with the HTS experiments themselves. This timely production of HTS analyses means that QuaSAR-Binary will not be the bottleneck in the HTS experiment/analysis cycle.

QuaSAR-Binary is a fundamental away from the empirically fitted functional relationship methods of traditional QSAR methodology. Rather that fitting the parameters of a model to experimental data, QuaSAR-Binary builds predictive binary models through the use of large-scale probabilistic and statistical inference. Because data fitting is not used, the predictive capacity of QuaSAR-Binary is not interpolative, but based on generalizations substantiated by the experimental data. Arguably, QuaSAR-Binary analyzes data and makes predictions similar to the way a scientist would: by examining past experience, weighing the alternatives and making a recommendation regarding what to do next.

Chemical Computing Group plans to apply this new technology to other binary criteria relevant to the pharmaceutical industry. Any True/False or Pass/Fail criterion is subject to QuaSAR-Binary analysis provided that there is sufficient experimental data available. The possible applications "is drug-like", toxicity, and bioavailability. The generality of this new technology and its technical foundations will allow it to be applied to a wide variety of data analysis problems. QuaSAR-Binary will have a profound impact not only on accelerated drug discovery but also the analysis of complex biological systems.