The purpose of QuaSAR-Descriptor is to calculate properties of molecules that serve as numerical descriptions or characterizations of molecules in other calculations such as QSAR, diversity analysis or combinatorial library design. In principle, because any molecular property may be used as a molecular descriptor, there is no single calculation procedure for QuaSAR-Descriptor. Rather, QuaSAR-Descriptor is a forum for the calculation of many descriptors.
A QuaSAR-Descriptor calculation proceeds as follows. Given a molecular database with a molecule field, a set of numerical properties will be calculated for each molecule and stored in the database. Every descriptor is given a unique name, or code, which identifies the descriptor. These codes are used as database field names. QuaSAR-Descriptor will overwrite fields with names identical to descriptor codes. When QuaSAR-Descriptor is invoked, the following panel appears:
This panel allows for selecting the list of descriptors to calculate. A keyword search facility can be used to restrict the list to particular descriptor families.
Descriptors are partitioned into classes. Each class indicates what is assumed by the descriptor calculators about the molecule presented:
2D molecular descriptors are defined to be numerical properties that can be calculated from the connection table representation of a molecule (e.g., elements, formal charges and bonds, but not atomic coordinates). 2D descriptors are, therefore, not dependent on the conformation of a molecule and are most suitable for large database studies.
Many descriptors make use of several fundamental quantities that can be computed from a chemical structure. This section will define these fundamental quantities. For purposes of illustration, the following chemical structure will be used:

The fundamental quantities of a chemical structure depend solely on the structure as drawn, i.e., no modifications to the structure are implied with the exception of the addition or subtraction of hydrogen atoms to full valence.
Z denotes the atomic number of an atom; lone pair pseudo-atoms (LP) are given an atomic number of 0. Heavy atoms are atoms that have an atomic number strictly greater than 1 (not H nor LP). A trivial atom is an LP pseudo-atom or a hydrogen with exactly one heavy neighbor. In the reference structure, H1, LP1 and LP2 are trivial.
The hydrogen count, h, of an atom is the number of hydrogens to which it is (or should be) attached. This count includes all hydrogen atoms that are necessary to fill valence. In the reference structure, F has h = 0, N has h = 1 and O1 has h = 1.
The heavy degree, d, of an atom is the number of heavy atoms to which it is bonded. That is, d is the number of bonded neighbors of the atom in the hydrogen suppressed graph. In the reference structure, F has d = 1, C6 has d = 3 and N has d = 2.
The following physical properties can be calculated from the connection table (with no dependence on conformation) of a molecule:
| Code | Description |
| apol | Sum of the atomic polarizabilities (including implicit hydrogens) with polarizabilities taken from [CRC 1994]. |
| bpol | Sum of the absolute value of the difference between atomic polarizabilities of all bonded atoms in the molecule (including implicit hydrogens) with polarizabilities taken from [CRC 1994]. |
| FCharge | Total charge of the molecule (sum of formal charges). |
| mr | Molecular refractivity (including implicit hydrogens). This property is calculated from an 11 descriptor linear model [MREF 1998] with r2 = 0.997, RMSE = 0.168 on 1,947 small molecules. |
| SMR | Molecular refractivity (including implicit hydrogens). This property is an atomic contribution model [Crippen 1999] that assumes the correct protonation state (washed structures). The model was trained on ~7000 structures and results may vary from the mr descriptor. |
| Weight | Molecular weight (including implicit hydrogens) with atomic weights taken from [CRC 1994]. |
| logP(o/w) | Log of the octanol/water partition coefficient (including implicit hydrogens). This property is calculated from a linear atom type model [LOGP 1998] with r2 = 0.931, RMSE=0.393 on 1,847 molecules. |
| SlogP | Log of the octanol/water partition coefficient (including implicit hydrogens). This property is an atomic contribution model [Crippen 1999] that calculates logP from the given structure; i.e., the correct protonation state (washed structures). Results may vary from the logP(o/w) descriptor. The training set for SlogP was ~7000 structures. |
| vdw_vol | van der Waals volume calculated using a connection table approximation. |
| density | Molecular mass density: Weight divided by vdw_vol. |
| vdw_area | Area of van der Waals surface calculated using a connection table approximation. |
The Subdivided Surface Areas are descriptors based on an approximate accessible van der Waals surface area calculation for each atom, vi along with some other atomic property, pi. The vi is calculated using a connection table approximation. Each descriptor in a series is defined to be the sum of the vi over all atoms, i such that pi is in a specified range (a,b].
In the descriptions to follow, Li denotes the contribution to logP(o/w) for atom i as calculated in the SlogP descriptor [Crippen 1999]. Ri denotes the contribution to Molar Refractivity for atom i as calculated in the SMR descriptor [Crippen 1999]. The ranges were determined by percentile subdivision over a large collection of compounds.
| Code | Description |
| SlogP_VSA0 | Sum of vi such that Li <= -0.4. |
| SlogP_VSA1 | Sum of vi such that Li is in (-0.4,-0.2]. |
| SlogP_VSA2 | Sum of vi such that Li is in (-0.2,0]. |
| SlogP_VSA3 | Sum of vi such that Li is in (0,0.1]. |
| SlogP_VSA4 | Sum of vi such that Li is in (0.1,0.15]. |
| SlogP_VSA5 | Sum of vi such that Li is in (0.15,0.20]. |
| SlogP_VSA6 | Sum of vi such that Li is in (0.20,0.25]. |
| SlogP_VSA7 | Sum of vi such that Li is in (0.25,0.30]. |
| SlogP_VSA8 | Sum of vi such that Li is in (0.30,0.40]. |
| SlogP_VSA9 | Sum of vi such that Li > 0.40. |
| SMR_VSA0 | Sum of vi such that Ri is in [0,0.11]. |
| SMR_VSA1 | Sum of vi such that Ri is in (0.11,0.26]. |
| SMR_VSA2 | Sum of vi such that Ri is in (0.26,0.35]. |
| SMR_VSA3 | Sum of vi such that Ri is in (0.35,0.39]. |
| SMR_VSA4 | Sum of vi such that Ri is in (0.39,0.44]. |
| SMR_VSA5 | Sum of vi such that Ri is in (0.44,0.485]. |
| SMR_VSA6 | Sum of vi such that Ri is in (0.485,0.56]. |
| SMR_VSA7 | Sum of vi such that Ri > 0.56. |
The atom count and bond count descriptors are functions of the counts of atoms and bonds (subdivided according to various criteria).
| Code | Description |
| a_aro | Number of aromatic atoms. |
| a_count | Number of atoms (including implicit hydrogens). This is calculated as the sum of (1 + hi) over all non-trivial atoms i. |
| a_heavy | Number of heavy atoms #{Zi | Zi > 1} |
| a_ICM | Atom information content (mean). This is the entropy of the element distribution in the molecule (including implicit hydrogens but not lone pair pseudo-atoms). Let ni be the number of occurrences of atomic number i in the molecule. Let pi = ni / n where n is the sum of the ni. The value of a_ICM is the negative of the sum over all i of pi log pi. |
| a_IC | Atom information content (total). This is a_ICM times n (as defined in the definition of a_ICM). |
| a_nH | Number of hydrogen atoms (including implicit hydrogens). This is calculated as the sum of hi over all non-trivial atoms i plus the number of non-trivial hydrogen atoms. |
| a_nB | Number of boron atoms: #{Zi | Zi = 5} |
| a_nC | Number of carbon atoms: #{Zi | Zi = 6} |
| a_nN | Number of nitrogen atoms: #{Zi | Zi = 7} |
| a_nO | Number of oxygen atoms: #{Zi | Zi = 8} |
| a_nF | Number of fluorine atoms: #{Zi | Zi = 9} |
| a_nP | Number of phosphorus atoms: #{Zi | Zi = 15} |
| a_nS | Number of sulfur atoms: #{Zi | Zi = 16} |
| a_nCl | Number of chlorine atoms: #{Zi | Zi = 17} |
| a_nBr | Number of bromine atoms: #{Zi | Zi = 35} |
| a_nI | Number of iodine atoms: #{Zi | Zi = 53} |
| b_1rotN | Number of rotatable single bonds. A bond is rotatable if it is not in a ring, and neither atom of the bond is such that (di+hi) < 2. |
| b_1rotR | Fraction of rotatable single bonds: b_1rotN divided by b_count. |
| b_ar | Number of aromatic bonds. |
| b_count | Number of bonds (including implicit hydrogens). This is calculated as the sum of (di/2 + hi) over all non-trivial atoms i. |
| b_double | Number of double bonds. Aromatic bonds are not considered to be double bonds. |
| b_heavy | Number of bonds between heavy atoms |
| b_rotN | Number of rotatable bonds. A bond is rotatable if it is not in a ring, and neither atom of the bond is such that (di+hi) < 2. |
| b_rotR | Fraction of rotatable bonds: b_rotN divided by b_count. |
| b_single | Number of single bonds (including implicit hydrogens). Aromatic bonds are not considered to be single bonds. |
| b_triple | Number of triple bonds. Aromatic bonds are not considered to be triple bonds. |
| VAdjMa | Vertex adjacency information (magnitude): 1 + log2 m where m is the number of heavy-heavy bonds. If m is zero, then zero is returned. |
| VAdjEq | Vertex adjacency information (equality): -(1-f)log2(1-f) - f log2 f where f = (n2 - m) / n2, n is the number of heavy atoms and m is the number of heavy-heavy bonds. If f is not in the open interval (0,1), then 0 is returned. |
For a heavy atom i let vi = (pi - hi ) / (Zi - pi - 1) where pi is the number of s and p valence electrons of atom i. The Kier and Hall chi connectivity indices are calculated from the di and vi values. The Kier and Hall kappa molecular shape indices [Hall 1991] compare the molecular graph with minimal and maximal molecular graphs, and are intended to capture different aspects of molecular shape. In the following description, n denotes the number of atoms in the hydrogen suppressed graph, m is the number of bonds in the hydrogen suppressed graph and a is the sum of (ri/rc - 1) where ri is the covalent radius of atom i and rc is the covalent radius of a carbon atom. Also, let p2 denote the number of paths of length 2 and let p3 denote the number of paths of length 3.
| Code | Description |
| chi0 | Atomic connectivity index (order 0) from [Hall 1991] and [Hall 1997]. This is calculated as the sum of 1/sqrt(di) over all heavy atoms i with di > 0. |
| chi0_C | Carbon connectivity index (order 0). This is calculated as the sum of 1/sqrt(di) over all carbon atoms i with di > 0. |
| chi1 | Atomic connectivity index (order 1) from [Hall 1991] and [Hall 1997]. This is calculated as the sum of 1/sqrt(didj) over all bonds between heavy atoms i and j where i < j. |
| chi1_C | Carbon connectivity index (order 1). This is calculated as the sum of 1/sqrt(didj) over all bonds between carbon atoms i and j where i < j. |
| chi0v | Atomic valence connectivity index (order 0) from [Hall 1991] and [Hall 1997]. This is calculated as the sum of 1/sqrt(vi) over all heavy atoms i with vi > 0. |
| chi0v_C | Carbon valence connectivity index (order 0). This is calculated as the sum of 1/sqrt(vi) over all carbon atoms i with vi > 0. |
| chi1v | Atomic valence connectivity index (order 1) from [Hall 1991] and [Hall 1997]. This is calculated as the sum of 1/sqrt(vivj) over all bonds between heavy atoms i and j where i < j. |
| chi1v_C | Carbon valence connectivity index (order 1). This is calculated as the sum of 1/sqrt(vivj) over all bonds between carbon atoms i and j where i < j. |
| Kier1 | First kappa shape index: (n-1)2 / m2 [Hall 1991] |
| Kier2 | Second kappa shape index: (n-1)2 / m2 [Hall 1991] |
| Kier3 | Third kappa shape index: (n-1) (n-3)2 / p32 for odd n, and (n-3) (n-2)2 / p32 for even n [Hall 1991] |
| KierA1 | First alpha modified shape index: s (s-1)2 / m2 where s = n + a [Hall 1991] |
| KierA2 | Second alpha modified shape index: s (s-1)2 / m2 where s = n + a [Hall 1991] |
| KierA3 | Third alpha modified shape index: (n-1) (n-3)2 / p32 for odd n, and (n-3) (n-2)2 / p32 for even n where s = n + a [Hall 1991] |
| KierFlex | Kier molecular flexibility index: (KierA1) (KierA2) / n [Hall 1991] |
| zagreb | Zagreb index: the sum of di2 over all heavy atoms i. |
The adjacency matrix, M, of a chemical structure is defined by the elements [Mij] where Mij is 1 if atoms i and j are bonded and zero otherwise. The distance matrix, D, of a chemical structure is defined by the elements [Dij] where Dij is the length of the shortest path from atoms i to j; zero is used if atoms i and j are not part of the same connected component. The adjacency matrix of CH3CH=O is displayed on the left and its distance matrix is displayed on the right (below):
C1 0 1 1 1 1 0 0 0 1 1 1 1 2 2 H2 1 0 0 0 0 0 0 1 0 2 2 2 3 3 H3 1 0 0 0 0 0 0 1 2 0 2 2 3 3 H4 1 0 0 0 0 0 0 1 2 2 0 2 3 3 C5 1 0 0 0 0 1 1 1 2 2 2 0 1 1 H6 0 0 0 0 1 0 0 2 3 3 3 1 0 2 O7 0 0 0 0 1 0 0 2 3 3 3 1 2 0 |
|
The following descriptors are calculated from the distance and adjacency matrices of the heavy atoms:
| Code | Description |
| balabanJ | Balaban's connectivity topological index [Balaban 1982]. |
| diameter | Largest value in the distance matrix [Petitjean 1992]. |
| petitjean | Value of (diameter - radius) / diameter as defined in [Petitjean 1992]. |
| radius | If ri is the largest matrix entry in row i of the distance matrix D, then the radius is defined as the smallest of the ri [Petitjean 1992]. |
| VDistEq | If m is the sum of the distance matrix entries then VdistEq is defined to be the sum of log2 m - pi log2 pi / m where pi is the number of distance matrix entries equal to i. |
| VDistMa | If m is the sum of the distance matrix entries then VDistMa is defined to be the sum of log2 m - Dij log2 Dij / m over all i and j. |
| weinerPath | Wiener path number: half the sum of all the distance matrix entries as defined in [Balaban 1979] and [Wiener 1947]. |
| weinerPol | Wiener polarity number: half the sum of all the distance matrix entries with a value of 3 as defined in [Balaban 1979]. |
The Pharmacophore Atom Type descriptors consider only the heavy atoms of a molecule and assign a type to each atom (using a rule-based system). That is, hydrogens are suppressed during the calculation. The feature set is Donor, Acceptor, Polar (both Donor and Acceptor), Positive (base), Negative (acid), Hydrophobe and Other. Assignments may take into account implied protonation, deprotonation, keto/enol considerations and tautomerism at a biologically relevant pH. For example, -COOH will be typed in its deprotonated form regardless of how the structure is stored.
| Code | Description |
| a_acc | Number of hydrogen bond acceptor atoms (not counting acidic atoms but counting atoms that are both hydrogen bond donors and acceptors such as -OH). |
| a_acid | Number of acidic atoms. |
| a_base | Number of basic atoms. |
| a_don | Number of hydrogen bond donor atoms (not counting basic atoms but counting atoms that are both hydrogen bond donors and acceptors such as -OH). |
| a_hyd | Number of hydrophobic atoms. |
| vsa_acc | Approximation to the sum of VDW surface areas of pure hydrogen bond acceptors (not counting acidic atoms and atoms that are both hydrogen bond donors and acceptors such as -OH). |
| vsa_acid | Approximation to the sum of VDW surface areas of acidic atoms. |
| vsa_base | Approximation to the sum of VDW surface areas of basic atoms. |
| vsa_don | Approximation to the sum of VDW surface areas of pure hydrogen bond donors (not counting basic atoms and atoms that are both hydrogen bond donors and acceptors such as -OH). |
| vsa_hyd | Approximation to the sum of VDW surface areas of hydrophobic atoms. |
| vsa_other | Approximation to the sum of VDW surface areas of atoms typed as "other". |
| vsa_pol | Approximation to the sum of VDW surface areas of polar (both hydrogen bond donors and acceptors) atoms (such as -OH). |
Descriptors that depend on the partial charge of each atom of a chemical structure require calculation of those partial charges. An unfortunate complication is the fact that there are numerous methods of calculating partial charges. Rather than enforce a particular method, MOE provides several versions of most of the charge-dependent descriptors. The only difference between these variants is the source of the partial charges. The following variants are supported: PEOE, Q (described below).
PEOE. The Partial Equalization of Orbital Electronegativities (PEOE) method of calculating atomic partial charges [Gasteiger 1980] is a method in which charge is transferred between bonded atoms until equilibrium. To guarantee convergence, the amount of charge transferred at each iteration is damped with an exponentially decreasing scale factor. The amount of charge transferred, dqij, between atoms i and j when Xi > Xj is
dqij = (1/2k) (Xi - Xj) / Xj+
where Xj+ is the electronegativity of the positive ion of atom j; Xi is the electronegativity of atom i (quadratically dependent on partial charge); and k is the iteration number of the algorithm. The PEOE charges depend only on the connectivity of the input structures: elements, formal charges and bond orders. Descriptors using the PEOE charges are prefixed with PEOE_.
Q. Descriptors prefixed with Q_ use the partial charges stored with each structure in the database. In other words, no partial charge calculation is made and it is assumed that some external program has been used to calculate the atomic partial charges. This dependence can be a subtle source of error if, for example, the wrong charges are stored when descriptors are recalculated (e.g., when evaluating QSAR models on novel structures).
Let qi denote the partial charge of atom i as defined above. Let vi be the van der Waals surface area of atom i (as calculated by a connection table approximation). The following descriptors are calculated:
| Code | Description |
| Q_PC+ PEOE_PC+ | Total positive partial charge: the sum of the positive qi. Q_PC+ is identical to PC+ which has been retained for compatibility. |
| Q_PC- PEOE_PC- | Total negative partial charge: the sum of the negative qi. Q_PC- is identical to PC- which has been retained for compatibility. |
| Q_RPC+ PEOE_RPC+ | Relative positive partial charge: the largest positive qi divided by the sum of the positive qi. Q_RPC+ is identical to RPC+ which has been retained for compatibility. |
| Q_PRC- PEOE_RPC- | Relative negative partial charge: the smallest negative qi divided by the sum of the negative qi. Q_RPC- is identical to RPC- which has been retained for compatibility. |
| Q_VSA_POS PEOE_VSA_POS | Total positive van der Waals surface area. This is the sum of the vi such that qi is non-negative. The vi are calculated using a connection table approximation. |
| Q_VSA_NEG PEOE_VSA_NEG | Total negative van der Waals surface area. This is the sum of the vi such that qi is negative. The vi are calculated using a connection table approximation. |
| Q_VSA_PPOS PEOE_VSA_PPOS | Total positive polar van der Waals surface area. This is the sum of the vi such that qi is greater than 0.2. The vi are calculated using a connection table approximation. |
| Q_VSA_PNEG PEOE_VSA_PNEG | Total negative polar van der Waals surface area. This is the sum of the vi such that qi is less than -0.2. The vi are calculated using a connection table approximation. |
| Q_VSA_HYD PEOE_VSA_HYD | Total hydrophobic van der Waals surface area. This is the sum of the vi such that |qi| is less than or equal to 0.2. The vi are calculated using a connection table approximation. |
| Q_VSA_POL PEOE_VSA_POL | Total polar van der Waals surface area. This is the sum of the vi such that |qi| is greater than 0.2. The vi are calculated using a connection table approximation. |
| Q_VSA_FPOS PEOE_VSA_FPOS | Fractional positive van der Waals surface area. This is the sum of the vi such that qi is non-negative divided by the total surface area. The vi are calculated using a connection table approximation. |
| Q_VSA_FNEG PEOE_VSA_FNEG | Fractional negative van der Waals surface area. This is the sum of the vi such that qi is negative divided by the total surface area. The vi are calculated using a connection table approximation. |
| Q_VSA_FPPOS PEOE_VSA_FPPOS | Fractional positive polar van der Waals surface area. This is the sum of the vi such that qi is greater than 0.2 divided by the total surface area. The vi are calculated using a connection table approximation. |
| Q_VSA_FPNEG PEOE_VSA_FPNEG | Fractional negative polar van der Waals surface area. This is the sum of the vi such that qi is less than -0.2 divided by the total surface area. The vi are calculated using a connection table approximation. |
| Q_VSA_FHYD PEOE_VSA_FHYD | Fractional hydrophobic van der Waals surface area. This is the sum of the vi such that |qi| is less than or equal to 0.2 divided by the total surface area. The vi are calculated using a connection table approximation. |
| Q_VSA_FPOL PEOE_VSA_FPOL | Fractional polar van der Waals surface area. This is the sum of the vi such that |qi| is greater than 0.2 divided by the total surface area. The vi are calculated using a connection table approximation. |
| PEOE_VSA+6 | Sum of vi where qi is greater than 0.3. |
| PEOE_VSA+5 | Sum of vi where qi is in the range [0.25,0.30). |
| PEOE_VSA+4 | Sum of vi where qi is in the range [0.20,0.25). |
| PEOE_VSA+3 | Sum of vi where qi is in the range [0.15,0.20). |
| PEOE_VSA+2 | Sum of vi where qi is in the range [0.10,0.15). |
| PEOE_VSA+1 | Sum of vi where qi is in the range [0.05,0.10). |
| PEOE_VSA+0 | Sum of vi where qi is in the range [0.00,0.05). |
| PEOE_VSA-0 | Sum of vi where qi is in the range [-0.05,0.00). |
| PEOE_VSA-1 | Sum of vi where qi is in the range [-0.10,-0.05). |
| PEOE_VSA-2 | Sum of vi where qi is in the range [-0.15,-0.10). |
| PEOE_VSA-3 | Sum of vi where qi is in the range [-0.20,-0.15). |
| PEOE_VSA-4 | Sum of vi where qi is in the range [-0.25,-0.20). |
| PEOE_VSA-5 | Sum of vi where qi is in the range [-0.30,-0.25). |
| PEOE_VSA-6 | Sum of vi where qi is less than -0.30. |
There are two types of 3D molecular descriptors: those that depend on internal coordinates only and those that depend on absolute orientation. 3D molecular descriptors are classified as "i3D" for internal coordinate dependent 3D and "x3D" for external coordinate dependent. A good example is the dipole moment: the magnitude of the dipole moment does not depend on absolute orientation in space; however, the x component of the dipole moment does depend on absolute orientation.
The energy descriptors use the MOE potential energy model to calculate energetic quantities from stored 3D conformations. Most of the energy descriptors belong to the the i3D class; that is, they depend on internal coordinates alone and not on an external reference frame. Descriptors that rely on an external reference frame are clearly indicated in the list below.
| Code | Description |
| E | Value of the potential energy. The state of all term enable flags will be honored (in addition to the term weights). This means that the current potential setup accurately reflects what will be calculated. |
| E_ang | Angle bend potential energy. In the Potential Setup panel, the term enable flag is ignored, but the term weight is applied. |
| E_ele | Electrostatic component of the potential energy. In the Potential Setup panel, the term enable flag is ignored, but the term weight is applied. |
| E_nb | Value of the potential energy with all non-bonded terms disabled. Thus, the state of the non-bonded term enable flags will be honored (in addition to the term weights). |
| E_oop | Out-of-plane potential energy. In the Potential Setup panel, the term enable flag is ignored, but the term weight is applied. |
| E_sol | Solvation energy. In the Potential Setup panel, the term enable flag is ignored, but the term weight is applied. |
| E_stb | Bond stretch-bend cross-term potential energy. In the Potential Setup panel, the term enable flag is ignored, but the term weight is applied. |
| E_str | Bond stretch potential energy. In the Potential Setup panel, the term enable flag is ignored, but the term weight is applied. |
| E_strain | Local strain energy: the current energy minus the value of the energy at a near local minimum. The current energy is calculated as for the E descriptor. The local minimum energy is the value of the E descriptor after first performing an energy minimization. Current chirality is preserved and charges are left undisturbed during minimization. The structure in the database is not modified (results of the minimization are discarded). |
| E_tor | Torsion (proper and improper) potential energy. In the Potential Setup panel, the term enable flag is ignored, but the term weight is applied. |
| E_vdw | van der Waals component of the potential energy. In the Potential Setup panel, the term enable flag is ignored, but the term weight is applied. |
| E_rele | Electrostatic interaction energy (external reference frame: x3d) between the stored molecule and the atoms currently loaded. In the Potential Setup panel, the term enable flag is ignored, but the term weight is applied. Partial charges are assumed to be correct in the database molecule as well as the currently loaded atoms. |
| E_rsol | Solvation free energy difference (external reference frame: x3d). Let L be the free energy of solvation of the stored molecule (ligand), R be the free energy of solvation of the atoms currently loaded (receptor), and G be the free energy of solvation of the RL complex. Consequently, the returned value is G - L - R. In the Potential Setup panel, the term enable flag is ignored, but the term weight is applied. Partial charges are assumed to be correct in the database molecule as well as the currently loaded atoms. |
| E_rvdw | van der Waals interaction energy (external reference frame: x3d) between the stored molecule and the atoms currently loaded. In the Potential Setup panel, the term enable flag is ignored, but the term weight is applied. |
The following descriptors depend on the structure connectivity and conformation:
| Code | Description |
| ASA | Water accessible surface area calculated using a radius of 1.4 A for the water molecule. A polyhedral representation is used for each atom in calculating the surface area. |
| dens | Mass density: molecular weight divided by van der Waals volume as calculated in the vol descriptor. |
| glob | Globularity, or inverse condition number (smallest eigenvalue divided by the largest eigenvalue) of the covariance matrix of atomic coordinates. A value of 1 indicates a perfect sphere while a value of 0 indicates a two- or one-dimensional object. |
| pmi | Principal moment of inertia. |
| pmiX | x component of the principal moment of inertia (external coordinates). |
| pmiY | y component of the principal moment of inertia (external coordinates). |
| pmiZ | z component of the principal moment of inertia (external coordinates). |
| rgyr | Radius of gyration. |
| std_dim1 | Standard dimension 1: the square root of the largest eigenvalue of the covariance matrix of the atomic coordinates. A standard dimension is equivalent to the standard deviation along a principal component axis. |
| std_dim2 | Standard dimension 2: the square root of the second largest eigenvalue of the covariance matrix of the atomic coordinates. A standard dimension is equivalent to the standard deviation along a principal component axis. |
| std_dim3 | Standard dimension 3: the square root of the third largest eigenvalue of the covariance matrix of the atomic coordinates. A standard dimension is equivalent to the standard deviation along a principal component axis. |
| vol | van der Waals volume calculated using a grid approximation (spacing 0.75 A). |
| VSA | van der Waals surface area. A polyhedral representation is used for each atom in calculating the surface area. |
The following descriptors depend upon the stored partial charges of the molecules and their conformations. Accessible surface area refers to the water accessible surface area using a probe radius of 1.4 Angstroms. Let qi denote the partial charge of atom i.
| Code | Description |
| ASA+ | Water accessible surface area of all atoms with positive partial charge (strictly greater than 0). |
| ASA- | Water accessible surface area of all atoms with negative partial charge (strictly less than 0). |
| ASA_H | Water accessible surface area of all hydrophobic (|qi|<0.2) atoms. |
| ASA_P | Water accessible surface area of all polar (|qi|>=0.2) atoms. |
| DASA | Absolute value of the difference between ASA+ and ASA-. |
| CASA+ | Positive charge weighted surface area, ASA+ times max { qi > 0 } [Stanton 1990]. |
| CASA- | Negative charge weighted surface area, ASA- times max { qi < 0 } [Stanton 1990]. |
| DCASA | Absolute value of the difference between CASA+ and CASA- [Stanton 1990]. |
| dipole | Dipole moment calculated from the partial charges of the molecule. |
| dipoleX | The x component of the dipole moment (external coordinates). |
| dipoleY | The y component of the dipole moment (external coordinates). |
| dipoleZ | The z component of the dipole moment (external coordinates). |
| FASA+ | Fractional ASA+ calculated as ASA+ / ASA. |
| FASA- | Fractional ASA- calculated as ASA- / ASA. |
| FCASA+ | Fractional CASA+ calculated as CASA+ / ASA. |
| FCASA- | Fractional CASA- calculated as CASA- / ASA. |
| FASA_H | Fractional ASA_H calculated as ASA_H / ASA. |
| FASA_P | Fractional ASA_P calculated as ASA_P / ASA. |
Descriptor calculation is handled by a module that searches the MOE system for SVL functions satisfying a specific naming convention. Each such function is responsible for calculating a descriptor or family of related descriptors. Typically, such functions are located in their own SVL source code file which must be loaded in the system prior to running the QuaSAR applications. Adding a descriptor involves writing a file containing SVL functions for registering and calculating the descriptor value, and then loading that file into the system:
Here is an example of a descriptor file (explanations follow):
//
// mydesc.svl sample new descriptors
//
#set title 'My Descriptors' // title of module
#set class 'QuaSAR' // module class of descriptors
function QuaSAR_list_MyDescriptors [] = tr [
[ 'Caro', 'Number of aromatic C', '2D', [] ],
[ 'C=O', 'Number of carbonyl C', '2D', [] ]
];
function QuaSAR_calc_MyDescriptors [db_mol, codes, parms]
local desc = zero codes;
// load the database molecule into MOE as objects
local [chains, molecule_name] = db_CreateMolecule db_mol;
local atoms = cat cAtoms chains;
// calculate the individual descriptors and assign
// them to the corresponding positions in the return vector
(desc | codes == 'C=O' ) = add sm_Match ['C=O', atoms];
(desc | codes == 'Caro') = add sm_Match ['c', atoms];
oDestroy chains; // destroy created objects
return desc;
endfunction
The header of the module is typical of SVL program files: a comment header followed by SVL compiler directives:
//
// mydesc.svl sample new descriptors
//
#set title 'My Descriptors' // title of module
#set class 'QuaSAR' // module class of descriptors
The #set title directive assigns a title to the SVL module which will appear in the Modules and Task window and give some indication as to the contents of the source code file. The #set class directive assigns a class (group of related SVL files) to the module. Descriptor modules are usually put in the QuaSAR class. This ensures that all descriptor modules are listed together in the Modules and Tasks window.
The descriptor file must contain two global functions that, together, a) define the descriptor to the rest of the system; and b) calculate the descriptor when given a molecule. A naming convention is used to identify these functions (the SVL file can define other functions if needed):
The suffix of the list and calculate functions must be the same. Any set of characters can be used, but the two functions must be unique with respect to all other global symbols (i.e., choose descriptive names). In the example file (mydesc.svl above), the list function is QuaSAR_list_MyDescriptors:
function QuaSAR_list_MyDescriptors [] = tr [
[ 'Caro', 'Number of aromatic C', '2D', [] ],
[ 'C=O', 'Number of carbonyl C', '2D', [] ]
];
List functions must return a table of data detailing which descriptors the calculate function can calculate. This table is a vector of lists of the form:
[code, description, class, parm]
Each of the elements of this vector must have the same length. The elements are interpreted as follows:
In the example, the list function is written so that each descriptor calculated by the calculation function is described on one line of the form:
[ 'Caro', 'Number of aromatic C', '2D', [] ],
and the tr operator is used to convert this transposed representation into the correct form for the QuaSAR system.
The calculation function associated with the list function is QuaSAR_list_MyDescriptors. The association is created because of the common suffix MyDescriptors. A calculation function must be declared as:
function QuaSAR_calc_name [db_mol, codes, parms]
// .... body of function ....
endfunction
where
The calculation function must return a vector desc equal in length to codes such that desc(i) is the value of the descriptor specified by code(i). The calculation function must be designed to accept more than one code at a time. In the example, the calculation function handles two descriptors. To handle multiple occurrences of descriptor codes, the following logic is typically used:
function QuaSAR_calc_MyDescriptors [db_mol, codes, parms]
local desc = zero codes;
// .... create molecule ....
(desc | codes == 'C=O' ) = add sm_Match ['C=O', atoms];
(desc | codes == 'Caro') = add sm_Match ['c', atoms];
// .... destroy molecule ....
return desc;
endfunction
The initialization of desc creates a zero vector of equal length to codes. Once the descriptors have been calculated, they are assigned to the correct locations with code of the form:
(desc | codes == 'mydesc') = value_of_mydesc;
The remainder of the calculation function handles the creation and destruction of the molecular objects in MOE. However, if the descriptor can be calculated solely from the db_mol argument, then there is no need to create molecular objects. The a_heavy descriptor (number of heavy atoms) is a good example of a descriptor that can be calculated directly from the db_mol parameter.
| [Balaban 1979] | Balaban, A.T.; Five New Topological Indices for the Branching of Tree-Like Graphs; Theoretica Chimica Acta. 53, 355-375, (1979) |
| [Balaban 1982] | Balaban, A.T.; Highly Discriminating Distance-Based Topological Index; Chemical Physics Letters. Vol.89, No.5, 399-404, (1982). |
| [CRC 1994] | CRC Handbook of Chemistry and Physics. CRC Press (1994). |
| [Crippen 1999] | Wildman,S.A., Crippen,G.M.; Prediction of Physiochemical Parameters by Atomic Contributions. J. Chem. Inf. Comput. Sci. 39(5), 868-873, (1999). |
| [Gasteiger 1980] | Gasteiger,J., Marsali,M.; Iterative Partial Equalization of Orbital Electronegativity - A Rapid Access to Atomic Charges. Tetrahedron. 36, 3219, (1980). |
| [Hall 1991] | Hall, L.H., Kier, L.B.; The Molecular Connectivity Chi Indices and Kappa Shape Indices in Structure-Property Modeling. Reviews of Computational Chemistry. Vol 2, (1991). |
| [Hall 1997] | Hall, L.H., Kier, L.B.; The Nature of Structure-Activity Relationships and Their Relation to Molecular Connectivity. Eur. J. Med. Chem. - Chimica Therapeutica. 4, 307-312, (1997). |
| [LOGP 1998] | Labute, P.; MOE LogP(Octanol/Water) Model. unpublished. Source code in $MOE/lib/svl/quasar.svl/q_logp.svl, (1998). |
| [MREF 1998] | Labute, P.; MOE Molar Refractivity Model. unpublished. Source code in $MOE/lib/svl/quasar.svl/q_mref.svl, (1998) |
| [Petijean 1992] | Petitjean, M.; Applications of the Radius-Diameter Diagram to the Classification of Topological and Geometrical Shapes of Chemical Compounds. J. Chem. Inf. Comput. Sci. 32, 331-337, (1992). |
| [Stanton 1990] | Stanton D., Jurs P.; Anal. Chem. Vol.62, 2323 (1990). |
| [Wiener 1947] | Wiener, H. Structural Determination of Paraffin Boiling Points. Journal of the American Chemical Society. Vol. 69, 17-20, (1947). |