This article focuses on the general usages of molecular databases and how MOE and its graphical Database Viewer provide a quick and easy way to access, manage and display large amounts of data, be it text, numerical or structural. This article looks at how data is managed by MOE before going on to describe the Database Viewer and the types of applications it includes.
By definition, a database is a large collection of interrelated data. In more visual terms, a database is essentially a table where each row, or "entry," contains different kinds of information on one specific item and each column, or "field," contains the same kind of information across the many items in the table. Databases have two primary uses in MOE:
Data sets can be customized using MOE's import, merge or database calculator facilities, and results can be managed with its sort, selection and statistical analysis tools. To ensure consistency, the database services provide a structured and unified file format, referred to as MDB -- Molecular Database File.
MOE includes an array of applications that exploit the full potential of molecular databases:
Conformational Analysis | The MDB file is used to contain the various molecular conformations generated by way of systematic conformation search, RIPS or Hybrid Monte Carlo Trajectory Generation. | |
|---|---|---|
Molecular Dynamics | The MDB file is used to store the sampled trajectory as well as instantaneous thermodynamic measurements. If desired, molecules can subsequently be loaded and displayed in the 3D rendering window using the Database Browser or Molecular Dynamics Animator. | |
Homology Modeling | The MDB file is used to house the loop dictionary, rotamer libraries and output models of homology modeling programs. | |
QuaSAR Suite of Applications | The MDB file is used to store the conformations, activity data and calculated molecular descriptors. |
Examples of MOE high throughput applications using molecular databases:
One of the tools used to analyze statistics is the correlation plot (shown below), which is part of MOE's graphical Database Viewer. Outliers were selected using the mouse.

MOE includes some of the most sought after database features in the industry:
Platform Independence | As MOE is platform independent, its molecular databases are also cross-platform which means that they can be read, saved, copied or edited on any type of machine from Sun, Dec Alpha and SGI to PC. | ||
|---|---|---|---|
Easy File Import | Users can import various database formats such as Delimited ASCII, SD files, RG files and Tripos files, as well as export data from MOE into different formats. | ||
Friendly GUI | A graphical user interface, the "Database Viewer," simplifies data input and extraction and provides visual, i.e., structural, information on molecules. For intensive computations that require a lot of time but need not be visualized, MOE can be run in a batch mode which allows users to launch the database operations from the command line, doing away with the graphical interface. | ||
Large Data Sets | MOE can access, store and manage large quantities of data. Each cell in the database can store up to 2 gigabytes of information. Sophisticated data compression techniques are used to store 3D conformations of molecules. Topological and conformational molecular data is stored using an average of 7 bytes per atom for small molecules and 8 bytes per atom for biopolymers. For example, a database of 65,000 small molecules can be displayed with ease even on a PC. And, finally, there are no preset limits on the number of molecules in a single database. | ||
Customization |
|
The MOE Database Viewer is first and foremost a container for molecular conformations and related data. One of its distinguishing characteristics is that it is a direct window onto the database file on disk, i.e., it continually reflects the actual contents of the disk. For example, the MOE Molecular Dynamics simulation uses a database as its output file. A Database Viewer opened onto that database is automatically updated each time a conformation is written to the database. Sophisticated caching techniques are used to deliver real-time response even though the bulk of database data lies on disk. This enables the display of very large databases without consuming an inordinate amount of memory. For users, this means quick and easy access to data.
The Database Viewer accesses, manages and displays the three most common types of data:
Examples of molecular operations performed in the Database Viewer:
Furthermore, the various conformations of the database molecules can be animated in the 3D rendering window using the MD Animator called from the Database Viewer.
Various operations can be performed on numeric data and character strings in the Database Viewer:
In the following example, the Database Calculator is used to calculate the negative log of all IC50 values in a database as a basis for a QuaSAR model. Using the Calculator is simply a matter of selecting the appropriate operator buttons in the panel and the field or fields to include in the equation. Results are then written to the database. In the present case, the destination field is named -log IC50.


As an example of the Database Viewer and MOE's molecular data
format (.MDB), let us now look at how MOE builds the PDB:
Shown here is a MOE molecular database containing the complete July 1999 edition (Release # 89) of the Protein Data Bank. Although the four CD-ROMS of the PDB contain over 10 000 compressed files requiring approximately 2 000 megabytes, the MOE MDB format requires only 440 megabytes on account of its efficient usage of disk space. As depicted in the snapshot, the database displays the 3D rendering of molecules which can be rotated and zoomed in on. Related data such as the codes and titles chains is also provided. The size of data cells can be adjusted using the mouse.
If so desired, the molecule selected in the Database Viewer (in the picture, phosphotransferase) can be loaded into MOE's 3D rendering window using the Copy to MOE command in the Cell popup menu. Click here to see the 3D rendering of phosphotransferase.
Chemical Computing Group uses the MDB format to build the extensive protein database included with each MOE release. Using SVL, MOE examines each entry for breaks and missing atoms and extracts chain data. (For more information on this topic, please see Exhaustive and Iterative Clustering of the Protein Databank.) Chains are then written into a database and further refined using MOE sort and selection utilities.
The picture below shows chain and sequence data in the Database Viewer and MOE's sort data panel.

Should you need to write an SVL application, it is more than likely that you will be using a database. This has the advantage that the format for methodology output is unified and can be manipulated with a common set of tools. In this case, when writing an SVL program, the first step is to open the database. Suppose, for example, that you want to open a database named confdb.mdb for reading and writing purposes. To do so, you would type the following at the command line:
local mdb_key = db_Open ['confdb.mdb', 'read-write'];
Here, db_Open returns the "key" of the database. A key is a number that serves to identify the database in subsequent database operations. The database key is temporary and destroyed when the database is closed. This key is necessary, for instance, to obtain the list of field names and field types in confdb.mdb as shown:
local ['field_name', 'field_type'] = db_Fields mdb_key;
The following example demonstrates a typical use of a database: this piece of code defines a function which minimizes all small molecules in the field named mol of a given database, based on MMFF94 forcefield parameters. (Note: Colored numbers at the beginning of each line are given for explanatory reasons and are not to be included in the code. Please refer to the text below for explanations.)
(1) function MM;
function Min_MMFF mdb_name
(2) local mdb_key = db_Open [mdb_name, 'read-write'];
(3) local entry_key = 0;
(4) local mol_data;
(5) pot_Load '$MOE/lib/mmff94.ff';
(6) while entry_key = db_NextEntry [mdb_key, entry_key] loop
(7) mol_data = first db_ReadFields [mdb_key, entry_key, 'mol'];
(8) local chains = first db_CreateMolecule mol_data;
(9) MM [ gtest: 0.01 ];
(10) mol_data = db_ExtractMolecule chains;
(11) db_Write [mdb_key, entry_key, ['mol' : mol_data]];
(12) oDestroy chains;
endloop
(13) db_Close mdb_key;
endfunction
Like the database file key returned by db_Open, entry keys are used to reference each of the entries in the database. Think of the entry key as the entry's "social insurance number."
When managing molecular databases, MOE combines the strong features of a database with the functionality of a spreadsheet. It is able to work with very large databases without consuming an inordinate amount of memory due to efficient usage of disk space. This combination makes for quick and easy access to large quantities of data. One can import, store and manage substantial molecular, numeric and character data and perform intensive calculations such as calculating 2D or 3D molecular descriptors. All operations can be performed using the graphical Database Viewer, which contains molecular conformations and related data, or from a command line using MOE in batch mode.
For more information on MOE's molecular database format and the Database Viewer, please contact .