Molecular Databases and MOEby J. Demers and S. Murray
This article focuses on the general usages of molecular databases and how MOE and its graphical Database Viewer provide a quick and easy way to access, manage and display large amounts of data, be it text, numerical or structural. This article looks at how data is managed by MOE before going on to describe the Database Viewer and the types of applications it includes.
A Few Words on Databases
By definition, a database is a large collection of interrelated data. In more visual terms, a database is essentially a table where each row, or "entry," contains different kinds of information on one specific item and each column, or "field," contains the same kind of information across the many items in the table. Databases have two primary uses in MOE:
Data sets can be customized using MOE's import, merge or database calculator facilities, and results can be managed with its sort, selection and statistical analysis tools. To ensure consistency, the database services provide a structured and unified file format, referred to as MDB -- Molecular Database File.
MOE includes an array of applications that exploit the full potential of molecular databases:
Enter the MOE Molecular DatabaseIn a sentence, a MOE molecular database is a binary file that stores a list of data records. What distinguishes it from other products on the market is that it combines the strength of a database with the operative work environment of a spreadsheet. Scientists can use MOE not only to import, store and manage large quantities of data but also to perform calculations on data like, for instance, calculating 2D or 3D molecular descriptors using QuaSAR.
Examples of MOE high throughput applications using molecular databases:
One of the tools used to analyze statistics is the correlation plot (shown below), which is part of MOE's graphical Database Viewer. Outliers were selected using the mouse.
More MOE Database Features
MOE includes some of the most sought after database features in the industry:
Graphical Database Viewer
The MOE Database Viewer is first and foremost a container for molecular conformations and related data. One of its distinguishing characteristics is that it is a direct window onto the database file on disk, i.e., it continually reflects the actual contents of the disk. For example, the MOE Molecular Dynamics simulation uses a database as its output file. A Database Viewer opened onto that database is automatically updated each time a conformation is written to the database. Sophisticated caching techniques are used to deliver real-time response even though the bulk of database data lies on disk. This enables the display of very large databases without consuming an inordinate amount of memory. For users, this means quick and easy access to data.
The Database Viewer accesses, manages and displays the three most common types of data:
Handling 3D MoleculesThe Database Viewer renders molecules as 3D structures which can be rotated and zoomed in on using the mouse. Display options such as showing hydrogens, element symbols, bond orders, etc., are based on user preferences. To take a closer look at the 3D structure of a molecule or examine a protein's sequence of residues, one can copy database molecules to and from the 3D rendering window.
Examples of molecular operations performed in the Database Viewer:
Organizing and Analyzing Data
Various operations can be performed on numeric data and character strings in the Database Viewer:
As an example of the Database Viewer and MOE's molecular data format (.MDB), let us now look at how MOE builds the PDB:
Shown here is a MOE molecular database containing the complete July 1999 edition (Release # 89) of the Protein Data Bank. Although the four CD-ROMS of the PDB contain over 10 000 compressed files requiring approximately 2 000 megabytes, the MOE MDB format requires only 440 megabytes on account of its efficient usage of disk space. As depicted in the snapshot, the database displays the 3D rendering of molecules which can be rotated and zoomed in on. Related data such as the codes and titles chains is also provided. The size of data cells can be adjusted using the mouse.
If so desired, the molecule selected in the Database Viewer (in the picture, phosphotransferase) can be loaded into MOE's 3D rendering window using the Copy to MOE command in the Cell popup menu. Click here to see the 3D rendering of phosphotransferase.
Chemical Computing Group uses the MDB format to build the extensive protein database included with each MOE release. Using SVL, MOE examines each entry for breaks and missing atoms and extracts chain data. (For more information on this topic, please see Exhaustive and Iterative Clustering of the Protein Databank.) Chains are then written into a database and further refined using MOE sort and selection utilities.
The picture below shows chain and sequence data in the Database Viewer and MOE's sort data panel.
Should you need to write an SVL application, it is more than likely that you will be using a database. This has the advantage that the format for methodology output is unified and can be manipulated with a common set of tools. In this case, when writing an SVL program, the first step is to open the database. Suppose, for example, that you want to open a database named confdb.mdb for reading and writing purposes. To do so, you would type the following at the command line:
local mdb_key = db_Open ['confdb.mdb', 'read-write'];
Here, db_Open returns the "key" of the database. A key is a number that serves to identify the database in subsequent database operations. The database key is temporary and destroyed when the database is closed. This key is necessary, for instance, to obtain the list of field names and field types in confdb.mdb as shown:
local ['field_name', 'field_type'] = db_Fields mdb_key;
The following example demonstrates a typical use of a database: this piece of code defines a function which minimizes all small molecules in the field named mol of a given database, based on MMFF94 forcefield parameters. (Note: Colored numbers at the beginning of each line are given for explanatory reasons and are not to be included in the code. Please refer to the text below for explanations.)
(1) function MM; function Min_MMFF mdb_name (2) local mdb_key = db_Open [mdb_name, 'read-write']; (3) local entry_key = 0; (4) local mol_data; (5) pot_Load '$MOE/lib/mmff94.ff'; (6) while entry_key = db_NextEntry [mdb_key, entry_key] loop (7) mol_data = first db_ReadFields [mdb_key, entry_key, 'mol']; (8) local chains = first db_CreateMolecule mol_data; (9) MM [ gtest: 0.01 ]; (10) mol_data = db_ExtractMolecule chains; (11) db_Write [mdb_key, entry_key, ['mol' : mol_data]]; (12) oDestroy chains; endloop (13) db_Close mdb_key; endfunction
When managing molecular databases, MOE combines the strong features of a database with the functionality of a spreadsheet. It is able to work with very large databases without consuming an inordinate amount of memory due to efficient usage of disk space. This combination makes for quick and easy access to large quantities of data. One can import, store and manage substantial molecular, numeric and character data and perform intensive calculations such as calculating 2D or 3D molecular descriptors. All operations can be performed using the graphical Database Viewer, which contains molecular conformations and related data, or from a command line using MOE in batch mode.
For more information on MOE's molecular database format and the Database Viewer, please contact .