Relational Database Access in MOEScott Burlington
Chemical Computing Group Inc.
Mar 26, 2004
Data acquisition in modern scientific computing is challenged by the large variety of data sources available. Scientific data often resides on magnetic disks and networks in many different file formats and is also commonly stored in computer databases. Different database applications have different languages and interfaces to manipulate their contents. These differences create difficulty for agents that would like to access information across various types of databases.
Chemical Computing Group's Molecular Operating Environment (MOE) addresses the challenge of accessing information in part by supporting a large variety of common file formats and supplying its own internal database application. In addition to this, MOE has the ability to directly access relational databases, greatly increasing its ability to share scientific data.
Directly supporting specific database applications is problematic for several reasons. Firstly, there is the risk that future versions of the database will change or fail to support current behavior, thus creating a maintenance problem. Next, due to differences in the various database applications, creating a consistent interface with a fixed behavior may not be possible; the result would be a cluttered system that would be difficult to learn. Finally, there are simply too many different database applications available; at best a small subset could be supported directly. An ideal solution would provide access to all database servers through a single simple and consistent interface, regardless of the database. The database access mechanism JDBC addresses all of these issues.
The Java Database Connectivity (JDBC) software package provides a framework for accessing and manipulating data stored in a database. It is a standard for programs written in Java that has been through several generations of development and is widely supported by most popular database applications. A major advantage of JDBC is that it provides a stable interface to supported database servers.
JDBC is a Java Application Programming Interface that allows a Java program to access a database server via a JDBC driver. Each supported database server has a JDBC driver which allows the underlying Java program to communicate with that database. The driver is a separate piece of software that must be installed on the system which, in conjunction with a Java Virtual Machine (JVM), and the client Java program, combine to provide access to stored data, regardless of the database in which it is contained.
JDBC has several properties that make it a superior solution for database access. The number of different databases which offer JDBC support make it the most generic mechanism for database support. Furthermore, there are two modes of interacting with the databases once a connection is established. The first mode is an application programming interface, which allows a stable interface for software to reliably interact with the database. The second uses SQL. Full support has been provided in MOE for the Structured Query Language ( SQL), the most prevalent language for users to control and interface with a database. These two modes of interaction provide equivalent capabilities so that both flexibility from a software perspective and familiarity from a user perspective are maintained simultaneously.
The JDBC model offers a practical solution for providing generic relational database access to MOE's built-in programming language SVL (the Scientific Vector Language). With JDBC integrated into the system, it is possible to use SVL calls to access and manipulate data from the most popular database applications available. Moreover, the SVL relational database access functions are designed to provide an almost completely uniform interface to all supported databases.
The JDBC package is used to insulate MOE from the details of interfacing with each different database server. Logically, this is a four-tier software scheme, as shown in the figure below, though functionally the inner two layers are invisible to the user. Provided the necessary software has been installed properly, there is no need to even know that Java or JDBC is actually in use.
For any of the relational database functions to succeed there must, of course, be a database server running and ready to accept connections. There are many relational database applications available, both freely and from vendors, such as DB2 from IBM, the Oracle 9i database from Oracle, Microsoft SQL Server 2000 and MySQL, as well as a host of others that support both SQL and JDBC. The server may or may not be installed on the same machine that is running MOE. Typically, the database server will be installed on another computer and accessed over a network.
Databases organize and store data on computer systems and provide users access to that data. There are many ways to provide this access: from a command line, using a graphical user interface, from a browser or from SVL. Whatever the method, there must be some manner in which to specify and select what data is viewed from the database. One manner of specifying how the database should present data to the user is through the use of SQL.
The Structured Query Language (SQL) is the standard language for communicating with databases. There are other database interface languages, but SQL is the most universally accepted and supported language for accessing and manipulating database data. SQL queries are executed on the database and the results of the query are sent back to the client.
The Structured Query Language can specify operations such as creating database tables, adding and deleting records or changing the value of fields in existing records. Tasks such as indexing a table, creating views, joining tables or acquiring information about the contents of a database table can all be accomplished using SQL.
Examples of SQL query statements for a table called "MolRef" might be:
Dropping a table
A existing table can be removed from a database using the DROP command in SQL.
Creating a table
Tables are newly created in a database by specifying the name of the table along with the names and data types of each of the columns that this table will hold.
Values may be inserted into a database table row by row by specifying the table name and the value of the data to be added.
Data can be selectively read from a table for display. The data that is read from the table can be specified according to certain relations in the values of the data.
The relational database access enhancements to SVL provide a complete set of specialized functions with which to manipulate data on a database. These functions require no knowledge of SQL. A general function exists for executing explicit SQL commands on the database. The actions of the specialized functions can always be duplicated by opening a connection to a database and executing the appropriate SQL commands explicitly. SQL is a powerful language with many features to specify data manipulation in a database, but a basic understanding of even the simple commands shown here is enough to accomplish many tasks related to exchanging data between MOE and a database.
A typical session will involve the following steps:
To illustrate such an exchange in the form of an SVL program, we will consider each step together with some actual SVL code. Initially, a connection to the database server must be established. Then an SQL query is made to the database to create a result set. This result set contains some of the data in the MolRef table which is then displayed as the table is read, row by row.
To help emphasize the new functions, all of which are prefixed with
The various JDBC drivers each have a slightly different set of supported functionalities and behaviors. The documentation available for each driver and experience with the database application will serve to define the complete set of capabilities available to the database functions.
In order to increase MOE's ability to acquire and manipulate data, SVL has been enhanced with functions to directly access a wide variety of relational databases. The Java based JDBC programming interface is used to connect to remote database servers.
To provide standardized access to databases, several layers of software have been employed. A Java Virtual Machine and the appropriate JDBC driver must be installed on the computer in order to enable the new functions. The advantage of this approach is the versatility to connect to a wide range of database servers using the same simple set of functions in SVL. An added benefit of the underlying use of JDBC is its support of SQL. Database users are likely already familiar with SQL as a database interface language so there is a very natural extension of functionality.
With these capabilities, data can be transferred directly from SVL to an external database. Data may be read from a remote database and stored locally in SVL for processing with a few simple commands. After processing, the results could then be committed directly to a new table in the remote database for storage.
The relational database access system provides a powerful mechanism to acquire data in MOE. Accessing data from common database servers provides better flexibility to MOE users. A small set of useful functions have been introduced to provide this functionality while keeping the SVL user interface simple and the database access familiar. These functions enhance MOE's capabilities as a powerful scientific computing platform.