SVL: The Scientific Vector Language


M. Santavy, P. Labute
Chemical Computing Group Inc.


Contents

Introduction | The Data Parallel Trend | Existing Languages
The Flavor of SVL | Integration in MOE
An Example SVL Program


1. Introduction

SVL is a new high-performance data-parallel programming language built into the MOE Molecular Operating Environment. SVL is an embedded language; that is, its compiler and run-time environment are an integral part of MOE. SVL serves as the command, macro, scripting, and high-performance computing language of MOE.

The tight integration of SVL and the molecular computing services of MOE result in a unique, high-performance and flexible molecular operating environment that minimizes the time taken to convert ideas into results.

SVL was designed

SVL was designed to be the vehicle for the exploitation of both sequential and parallel computing platforms. The future of industrial scientific computing lies in the efficient use of the entire spectrum of computer platforms from low end Pentium Pro platforms through shared memory multiprocessors to high-end vector pipeline and massively parallel architectures.

In this document, we will outline the ideas behind SVL and data-parallelism as a programming paradigm. We present some of the reasoning behind the decision to design and implement a new programming language. Finally, we attempt to give the flavor of SVL (although a thorough description is beyond the scope of this document).


2. The Data-Parallel Trend

A language is called collection-oriented if aggregate data structures (e.g., sets, sequences, arrays, vectors and lists) as well operations for manipulating them "as a whole" (summing, sorting, permuting and applying a function to each member) are primitives in the language.

Collection-oriented languages discourage explicit inner loops. For this reason they are, in most cases, ideally suited for implementation on massively parallel machines. The parallelism inherent in the primitive operations removes the need for sophisticated compiler analysis to uncover available parallelism.

Indeed, the designers of high-level languages for massively parallel machines have turned to collection-oriented languages because of the difficulty in automatically exploiting parallelism. Languages such as C*, *LISP, CM-LISP (Connection Machine), AL and Apply (Warp), Parallel Pascal (MPP) and the array extensions to Fortran and High Performance Fortran (Cray) all exploit collection-oriented primitives and constructs.

SVL is a collection oriented language; more precisely, it is a data-parallel language. A data-parallel language has the following important characteristics:

The trend towards data-parallelism is due to a number of key observations and wide-spread experiences including

The result has been a wealth of research (e.g., at CMU and MIT) into data parallel languages. Many language groups have shown that the paradigm is extremely successful in that it allows for efficient implementations on a wide variety of computer architectures, even sequential processors.

Data parallel languages in general, and SVL in particular, have five important strengths:


3. Existing Languages

The decision to design a new programming language was not taken lightly. There are, perhaps, a dozen credible programming languages that, on the surface, appear to satisfy the requirements. However, after careful study, the conclusion was reached that a new language was necessary. In this section, some of the reasoning leading to this conclusion will be described.

Existing candidate languages for scientific computing can be divided into two broad classes. (Please note, however, that this classification is not strict in the sense that most languages contain features from many classes.)

Imperative languages are, loosely speaking, the traditional languages and the more recent object oriented languages. Examples of languages in this class are C, Fortran, Pascal, Modula, C++, Ada, SML, and APL.

Imperative languages like C and Fortran can deliver high performance and can be used for large scale problems. Unfortunately, they are not interactive, parallelize poorly, and are not concise. The goals of High Performance Fortran are a step in the right direction; however, the result is Fortran with a few vector constructs and not a fundamental change. Object oriented languages like C++ and SML can at times be concise; however, they still suffer from the drawbacks of the other imperative languages.

It was not the imperative nature of these languages that made them poor candidates for embedding into MOE. Rather, it was the fact that they were low level languages; that is, paradigms in which the programmer must be concerned with things like memory allocation and management, as well as all looping and iteration control. Furthermore, this low level focus makes programs written in these languages difficult to parallelize (e.g., the efforts toward automatic vectorization of Fortran programs have led to the conclusion that a new language, High Performance Fortran, was required).

Functional languages are modern languages (in some cases, still experimental) that attempt to provide powerful new features and constructs by sacrificing the imperative notion of state. In functional languages, side effects are banned (e.g. global data structures or COMMON blocks). A program becomes, in effect, one large mathematical expression or function. A consequence of this is that there are no traditional looping constructs and recursion must be used (although compilers can convert some recursive algorithms into loops). In exchange for these sacrifices, functional languages offer powerful notational syntax, elegant and concise programs and a strong mathematical flavor. Examples of languages in this class are Haskell, Lisp, NESL, SETL, and ML.

A recent survey (Zorn 1995) of the use of nonprocedural languages for numerical computing revealed that functional programs can be elegant, concise and very expressive for mathematical algorithms; however, existing functional language implementations have a number of severe problems that make them (currently) unsuitable for industrial application:

The latter two problems are inherent to the functional paradigm; however, the first is not. Indeed, languages such as NESL attempt to solve the first problem; however, they are still in the experimental stage and are therefore more suited to language research than industrial application.


4. The Flavor of SVL

SVL is both a compiled and interactive language. In this way SVL can serve as a command language and as a forum for the development of complex scientific applications. The interactive component of the language consists of the evaluation of expressions that make up the core of applications programs. Flow control constructs and function definitions form the remainder of the language.

The unit of data in SVL is the vector. Vectors are manipulated with functions that operate on a vector and return a result vector. Syntactic shorthands are provided for common functions such as addition and multiplication. A statement of data manipulation is an expression. Expressions and flow constructs can be grouped into functions.

SVL is a very rich language. A full description is beyond the scope of this document; however, the flavor of SVL will be described below.

An SVL vector is a quantity very much like the array that is found in many programming languages. The general ordered vector x has the form

[x(1),...,x(n)]

Each vector x(i) is called an element of x and n is the length of x. Vectors can nest to arbitrary depths since each of the elements of x is, itself, a vector. If n=0 then x is a null vector while if n=1 then then x is called a unit.

All vectors are constructed from the set of atomic unit vectors. This set includes all

Atomic unit vectors have the unique property that they are equal in value to their first element. It is this property that eliminates the need to distinguish between scalars and vectors.

SVL supports the traditional infix syntax for mathematical operators; for example,

are all legal SVL expressions. Operators proceed element-wise, pairing an element from one operand with the corresponding element in the other operand. In the case where a unit vector operates with another vector, the unit vector automatically extends to the length. This principle of unit extension provides scalar-vector multiplication, for example. Unit extension is not limited to atomic units. To form an outer product one uses unit extension as follows:

The vector on the right is a unit; it consists of a single element, a 3-vector. It is extended to each element of the vector on the left. The resulting vector is

SVL provides a rich set of built-in functions such as trigonometric functions, random number generators, sampling, sorting and permutation functions as well as mathematical reduction functions. For example, the add function sums all of the elements in a vector. The dot product of two vectors x and y is computed with add(x*y). while the 2-norm of a vector is computed with

which almost reads as "the square root of the sum of the squares". The power of the unit extension principle and the natural syntax are extremely expressive. For example, if u = [x,y,z] is a vector containing three length n-vectors of coordinates, then the norm of each of the n 3-vectors [x(i),y(i),z(i)] can be computed in parallel with the code fragment sqrt add sqr u. Note the similarity with the code fragment for the 2-norm of a simple n-vector.

The familiar Velocity Verlet integration step (used in molecular dynamics simulations) can be coded with

which operates on all atoms at the same time. Note that this is practically the theoretical description of the algorithm. The above fragment of SVL code, together with a few lines of code for initialization and flow control form a complete dynamics simulation program.


5. Integration in MOE

SVL is an integral part of MOE. It does not consist of a separate compiler and execution model but rather a tightly coupled command and control paradigm.

Specific SVL functions are provided to modify and examine the molecular data structures used for visualization and calculation. SVL can control such things as how molecules are rendered, the geometry and properties of atoms, and configuration of the potential energy model.

SVL is the key to the programmability of the MOE interface in that menu commands trigger the execution of SVL expressions. Any program written in SVL can be run by making an attachment to the MOE menus. Interaction and dialog with users is supported with an automatic graphical interface constructor. Thus any SVL program can have a graphical user interface.

SVL is interactive and, indeed, is the command language of MOE. All functions available in programs are available at the MOE command line. New commands can be tested interactively and, when ready, introduced to the menus. The high level of integration of SVL and the molecular modeling services of MOE result in an extremely flexible and efficient environment for computational chemistry.

With traditional programming languages and script-based tool control, many chemists are discouraged from implementing new techniques. In many cases, a simple idea can only be tested after days or months of programming. With MOE and SVL, new ideas and methods can be tried out in a matter of hours.


6. An Example SVL Program

MOE is built on the embedded language architecture; that is, MOE contains a compiler, linker and run-time execution environment for SVL, the Scientific Vector Language. With SVL, programs can be developed and run from within MOE.

We now present an example of what an SVL program looks like. The context of this example is Conformational Search; more specifically, a modified version of the Random Incremental Pulse Search (RIPS) as described in

Given that a molecule has been loaded into the MOE system, the RIPS methodology proceeds as follows. (Each step will be described below.)

  1. Perturb conformation.
  2. Minimize Energy.
  3. Check for duplicate conformation.
  4. If new conformation then write to database.
  5. If maximum iterations not exceeded go to Step 1; else stop.

The program will be developed in the following sections and a program listing will appear at the bottom of the page. From the outline described above the methodology requires the services of the internal molecular objects, the forcefield, energy minimization and structural comparison and molecular database services. We structure the program as follows.

The #set line is optional and sets the title of the program module. This title will appear the graphical module management window of MOE. The global keyword indicates that the function to be defined will be visible at the MOE command line. (The local keyword indicates that the function to be defined is private to the program module.)

The semantics of the function definition is as follows. RIPS accepts one argument, called options. This argument may take on many values depending on what the caller passes in; however, the intent is that the caller will use a calling sequence of the form

to invoke the search. In this hypothetical calling sequence, the caller wants to perform 20 iterations and store unique conformers to the database file my_database.mdb. Also, we have assumed that the parameter names gtest and file are meaningful to RIPS.


Perturb the Conformation

Each attempt to generate a new conformation consists of a conformation perturbation followed by energy minimization. The described methodology uses a uniform random Cartesian coordinate perturbation.

Notes:


Minimize Energy

Energy minimization is performed with a call to the Molecular Mechanics services of (a piece of SVL code defined in another module). In MOE, these services are accessed with the MOE function (that must be declared as external).

Like the RIPS function that is being written, MM accepts a vector of tuning parameters. In this case, we are specifying an RMS gradient test of 0.001.

We will now fill in some of the RIPS function body (although we will modify it shortly). The first version will perform a specific number of perturbations and energy minimizations.

Notes:


Check for Duplicate Conformation

In order to produce unique conformations two things will have to be done. Firstly, all of the unique conformations generated will have to be kept somewhere. Secondly, as each new conformation is generated it will have to be compared to the known ones.

We modify the RIPS function to keep all of the unique conformers in a single vector as follows.

Notes: The new_conf function can be coded as follows: Notes: As given, the new_conf function looks very traditional: an explicit loop is performed over each element in the list vector. A more advanced version of the same function would look like the following. Notes:


Write Conformation to Database

Saving the conformations is accomplished by creating a MOE molecular database file (which can be visualized with the Database Viewer). We encapsulate the writing of the conformations in the following function.

Notes: We have deviated slightly from the original program outline in that we are writing the conformers to a database just before termination. This was done for clarity; however, it would be a simple matter to write each new conformation out as it is generated.


Program Listing

Putting all of the code fragments together, we end up with the following SVL program (roughly 100 lines).