Abstract. A method is presented for the automatic assignment of higher order bonds to a molecular graph.
The automatic assignment of bond orders to molecular graphs is of fundamental importance when "cleaning up" large databases of molecules. In this article, we present an algorithm for the assignment of bond orders that uses element, ionization and hybridization information as input. The rationale for this choice of input is that it can be inferred easily from many file formats even in the presence of input errors.
Let G=(V,E) be a graph. As a shorthand, we will use kj to denote the edge {k,j}. Also, if A and B are two sets, then we use A\B to denote the symmetric difference (A-B) + (A-B).
A matching is a subset, M, of edges in E such that if e, f are in M implies that e and f are disjoint (i.e., do not share a vertex). A node x that does not belong to an edge in M is exposed with respect to M. A matching M of largest cardinality is a maximum matching. A perfect matching is a matching in which every node of V is not exposed in M. (A graph has a valid Kekule structure if and only if it has a perfect matching.)
An augmenting path with respect to a matching M is a simple path abc...yz of nodes connected by edges in E such that nodes a and z are exposed, and edges bc, de, ... xy are in M and the other edges of the path are not in M.
The significance of alternating edges to maximum matching is due to the facts (C.H.Papadimitriou, K.Steiglitz, Combinatorial Optimization: Algorithms and Complexity, Prentice-Hall, New Jersey, 1982):
These facts lead to the following greedy algorithm for finding a maximum matching:
Let G=(V,E) be a molecular graph. In G, the nodes represent atoms and the edges represent bonds; that is, if k and j are in V then atoms k and j are bonded iff kj is in E.
The objective is to assign, to each edge, a bond order in the set {1,2,3} under the assumption of full valence for each atom. Initially, each bond is given an order of 0 to indicate that no assignment has been made.
Step 1. This phase of the algorithm assigns bond orders to the obvious cases or the bond orders that should be assigned even for an illegal structure (to localize errors). This is the only phase that emphasizes chemical feasibility; that is, the remaining methods are mathematical and chemically unaware.
Assign a bond order of 1 to all the remaining unassigned bonds of sp2 atoms that have been assigned a bond order of 2 in this step. Assign a bond order of 2 to all the remaining unassigned bonds of sp atoms that have been assigned a bond order of 1 in this step.
Remove from the graph all edges that have been assigned bond orders in this step. What remains in the graph are:
Step 2. At this point in the algorithm, what remains are atoms for which no immediate or forced bond assignments can be made. Since we have no further forced information, we use the maximum matching algorithm described above to determine a maximum matching M in G.
Step 3. Since there are no odd-length sp chains, assign any sp-sp matched bonds a bond order of 3 and other matched bonds a bond order of 2. All remaining bonds are assigned a bond order of 1.