Earlier work succinctly represented protein ribbon models as cubic B-spline curves fit to peptide planes (6), with the peptide bond forming a natural underlying basis. Raster versions and an extension of the method to nucleic acids were later developed (7). Here it was suggested that bicubic surface patches might be appropriate for this modeling. Bicubic ribbons, as well as ``shrink-wrapped'' surfaces based on patches, were developed on and shown with the AT&T Pixel Machine, one of the first devices with such patches provided as built-in primitives. (Carson, unpublished results, SigGraph, 1988).
A key feature of the programs described above was the use of the spline curves and surfaces supported by the hardware (eg, Evans and Sutherland, Silicon Graphics) and software libraries ( eg, PEX, OpenGL) of the current generation of graphics workstations commonly used in molecular modeling. These techniques have become standard in the field of computer graphics, and are described in general texts (8) and specialized monographs (9). A very brief explanation is given below.
A non-uniform rational B-spline (NURB) curve can be formulated in general as follows:
C(t) = ( x(t)/w(t) y(t)/w(t) z(t)/w(t) )
where the polynomial curve, C(t), is given in terms of the ratio of polynomials in a dimensionless parameter, t. These parametric curves usually have 0.0 <= t <= 1.0. These curves are specified in modern graphics libraries by a sequence of ``knot'' values and a set of ``control points'' coordinates (8,9). Simple examples are shown in Figure 1.
A cubic polynomial, ie,
C(t) = a0 + a1 * t^1 + a2 * t^2 + a3 * t^3
is the simplest type of curve that can be joined piecewise in segments insuring continuity in position, tangent, and curvature. These curves are specified by giving 4 Cartesian control point values and a non-decreasing set of 8 knot values which determine exactly how the control points influence the shape of the curve. Figure 1 shows the two types of cubic curves used in this paper: the Bezier and the B-spline forms.
A NURBS surface is defined by two such curves, Ca(u) and Cb(v), which form the basis functions of a surface, S(u,v). This is topologically a square, with a 2-D array of 4x4 or 16 control points required to specify the surface in terms of the parameters u and v. The curves Ca and Cb need not be of the same type. Figure 2 illustrates such a surface constructed from the cubic curves shown in Figure 1. Such a surface is known as a bicubic patch. The user only needs to establish the positions of the control points.
The classical Watson-Crick structure, B-DNA, provides the canonical model. There are four standard base pairs encountered traversing a single strand starting at the phosphate and proceeding in the 5'--3' direction: GC, CG, AT, and TA. The standard orientation used herein (Figure 3) is defined by looking down the pseudo-dyad of the strands perpendicular to a single base pair with the phosphate at 12 o'clock (positive Y). Circling clockwise from this phosphate in the plane around the molecular surface are the sugar ring, the minor groove of the purine/pyrimidine pair, the sugar/phosphate of the second strand, and the major groove, then back to the phosphate of the first strand.
Figure 3 depicts the data structure employed. Each of the four base pairs has an ordered path of about 250 surface points with their surface normal vectors. The closest atom to each surface point is also maintained. The path, B(u), around the base pair is accessed by the parameter u = 0.0-1.00. Each path is subdivided into 4 sections: u = 0.00 and u = 0.50 are set to the center of the surface points arising from the outermost oxygen atoms of the first and second phosphate groups; u = 0.25 and u = 0.75 are the points at which the bases ``touch'' in the minor and major grooves, respectively. The four quarter sections in the parameter u are of roughly equal length (Table 1).
Table 1. Path Lengths around Base Pairs.
The curve B(u), u=0.00-1.00, circles the surface in the plane of the given base pair as shown in Figure 3. The path lengths (A) tracing around the surface is given for the four subdivsions of the path as described in the text.
pair 0.00-0.25 0.25-0.50 0.50-0.75 0.75-1.00 total ---- ----- ----- ----- ----- ----- AT 16.33 14.58 14.35 15.38 60.64 TA 14.61 16.05 15.71 14.28 60.65 GC 16.89 14.04 14.23 15.48 60.65 CG 13.68 17.26 15.37 14.37 60.67
The data encapsulated in Figure 3 were created in an automatic fashion, from a series of programs and filtering scripts. An ideal B-DNA structure with the helix axis aligned along Z served as the starting point. Each representative base pair was extracted, with the molecular surface (10) computed at a high dot density ( d = 50/A^2 ). Only dots with surface normals nearly perpendicular to Z were kept (|n_z| < 0.1). The dots were then culled, keeping only those approximately 0.2 A apart. Finally, the paths were smoothed by replacing dot_i with the average of dot_i-1 and dot_i+1.
Figure 4 gives examples of approximating the base pair surface path, B(u), with a small number (N) of Bezier curves. First, N points are chosen on B(u). Adjacent points provide the first and last (4th) control points for the Bezier curve segment. The second and third control points are taken along the tangent (perpendicular to the normal) at the end points on B(u). This insures the next curve segment will join smoothly with the same tangent (i.e., C-1 continuity.) The scale of the tangential line may be adjusted to control the curvature of the segments.
Figure 5 gives an example of approximating B(u) with N cubic B-spline curve segments. Here a non-uniform scale factor (S_x,S_y,S_z = 1.25,1.5,1) is applied about the center of B(u) to enlarge the loop in the xy plane. (The control points must lie outside the actual base pair surface, as B-spline curves do not in general pass through their control points. A similar hack is used with a 1.5 scale factor for the helices in B-spline protein ribbons (6.) Next, N u-values are chosen. Every 4 consecutive points define a B-spline curve. (If 6 control points, u_0...u_5, are chosen, the six curves are formed from u_0, u_1, u_2, u_3, u_1, u_2, u_3, u_4, u_2, u_3, u_4, u_5, u_3, u_4, u_5, u_0, u_4u_5u_0u_1, and u_5, u_0, u_1, u_2.) Each segment joins smoothly not only in tangent, but also in curvature (C-2 continuity).
The approximated base path spline curves specified by the same N u-values and scale factor for each of the four base pair types are similar. Figure 5 illustrates this for the B-spline case, and the same is true for the Bezier case. Thus only N values and a scale factor will be required to define the DNurbs geometry. The program currently sets a limit of N_max = 20, since twenty atoms at most contribute to the surface.
The desired result is a surface patch roughly centered on the atomic surface for each base pair. Thus control points are required between the planes of the neighbouring stacked base pairs. (Recall that an array of 4x4 control points are needed to specify a bicubic patch.)
For B-spline DNurbs, each of the N corresponding u-values from adjacent base pairs are averaged forming a ``plane'' of values between the base pairs that is roughly perpendicular to the helix axis. Each four successive planes provide the control points in the ``v'' direction running along the length of the helix. For a particular base pair, i, control points are derived from the five adjacent pairs, i+/-2. Therefore, the ends of the molecule require special attention. Two extra copies of the first and last base pair's u-values are added.
The B-spline curves in the v-direction are themselves helices. An additional scale factor to expand the control points is needed (see previous discussion concerning Figure 5.) For the u,v = B-spline,B-spline case, the overall non-uniform x,y scale for the B(u) curves is set to 1.5,1.8. For the Bezier,B-spline case, a scale of 1.25,1.4 is used.
The prototype version of the software was developed with the IRIS Inventor, a 3-D object-oriented graphics toolkit supplied by Silicon Graphics. This C++ programing environment provides a wide variety of geometric primitives and methods for rendering them, including such advanced features as true transparency and texture mapping. Figures 1-5 were created by auxiliary programs using Inventor with its PostScript output.
The coding was done in C++ on a low-end Indigo machine. The style is basically ``C'' with small portions done in true object-oriented fashion ( eg, B(u) is implemented as a class). This is part of the Ribbons++ software, an extention of the Ribbons 2.0 program (11) currently under developement. The prototype was shown in a poster session at the 1993 Molecular Graphics Society annual meeting.
The auxiliary programs are run to determine optimal placement of control points and scale factors. The program reads two PDB files, each containing a complementary strand of DNA. It thens read N, the number of patches per base pair, a key specifying whether Bezier or B-splines are to be used, and a scale factor to control the placement of control points. Next the N u-values which position the control points are read, as well as integer keys to the color and texture to be applied to each patch of the four types of base pairs. The base pairs are processed, with the ideal B-DNA coordinates of the appropriate B(u) fit in a least-squares sense to the actual PDB coordinates. The resultant geometry can be imported/exported between other SGI Inventor and Explorer applications. Work on a more integrated control point, color, and texture editor is in progress.