Wavelets
Wavelets were discovered as a mathematical tool in approximation theory for the hierarchical decomposition of functions [3] and first found application in signal processing [4]. The discrete wavelet transform is a fast linear operation, like the fast Fourier transform. Whereas the latter is described in terms of sines and cosines, the basis functions for the former are 'wavelets.' Individual wavelet functions have a characteristic frequency like sines. Unlike sines, individual wavelet functions are localized in space. The discrete wavelet transform operates on a data vector of length equal to an integer power of two, returning a numerically different vector of the same length [5].
B-splines and Wavelets
B-splines are a standard curve-fitting primitive provided by most computer graphics systems. Given coordinates of n+3 'control points' and the default cubic polynomial basis functions, a curve with n smoothly joined segments is drawn. Only the 4 control points bracketing each segment have any influence on its shape. The curve generally does not pass through the control points, except by special construction at the end points.
Following Finkelstein and Salesin [1], our discrete signal is the set of control points Cj (for example, the set of n C-alpha coordinates) defining the B-spline curve. A downsampling filter to create a lower-resolution version Cj-1 with n' coordinates is an n by n' matrix Aj: Cj-1 = Aj Cj.
The coordinate detail lost in the downsampling process above can be captured by another filter, the (n-n') by n matrix Bj: Dj-1 = Bj Cj .
The correctly chosen Aj and Bj matrices are called analysis filters; splitting the input Cj into the lower-resolution Cj-1 and detail Dj-1 is called decomposition. The original input can be recovered by another pair of matrices, the synthesis filters Pj and Qj, through the reconstruction: Cj = Pj Cj-1 + Qj Dj-1. The decomposition applied recursively is known as the filter bank (Figure 1):
Figure 1. The Filter Bank.
The original signal Cj can then be reconstructed from the sequence Co, Do, D1, ..., Dj-1. This sequence is the same size as the original signal and is known as its wavelet transform. The B-spline curves are generated from the control points and scaling functions (or blending functions or basis functions): fj(u) = Fj(u) Cj.
The basis functions, Fj, are cubic polynomials that interpolate the control points to create the parametric curve, fj(u). The relationship between the number of control points, n, and integer resolution level, j, of Cj is required to be: n = 3 + 2j.
The basis functions, Yj, that are orthogonal to every Fj under the chosen inner product are called wavelets. Orthogonality implies no redundant information exists in the coefficients of the wavelet transform. The linear algebra and derivation of the orthogonal wavelets for B-splines have been given and the scaling functions for the B-splines and wavelets have been plotted [1]. The subject is also presented in a tutorial [6,7]. The Pj and Qj matrices given provide sufficient information to implement the method.
Ribbon Models and the Protein Fold
The ribbon diagram is an excellent way to represent a protein fold by approximating the path of the backbone and revealing hydrogen bonding patterns. The protein backbone may be modeled as a B-spline ribbon defined by one point per peptide plane [8]. A 'ribbon space curve' spline is the basis of a variety of styles of ribbon drawings [9]. Two points on each terminii are used to start and stop the curve and N peptide planes implies N+1 residues, thus the number of control points is n = 3 + N residues. This conveniently sets the number of curve segments equal to the number of residues. Complete recipes, including relevant information gleaned from the Evans & Sutherland and Silicon Graphics manuals, may be found in the description of ribbons 2.0 [10].
The 'best' curve to specify a backbone is an open question. At issue are the path to be traced and the underlying order and basis of the defining polynomial. I contend the cubic polynomial B-spline formulation is the minimum needed for computer graphics to capture the feel of the drawings in Richardson's monograph [11]. There are many possible formulations (see references in the Methods of Enzymology pre-print at URL: http://sgce.cbse.uab.edu/ribbons)
Figure 2. Ribbon Model Construction. Peptide plane ribbon (left) vs. Ca
ribbon (right). The former uses one control point per peptide plane, the
latter uses one control point per alpha carbon. For each, two additional
points are required at the terminii to start and end the B-spline curve.
Each four consecutive points define one segment of the curve. A sheet, turn,
helix section of the protein ubiquitin [47] is shown. The top portion shows
the Ca trace as white balls-and-sticks, the B-spline control points as smaller
dark gray balls-and-sticks, and the B-spline curve as a light gray tube.
The lower portion shows the Ca trace along with the ribbon drawing colored
by residue type. Each B-spline segment corresponds to one residue in the
peptide plane based curve. Each segment is split in half, with adjacent
halves flanking the Ca for that style.
MRC Analysis of Protein Backbones
Ribbon construction is shown in Figure 2. The spline curves rarely pass through the C-alpha (Ca) positions in the standard ribbon drawings. Sheets are not pleated and coils are smoothed out as the spline is fit to the peptide planes. The spline can easily be fit through the Ca's, but the results are not as visually appealing. Cr (the center of the curve segment corresponding to a residue) is the same point as Ca for the Ca-based ribbons of Figure 2. For the peptide plane-based ribbons of Figure 2, Cr is systematically displaced, as explained above. The Ca-Cr distance is used as a measure of fit during multiresolution analysis.
The multiresolution curve analysis consists of taking the control points defining the ribbon curve and successively calculating and displaying lower-resolution versions. The relationship between the number of control points, n, and integer resolution level, j, of Cj is n = 3 + 2**j. The number of segments in the curve is 2**j. The number of segments equals the number of residues in the peptide plane-based ribbons, but the number of residues in a protein is generally not an integer power of 2. Experimentation showed that padding of the signal with multiple copies of the chain terminii had no effect on the shape of the ribbon curve. An example: j = 7 for a 128 residue protein, as 2**7 = 128. For a 123 residue protein, 5 redundant points must be added to pad the signal to integer level 7. A maximum fractional level is taken as j = log2(Nres=123) = 6.93.
Coworkers were asked to display their favorite protein as a tube ribbon and to interactively adjust the fractional level. The adjustment was stopped at the lowest level that still looked like the structure with which they were so familiar. The ribbon was displayed with additional structural cues, e.g., coloring by secondary structure or making the width of the tube a function of secondary structure. For a less subjective test, a representative sample of 230 proteins [12] was analyzed numerically. The distances between the Ca and the Cr position on the original ribbon and low-resolution versions were monitored as a function of the wavelet resolution level and the secondary structure type.
Topological Comparison
Utilitarian uses of the protein backbone are found in the comparison, classification, and construction of proteins. Functional binding domains can be described, aligned, and compared [13]. Proteins can be grouped according to the topology of their folds [11]. The optimal superposition of protein backbones is critical in homology modeling. [14]. Complete backbones may be piecewise fit to electron density maps using a database of unrelated backbone conformations [15].
The wavelet formalism is useful for objective comparisons at the appropriate adjustable scale, allowing one to focus on supersecondary structures, motifs, domains, or complete proteins as need be. Visual comparisons were made. Manipulation involved only rigid-body superposition of structures. The root-mean-square (rms) deviations were calculated after a least-squares superposition of two coordinates sets. Superpositions are based on various sets of Ca and Cr at different resolutions.
Multiresolution Editing
Crystallographers have created software packages that greatly facilitate the interpretation of crystallographic maps (O [16]; XtalView [17]). Commercial packages have integrated the needs of the crystallographer within a more general molecular modeling environment (e.g., InsightII, Biosym Technologies; Quanta, Molecular Simulations, Inc.; Sybyl, Tripos Associates). One significant feature missing from current modeling programs is the ability to easily make very large changes to the protein backbone. The Sculpt [18] prototype tries to address this. The decomposition/reconstruction via wavelets makes this possible by allowing interactive selection of the level of approximation.
The matrix equations for reconstructing the maximum resolution version of a curve defined by Cm after editing the control points of a lower-resolution version Cj is derived [1] as: Cm,new = Cm,old + Pm Pm-1 ...Pj+1 delta Cj.
Editing at fractional levels and making changes through direct manipulation is possible by the user tugging the curve directly instead of the control point. The mapping from the protein's atomic structure to the ribbon curve is well-defined. An inverse mapping is needed to go from an edited curve, as described above, back to the atomic structure. Each residue is treated as a rigid body. The tangent, normal, binormal (TNB) coordinate frame of the original ribbon curve is saved at each Cr. The new frame, TNB', centered at the new Cr, is used to determine the rotation/translation required to transform the coordinates. No other constraints are currently applied.
Molecular Surfaces
The accessible surface of a macromolecule is a significant determinant of its action. The definition [19] and computer implementation [20] of such surfaces have had a profound effect on subsequent calculations and visualizations of molecular form and function. Usually such surfaces are displayed as dots or triangles and may be color-coded to highlight chemical properties [21]. Various types of splines [22,23,24] and spherical harmonics [25] have also been used to model molecular surfaces.
Wavelet analysis of B-splines can be extended to tensor product surfaces [1]. NURBS (Non-Uniform Rational B-spline Surfaces) are now common in graphics libraries. A parametric surface, S(u,v), is defined by B-spline functions in the u- and v- directions. Modeling of the surface of DNA with textured NURBS and the implementation of the program DNurbs have been described [24]. We also discuss how a 'globe' can be collapsed onto a globular molecule to create a set of NURBS (topologically a sphere, with singularities at the poles) to crudely approximate the molecular surface. For exact molecular surfaces, Connolly's Molecular Surface Program (MSP) [26] is used to create a triangulation for display with ribbons.
Texture maps for the DNurbs were created by a 64 x 64 sampling of the MSP triangular surface, colored by either electrostatics or curvature. The texture map for globe surfaces were created by a 64 x 64 sampling of the set of atomic spheres of the protein.
The multiresolution analyses of the spline surfaces above require special surface topologies. A recent paper describes multiresolution analysis of arbitrary meshes [27]. Orthogonal wavelet transforms are proposed [28] for obtaining a unique hierarchical shape description of volumetric data in the field of medical imaging.
Software
All code was developed in C++ under UNIX on an SGI Indy. A set of C++ classes implemented the multiresolution curve (MRC) analysis of B-splines, using the published formulas [1]. These MRC classes were used with the ribbons 3.0 software to create the example images. (More information on the ribbons package can be found through the URL: http://sgce.cbse.uab.edu/ribbons)
The 'ribbon dimension control panel' has a set of Motif widgets for the MRC analysis. The user turns on the analysis to initialize, then uses a slider to adjust the MRC level. A toggle is used to restrict the MRC level to integer values. This all takes place in real time for a small protein on a low-end SGI Indy. The user can also smoothly interpolate to any fractional level in real time.
Several utility and prototype programs were developed with Inventor, SGI's object-oriented 3D graphics toolkit. bs-edit uses the direct-manipulation features of the toolkit to edit the curves/protein structure in a multiresolution analysis. mr-nurbs uses the texture-mapping and NURBS features to create the multiresolution parametric surfaces. Texture maps were created by a 64 x 64 sampling of surfaces using the ray-casting objects in Inventor.