The molecular replacement models had reasonable R factors and geometries. However, about 10% of the mainchain was grossly in error and almost all of these errors were located in the active site and substrate binding regions of Factor D. These are precisely the residues that must be known accurately to understand the structure/function relationship in this enzyme. A tool to assess the quality of each residue is needed.
A protocol is presented that locates almost all of these grossly misfit residues, while falsely identifying a minimal number of well-fit residues for our test case. Variants of this protocol have been successfully used in this laboratory for numerous crystal refinements. Here we attempt to formalize the protocol, allowing the crystallographer to assess the confidence in an atomic model on a per residue basis with a single error criterion.
Our previous work on identifying and illustrating features and potential problems in protein structure (Carson and Bugg, 1988) used essentially the same types of criteria as described here. Ribbon drawings were color-coded by residue B-factors, degree of fit to density, or a mapping to the Ramachandran plot. Sidechains not matching the rotamer library and color-coding of the atomic structure by potential energy were shown. These points are discussed further in the description of the graphics program Ribbons 2.0 (Carson, 1992). A new feature is the assignment of a single pseudo-energy value to represent the phi/psi or the chi1/chi2 dihedral angle pair, based on the observed distribution in the PDB.
These methods were recently used in the refinement of the crystal structure of peroxynitrite-modified superoxide dismutase (SOD) by Smith et al (1992). Here the PDB structure ``2SOD'' (Tainer et al, 1982) was used for a trivial molecular replacement solution. The standard SA protocol refined this model to an R-factor of 0.216 to 2.5 A. Our criteria identified 43 potential problems in the 151 residues; 34 residues were manually adjusted to the maps. The subsequent SA step reduced the R-factor to 0.190 and led to improvement in the criteria. The rms atomic shift from the original PDB structure was 1.22 A for all atoms and 0.73 A for C-alpha atoms.
The protocol detects errors in partially refined crystallographic structures. A variety of standard tools and methods developed largely by others in the field are employed. These tools, ie., X-PLOR and FRODO, are accesible to the majority of macromolecular crystallographers. Assorted utilities of the Ribbons package are also required (freely available through ftp).
The protocol requires coordinates of the latest refined structural model (with individual B-factors) in PDB format, as well as the same structure from the previous round of model building or refinement. Suitable electron density maps in the FRODO format must be present (these may be made with X-PLOR). X-PLOR scripts are used to determine average B-factors, average shifts between the two models, and average geometric strain energy for the mainchain and sidechain atoms on a per residue basis. Ribbons utilities compute the real-space fit residual and dihedral angle probabilities for each residue. The five criteria are then converted to standard deviations and averaged, giving a single goodness of fit value for the mainchain and sidechain atoms in each residue.
The recent implementation of the real-space fit as part of the Ribbons package has produced some interesting results. Here, the user is expected to produce both a 2Fo-Fc and an Fc map in units of electrons/A**3. The summation is then performed over all grid points within 2.2 A of any atom of interest. The implementation in O inputs only the first map, and calculates the model map on the fly by adding atomic gaussian densities (with a uniform temperature factor) to the grid.
The correlations of rsr with rms error now approach 0.7 for both mainchain and sidechain with the new implementation, significantly higher than before. However, the correlation of rsr to B-factor has also increased significantly, as might be expected given the inclusion of individual B-factors in the map calculations. This may influence the independence of these two variables in the final error function, but these two are the most correlated with error. The new rsr implementation in effect gives B-factor a higher weight. One could always calculate the Fc map with an overall temperature factor to avoid this bias.
Sevcik et al (1993) have proposed the ``descriminator'' for each atom as its temperature factor divided by its electron density in the final 2Fo-Fc map, or B(A**2)/e(1/A**3). They monitor this function on a per residue basis to assess quality. Their method is consistent with the results presented here. A referee offered the gut opinion that it all comes down to B-factors and difference maps. This is basically true. The rsr described above is the best single criterion, but it is not as powerful in identifying errors as is the full protocol.
We plan to accumulate partially refined models from a variety of crystal structures, analyze them by the methods presented, and monitor the deviations from the final result. This will provide the data for a neural network program to ``learn'' to recognize errors in macromolecular crystal structure.
We have shown here that the incorrect residues can in large part be identified. Figure~9 of the accompanying paper shows the FDIBX molecular replacement model with the FDB crystal structure and the computed OMITMAPS based only on the model and the native diffraction data. It shows weak disconnected density for the current model, and a parallel stretch of similar density several angstroms away where the atoms should be. Knowing that the model is grossly in error at that point would provide the impetus to make major changes in these residues. The unanswered question is whether the correct structure could have been attained without resorting to MIR methods and many rounds of manual refitting on graphics. The Ribbons++ program under development will seek to provide answers.
The Ribbons utilities are freely available via anonymous ftp to xtal.cmc.uab.edu. The code is written in the C language and produces ASCII and PostScript output. UNIX versions for Silicon Graphics and Evans & Sutherland workstations as well as a VAX/VMS version are available. A UNIX version of Bhat's OMITMAP program is also available (see Carson, 1991). The graphics display program Ribbons 2.5 is available at nominal charge to academics. (contact carson@luna.cmc.uab.edu).
We gratefully acknowledge NASA grant NAGW-813 and Public Health Sevice grant AI32949 for support.