Methods

Homology Modeling

Homology modeling was used to create a full-atom model of Factor D based on previously determined proteins structures. The Brookhaven Protein Databank (PDB) (Bernstein et al, 1977) provided the data.

Ad Hoc Modeling

This methodology was inspired by a talk given by Greer (1988).

Sequence Homology

Sequence homology searches were performed with the program package IDE AS (Integrated Database and Extended Analysis System for Nucleic Acids and Proteins) running on the Alabama Supercomputer Network CR AY X-MP employing the F AST A algorithm (Pearson and Lipman, 1988). A custom database was constructed with all X-ray structures in the January 1990 release of the PDB. The default parameters recommended for alignment were used.

Model Building

Structures were superimposed manually employing the program PSFRODO (Pflugrath et al, 1984). Serine proteases are predominantly beta-sheet structures, with two small alpha-helices and four conserved disulphide bonds. One C_alpha tracing served as the reference; others were visually fit in stereo to best align the highly conserved secondary structural features.

The model was created from a related sequence that was changed, with appropriate insertions and deletions, using options in FRODO (Jones, 1978). Missing atoms were added with the REFINE option (Hermans and McQueen, 1974) and a dictionary modified to use the sidechain rotamer conformation occuring most frequently in highly-refined protein structures (Ponder and Richards, 1987).

This crude model was modified on a Silicon Graphics IRIS workstation using the program Atom. Atom ( Alabama TOM ) is a local variant of TOM (Cambillau and Horjales, 1987), itself a variant of FRODO. A customized user interface with pop-up menus allows selection of most probable sidechain and mainchain conformations. Pointing and clicking on a phi/psi plot invokes refinement to a particular mainchain conformation.

Sidechain conformations were not modified in the case of an exact match with the template structure (about 35% of the residues). The remaining sidechain conformations were selected to best mimic the conformation in one of the overlaid structures. For example, if a Phe of the template became a Leu in Factor D, the Leu rotamer which best overlapped its CD atoms with those in the Phe ring was selected. In ambiguous cases, selections were based on well-known principles of protein structure: form hydrogen bonds if possible, place polar groups outside and hydrophobic groups inside, and best fill space.

Geometric regularization was done with the REFINE options of FRODO. A model is checked by calculating its potential energy with X-PLOR (Brunger et al, 1987).

Automated Modeling

A commercial molecular modeling package became available for trial at a later date. A model of Factor D was created to test this system.

Software Features

The Protein Design module of Quanta (Polygen Molecular Simulations, Inc., 200 Fifth Avenue, Waltham, M A 02254) is designed to create homology models as briefly described below. Secondary structural assignments are made based on the structure of known proteins (Kabsch and Sander, 1983), sequence alignments are made with the F AST A program (Pearson and Lipman, 1988), searches of files in PDB format are allowed, and superposition of coordinates may be done in a simple fashion by combining sequence match and secondary structure match information. The homology model is built by copying conformations and sequences between proteins, while performing insertions, deletions, and mutations of residues as needed. Automatic change of side chain conformations is carried out to remove or minimize bad contacts. Energy minimization with constraints using CH ARMm (Brooks et al, 1983) completes the process.

The Protein Design User's Guide, a tutorial, describes the steps required to model the human renin protein. The renin sequence is known, and has been modeled based on available structures of homologous proteins (Sibanda et al, 1984). This tutorial was followed to create a model of human Factor D in an automated fashion using all the recommended default values and procedures.

Crystallographic Analysis

X-PLOR Version 2.1 was employed for all aspects of the initial crystal structure solution, refinement, and analysis. X-PLOR Version 3.0 was used in the later stages. \subsection{Pairwise Comparsion} Two Factor D models to be compared are first superimposed with X-PLOR by a least-squares fit of all the C-alpha atoms. An X-PLOR script computes root-mean-square (rms) differences in atomic coordinates between two models on a per residue basis, taking into account the symmetry of Asp, Glu, Tyr, and Phe residues. The difference between two structures may also be expressed in terms of dihedral angles. The differences in phi/psi or chi1/chi2 pairs are computed as euclidian distances in radians (ie, for mainchain, sqrt{ (Delta phi)**2 + (Delta psi)**2 }. This is called the dihedral difference. Plots are made on a per residue basis, considering mainchain and sidechain atoms separately. The C-alpha atom is included in both the mainchain and sidechain rms computations. Residue numbering is based on consecutive integers from 1 to 228. (See Figure 1 for comparison with the chymotrypsinogen convention.)

The structure of Sarcoplasmic Calcium-binding Protein (SCP) (Vijay-Kumar and Cook, 1992) was analyzed for comparison. SCP is a helical protein solved in this laboratory by similar methods at the same resolution to a similar R-factor. Two monomers of 174 residues are in the asymmetric unit. The error suggested by Luzzati (1952) plots is 0.23 A. In this case, 40% of the mainchain residues after superposition were within 0.23 A and 82% were within 0.46 A. The values were 20% and 55% for the sidechains. A total of 8 mainchain and 49 sidechain residues differ more than 1.0 A. A standard to monitor the agreement between two structures is required for the analysis that follows. We adopt the criterion that a model agrees if the rms deviation is within twice the error suggested by the Luzzati plots.