Structure Validation Tutorial
5 Pages
English

Structure Validation Tutorial

-

Downloading requires you to have access to the YouScribe library
Learn all about the services we offer

Description

Updated 4/26/06 HLS Structure Validation Tutorial Brian Kelly November 12, 2004 Now that you have solved a structure, how do you know that it is correct (explains your data as well as possible, without bias)? You need to validate your work. This can seem overwhelming at times, but it is absolutely necessary. You don’t need many materials to validate your structure, basically a pdb, your scalepack file (.hkl or .sca), and an mtz. I will use examples from structures solved from different people in the lab to help illustrate how structures are validated. You will be required to run a program called ADIT at the PDB before depositing your data (both pdb and structure factor files). ADIT basically runs the same validation programs as PROCHECK. You can do a trial run of ADIT before actually depositing your structural information. Many tools are available- you can access most of them from our lab links webpage, but here is a list as well: Biotech validation server (EBI) - performs any or all of the quality checks provided by the programs WHAT IF and PROCHECK Structure validation central (JCSG) - runs PROCHECK, SFCHECK, PROVE, ERRAT, WASP, DDQ, WHATCHECK and PSQS ADIT server (RCSB) - runs PROCHECK, NUCheck and (if structure factors in mmCIF format are provided) SFCHECK MolProbity server (various checks; Duke) STAN (STructure ANalysis) server (Uppsala) - produces Ramachandran and "CA-Ramachandran" plots for proteins and Duarte-Pyle plots ...

Subjects

Informations

Published by
Reads 73
Language English
Updated 4/26/06 HLS
Structure Validation Tutorial
Brian Kelly
November 12, 2004
Now that you have solved a structure, how do you know that it is correct (explains your
data as well as possible, without bias)? You need to validate your work. This can seem
overwhelming at times, but it is absolutely necessary. You don’t need many materials to
validate your structure, basically a pdb, your scalepack file (.hkl or .sca), and an mtz. I
will use examples from structures solved from different people in the lab to help illustrate
how structures are validated.
You will be required to run a program called ADIT at the PDB before depositing your
data (both pdb and structure factor files). ADIT basically runs the same validation
programs as PROCHECK. You can do a trial run of ADIT before actually depositing
your structural information.
Many tools are available- you can access most of them from our lab links webpage, but
here is a list as well:
Biotech validation server
(EBI) - performs any or all of the quality checks provided by
the programs WHAT IF and PROCHECK
Structure validation central
(JCSG) - runs PROCHECK, SFCHECK, PROVE, ERRAT,
WASP, DDQ, WHATCHECK and PSQS
ADIT server
(RCSB) - runs PROCHECK, NUCheck and (if structure factors in mmCIF
format are provided) SFCHECK
MolProbity server
(various checks; Duke)
STAN (STructure ANalysis) server
(Uppsala) - produces Ramachandran and "CA-
Ramachandran" plots for proteins and Duarte-Pyle plots for nucleic acids. Moreover, it
runs the program WASP to check if any water molecules could be small cations instead
(Na+, Li+, Mg+2, Ca+2). It also runs the program CISPEP to investigate if any non-
proline peptide bond that has been modelled as
trans
could in reality be
cis
.
Verify3D server
(fold; UCLA)
ERRAT server
(contacts; UCLA)
Ramachandran server
(Bangalore) Slow, maybe not worth it.
RamPage server
(Ramachandran server; Cambridge)
VADAR server
(Alberta, Canada)
PARVATI server
(anisotropic temperature factors; Washington)
ANOLEA server
(Atomic Non-Local Environment Assessment; Namur, Belgium)
TB Consortium Bias Removal Server
(Texas A&M University) - generates maps that are
as little biased as possible and calculates real-space fit values
Electron Density Server
(Uppsala University) A very useful site with data on all pdb
entries. Good place to start with a molecular replacement model.
SAVS
(UCLA) Runs many different programs, including ERRAT and Verify3D. Good
place to check many features of model in one place.
1
Updated 4/26/06 HLS
To go over examples of each of the above programs would take many lengthy pages, but
I will go over the two suites I prefer because they are comprehensive and relatively easy
to use:
1.
Structure Analysis and Verification Server
The first suite is called
Structure Analysis and Verification Server
.
It is run by a
research group at UCLA, and the web address is:
http://shannon.mbi.ucla.edu/DOE/Services/SV/
This website will run programs called ERRAT, Verify-3D, Prove, Procheck,
What_Check (aka Whatif), and SFCheck.
You will need your pdb file and your mtz file.
On the home page, upload your pdb and mtz. (you can use your gun4.pdb and gun4.mtz
from the refinement tutorial)
Click Upload files.
You will have to choose which lines in the mtz correspond to F (amplitudes of the
observed data), SigF (estimated standard deviation on F), and Free (free R flag). The
SAVS will attempt to highlight these for you, depending on the labels it culls from the
mtzdump operation. Select the correct radio buttons and click “run all programs”. It will
take awhile for these programs to run (roughly 4 minutes for a smaller structure, and up
to 30 minutes for a larger structure).
Look at the output files. As you can see, there are different colors, which make
interpreting the results a little easier. The following table describes each section.
SFCHECK (Vaguine, Richelle et al. 1999)
Amino acid positions in the structure get a red flag if
1) shift of backbone is greater than 1.5
2) density correlation of backbone is less than 0.90
3) density index of backbone is less than 0.5
4) B-factor of backbone is greater than 60 Angstroms squared
5) connectivity is less than 65%
For each test, a green rating is given if > 95% of the positions
pass the test.
GOOD:
SATISFACTORY
Yellow rating if 90 - 95% of the positions pass the test.
WARNING:
INSPECTION
SUGGESTED
Red rating if fewer than 90% of the positions pass the test.
ERROR:
INSPECTION
RECOMMENDED
PROCHECK (Laskowski, Moss et al. 1993)
One of the many files produced from running
PROCHECK
is
GOOD:
2
Updated 4/26/06 HLS
the summary file: file.sum. The first column in this file for
each line indicates 1 of 3 things (as described in the file itself):
Nothing or Blank
SATISFACTORY
+
WARNING:
INSPECTION
SUGGESTED
*
ERROR:
INSPECTION
RECOMMENDED
WHAT_CHECK (Hooft, Vriend et al. 1996)
The output file produced from running
WHAT_CHECK
is
divided up into sections. Each section is identified by 1 of 3
words: NOTE
GOOD:
SATISFACTORY
WARNING
WARNING:
INSPECTION
SUGGESTED
ERROR
ERROR:
INSPECTION
RECOMMENDED
ERRAT (Colovos and Yeates 1993)
The data plot produced from running
ERRAT
contains a
percentage of the sequence that is below the 95% limit for each
chain in the input structure. This percentage is reported and no
color evaluation is made by SAVS. A large
ERRAT
score is a
good thing.
VERIFY_3D (Luthy, Bowie et al. 1992)
Using the averaged data points produced for each amino acid
in the sequence, the number of times the value is greater than
0.2 is converted into the percentage of the sequence that has
positions with values > 0.2. If this percentage is < 20%
GOOD:
SATISFACTORY
20 - 45%
WARNING:
INSPECTION
SUGGESTED
>= 45%
ERROR:
INSPECTION
RECOMMENDED
PROVE (Pontius, Richelle et al. 1996)
In the section delimiter identified with "**OUTLIERS", the
percent of the number of buried outliers is given. If this
number is < 1.0%
GOOD:
SATISFACTORY
1-5%
WARNING:
INSPECTION
SUGGESTED
3
Updated 4/26/06 HLS
> 5%
ERROR:
INSPECTION
RECOMMENDED
Each section will teach you something useful. The take-home message would be that if
you see red on your output, click on it, read the error message, and look at the
corresponding area of your structure. You will need to make sure you either have a valid
reason for your model having the features that caused the error message, or fix the region
that needs fixing.
For example, Verify 3D will provide you with a nice graph (3D-1D) profile of your
structure. This program analyzes the compatibility of an atomic model (3D) with its own
amino acid sequence (1D). Each residue is assigned a structural class based on its
location and environment (alpha, beta, loop, polar, apolar etc). Then a database
generated from vetted good structures is used to obtain a score for each of the 20 amino
acids in this structural class. For each residue, the scores of a sliding 21-residue window
(from -10 to +10) are added and plotted. The returned 3D-1D profile should for the most
part stay above 0.2 and never really dip below 0. Verify 3D has been useful in
identifying register errors (shifts) in models.
ERRAT will give you an “overall quality factor” and you should be seeing numbers in
the high 90% range if your structure is good. Below the plot, you will receive a small
text discussion regarding your results and how they compare to other models solved to
similar resolutions. On the contrary, if you receive a high value for error, you should go
look at these residues and make sure nothing is wrong, such as a buried charged residue,
or the like.
2. MolProbity
The next suite is located on the 3D Macromolecule Analysis and Kinemage Home Page.
The specific program you will want to use is called MolProbity. A research group at
Duke University runs it and its current web address is:
http://molprobity.biochem.duke.edu/
This site offers a few cool features including the ability to “flip” Asn, Gln, and His
residues for better hydrogen bonding throughout your structure. Looking at the density
alone, this can be challenging, but the Richardson’s have created an algorithm to look at
neighboring residues as well as solvent and scoring which decides how the greatest
amount of bonding can occur in a structure. Additionally, you can visualize clashes
between atoms after running the tools available on the site.
MolProbity
also offers Ramachandran plots, rotamer analysis and C
β
deviations.
There is a tutorial for MolProbity if you would like to run through a trial before using
your own data (also located in this directory as a PDF (
MolPobity_tutorial.pdf
)
4
Updated 4/26/06 HLS
MolProbity allows you to load electron density maps as well, currently in the “O” format
as well as X-PLOR ASCII and CCP4 (mode 2 only). This is a nice feature so you can
visualize the results and your data to determine if the program has given you a reasonable
error message. The only caveat is that since it is a java program, and runs in your web
browser, it can be a little on the slow side.
Interpreting the results from the all-atom contact test can be a little challenging. I have
taken the following text directly from the paper used to describe MolProbity:
With all hydrogens present, all-atom contacts are then calculated by P
ROBE
, which uses
traditional van der Waals radii for most atoms and 1.0 Å for polar H, in a rolling-probe
algorithm that leaves a dot when the 0.25 Å-radius probe intersects another not-
covalently-bonded atom. The results include a clustered list of disallowed atom pair
overlaps 0.4 Å, an overall clash score (number of bad overlaps per 1000 atoms) and two
kinemage graphics displays of contacts for the whole structure. Van der Waals contacts
are shown as back-to-back patches of green or blue dots on the surfaces of non-covalent
atom pairs within 0.5 Å of touching; hydrogen bonds are shown as lenses of pale green
dots outlining the interpenetrating surfaces of a donor and acceptor. Steric clash overlaps
are emphasized with spikes rather than dots, progressing from yellow to hot pink as the
clash becomes truly serious beyond 0.4 Å overlap. The reference for this paper is (Davis,
Murray et al. 2004).
After running these two suites of programs and investigating an errors or suggestions
with a critical eye, fixing what is necessary, you can rest well knowing that your structure
is generally good.
References
Colovos, C. and T. O. Yeates (1993). "Verification of protein structures: patterns of
nonbonded atomic interactions." Protein Sci
2
(9): 1511-9.
Davis, I. W., L. W. Murray, et al. (2004). "MOLPROBITY: structure validation and all-
atom contact analysis for nucleic acids and their complexes." Nucl. Acids Res.
32
(suppl_2): W615-619.
Hooft, R. W., G. Vriend, et al. (1996). "Errors in protein structures." Nature
381
(6580):
272.
Laskowski, R. A., D. S. Moss, et al. (1993). "Main-chain bond lengths and bond angles in
protein structures." J Mol Biol
231
(4): 1049-67.
Luthy, R., J. U. Bowie, et al. (1992). "Assessment of protein models with three-
dimensional profiles." Nature
356
(6364): 83-5.
Pontius, J., J. Richelle, et al. (1996). "Deviations from standard atomic volumes as a
quality measure for protein crystal structures." J Mol Biol
264
(1): 121-36.
Vaguine, A. A., J. Richelle, et al. (1999). "SFCHECK: a unified set of procedures for
evaluating the quality of macromolecular structure-factor data and their
agreement with the atomic model." Acta Crystallogr D Biol Crystallogr
55 ( Pt
1)
: 191-205.
5