D R A F T - An Introduction to Computational Science
Molecular Dynamics Module
Last updated: Friday, 30-Jul-2004 12:58:12 EST
About the Course
This molecular dynamics module is one of 8 or so similar modules from a
semester long lab science course titled "An Introduction to Computational
Science". This particular module is relatively advanced therefore it is
placed fairly late in the semester schedule.
Pre-requisites for the course are one semester of calculus and a
declared major in one of the natural or mathematical sciences. The
course will meet for three 1 hour lectures and one 3 hour lab session per
week.
This module may also be appropriate for an undergraduate/liberal
arts faculty workshop in computational science. This is particularly true
since 1) the bulk of the PhD students produced in the United States come
from liberal arts environments and 2) liberal arts colleges are well
suited to support the interdisciplinary nature of computational science.
The modules which make-up the course cover the following topics:
- Using existing models, modifying existing models, building
new models
- Real world -> physical model -> computational model
- Validating and verifying models
- Numerical methods - error analysis, finite representation,
interpolation
- Estimation
- Analytical solutions, numerical solutions
- Scaling, speed-up, and asymtotics
- Pseudo random numbers, probability, statistics
- Qualitative analysis, quantitative analysis
- Global modeling (top down)
- Agent/local modeling (bottom up)
- Continuous and discrete models
- Static and dynamic models
- Interfacing to the physical world
- Social and ethical issues related to these uses of technology
Most of these are done in the context of some "real world"
application from physics, chemistry, mathematics, biology, or
environmental science. In some cases a topic is covered "in review",
e.g. statistics, where most if not all of the students have seen the
material before.
The focus of this course is not so much on how to use technology
as a better typewriter but rather on how technology can enable scientists
to ask fundamentally different questions than we could in the past. These
include modeling/simulation, data mining, combinations of those two, and
other research methodologies.
One of the underlying themes in this course is that we are trending
away from the current state of specialization in science education and
research towards a more general approach. Most of the interesting
cutting-edge research is being done by interdisciplinary teams whose broad
disciplinary representation allows them to ask very high-level questions.
One way to place computational science in the canon is to consider
the relationship between physics and say civil engineering, that is civil
engineering can be described as applied physics. Computational science
and computer science have the same sort of relationship.
For more information about the course see http://cs.earlham.edu/xxx
Overview of the Molecular Dynamics Module
- This module employs molecular dynamics software and Beowulf
clusters to simulate the self assembly of amino acid necklaces (long chain
molecules) into proteins, a.k.a. protein folding. Proteins are the basis
of how biology gets things done, e.g. enzymes, structural components,
and antibodies.
- Definitions:
- Molecular modeling - the study of molecular structure and function
through model building and computational methods.
- Molecular dynamics - a computational method that solves Newton's
equation of motion for all of the atoms in a molecule. At each point
in time 6 quantities are known for each atom, position (x_p, y_p, z_p)
and force (x_f, y_f, z_f).
- Distributed computing - harnessing computational power from a large
number of (potentially geographically distributed) computer systems.
This technique is a direct outgrowth of parallel/cluster computing.
- Each simulated timestep involves computing the forces on each atom
and integrating them to update their positions. The forces are from bonds
and electrostatic forces between atoms within a cut-off distance. The
desired properties are usually obtained as statistical mechanical averages
of the atom trajectories over many runs. The averages tend to converge
slowly with the length of the simulation or the size of the molecular
system.
- MD simulations are very computationally intensive, until recently
it hasn't been practical to compute more than a couple of nanoseconds for
relatively few atoms. Recent developments in cluster and distributed
computing algorithms now make it possible to (relatively) efficiently
harness thousands of processors as part of a single ensemble simulation.
- Protein folding is a good example of consilience between biology,
chemistry, physics, mathematics, and computer science. Sequencing the
human genome gave us blueprints for all the amino acid necklaces which
in-turn fold into proteins which have function within the body, etc.
Connections to physics and mathematics at multiple levels.
- Why study protein folding? The process is integral to all of
biology yet it remains largely a mystery. Also, when proteins misfold
they can be responsible for diseases such as Alzheimer's, Mad Cow (BSE),
CJD, ALS, and Parkinson's disease.
- Why study protein folding computationally? In-vitro is time
consuming and expensive compared to in-silica. Due to the time scales
involved in-vitro only allows you to see the final protein conformation,
not any of the intermediate conformations.
- What makes this possible? Moore's law, examine history of
molecular dynamics simulations (number of atoms, length of time). Beowulf
clusters, large scale distributed computing.
- Scaling and speed-up. Which of the problems' parameters dominate
the asymtotics? In what ways can we do the parallel decomposition?
- Lab
- Using the PDB
- Preparing the data for a run
- Energy minimization
- Parallel decomposition
- Choice of method for cutoffs
- Choice of method for long-range electrostatics
- Running the simulation once
- Examining the results
- Running an ensemble and coalescing the partial results
- Examining the results, comparison to experimental and simulated
results obtained by others
- Write-up
References
Molecular Modelling, Principles and Applications, 2nd Edition -
Andrew Leach, Prentice Hall, 2001.
Introduction to Computational Chemistry - Frank Jensen, John
Wiley & Sons, 1999.
Molecular Modeling and Simulation, An Interdisciplinary Guide - Tamar
Schlick, Springer, 2002.
Scaling Molecular Dynamics to 3000 Processors with Projections: A
Performance Analysis Case Study. Kale, Kumar, Zheng. 2002.
Load balancing of molecular dynamics simulation with NWChem. Straatsma,
McCammon (IBM). IBM Systems Journal, Volume 40, Number 2, 2001.
Folding@Home FAQ.
Absolute comparison of simulated and experimental protein-folding
dynamics. Snow, Ngyen, Pande. Nature, 2002.