NCSI - Parallel/Cluster Workshop - HPC Overview
Last updated: Thursday, 24-Jun-2004 07:44:43 EST
What is High Performance Computing?
- Fastest technology available at any given point in time.
- A (rough) hardware taxonomy:
- SMP and CC-NUMA machines (MIMD)
- Beowulf clusters built with COTS components (MIMD)
- Grid resources (MIMDish)
- Others - vector machines (SIMD), clusters of SMP machines,
massively parallel processors
What's in a Computer?
- CPU
- RAM
- Disk
- Network interface
- Relative performance and quantity of each
- Ratio of memory to CPU is changing, forcing a change in algorithms
What's in a CPU?
- Microcode - hardware is just petrified software
- Fetch, decode, execute
- Floating Point Unit
- Integer Unit
- Memory Management Unit
- CISC and RISC
- Scaler
- Super Scaler
- AltiVec and SSE - small scale SIMD
- Out of order execution - how the compiler and scheduler can
change your results
Memory Hierarchy
- Registers
- Cache - L1, L2, split data and instruction
- RAM
- Disk
- Data locality
- Cache thrashing
Cluster Specific Topics
- Network fabric and communications protocols
- Latency and bandwidth review
- Ethernet, Fast ethernet, Gigabit ethernet, channel bonding
- Switches - un-managed, managed, backplane capacity
- Myrinet and other high performance interconnects
- VIA
- Clusters have a very good memory to CPU ratio, particularly good
for problems that are memory intensive.
- Distributed shared memory, one way to port code to a cluster
environment.
- Reliability of the cluster vs an individual machine.
Benchmarking and Tuning
- There are many benchmarks available, generally the best one to use is
your own application or a benchmarking kernel derived from your own
application.
- Using the 80/20 rule to identify where the heavy lifting is within
your application. From this a benchmarking kernel can be developed.
- Profiling, statement and statistical
- Tuning is best done from the outside in, that is:
- the application
- the compiler
- the operating system
- the network
- When purchasing new hardware it is often possible to obtain loaner
gear from vendors for evaluation. This is the perfect time to have a
benchmarking kernel available since they tend to be much easier to port
and run.
Other
- Monitoring service availability; Nagios, Big Sister
- Measuring and recording resource consumption; Ganglia, Cricket
- Lots of data to manage - PVFS