- •IBM Research
- •IBM Research
- •IBM Research
- •Blue Gene/L
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research Simulation
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
- •IBM Research
IBM Research
Effect of Network Progress
(Projections timeline of a 1024-node run without aggressive network progress)
Network progress not aggressive enough: communication gaps eat up utilization
41 |
© 2005 IBM Corporation |
|
IBM Research
Effect of Network Progress (2)
(Projections timeline of a 1024-node run with aggressive network progress)
More frequent advance closes gaps
42 |
© 2005 IBM Corporation |
|
IBM Research
Virtual Node Mode
Step Time (ms)
Processors
APoA1 step time with PME
43 |
© 2005 IBM Corporation |
|
IBM Research
Spring vs Now
Step Time (ms)
Processors
APoA1 step time with PME
44 |
© 2005 IBM Corporation |
|
IBM Research
Summary
© 2005 IBM Corporation
IBM Research
Summary
Demonstrated good scaling to 4k processors for the APoA1 with a speedup of 2100
– Still working on 8k results
ATPase scales well to 8k processors with a speedup of 4000+
46 |
© 2005 IBM Corporation |
|
IBM Research
Lessons Learnt
Eager messages lead to contention
Rendezvous messages don’t perform well with mid size messages
Topology optimizations are a big winner
Overlap of computation and communication is possible
– Overlap however makes compute load less predictable
Lack of operating system daemons leads to massive scaling
47 |
© 2005 IBM Corporation |
|
IBM Research
Future Plans
Experiment with new communication protocols
–Remote memory access
–Adaptive eager
–Fast asynchronous collectives
Improve load-balancing
–Newer distributed strategies
–Heavy processors dynamically unload to neighbors
Pencil decomposition for PME
Using the double hummer
48 |
© 2005 IBM Corporation |
|