Design & Reuse

Tasking Lives Up To Its Promises

community.arm.com, Sept. 13, 2019 – 

One of the main issues in High-Performance Computing (HPC) systems is the underutilization of resources. Parallel applications partition and distribute compute and data across processors in the system that work together to solve a given problem. In this operation, processors synchronize and communicate which may lead to some of them spending time idle, waiting for other processors to complete their part. Idle processors mean wasted time and power. This can happen at a number of points in a program, the main ones being:  

  • serial sections, parts of the program where only one processor has work to do 
  • load imbalance, some processors finish their work earlier than others and need to wait 
  • waiting on synchronizations

These issues are common in bulk synchronous parallel applications, especially those that statically assign work to processors. Tasking, on the other hand, promises to improve load balancing, increase system utilization and has the potential for better locality. However, the adoption of tasking to program parallel and distributed systems is slow. The lack of more codes being implemented or ported to a tasking model is, in part, due to a limited number of success stories. Recent collaborative work between Arm Research and Barcelona Supercomputing Center reports on the experience of porting an adaptive mesh refinement code from the US Exascale Computing Project Proxy App Suite. The lessons learned from this porting exercise are reported in the "On the Benefits of Tasking with OpenMP" at the International Workshop on OpenMP 2019, which aim at being a reference of how to taskify applications and encourage tasking adoption to improve application performance at scale. 

click here to read more...