Sun 14 Jun 2015 10:30 - 11:00 at B112 - Programming in X10

The X10 Global Matrix Library (GML) is designed to simplify the development of scalable linear algebra applications. By hiding the communication and parallelism details, GML programs are written in a sequential style that is easy to use and understand by non expert programmers.

Resilience is becoming a major challenge for HPC applications as the number of components in a typical system continues to increase. To address this challenge, we improved GML’s adaptability to process failure and provided a mechanism for automatic data recovery. As iterative algorithms are commonly used in linear algebra applications, we also created a checkpoint/restore framework for developing resilient iterative applications using GML.

Using three example machine learning applications, we demonstrate that this framework supports resilient application development with minimal additional code compared to a non-resilient implementation. Performance measurements in a typical cluster environment show that the major cost of resilient execution is due to resilient X10 itself, and that the additional cost due to our framework is acceptable.

Sun 14 Jun

Displayed time zone: Tijuana, Baja California change

09:00 - 11:00
Programming in X10X10 at B112
Day opening
Opening and Welcome
Jose Nelson Amaral University of Alberta, Olivier Tardieu IBM Research
Introduction to X10
Olivier Tardieu IBM Research
Link to publication
The X10 Global Matrix Library: A Resilient Framework for Linear Algebra Applications
Sara S. Hamouda Australian National University, Josh Milthorpe IBM Research, Peter Strazdins Australian National University, Vijay Saraswat IBM TJ Watson Research Center
Link to publication