Compiling for Spatial Computation Architectures

Context

With the rise of the internet of things, much more data needs to be processed in edge devices (e.g., sensors, cameras, routers, etc) to prevent large data transfers to central servers. However, these devices are typically not very powerful computationally, and it is not trivial to replace them with high end devices (e.g. multi-core giga-hertz processors, GPUs, etc) due to power consumption limits.

One type of computation machine are spatial arrays, also known as coarse-grain arrays (CGRAs) or systolic arrays. These machines are matrixes of units that perform operations in parallel. A 4x4 matrix has 16 units, meaning 16 operations in parallel in the best case. In theory, these devices promise higher performance for lower power.

However, many issues related to abstraction, programming, and compilation exist, since there is no established language, compiler, or model (Podobas2020, Lin2019, Zhao2020).

While CPUs have established compilers and languages like C, where a sequence of operations is executed one at a time in the processor, the spatial nature of the arrays introduces several challenges. What language to use? If some code executes on a main processor, and some code on the array, how to interface the devices? Via an explicit API? What compiler to use?

Objectives

This work targets this model: a CPU has a spatial array attached as a peripheral (i.e., like a GPU), and standard C code is written for an application.

This work aims to devise a compilation flow for this model where some functions in the C code are compiled for the array. To do this, source-to-source tools developed in-house (Bispo2020) will be used to automatically replace specific functions with calls to the array.

Novel Aspects

This type of computation machine has been studied at length, but there is no established compilation flow.

This is due to the several types of array layouts and architectures which influence compilation; due to no established interface between arrays and host system, and due to the fact that no existing language has been widely adopted for this type of machine.

Providing a high-level approach by compiling standard C code, and proposing an abstraction layer addresses some of these issues.

Proposed Work Plan

  • Familiarization with existing tools for source-to-source transformations, and techniques for operation mapping/scheduling (During first semester)
  • Definition of the set of pragmas to aid in source conversion
  • Parsing C code into an intermediate representation (using existing tools in the lab), and schedule the function onto a target array architecture
  • Replacing the original C code with calls to the architecture, based on the generated schedule
  • Evaluating the resulting speedups and scheduling time based on the architecture specification (i.e., number of units)
  • Writting the dissertation
  • Writting a scientific publication

Details

  • Status: Open!
  • Student: None (yet).

References

Previous
Next