FPGA 101 for Software Engineers

Nuno Paulino INESC TEC nuno.m.paulino@inesctec.pt



## 01

Introduction What are FPGAs Early FPGAs FPGA Architecture FPGA Growth

## 03 FPGA Spaces

Embedded Hobby Edge & Al (?)

#### Learning Curve

02

Where to Learn? Hardware "Programming"Languages Compilation

#### High-Level Synthesis

Re-Targeting Old Languages A Device to Rival GPUs?

## **05** Witness Testemonies

Pedro Silva Tiago Santos

04





## 01. Introduction







## What are they?

- A type of integrated circuit (IC)
  - O Reconfigurable functionality by changing connections between logic blocks
  - O Like a microscopic breadboard capable of firmware updates
  - You can build anything!
- CPUs/GPUs are programmable too!
  - $\bigcirc$  Yes, but you're stuck with their respective **models** (e.g., von Neumann) -> They are **ASICs**
  - O They're also expensive to make (tradeoff at high volume), and "impossible" to bugfix (re-spin)



## FPGA vs. ASIC

- Non-Recurring Engineering
  - Initial masking and fabrication cost of ASICs (high)
- They made/make sense versus ASICs depending on volume, and on NRE
  - Despite long "compile" times, they're still orders of magnitude ahead of ASICs on "bug fixes"



Steve Trimberger, "Three Ages of FPGAs: A Retrospective on the First Thirty Years of FPGA Technology", Proceedings of the IEEE, 2015



## The first FPGA and its Father (circa 1984)

- Ross Freeman (1948-1989)
  - Peace Corps Volunteer
  - Inventor of the "FPGA"
  - $\bigcirc$  Founder of Xilinx Inc.
- XC2000 Family
  - Up to 100 4-Input LUTs!
  - Up to 100Mhz! @ 1µm





Xilinx XC2000 First family of SRAM reconfigurable devices Ross Freeman Founder of Xilinx Inc. (colorized)

More on Ross Freeman: https://www.autodesk.com/products/eagle/blog/ross-freeman/



## Looking inside...

#### • Configurable Blocks

Re-programmable with arbitrary logic
 functions, + data storage (registers)

#### Interconnections

 Short and long connections between blocks, + connections to the outside

#### Programmed with XACT

- O MS-DOS "GUI"
- For only **\$12.000 in 1984!**



From the XC2000 Patent Simplified 3x3 diagram (US4870302A)



## XC2000 Under the microscope

- Tiles in 8x8 arrangement
  - O Includes the CLBs and the interconnets
- By todays standards, this is:
  - Small in resourcesHuge in required size
- Where are we now?



More on the XC2000: http://www.righto.com/2020/09/reverse-engineering-first-fpga-chip.html



#### 50 Years of CPU Evolution

- Average for top 30 devices per year
- Stagnation >2005
  - Start of multi-core era
- Breakdown of Dennard Scaling and Moores Law



N. Paulino, J. Bispo, J. C. Ferreira and J. Cardoso, "A Binary Translation Framework for Automated Hardware Generation," in IEEE Micro



### 30 Years of FPGA Evolution

Since ~1990

Capacity x10000

- O Performance x100
- From 1µm to 14nm
- Many dedicated components (e.g., DSPs)

After 2012

○ The SoC FPGA Era



N. Paulino, J. Bispo, J. C. Ferreira and J. Cardoso, "A Binary Translation Framework for Automated Hardware Generation," in IEEE Micro



## Are they really that relevant? Who's involved?

#### Intel Corporation

Purchases Altera for \$16.6 billion in 2015

#### Advance Micro Devices (AMD)

Purchases Xilinx Inc. for \$35 billion in 2021 (sale just became final)

#### 🔵 Some users: Amazon, Microsoft, Google, Ali Baba

 You may have heard about FPGAs associated with Machine Learning, Deep Learning, AI, Computer Vision, Data centers, etc





# 02. Learning Curve





## Where to learn?

I already know how to program!

Googling "FPGA code Hello World example" won't get you far...

What might you need to start?

○ What Languages?

- How do I compile?
- What *can* I compile?
- Where do I run my code?



### Where can I learn?

Books?

O Big

- Some expensive
- There are C books too...
  - But honestly, I learned
     from the Internet
  - Ctrl-C, Ctrl-V, compile, modify and try!





## Maybe online?

- ~2.2 million hits for languages like Javascript, Java, etc
- ~1000 hits for FPGAs and related languages...
- There isn't much of a community... yet!

## Questions tagged with [X] in *Stackoverflow* and *Stackexchange* + Reddit community size



■ stackoverflow.com ■ electronics.stackexchange.com ■ reddit.com



## Electric Engineers must learn plenty about FPGAs (?)

- Lets look...
  - MIEEC@FEUP ~90 subjects... 2 or 3 on FPGAs?
  - O MIEEC@Nova
    - ~130 subjects... 1 on HDLs? (not sure)
- How much time to be a good digital circuit design engineer on FPGA?
  - $\bigcirc$  Opinions range from 2 to 5 years, full time.
  - O But let's try...



We learn on this (Xilinx Spartan-3 Development Kit)

Once the LEDs blink, it's a great sucess!



## Hello World?



```
#include <stdio.h>
int main() {
    printf("Hello, World!");
    return 0;
}
```

~\$ gcc hello.c -o hello ~\$ ./hello "Hello World!"

> I can edit compile and run in seconds (!), and debug with printfs (!!!)

#### • On FPGAs, lets Google it...

library ieee; use ieee.std\_logic\_1164.all; use ieee.numeric\_std.all;

```
entity hello_world is
    port(
        clk : in std_logic;
        led : out std_logic);
end hello_world;
```

architecture rtl of hello\_world is

```
constant CFREQ : integer := 2000000;
constant BFREQ : integer := 2000000;
constant CMAX : integer :=
CFREQ/BFREQ/2-1;
```

```
signal cnt : unsigned(24 downto 0)
    := to_unsigned(0, 25);
```

signal blink
 : std\_logic := '0';

```
begin
process(clk)
    begin
    if rising_edge(clk) then
    if cnt = CMAX then
        cnt <= (others => '0');
        blink <= not blink;
    else
        cnt <= cnt + 1;
    end if;
    end if;
end process;
led <= blink;
end rtl;</pre>
```

...where's the output?



## Languages

How to design hardware? Hardware Description Languages

#### **Fundamentally**

- Statements are concurrent
- Scopes express modules (blocks)
- There's no functions, stack, heap, memory, stdio, etc

 HDLs simultaneously express structure (space) and control (time)



### The usual suspects...

- Verilog (since 1984)
  - O Weak typing
  - Less verbose (than VHDL)
- VHDL (since ~1980)
  - Strong typing
  - O More verbose (than anything else)
- Mixed Design (both!)



Steve Golson, Leah Clark, "Language Wars in the 21st Century: Verilog versus VHDL–Revisited", 2016, Synopsys Users Group (SNUG)

"Europe used to be a huge VHDL supporter, but this is a legacy issue now and there is very little new VHDL being written." - Steve Holson and Leah Clark



## Emerging Object Oriented Languages (and IRs...)

- Chisel3 (since 2012) and SpinalHDL (since 2014), others (DFiant, Gemini, ...)
  - O Both based on Scala (i.e., inner DSL)
  - Generate HDL from OO design (inheritance, overloading)
    - A lot of boilerplate is removed (e.g., clock declarations, process blocks, enables, resets)
  - Online Jupyter bootcamps available!





https://github.com/chipsalliance/chisel3 https://fires.im/micro19-slides-pdf/02\_chipyard\_basics.pdf https://github.com/SpinalHDL https://spinalhdl.github.io/SpinalDoc-RTD/



### Emerging Object Oriented Languages (and IRs...)

#### Chisel3

```
class Add extends Module {
  val io = IO(new Bundle {
    val a = Input(UInt(8.W))
    val b = Input(UInt(8.W))
    val y = Output(UInt(8.W))
  })
  io.y := io.a + io.b
}
```

#### SpinalHDL

```
class MyComponent extends Component {
  val io = new Bundle {
    val a = in Bool
    val b = in Bool
    val c = in Bool
    val result = out Bool
  }
  io.result := (io.a & io.b) | (!io.c)
}
```

Both very similar, and allow for functional programming for hardware! e.g., (Chisel3): val delayFilter = Module(new FirFilter(8, Seq(0.U, 1.U))) // functional module decl.



## Emerging Object Oriented Languages (and IRs...)

#### More on Chisel3

- Developed at UC Berkley
- Uses FIRRTL intermediate representation (LLVM of hardware?)
- O Integral part of Berkeley's Chipyard

BOOM (Berkely Out-of-Order Machine), Rocket Chip (In-Order Core), etc

- O Used in **Sifive**!
  - "At SiFive, all RTL development is done in Chisel (...)"
     Krste Asanović, RISC-V Foundation
- Some already teach it (e.g., University of Denmark)





## Tools

#### Compilation Flow and IDEs

Compilation -> Synthesis

Some giants

🔘 Xilinx Vitis / Vivado Suites

○ Intel Quartus

○ Synopsys Design Compiler

 Some free tools/projects exist, like Verilator, SymbiFlow, Yosys, Rapidsmith



## From HDLs to Circuits

- When you compile C code, you generate control for your architecture
- But here, the code is the architecture
- Place & Route is one of the major outstanding issues in FPGA design
  - Design dependant, but can be up to dozens
     of hours







### XACT

#### • Locked away somewhere in FEUP, this software remains...







## Xilinx Vitis + Vivado XILINX VITIS

- Sucessor to many other tools..
  - Xilinx ISE (defunct)
  - Xilinx EDK (defunct)
  - Xilinx SDx (defunct?)
- Vitis
  - Software perspective (host code + HLS)
- Vivado
  - Hardware perspective (HDL, block designs, IP blocks)

| ■ + + E = ×                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                                                                                                                           | 8 X                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                        |          |         | ≣ Del  | ault Lays           | out               | ¥   |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|----------|---------|--------|---------------------|-------------------|-----|
| Flow Navigator 💿 🚊 🤗 📖                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | PROJECT MANAGER proj                                                                                                                                                                                                                                                      | pct_1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |                                        |          |         |        |                     |                   | ? ; |
| V Roject MANAGER      Settings     Add Sources     Language templates     P Poleculog      IP Catalog      IP INTEGRATOR     Create Block Design     Open Block Design     Cenerate Block Design | Sources       Q     Z     +     +       Q     Z     +     +       Q     Design Sources (1)       Q     A hello worldw       >     Simulation Sources       >     Simulation Sources       >     Utility Sources       Belayhold     Belayhold       Phabled     Location: | Project Summary × hells.vhdl × ? D<br>helogropet_liproject_liproget_liproget_liproget_liproject_liproject_liproject_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproget_liproge |                                        |          |         |        | 1 G<br>  X<br> <br> |                   |     |
| <ul> <li>RTL ANALYSIS</li> <li>Open Elaborated Design</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | General Properties                                                                                                                                                                                                                                                        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | 18 signal bl:<br>19<br>20 begin        | unk i i  | std log | pic := | .0,1                | ;                 | ~   |
| SVNTHESIS     Hun Synthesis     Open Synthesized Design     IMPLEMENTATION                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | Tcl Console     Message       Q     ≚     ●     I4     ≪       Name     Constra       ∨     ✓ synth_1     constra       ✓ impl 1     constra                                                                                                                              | Ints Status<br>I synth_design Co                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | WNS mplete                             | ×<br>TNS | WHS NA  | THS    | TPWS                | ? _ D<br>Total Po |     |
| Run Implementation     Open Implemented Design     PROGRam and DERUG                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | an Constanting Constanting Constanting |          |         |        |                     |                   |     |

Our "Hello" in Vivado



## Ok. Back to our Hello World. Where's the output?

#### Elaborated Design



I guess I'll click Run Simulation?

|               |         |      |        |          |        |         | 1,000.000 n |
|---------------|---------|------|--------|----------|--------|---------|-------------|
| Name          | Value   | 0 ns | 200 ns | 400 ns   | 600 ns | 1800 ns | 1,000 ns    |
| 🕌 clk         | U       |      |        |          |        |         |             |
| 🔐 led         | U       |      |        |          |        | •       | <u></u>     |
| > 😻 cnt[24:0] | 000000  |      |        | UUUUUUU  |        |         | 2           |
| 🛿 blink       | U       |      |        |          |        |         |             |
| 🐻 CFREQ       | 2000000 |      |        | 20000000 |        |         | X           |
| 14 BFREQ      | 1       |      |        | 1        |        |         | X           |
| Ъ СМАХ        | 9999999 |      |        | 9999999  |        |         | k           |
|               |         |      |        |          |        |         |             |



## I can't compile and run?... No, you need a testbench!



- Verification is most of the job
  - $\bigcirc$  For very large designs, its not trivial
  - Worse if you have components external to the FPGA (i.e., DDR)
  - O But its 100% deterministic!





### On the chip! (Synthezising for a Xilinx Virtex-7)





#### That's the (ugly) core of it!

#### Do I have to create everything from scratch? No.

- Many common components are embedded into the FPGA
- Lots of pre-made "soft-core" designs (e.g., RISC-V soft-cores for your FPGA)
- $\bigcirc$  Progress has been significant towards higher abstractions (away from HDL)

• Libraries, abstractions, and form-factors place the FPGA in many **spaces** 





# 03. FPGA Spaces







## The FPGA and where you put it

- The FPGA is only the IC
- Where to put CPUs and GPUs
  - CPU --> Motherboard socket
  - GPU --> Motherboard PCI-e Slot
- FPGAs I can place on
  - PCI-e boards -> Server Racks -> Server Space
  - O Custom PCBs
    - Edge/Embedded Space
    - Hobby/Educational Space
  - Development Kits --> May cover all of the above



Stratix 10 SX2800K SoC (High-end, 14nm)

l cost about \$24.800 (!)



Kintex UltraScale XCKU115 SoC (Mid-range, 20nm)

l cost about \$7,800



## Server Space

#### Powering big data workloads

#### Applications

- Cloud Computing (search engines)
- Cutting-edge AI Applications
- SmartNICs
- Accessibility (for education)
- Hardware updates can lead to server performance improvements at low cost





AMD XILINX.

- "30,000 Images/Second"
  - Two AMD EPYC 7551 CPUs
  - Eight Alveo U250 PCI-e Cards
    - ~\$7.500 (each!)
  - Al Inference Record (?)

GoogLeNet CNN

Al Inference Record: https://www.enterpriseai.news/2018/10/03/30000-images-secondxilinx-and-amd-claim-ai-inferencing-record/





- Intel Programmable Acceleration Cards (PAC)
  - ~\$7.000
  - Based on Stratix 10 FPGAs (Altera)
  - O Example:
    - Intel OpenVINO Toolkit
    - ~20x over GPU based solutions

OpenVINO™ Toolkit and FPGAs https://techdecoded.intel.io/resources/openvino-toolkit-and-fpgas/





"NPU Peak performance of the Brainwave DPU across three generations of Intel FPGAs. The use of ms-fp8 narrow precision improves performance by 3.2X-7.8X over a conventional 16-bit fixed point."



 $\bigcirc$  Models used in Bing and Azure

"(...) the world's largest cloud investment in FPGAs"

On a Intel Stratix 10 (280k) FPGA

39 TFLOPS @ 300MHz, 125W

(RTX 2080: 89 TFLOPS, 250W)

Project Brainwave: https://www.microsoft.com/en-us/research/project/project-brainwave/ https://ieeexplore.ieee.org/document/8344479





- Versal AI VC6190 Kit
  - Only **\$12.000**!
  - The first "ACAP" type device
    - Dedicated AI engines + CPU + FPGA
  - "(...) x100 greater compute efficiency over server-grade CPUs (...)"
  - $\bigcirc$  "(...) x20 over other FPGAs (...)"

Versal ACAP White Paper: https://www.xilinx.com/support/documentation/white\_papers/wp505-versal-acap.pdf



# Embedded Space

### Gaining more traction with the FPGA based MPSoC

Sub-Spaces

- Consumer Electronics
- Telecommunications
- Automotive
- O Medical
- Space & Defence







- Hard ARM cores
- Common function cores built-in
- O Fast interface between PS and PL
- Operating Systems (!)
- Single-board projects away from barebones gate level design





- Xilinx Zedboard
  - \$450 (!)
  - Zynq-7000 SoC FPGA
  - $\bigcirc$  Lots of peripheral interfaces
  - Single board computer
  - (Sucessor to the Spartan-3 for EE classes?...)



### Some Products!

- Waymo (Google's self driving car)
- HTC Vive (VR)
- NVIDIA G-Sync
- Smartphones
  - O Good for fixing "bugs" after product launch
- Apple MacBook Pro "Afterburner"
   PCI-e card for video codecs / streaming / editing



Intel News Ø @intelnews

What <u>#Intel</u> parts are in the <u>#Waymo</u> vans? Xeon processors, Arria FPGAs, and Gigabit Ethernet and XMM modems. <u>newsroom.intel.com/editorials/way...</u>

Examples from presentation from VHDLwhiz: "An Introduction to FPGAs & Programmable Logic" https://www.youtube.com/watch?v=lmvdPQQAehQ



## Hobby Space!

# Dude, where's my weekend project?

Wheres the Arduino or Raspberry Pi of FPGAs?

 Used to be difficult due to complexity of tools (>20GBs), licenses required, cost of boards, learning curve, HDLs, etc

> Hard to create a community "backbone"

 $\bigcirc$  Now there are a few alternatives!





#### • Xilinx PYNQ-Z1

- 8 cm x 12 cm
- \$199 (+ accessories...)
- Xilinx Zynq®-7020 SoC
- $\bigcirc$  Python + Zynq = PYNQ
- With Operating System
- https://github.com/Xilinx/PYNQ\_Work shop



## Xilinx PYNQ Abstraction Stack

Run an OS

Use familar languages to integrate sw + hw

Need custom modules?

 Still need to design the hardware





## Example on the PYNQ-Z1 (PYNQ-HelloWorld)

### https://github.com/Xilinx/PYNQ-HelloWorld





### (Some) Other Boards

- Terasic DE0-Nano
  - 20cm x 13cm
  - Altera Cyclone IV
  - \$93



- 15cm x 7cm
- 🔘 Xilinx Zynq-7010
- \$99

### TinyFPGA

- 3.0cm x 1.7cm
- Lattice FPGA
- As low as **\$12**!









### But does it run DOOM?

### • Yes.

- Using a RISC-V main processor
- On a Lattice ICE40 FPGA



https://www.youtube.com/watch?v=3ZBAZ5QoCAk https://hackaday.com/2021/02/07/ice40-runs-doom/

### Twice.

Using entirely custom logic (no insts.)On a Intel Cyclone V FPGA

Sylvain Lefebvre @sylefeb

The DooM-chip! It will run E1M1 till the end of times (or till power runs out, whichever comes first). Algorithm is burned into wires, LUTs and flip-flops on an #FPGA: no CPU, no opcodes, no instruction counter. Running on Altera CycloneV + SDRAM. (1/n)



https://twitter.com/sylefeb/status/1258808333265514497 https://www.engadget.com/doom-chip-fpga-173503758.html



## Edge

# The best device for power efficiency at the edge?

### The domain of excellence?

- O Adaptive
- Low NRE
- Updates Over-the-Air
- Hardware accelerated radio for Internet-of-Things
- Good performance to energy tradeoff (for battery devices)



### Examples of recent platforms for Edge Applications

- FPGA Based
  - O Xilinx Kria Family (and others)
  - $\bigcirc$  ~\$250 for this module



- A challenger appears! GPUs (?)
  - NVIDIA Jetson Family
  - $\bigcirc$  ~\$479 for the standalone module





# 04. High-Level Synthesis





## C/C++ or OpenCL into RTL

### Xilinx Vitis HLS (Vivado HLS)

"Vitis™ HLS is a high-level synthesis tool that allows C, C++, and OpenCL functions to become hard wired onto the device logic fabric and RAM/DSP blocks."

### Intel Quartus HLS

() "(...) is a high-level synthesis tool that takes in untimed C++ as input and generates production-quality register transfer level (RTL) code (...)"

### Example

```
extern "C" {
void krnl_vadd( const unsigned int* in1, const unsigned int*
in2, unsigned int* out_r, int size) {
  unsigned int v1_buffer[BUFFER_SIZE];
    for (int i = 0; i < size; i += BUFFER SIZE) {</pre>
#pragma HLS LOOP TRIPCOUNT min = c len max = c len
        int chunk_size = BUFFER_SIZE;
        if ((i + BUFFER SIZE) > size) chunk size = size - i;
        for (int j = 0; j < \text{chunk size}; j++) {
#pragma HLS LOOP TRIPCOUNT min = c size max = c size
            v1\_buffer[j] = in1[i + j];
        for (int j = 0; j < \text{chunk_size}; j++) {
#pragma HLS LOOP TRIPCOUNT min = c size max = c size
            out_r[i + j] = v1_buffer[j] + in2[i + j];
} }
```

https://github.com/Xilinx/Vitis\_Accel\_Examples



## Example: OpenCL (1/3)

- Accelerating k-means via HLS
  - Alpha Data PCI-e FPGA Card (Kintex)
  - Task kernels (single-thread)
    - Loop pipelining
  - NDRange kernels
    - OpenCL model of workgroups



Nuno Paulino, J. C. Ferreira and J. M. P. Cardoso, "Optimizing OpenCL Code for Performance on FPGA: k-Means Case Study With Integer Data Sets," in IEEE Access, vol. 8, 2020





Baseline OpenCL version of k-means clustering



## Example: OpenCL (2/3)

### • Changes?

- 🔿 No outtermost loop
- Some work moved to the host (scope A)
- OpenCL vector types

uint16

O Scopes E1 to E4

Burst reads/writes



#### (...continued)

| <pre>if(ptotr == TMPPTS) {     ptotr = 0;     for(int j = 0; j &lt; TMPPTS/2; j++) {         int idx = {(0fstet + h)/2} + j;         uintl6 tmpread = data[idx];         tmppts[(j*2)+1] = tmpread.lo;         tmppts[(j*2)+1] = tmpread.hi;         } }</pre> | E3 |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| () // for every centroid                                                                                                                                                                                                                                       | С  |
| // adapt D segment in kmeansv2/v3 D<br>// to resort to "tmppts" and<br>// "tmpcntr" to compute distances                                                                                                                                                       |    |
| () // compare dist with mindist                                                                                                                                                                                                                                |    |
| ptctr++;                                                                                                                                                                                                                                                       |    |
| <pre>nt i = 0, idx = (offset/16);<br/>or(i = 0, j = 0; i &lt; npoints; i += 16, j++)</pre>                                                                                                                                                                     | E  |
| <pre>labels[idx + j] = *(uint16 *) &amp;(tmplabels[i]);</pre>                                                                                                                                                                                                  |    |
| <pre>or(i = 0, j = 0; i &lt; npoints; i += 16, j++) min_dist[idx + j] = *(uint16 *) &amp;(tmpdist[i]);</pre>                                                                                                                                                   |    |

Version with vectorization, local partitioned memories, and burst accesses



## Example: OpenCL (2/3)

- 725x Over the baseline
  - By combining loop pipelining, burst memory accesses, and vectorization
  - O But the code **needed a lot of work**
  - And features outside the OpenCL standard are needed... (e.g., partitioning attributes)
- But more cost effective!
  - CPU: \$450 on release (2014)
  - O FPGA: \$2700... but <u>1.5x faster and 4.8x less power</u>



Speedup for v1b and v5b8 vs. the respective versions without burst optimization





# 05. Witness Testemonies





## Pedro Silva

### FEUP, MIEIC FPGAs as Accelerator for Graph Analysis Algorithms

- Impressions from a user perspective
  - "Lots of boilerplate"
  - "Leaky abstractions"
  - "Thinking outside the Sofware Engineering box"
  - "Slow compilation times"
  - "Breaking through the C abstraction (when it doesn't fall apart on its own, see above)"





Source: Wlkipedia Commons

- Accelerating Graph Centrality Algorithms on FPGAs via HLS
  - Relatively unexplored on FPGAs
  - Can they be easily expressed through HLS abstractions?
  - $\bigcirc$  How central is each node to the graph?
  - Uses algorithms like *Shortest Path* 
    - Up to x100 slower than GPU, using out-of-the box Xilinx libs.



## Tiago Santos

FEUP, MIEIC Automatic Insertion of High-Level Synthesis Directives

- Impressions from a developer perspective
  - "Instability between versions"
  - O "Scattered documentation"
  - "Compile times"
  - "Which directives to chose?"
  - "How to configure directives?"
  - Can the process be **automated**?



```
#define N 2000
void computeGrad(float grad[N],
float feature[N], int scale) {
    for (int i = 0; i < N; i++)
        grad[i] = scale * feature[i];
```



```
#define N 2000
void computeGrad(float grad[N],
float feature[N], int scale) {
```

```
#pragma HLS array_partition
 variable=grad cyclic factor=32
#pragma HLS array_partition
 variable=feature cyclic factor=32
```

```
for (int i = 0; i < N; i++)
#pragma HLS unroll factor=32
#pragma HLS pipeline
       grad[i] = scale * feature[i];
```

- Do acceleration candidates need improvement for better HLS?
  - Iterations assumed sequential
  - Parallelism needs to be exposed
  - Automatic annotation with HLS directives
    - Using Source-to-Source tool Clava
  - 3x to 58x latency improvements







## Thank you!

### Stay tuned for REC'2021!

Nuno Paulino INESC TEC nuno.m.paulino@inesctec.pt