ECE 8893 – Parallel Programming for FPGAs
Instructor: Cong (Callie) Hao
Time: Tue. & Thu., 3:30 PM – 4:20 PM
Course Information
Instructor Office Hour: Tuesday 4:30 PM – 5:30 PM, Klaus 2306
TA: Akshay Kamath
TA Office Hour: Friday 3:00 PM – 5:00 PM, Klaus 2304
Course materials and labs: ECE8893 GitHub
Useful Links and Interesting Projects
- Xilinx resources (the most helpful resource for this class)
- A floating-point matrix multiplication using HLS: https://github.com/twaclaw/matmult
- A related course (CS 3220) from the CS department taught by Professor Hyesoon Kim: https://gt-cs3220.github.io/
- Vitis HLS examples and tutorial
- Introductory examples: https://github.com/Xilinx/Vitis-HLS-Introductory-Examples
- E.g., Pipeline, interface, dataflow.
- A more comprehensive tutorial: https://github.com/Xilinx/Vitis-Tutorials
- Introductory examples: https://github.com/Xilinx/Vitis-HLS-Introductory-Examples
Course Overview
FPGAs have been serving as important computing resources for decades with significant benefits in low-power, high-throughput, and low-latency applications. Especially, in the current AI revolutionary era, the massive computing and big-data processing requirements have tremendously and exponentially increased. As an example, deep neural networks (DNNs), have demonstrated extremely promising results in various areas but also come with high computational demands for both cloud and IoT platforms. Other examples include large-scale graph processing, scientific computing such as physics simulation, and bioinformatic and chemistry applications such as modular modeling.
With the emerging need for low-power high-performance computing in various domains, FPGAs have been attracting ever-growing interest in specialized hardware accelerators in both industry and academia. Examples include Amazon’s AWS with F1 FPGA clusters, Microsoft Azure FPGA cloud, Xilinx’s FPGA adaptive compute cluster at multiple universities, etc.
This course will present recent advances towards the goal of efficient and high-performance FPGA parallel programming using High-Level Synthesis (HLS) for computation-intensive applications. Specifically, it will provide an overview of FPGA architectures, discuss its underlining structures, and highlight key technologies in achieving high-performance parallel programming on FPGA.
To improve FPGA development productivity, this course will adopt behavioral-level programming language, C/C++, via HLS tools for agile development. It will discuss recent achievements of FPGA accelerations in multiple domains, not limited to DNNs, to broaden the view. Multiple design examples will be provided to let learners quickly get started on basic designs, but with large room to improve which is left for exploration. Optionally, this course will include accelerator/algorithm co-design, being an extremely important and promising research topic.
Learning Outcomes
As part of this course, students will:
- Understand the basic architecture of traditional and modern FPGAs and System-on-Chips (SoCs);
- Understand the key techniques of improving the design performance;
- Understand tradeoffs between various hardware architectures and platforms;
- Learn about HLS programming language using C/C++;
- Learn about micro-architectural knobs such as precision, data reuse, memory optimization, and parallelism to architect FPGA accelerators, given target area-power-performance metrics;
- Evaluate the implemented design on FPGA boards and iterate on improving the design quality; and
- Understand future trends and opportunities for FPGAs for a diverse range of applications such as GNNs, scientific computing, medical electronics, cybersecurity systems, and wireless communications.
Course Structure
The course will involve a mix of lectures interspersed with heavy paper reading and discussions. A semester-long programming-heavy project will focus on developing an FPGA accelerator using HLS for DNN, GNN, or other computation-intensive algorithms.
Course Text
The material for this course will be derived from the following texts:
- Kastner, Ryan, Janarbek Matai, and Stephen Neuendorffer. “Parallel programming for FPGAs.” arXiv preprint arXiv:1805.03648 (2018).
- Papers from recent FPGA and computer architecture conferences: ISCA, MICRO, HPCA, ASPLOS, DATE, DAC, ICCAD, ICCD.
- Papers from ML conferences: ICML, NeurIPS, ICLR, CVPR
Syllabus and Outline
- Overview of FPGA
- FPGA architecture
- FPGA programming tutorial (Vivado)
- Overview of High-Level Synthesis (HLS)
- HLS introduction
- Vitis HLS tutorial
- Overview of Machine Learning
- Deep Neural Networks
- Graph Neural Networks
- FPGA Design Techniques (I)
- Data precision and model quantization
- Loop optimizations and array partitioning
- Data reuse and model tiling
- FPGA Design Techniques (II)
- Storage and memory access
- Data streaming
- C/RTL Co-simulation
- Software/Hardware Co-design
- Hardware-aware algorithm design
- Neural architecture search
- Future Trends
- Spatial-temporal GNNs
- Scientific computing
- Analog Accelerators
Course Project
Device: Xilinx Ultra96V2 or Pynq-Z2 FPGA Board
Tool: Vitis HLS, Vivado
Project Organization:
Lab1 (Individual): Basic usage and practice of Vitis HLS
Lab2 (Individual): Advanced techniques of Vitis HLS
Lab3 (Individual): Deployment on FPGA
Final Project (Group project, each group up to 3 people): A DNN Accelerator (object detection or tracking), GNN Accelerator (from Open Graph Benchmark), or any other computation-intensive algorithms that students are interested in.
Course Grading
- Lab Assignments: 30% (3 Labs – 10% each)
- Paper Presentation: 10%
- Final Project: 60%
- Project Proposal: 10%
- Mid-term Report: 5%
- Source Code: 15%
- Presentation: 10%
- On-board Demo: 5%
- Final Report: 15%
Course Schedule and Slides Download
Week | Date | Topic | Labs | Material |
---|---|---|---|---|
1 | Jan. 10 | Course Introduction | [Slides] | |
1 | Jan. 12 | Overview of Domain-Specific Accelerators | [Slides] | |
2 | Jan. 17 | Introduction to FPGA | [Slides] | |
2 | Jan. 19 | Vitis HLS Tutorial (get your laptop to class!) | Lab 1 release on Jan. 21 [Code] | [Tutorial] |
3 | Jan. 24 | Introduction to Verilog | [Slides] | |
3 | Jan. 26 | HLS Overview | [Slides] | |
4 | Jan. 31 | Loop Optimizations | [Slides] | |
4 | Feb. 2 | Machine Learning 101 | Lab 1 due on Feb. 4 | [Slides] |
5 | Feb. 7 | Fixed-point and Data Quantization | Lab 2A release on Feb 8 [Code] | [Slides] |
5 | Feb. 9 | Data Movement and Streaming | [Slides] | |
6 | Feb. 14 | Convolution and Optimization | Lab 2BC(D) release on Feb 14 [Code] | [Slides] |
6 | Feb. 16 | Project Topics | Lab 2A due on Feb. 18 | Slides shared on Canvas! |
7 | Feb. 21 | Systolic Array and Winograd | [Slides] | |
7 | Feb. 23 | Advanced DSP Techniques and C/RTL Co-Simulation | [Slides] | |
8 | Feb. 28 | FPGA Accelerators for GNNs | ||
8 | Mar. 2 | Class Canceled! | ||
9 | Mar. 7 | Project Proposal Presentation (Part 1) | Lab 2BC(D) due on Mar. 7 | Slides on Canvas! |
9 | Mar. 9 | Project Proposal Presentation (Part 2) | Lab 3A release on Mar. 10 [Code] | Slides on Canvas! |
10 | Mar. 14 | Project Proposal Presentation (Part 3) | Slides on Canvas! | |
10 | Mar. 16 | Project Proposal Presentation (Part 4) | Slides on Canvas! | |
11 | Mar. 21 | Spring Break | ||
11 | Mar. 23 | Spring Break | ||
12 | Mar. 28 | Class Canceled! | ||
12 | Mar. 30 | Paper Presentation (Part 1) | Lab 3A due on Apr. 1 | Slides on Canvas! |
13 | Apr. 4 | Paper Presentation (Part 2) | Slides on Canvas! | |
13 | Apr. 6 | Paper Presentation (Part 3) | Lab 3BC release on Apr. 8 | Slides on Canvas! |
14 | Apr. 11 | Paper Presentation (Part 4) | Slides on Canvas! | |
14 | Apr. 13 | Paper Presentation (Part 5) | Slides on Canvas! | |
15 | Apr. 18 | Project time… | ||
15 | Apr. 20 | Project time… | Mid-term report due on Apr. 20 | |
16 | Apr. 25 | Project time… | ||
16 | Apr. 27 | Project time… | Lab 3B due on May 1 | |
17 | May 2 | Project time... | ||
17 | May 4 | Final Project Presentations (2:30 - 5:30 PM) | Final Report and Code due on May 7 |
Course Policies
Attendance and Absence. Students are expected to attend all lectures and exams. If one has a documented emergency or a university-mandated reason, get in touch with the instructor before (preferable) or latest by the day of the exam.
Learning Accommodations. If needed, we will make classroom accommodations for students with disabilities. These accommodations should be arranged in advance and in accordance with the Office of Disability Services (http://www.adapts.gatech.edu)
Honor Code. Students are expected to abide by the Georgia Tech Academic Honor Code (http://www.policylibrary.gatech.edu/student-affairs/academic-honor-code). Honest and ethical behavior is always expected. All incidents of suspected dishonesty will be reported to and handled by the office of student affairs. Students will have to do all assignments individually unless explicitly told otherwise. Students may discuss with classmates but may not copy any solution (or any part of a solution).