#### **Flavor Physics Workshop Neckarzimmern**

## Moving from HEP to Industry with FPGAs

Christian Färber
Neckarzimmern - Germany
15. March 2023

Copyright © 2023 Intel Corporation.

This document is intended for personal use only.

Unauthorized distribution, modification, public performance,

public display, or copying of this material via any medium is strictly prohibited



Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

## Agenda

## Moving from HEP to Industry with FPGAs

- Career Path
  - Heidelberg University
  - Vector Informatik
  - CERN
  - Thales
  - Intel
- What I found useful
  - Interviews
  - General Thoughts
  - Paths of FPGA Colleagues
- Summary
- FPGA Introduction
- FPGA compute acceleration

#### Field Programmable Gate Array









Source:

https://www.intel.de/content/www/de/de/products/details/fpga/development-kits/stratix/10-dx.html

## **Section:**Career Path

- Heidelberg University
- Vector Informatik
- CERN
- Thales
- Intel

#### 2006 - 2009



## Heidelberg University

## Internship, Diploma

- Investigating Outer Tracker ageing
- Development of automatic ageing scanner
- Adding oxygen to counting gas and investigate ozone creation to reduce ageing

#### Skills:

- Undertaking irradiation tests in the lab
- Implement a lab readout chain as also lab experiment control
- Analyzing data with C++ and ROOT









https://www.physi.uni-heidelberg.de/Publications/Diplomarbeit.pdf



## Heidelberg University

#### PhD

- Prototyped FPGA-based readout board for potential OT upgrade
  - FPGA-based Time to digital converter (t<sub>res</sub><1ns)</li>
  - FPGA irradiation tests at MPIK and HIT
  - Collaboration with engineers from PI workshop
- Support of Outer Tracker commissioning
- Outer Tracker gas monitoring
- Skills:
  - Electronics design with FPGAs, optical high-speed transceivers, VHDL, C++, analysis of experiments, presenting at conferences, ...
  - Developed muscle to work into complex new fields, having a high frustration tolerance and the mindset to get things really working
  - With HEP great preparation for joining academia and industry!





https://www.physi.uniheidelberg.de/Publications/Dissertation Christian Faerber.pdf



## **Section:**Career Path

- Heidelberg University
- Vector Informatik
- CERN
- Thales
- Intel

## Vector Informatik

## Hardware Development Engineer

- Vector: Develops software tools and components for networking of electronic systems based on serial bus systems for automotive
- Project: FPGA development of a high-speed interface and a memory synchronization block for the VX1000 hardware, used for vehicle ECU calibration
- Team of 5 FW engineers
  - Next to software team and PCB team, in total 20
- Programming Guide for VHDL & C++
- Industry timelines
- Working time
- Trainings necessary for CAN ...













Serial / HSSLx



Source: https://www.vector.com/us/en/

### Vector Informatik

## How I got this job

- Job portals / social networks
  - Stepstone
  - For large companies check their career portal!
  - LinkedIn / (in the past Xing)
- Job-Messe University Heidelberg
- Created a professional resume 2 pages only!
- Applied for several positions and companies
- Shared and discussed experience with my FPGA network
  - Dry run of interview with people from the field







## **Section:**Career Path

- Heidelberg University
- Vector Informatik
- CERN
- Thales
- Intel



## CERN - Openlab

### Senior Applied Fellow

- HTCC CERN Openlab collaboration with Intel
- Feasibility study to use FPGA-based compute acceleration in the HPC sector of High-Energy physics
  - RICH photon reconstruction
  - Calorimeter raw data encoding
  - CNN inference study with MNIST
- Highlight: Prototypes of the Intel® Xeon®+FPGA hybrid server-processor (HARP - Intel Hardware Accelerator Research Program )
- Got to know Intel development team in Hillsboro very well







# CERN

## CERN - Openlab

#### RICH PID

- Calculate Cherenkov angle  $\Theta_c$  for each track t and detection point D, not a typical FPGA algorithm
- RICH PID is not processed for every event, processing time is too long!
- 748 clock cycle long pipeline written in Verilog
  - Additional blocks developed: cube root, complex square root, rot. matrix, cross/scalar product,...
- Lengthy task in Verilog with all test benches



Reference: LHCb Note LHCb-98-040



# CERN

## CERN - Openlab

#### RICH PID

- Acceleration of up to factor 35 with Intel® Xeon®+FPGA
  - Theoretical limit of photon pipeline: a factor 64 with respect to single Intel<sup>®</sup> Xeon<sup>™</sup> thread, for Arria<sup>™</sup> 10 a factor ~ 300
  - Bottleneck: Data transfer bandwidth to FPGA, caching can improve this, tests ongoing
- Energy efficiency improvement
- Later implemented with OpenCL and standard PCIe acceleration cards

## Compare runtime for Cherenkov angle reconstruction with Intel® Xeon® CPU and Intel® Xeon®+FPGA



https://indico.cern.ch/event/669298/attachments/1551772/2433879/CERN Computing Seminar Faerber 20171031.pdf



## Section: Career Path

- Heidelberg University
- Vector Informatik
- CERN
- Thales
- Intel

## Thales - MLS

## Senior FPGA Design Engineer

- Thales Group is a French multinational company that designs, develops and manufactures electrical systems for the aerospace, defence, transportation and security sectors.
- Algorithm development and sensor optimization for applications in the rail industry
  - Got trainings in **CENELEC EN 50126, ...**
- Fiber Bragg Gating based axle counting system
  - Simulation of sensor geometries
  - Rail field tests
  - Regular presentation of project status to Thales MLS management
  - Collaboration with product group and business unit supporting early customers
- Worked in a team of 6 FPGA design engineers, depart. 100.
- Direct manager and project manager







https://en.wikipedia.org/wiki/Fiber Bragg grating

## Thales - MLS

## **Axle Counting**

- Why axle counting?
- Old axle counting used magnet field
- New idea to measure bending of rail with FBG
- Replace expensive copper with cheaper fiber
- No power at sensor location needed O(km)



https://www.thalesgroup.com/en/countries/europe/germany/transportation/rail-field-equipment

## Thales - MLS

## **Axle Counting**



FiCoS wavelength shift with axle position: currentGeometry



Working at Thales is a great possibility to influence modern transportation ©



Great to see your algorithm working on the FPGA in the readout electronics for fields tests with real trains

## Section: Career Path

- Heidelberg University
- Vector Informatik
- CERN
- Thales
- Intel



## Intel - PSG Senior Field Application Engineer

- General: Technical support of customers with freedom on which topics I work considering reaching Design Win
- How I got this job my network
- No training, second day onside at customer
- Port of algorithms to FPGAs with goal to increase performance and reduce energy usage as also cost of future data centers
- Algorithms for large data center customers
  - Data compression
  - File parsing at line speed
  - Data analytics
- Not typical but possible:
  - Initiated that Intel participates in <u>EU H2020 Daphne</u>
  - Writing research papers and supervise students
  - Present research results at conferences, e.g.: <u>HiPEAC2023</u>



#### Source:

https://www.intel.de/content/www/de/de/products/details/fpga/platforms/pac/d5005.html





https://dl.acm.org/doi/pdf/10.1145/3533737.3535094



https://www.cidrdb.org/cidr2022/papers/p4-damme.pdf



## Intel - PSG

## Senior Field Application Engineer

- FAE is responsible for technical success of customer projects customer first mentality
  - Customer trainings onside in RTL and Intel® oneAPI
  - Realizing Proof of Concepts with customers
  - Support of productization
- New feature development with Intel engineering
- Marketing support
- Find solutions for many different customers from different fields
- Working in a team of ~10 FAEs



https://salesvideos.intel.com/detail/video/6234855095001/intel%C2%AE-fpga-acceleration-of-data-compression



https://www.intel.com.br/content/dam/www/ central-libraries/us/en/documents/2022-09/sap-open-fpga-stack-white-paper.pdf

## Section: What I found useful

- Interviews
- General Thoughts
- Paths of FPGA Colleagues

## Interviews I/II

## What to expect

- Headhunter
  - High level, wants normally a CV and 3Qs:
    - Are you in contact with other companies already?
    - Salary (portals and talk with others)
    - When can you start?
- HR
  - Be confident
  - Rare: Some will challenge you like, why should we hire you and not the others?
  - Often: Some want to understand, what they need to do, that you join
- First time with hiring manager, job and team description, check your skills and expectations





## Interviews II/II

## What to expect

- Coding interview
  - Live programming, prepare: e.g. <u>coderbyte</u>
  - Short project, e.g. 10 days to finish
- Technical interview, often with future team members, in the past F2F
  - Prepare to talk about successful projects and small technical questions, no test
  - Be proud what you did
- Last clarification with hiring manager
- HR with last negotiation



Ask questions!

## Section: What I found useful

- Interviews
- General Thoughts
- Paths of FPGA Colleagues

## General Thoughts

## During my career path so far

- Get a mentor and mentor yourself someone
- Create a large network in and around your field
- Create an elevator pitch
- Be pro-active, take responsibility and speak up, you own your own career!
- Changing job and company is normal
- Remote work is in IT even more remote as you may think



https://www.td.org/talent-development-glossary-terms/what-is-mentoring

## Section: What I found useful

- Interviews
- General Thoughts
- Paths of FPGA Colleagues

## Paths of FPGA Colleagues

## Many interesting paths

- Colleagues who did their PhD with me and worked also with FPGAs are now:
  - Group leader in engineering
    - Aerospace developing FPGA-based control systems
  - Freelancer
    - Transportation
  - ASIC designer
    - Commercial sensor chips
- Similar experience with job market and applications



## Summary

### Moving from HEP to Industry with FPGAs

- Working in HEP prepared a large set of skills to be successful in industry <sup>3</sup>
- With hardware & FPGA knowhow you can select the job you want, great future perspective!
- FPGAs are used in a large variety of electronics from edge to cloud
  - FPGA field is growing fast
- Intel® oneAPI enables software engineers to use FPGAs without using RTL ©
- FPGAs get used also as compute accelerators and infrastructure processing units in data centers and cloud





Source: https://www.bittware.com/intel



## Further questions?

Contact:

christian.faerber@intel.com

Or

https://www.linkedin.com/in/dr-christian-faerber-b7211590/

# 

## Section: FPGA Introduction

- FPGA Architecture
- Use Cases

### **FPGA Architecture**

## **Base Components**

- ALMs (# up to 4M)
  - configurable building block that combines a look-up table and a flip-flop to implement logic functions and register-based storage.
- DSPs (# up to 12k)
  - specialized hardware block with high-speed arithmetic and signal processing capabilities used for implementing complex digital signal processing algorithms.
- RAM blocks (30MB), I/O, PLLs
- Programming: HDL (Hardware Description Language)



### FPGA Architecture

## **Advanced Components**

- HPS (Hard Processor System)
- Floating Point and ML DSPs
- Hyperflex registers
- Hardened IPs (e.g. DDR4, PCIe)
- High speed Transceivers
- High Bandwidth Memory
- NoC (Network on Chip)
- Chiplet tiles (PCIe5, CXL, Network, ...,



https://www.intel.de/content/www/de/de/products/docs/programmable/agilex-7-fpga-m-series-memory-bandwidth-wp.html

## Section: FPGA Introduction

- Architecture
- Use Cases

## **Use Cases**

#### FPGA base use cases

- Digital Signal Processing, Embedded Systems and Prototyping
- Aerospace, Medical, and Automotive
- Networking and Communications
- Industrial Automation and Control
- High-Performance Computing





















## New Demands Driving Customization Needs

Embedded/ Edge Communications
Infrastructure

Cloud/ Enterprise

Real-Time Actionable Intelligence

High-Bandwidth Aggregation & Processing Managing, Organizing & Processing the Explosion of Data

**Customized Connectivity** 

**Low Latency Compute** 

Maximum Data
Throughput with Network
Acceleration

Customized Acceleration of Diverse Workloads

intel.

## Section: FPGA compute acceleration

- Hardware
- Intel® oneAPI
- Intel® DevCloud
- FPGAs in HPC
- FPGAaaS

## FPGA supported Technologies

#### **FPGA Advantages**

- UPI and now CXL
  - Cache-coherent, high bandwidth and low latency interface
  - Memory extension with DDR4/5
  - High Bandwidth Memory
  - In package 32GB HBM2e, up to 1TB/s
- IKL FPGA to FPGA connection
- Network interfaces
- Special AI DSP blocks S10NX
  - https://www.intel.com/content/www/us/en/products/programmable/stratix-10-nx-fpga-vs-gpu-aiconference-paper.html

#### Intel® S10DX



https://www.intel.de/content/www/de/de/products/details/fpga/development-kits/stratix/10-dx.html

#### BittWare: 520NX



# Intel® Agilex™ FPGA Cards with Intel® oneAPI

BittWare: IA-840f

• PCIe Full Size, Dual Width Intel® Agilex™ FPGA : AGF027

- PCle Gen4 x16
- 128GB DDR4 (4x banks)
- 3x QSFPDD (200G); 2x MCIO x8 expansion
- BittWare BMC + SDK

- PCIe Dual Width Intel® Agilex™ FPGA : AGM039
- PCle Gen5 x16 + CXL
- 32GB HBM2e
- 32GB DDR5 (2x banks)
- 3× QSFP-DD (400G)
- BittWare BMC + SDK



BittWare: IA-860m



# Section: FPGA compute acceleration

- Hardware
- Intel® oneAPI
- Intel® DevCloud
- FPGAs in HPC
- FPGAaaS

# Programming Challenges

#### Multiple Architectures

- Separate programming models and toolchains for each architecture.
  - Required **training** and **licensing** compiler, IDE, debugger, analytics/monitoring tool, deployment tool, et al. per architecture.
  - Challenging experience in debug, monitoring, and maintenance of a cross-architectural source code.
  - Difficult integration across proprietary IPs and architectures and no code re-use.
- Software development complexity limits freedom of architectural choice.
  - Isolated investments required for technical expertise to overcome the barrier-to-entry



#### Intel® oneAPI Product





**Available Now** 

## DPC++: Three Scopes

- DPC++ Programs consist of 3 scopes:
  - Application scope Normal host code
  - Command group scope Submitting data and commands that are for the accelerator
  - Kernel scope Code executed on the accelerator
- The full capabilities of C++ are available at application and command group scope
- At kernel scope there are limitations in accepted C++
  - Most important is no recursive code
  - See SYCL specification for complete list

```
void dpcpp_code(int* a, int* b, int* c) {
 //Set up an FPGA device selector
 INTEL::fpga_selector selector;
                                   Application
  // Set up a DPC++ device queue
                                   Scope
 queue q(selector);
 // Setup buffers for input and output vectors
  buffer buf a(a, range<1>(N));
  buffer buf b(b, range<1>(N));
  buffer buf c(c, range<1>(N));
  //Submit Command group function object to the queue
 q.submit([&](handler &h){
   //Create device accessors to buffers
   accessor a(buf_a, h, read_only);
   accessor b(buf_b, h, read_only);
                                    Command
   accessor c(buf_c, h, write_only); Group
   //Dispatch the kernel
                                    Scope
   h.single_task<VectorAdd>([=]()
     for (int i = 0; i < kSize; i++) {
       c[i] = a[i] + b[i];
                       Kernel Scope
```

## Getting Started with oneAPI on an FPGA



**Note**: Developers using custom platforms should write with IOFS their own BSP or obtain a BSP from their 3<sup>rd</sup> part platform vendor.

# Section: FPGA compute acceleration

- Hardware
- Intel® oneAPI
- Intel® DevCloud
- FPGAs in HPC
- FPGAaaS

#### Intel® DevCloud

- Sign up here:
  - https://software.intel.com/devcloud
  - Account for 120 days
  - Nodes with cards installed in the group fpga\_runtime
  - Nodes with extra memory for full FPGA compiles in the group fpga\_compile
  - Intel® oneAPI environment already ready



### Use the Intel® DevCloud

- SSH to gateway
- Or Jupyter Notebooks via browser
- Job queue for compile and run
  - Job output into log file
- Access to single node also possible
- Session time limit:

default : 6hrs max : 24hrs

Many samples and tutorials in Git repro

https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/DPC%2B%2BFPGA





#### Intel® DevCloud – Available FPGA Hardware

What are you trying to use the DevCloud for?





- 1) Arria<sup>™</sup> 10 PAC RTL AFU, OpenCL
- 2) Arria<sup>™</sup> 10 OneAPI, OpenVINO
- 3) Stratix<sup>™</sup> 10 RTL AFU, OpenCL
- 4) Stratix<sup>™</sup> 10 OneAPI
- 5) Emulation
- 6) Compilation (bitstream creation)

Soon: Intel® Agilex™ cards with Intel® oneAPI







# Section: FPGA compute acceleration

- Hardware
- Intel® oneAPI
- Intel® DevCloud
- FPGAs in HPC
- FPGAaaS

#### FPGAs in HPC

#### Examples

- Reasons: Acceleration, Perf/W, Perf/V
- Use cases:
  - Scientific calculations
    - Genomics
    - HEP trigger
  - Data compression
  - Image processing
  - Data Base acceleration
  - File parsing
  - Financial

#### Particle Identification on an FPGA Accelerated Compute Platform for the LHCb Upgrade

Abstract—The current LHs
in 2018 to a "triggerless" it
the Large Hadron Collider
corresponding bandwidth fros
dedicated computing farm (etrigger, has to be increased
currently 500 Gbls up to 4d
preanalyze the data and will s
basis. Tais will reduce the bas
to write the interesting physia
a system is a challenging task
technologies are considered a

Intel Corporation (UK) Limited Veeraraghavan Ramamurthy

Christian Färber Intel Corporation

Accelerating Re-Pair Compression using FPGAs

eeraraghavan.ramamurthy@intel. Intel Corporation

ABSTRACT FPGA linked via the high-sp An accelerator is implemented these platforms, which are buil computing, are also very inter community. First, the perform performed at the beginning of the existing LHCb RICI and is ported to the experims We have compared the perfor-identification running on a n of the same algorithm, whice commute accelerator halform widespread use in the data manager system that performs Re-Pair com mented in OpenCL, aside from a hash tal realized in RTL for more control over t Our experiments demonstrate that an highly-optimized CPU version of Re-Pair

latform for the possible use ACM Reference Format Robert Lasch, Suleyman S. Demirsov, Nor murthy, Christian Fürber, and Kai-Dwe Sattler York, NY, USA, 8 pages. https://doi.org/10.

1 INTRODUCTION

transfer, Especially when used with coagement [13]. Besides the obvious adva Various compression algorithms exist th

and speed, and the access to compressed Re-Pair [22] is a grammar-based comp trast to most general-purpose algorithms family or bzip2, it allows random accesse tional complexity and thus its long con been able to partly mitigate this in pric

claseroom use is granted without fee provided that c fee profit or commercial advantage and that copies be on the first page. Copyrights for components of this author(s) must be honored. Abstracting with credit is

TU Ilmenau FPGA-Accelerated Compression of Integer Vectors

Kai-Uwe Sattler

Mahmoud Mohsen mahmoud.mohsen@sap.co norman.may@sap.com

christian.faerber@intel.com

david broneske@oven de University of Magdeburg

Compression Technique Index Type #Columns RLE None 127.073

coded column stores like SAP HANA to k limited and precious main memory. Past resear weight compression techniques that trade le accesses for lower compression ratios. Cor pression ratio for many columns. Furth

streaming-processor, an FPGA is the perfect car the compression task. As a result of our Open tation, we achieve a saturation of the availab compression on the FPGA, by using less than resources. Furthermore, our real-world based SAP HANA shows a performance factor of 2 in compression throughout while down to 60% of the best SAP HANA cor

KEYWORDS

FPGA, Compression, Binary Packing

ACM Reference Format Jahmand Mahsen Norman May Christian 2020, FPGAsAccelerated Comm org/10.1145/3399666.3399932

PipeJSON: Parsing JSON at Line Speed on FPGAs

Daniel Ritter {firstname.lastname}@sap.o Walldorf, Germany

christian.faerber@intel.com Munich, Germany

ABSTRACT

KEYWORDS

ACM Reference Format:

heidelberg.de Heidelberg University Heidelberg, Germany

JavaScript Object Notation (ISON) data exchange and storage format. While modern CPUs show an improved JSON p parallelism with vector instructions. t

from reaching the practical limit of m We present PipeJSON, the first star parser to process tens of gigabytes utilizes FPGA hardware to make exten and can parse multiple characters per usability in software projects, PipeJS art JSON parsers on CPU, despite data

Keywords: FPGA, Hardware Pipelini ACM Reference Format: Jonas Dann, Royden Wagner, Daniel Ritte ger Fröning, 2022, PipelSON; Parsing ISON; 7 pages, https://doi.org/10.1145/1122445.

n recent years, JavaScript Object !

formats due to their flexible, semi-struc

son [12] (not shown) and-its technical

personal or classroom use is granted without fee p made or distributed for profit or commercial adva this notice and the full citation on the first page. O

of this work owned by others than ACM must

credit is permitted. To copy otherwise, or repub redistribute to lists, requires prior specific perm

permissions from permissions@acm.org. DaMoN'22, June 13, 2022, Philadelphia, PA, US/

its variants gained popularity as data ex tation (e. g., [1, 7]). This is especially imp wide variety of data (e.g., [8, 10, 13]). process and store data in efficient inte ingesting raw JSON documents is exper [7, 11, 12]. Recent advances on modern Resource-Efficient Database Query Processing on FPGAs

PipeJSON (PCIe) similism saison Rapi

Christian Färber mehdi moghaddamfar@sap.con christian faerber@intel.com wolfgang.lehner@tu-dresden.d

> Norman May norman.may@sap.com

Akash Kumai akash.kumar@tu-dresden.d

FPGA technology has introduced new ways to accelerate data-Therefore, the number and type of algorithms that an FPGA can simultaneously support is decided by the size of the modules repr senting them, and resources provided by the chip. In the content base query processing, that often result in higher performance and energy efficiency. This is thanks to the unique architecture of FPGAs using reconfigurable resources to behave like an applicationof database acceleration. FPGA resource constraints are often no a limiting factor for streaming operators (e.g. filter, projection specific integrated circuit upon programming. The limited amoun given their low complexity and thus resource requirements [14 More complex pipeline-breaking operators however require specia FPGA can simultaneously support. In this paper, we propose "morphing sort-merge": a set of run-time configurable FPGA modules sort, aggregation, and equi-join. The proposed modules use dy-

engines typically take the approach of either hashing or sorting for intermediates [15]. Accelerating pipeline-breaking operators on FPGAs is also subject to the same choice of technique. However, implementations of both hash-based [2, 16, 17] and sort-bases limits the number and type of pipeline-breaking operators that an FPGA can efficiently support at run-time.

An FPGA module can achieve resource efficiency by reusing its dedicated resources to support different functionality through that simultaneously support the sort, aggregation, and equi-join or and highly skewed keys for the aggregation operator. Morphin detailed in Sections 4.2, 4.4, and 4.5. Our benchmarks in Section 5 demonstrate speedups of up to  $28 \times$  compared to a 28-threaded

USA, 8 pages. https://doi.org/10.1145/3465998.3466006

With recent advances in FPGA technology [21, 35], and its appear ance in the cloud [19, 22, 31], FPGAs have become valuable acceler ators for data and query processing pipelines. An FPGA comprise of a limited, inextensible amount of reconfigurable resources, that when programmed allows it to behave like an application-specific integrated circuit (ASIC). The unique architecture of FPGAs poses mplemented as a module on an FPGA uses a fixed amount of its

namic optimization mechanisms that adapt the implementation to the distribution of data at run-time, thus resulting in higher perfor-

query processing, FPGA, sorting, aggregation, join, treap, morphing sort-merge, Chisel, OPAE

Mehdi Moshaddarufar, Christian Färber, Wolfgang Lehner, Norman May

and Akash Kumar. 2021. Resource-Efficient Database Query Processing on FPGAs. In International Workshop on Data Management on New Hardware (DAMON'21), June 20–25, 2021, Virtual Event, China. ACM, New York, NY,

mance. Our benchmarks show that mombine s

an average speedup of 5× compared to MonetDB.

BACKGROUND ON FPGAS Field Programmable Gate Arrays (FPGAs) are integrated

with reconfigurable logic, memory, and inte that are programmed to mimic a digital circuit. They can be repor-FPGAs are an order of magnitude slower (in frequency) and provide less logic resources than a same sized ASIC. Nonetheless, thanks to their reprogrammability and faster design cycle, they are suitable for use in environments where design changes or frequent he FPGA and its local RAM is connected to the host (CPU and it RAM) through a host link (e.g. PCle) responsible for the transfer of

FPGAs comprise of a fixed amount of at least 3 types of pro-

# Section: FPGA compute acceleration

- Hardware
- Intel® oneAPI
- Intel® DevCloud
- FPGAs in HPC
- FPGAaaS

#### **FPGAaaS**

#### Datacenter and cloud

- FPGA accelerator functionality can be offered as a microservice enabling application developers to easily leverage many microservice characteristics:
  - Auto-deployment,
  - Scalability, dynamic configuration
- Data compression as CPU competitive example
- Sharing FPGA with multiple instances important for business case and enables even more use cases later
- IOFS enables virtualization and Docker container available for deployment and easy scaling, OS with DFL driver necessary e.g. GardenLinux
- Availability of Intel® FPGAs

#### **CIDR 2023**



https://www.cidrdb.org/cidr2023/papers/p6may.pdf



https://www.intel.com.br/content/dam/www/central-libraries/us/en/documents/2022-09/sap-open-fpga-stack-white-paper.pdf

# Section: Use cases

## Sub-Topics:

FPGAs and Edge Computing

# FPGAs and Edge Computing

#### Examples

- Reduce data transfer by compute and decide at the edge
- Benefits of FPGAs:
  - Good fit for customized performant edge systems
  - Easy scalable perf., low latency, low power, thermal stability
- Applications:
  - Internet of Things (IoT)
  - Mobile and 5G
  - Industrial automation
  - Video analytics
  - Autonomous vehicles



https://www.usenix.org/system/files/conference/hotedge18/hotedge18-papers-biookaghazadeh.pdf



https://www.intel.com/content/www/us/en/products/docs/programmable/cloud-connectivity-solution-brief.html

# Section: FPGA Hardware

- Intel® Open FPGA Stack
- CXL Interface
- FPGA based IPUs

# Intel® OFS – Enabling Scale and Deployment

Intel OFS is a **software and hardware infrastructure** providing an efficient approach to develop a custom FPGA-based platform or workload using an Intel, 3<sup>rd</sup> party, or custom board.

- Scalable, source-accessible hardware and software framework delivered through **Git repositories**
- Reduce development time with modular and composable source code used as-is or easily customized
- Upstreamed Linux kernel drivers are being adopted by leading OS and orchestration vendors
- Growing ecosystem of Intel OFS-enabled boards, workloads, and OS distributions



## Intel® OFS Deliverables

#### Hardware

- Acceleration Functional Unit (AFU) Region for Workload Development with Sample AFUs
- FPGA Interface Manager (FIM)
- Board Management Controller (BMC)
- HLD enablement

#### Software

- Upstreamed, open-source kernel drivers
- OPAE libraries, tools and APIs
- Example Applications

#### **Verification Environment**

 UVM Verification environment provided through Git repositories



Copyright © 2023 Intel Corporation

# Section: FPGA Hardware

- Intel® Open FPGA Stack
- CXL Interface
- FPGA based IPUs

## CXL Configuration Types - Type 1, 2, 3





## CXL-BASED ACCELERATORS – ELEVATED TO 1<sup>ST</sup> CLASS PRIORITY

Intel Public

intel

## Intel® Agilex™ FPGAs – The 1st FPGA with CXL Hard IP

- FPGA Industry-leading interconnect performance
  - Hard IP for CXL and PCIe 5.0 (x16 lanes)
- Supports CXL across multiple CPU / chipset suppliers
  - CXL v1.1 (now) and v2.0 (future IP release, same silicon)
  - Support for Type 1 and Type 2 Accelerators
  - Support for Type 3 memory expansion
    - Additional algorithm acceleration options may not be available from 3<sup>rd</sup> party ASSP's / ASIC's



Intel Public intel

<sup>1.</sup> Intel estimates based on Intel Agilex FPGA with CXL hard/soft IP bandwidth per port vs. competitors FPGAs using 3<sup>rd</sup> party soft IP CXL controller.

<sup>2.</sup> Based on PCI-SIG integrator list results for PCIe 5.0 compliance performance. Intel Agilex R-Tile Gen 5 x16 @ 32 GT/s vs. Xilinx Versal Premium ACAP CPM5 Gen 5 x16 @ 16 GT/s.

# Section: FPGA Hardware

- Intel® Open FPGA Stack
- CXL Interface
- FPGA based IPUs

#### FPGA based IPUs

| Features                                                                                                                                               | Target Acceleration Workloads                                                                |
|--------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|
| <ul> <li>2 x 25 GbE connectivity</li> <li>Intel® Stratix™ 10 DX FPGA</li> <li>Intel® Xeon™ D-1612 Processor</li> <li>32GB DRAM</li> <li>RTL</li> </ul> | <ul> <li>Packet processing</li> <li>Open vSwitch</li> </ul>                                  |
| <ul> <li>2 x 100 GbE connectivity</li> <li>Intel® Agilex™-F FPGA</li> <li>Intel® Xeon™ D-1736 Processor</li> <li>32GB DRAM</li> <li>RTL/P4</li> </ul>  | <ul><li>Packet processing</li><li>Open vSwitch</li><li>NVMe-oF/RoCEV2</li><li>RDMA</li></ul> |

#### Intel® IPU Platform C5000X-PL



#### Intel® IPU Platform F2000X-PL



# A coach talks to you, a mentor talks with you, and a sponsor talks about you

(roles may overlap)







A coach provides guidance for your development, often focused on soft skills (e.g., active listening) rather than technical skills. A mentor informally or formally helps you navigate your career, providing guidance for career choices and decisions. A sponsor is a senior leader or other person who uses strong influence to help you obtain high-visibility assignments, promotions, or jobs.

#### Who Drives the Relationship?

You and your coach are responsible for driving the relationship—you can reach out to your coach when you need help, but your coach can also reach out to you.

#### Who Drives the Relationship?

You drive the relationship. Your mentor is reactive and responsive to your needs.

#### Who Drives the Relationship?

The sponsor drives the relationship, advocating for you in many settings, including behind closed doors.

#### Actions

Provide development feedback outside the formal performance evaluation process.

#### Actions

Help you determine possible career paths to meet specific career goals.

#### Actions

Advocate for your advancement and champion your work and potential with other senior leaders.

Based on catalyst.org guidance

@addyosmani