State-of-the-art AI At Alibaba Is Powered By Xilinx

At XDF China in December 2019, Mr. Jiansong Zhang, Staff Engineer of Alibaba, gave a great talk about AI Platforms and Heterogenous Computing.

Alibaba developed a deep learning stack on top of Xilinx FPGAs including IP, shell, runtime, driver, compiler and models to enable various AI workloads.

This solution features –

Optimized and customized hardware design
- On-chip streaming structure
- 3D-systolic-array conv. engine which makes full use of DSP carry chain and supertile design @ 600MHz
- Configurable dimensions of parallelism
Resource allocation & task scheduling using a runtime
Software-hardware co-optimization in compiler
Model parsers for ONNX and Tensorflow

Mr. Zhang also introduced the following 4 key use cases to demonstrate their excellent results.

Case 1: OCR (Optical Character Recognition) in Public Cloud Services

Case 2: Edge solution for Smart Retail

Case 3: Private Cloud Service

~7x TCO saving achieved in replacement of CPU servers

Case 4: Speech Synthesis

Speech synthesis is an iterative task in which 16,000 iterations are needed to generate one second of audio. NN-based TTS (Text-to-speech) can be indistinguishable from human speech. Alibaba developed a Xilinx FPGA-based solution for real-time WaveNet, a state-of-art NN model for TTS. With the customized autoregressive low-latency IP in hardware and customized on-chip loop implemented in a compiler, they achieved 150x speed-up compared to a GPU implementation!