ML Suite v1.4 was recently released. This update brings many upgrades and new features. It also starts the Cloud-to-Edge unification process, with ML Suite now using Decent_q quantization, while deprecating support for the xfDNN quantizer. Version 1.4 also brings back Docker support to facilitate easier proof of concept evaluations in the Data Center, as well as providing improvements to the runtime APIs.
ML Suite v1.4 has also deprecated support for xDNNv2, and now only supports xDNNV3 overlays.
Platform Support
ML Suite v1.4 initially supports the following platforms:
- AWS
- Nimbix
- Alveo U200
- Alveo U250
- VCU1525
Coming soon:
- Alveo U280
Introducing Decent_q Support
ML Suite previously used the xfDNN quantizer. This python based quantizer performed a recalibration quantization strategy, and while it was fast to quantize a 32-bit model to int8, in some cases produced int8 models with more than 1-2% accuracy loss from the base model.
Decent_q has been used primarily for the DPU (Part of the Edge AI Platform) and embedded use cases, but now is available within ML Suite – targeting the xDNN overlay. It delivers less accuracy loss from 32-bit models. The chart below shows some common models and the accuracy difference between the original model and the quantized int8 model.
Networks |
Float32 baseline |
8-bit Quantization |
||||
Top1 |
Top5 |
Top1 |
ΔTop1 |
Top5 |
ΔTop5 |
|
Inception_v1 |
66.90% |
87.68% |
66.62% |
-0.28% |
87.58% |
-0.10% |
Inception_v2 |
72.78% |
91.04% |
72.40% |
-0.38% |
90.82% |
-0.23% |
Inception_v3 |
77.01% |
93.29% |
76.56% |
-0.45% |
93.00% |
-0.29% |
Inception_v4 |
79.74% |
94.80% |
79.42% |
-0.32% |
94.64% |
-0.16% |
ResNet-50 |
74.76% |
92.09% |
74.59% |
-0.17% |
91.95% |
-0.14% |
VGG16 |
70.97% |
89.85% |
70.77% |
-0.20% |
89.76% |
-0.09% |
Inception-ResNet-v2 |
79.95% |
95.13% |
79.45% |
-0.51% |
94.97% |
-0.16% |
ML Suite and xFDNN have been updated to natively support Decent_q and don’t require extra steps to quantize and deploy a model. ML Suite V1.4 only supports Decent_q for Caffe models. Tensorflow will be coming in the next release.
Enhanced Model Support
ML Suite traditionally included only basic reference models to help users run through a few examples and Jupyter notebook tutorials. Version v1.4 has a new set of application-specific models to give new AI/ML users the ability to get closer to their end application and showcase our AI/ML abilities in end applications. The newly enabled models are shown below.
Application |
Function |
Algorithm |
Face |
Face detection |
SSD, Densebox |
Landmark Localization |
Coordinates Regression |
|
Face recognition |
ResNet + Triplet / A-softmax Loss |
|
Face attributes recognition |
Classification and regression |
|
Pedestrian |
Pedestrian Detection (Crowd Volume) |
SSD |
Pose Estimation |
Coordinates Regression |
|
Person Re-identification |
ResNet + Loss Fusion |
|
Video Analytics |
Object detection |
SSD, RefineDet |
Pedestrian Attributes Recognition |
GoogleNet |
|
Car Attributes Recognition |
GoogleNet |
|
Car Logo Detection |
DenseBox |
|
Car Logo Recognition |
GoogleNet + Loss Fusion |
|
License Plate Detection |
Modified DenseBox |
|
License Plate Recognition |
GoogleNet + Multi-task Learning |
Along with these new application models, reference models for Resnet50, Inception v1/3/4, SSD, yolov2 will also be included.
xfDNN Runtime Enhancements – Support for pycaffe
The xFDNN runtime now supports pycaffe. This addition makes it easier to deploy custom models with layers that aren’t fully supported in xDNN. With the addition of xfDNN subgraph, xfDNN will now automatically split the graph, separating the xDNN and CPU parts. The layers that will run on CPU are executed using pycaffe’s APIs. When the compiler is run, the outputs are feed into the xfDNN subgraph tool which parses the network, creating subgraphs whenever unsupported layers are encountered, so they can be run through pycaffe on the CPU. This new feature will make it easier for users to deploy more networks, and to avoid manually executing CPU layers like Softmax or FC.
In the code, this will look simply like:
Quantize(prototxt,caffemodel)
Compile()
Cut(prototxt)
Infer
This flow also Includes flags to run only on CPU, allowing FPGA/CPU accuracy and speed comparison for benchmarking/eval and single image and streaming image support.
Get started with ML Suite 1.4 at the GitHub Page
Source: https://forums.xilinx.com/t5/AI-and-Machine-Learning-Blog/Ml-Suite-v1-4-Released/ba-p/978398