Connect with us


Unlock near 3x performance gains with XGBoost and Amazon SageMaker Neo



When a model gets deployed to a production environment, inference speed matters. Models with fast inference speeds require less resources to run, which translates to cost savings, and applications that consume the models’ predictions benefit from the improved performance.

For example, let’s say your website uses a regression model to predict mortgage rates for aspiring home buyers to see what type of rate they could expect, based on inputs they provide such as the size of the down payment, their loan term, and the county in which they’re looking to buy. A model that can send a prediction back in 10 milliseconds versus 200 milliseconds for every time an input is updated makes a massive difference in terms of the website’s responsiveness and user experience.

Amazon SageMaker Neo allows you to unlock such performance improvements and cost savings in a matter of minutes. It does this by compiling models into optimized executables through various open-source libraries, which can then be hosted on supported devices on the edge or on Amazon SageMaker endpoints. Neo is compatible with eight different machine learning (ML) frameworks, and in the context of gradient boosted tree algorithms such as XGBoost, Neo uses Treelite to optimize model artifacts. Due to the popularity of XGBoost and its unique categorization as a more classical ML framework, we use it as our framework of choice throughout this post. A near 3x speedup will be demonstrated for the optimized XGBoost model compared to the unoptimized one. The Abalone dataset from UCI will be used to train the model. Please feel free to use your own model and dataset, however, and let us know in the comments what type of acceleration was achieved.

This post will take a deeper dive into compiling XGBoost model artifacts using Neo and will show you how to accurately measure and test the performance gains of these Neo-optimized models in general. By the end of this walkthrough, you’ll have your own framework for quickly training, deploying, and benchmarking XGBoost models. In turn, this can help you make data-driven decisions on what type of instance configurations best fit your unique cost profile and inference performance needs.

Solution overview

The following diagram visualizes the services we use for this solution and how they interact with one another.

The steps to implement the solution are as follows:

  1. Download and process the popular Abalone dataset with a Jupyter notebook, and then run an XGBoost SageMaker training job on the processed data. We use a local mode SageMaker training job to produce the unoptimized XGBoost model, which can be faster and easier to prototype compared to a remote one.
  2. Deploy the unoptimized XGBoost model artifact to a SageMaker endpoint.
  3. Take the unoptimized artifact and optimize it with a Neo compilation job.
  4. Deploy the Neo-optimized XGBoost artifact to a SageMaker endpoint.
  5. Create an Amazon CloudWatch Dashboard from the SageMaker notebook to monitor inference speed and performance under heavy load of both endpoints.
  6. Deploy Serverless Artillery from the SageMaker notebook, which we use as our load testing tool. We set up Serverless Artillery entirely from the SageMaker notebook, and directly invoke your SageMaker endpoints from the internet through manually signed AWS Signature Version 4 requests—no need for Amazon API Gateway as an intermediary.
  7. Perform load tests against both endpoints.
  8. Analyze the performance of both endpoints under load in the CloudWatch dashboard, and look at how the optimized endpoint outperforms the unoptimized one.


Before getting started, you must have administrator access to an AWS account, and complete the following steps:

  1. Create an AWS Identity and Access Management (IAM) role for SageMaker that has the AmazonSageMakerFullAccess managed policy attached along with an inline policy that contains additional required permissions.

The following screenshot is an example of a properly configured role called NeoBlog.

The AdditionalRequiredPermissionsForSageMaker inline policy contains the following JSON:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "cloudwatch:PutDashboard", "Resource": "arn:aws:cloudwatch::*:dashboard/NeoDemo" }, { "Effect": "Allow", "Action": [ "s3:CreateBucket", "s3:GetBucketLocation", "s3:GetObject", "s3:ListBucket", "s3:PutObject", "s3:DeleteBucket", "s3:DeleteObject", "s3:DeleteObjectVersion", "s3:PutLifeCycleConfiguration", "s3:GetEncryptionConfiguration", "s3:PutEncryptionConfiguration", "s3:PutBucketPolicy", "s3:DeleteBucketPolicy", "s3:GetBucketPolicy", "s3:GetBucketPolicyStatus" ], "Resource": "arn:aws:s3:::serverless-artillery-*" }, { "Effect": "Allow", "Action": [ "cloudformation:CreateStack", "cloudformation:UpdateStack", "cloudformation:DeleteStack", "cloudformation:DescribeStacks", "cloudformation:DescribeStackEvents", "cloudformation:DescribeStackResource", "cloudformation:DescribeStackResources", "cloudformation:ListStackResources" ], "Resource": "arn:aws:cloudformation:*:*:stack/serverless-artillery-*" }, { "Effect": "Allow", "Action": [ "cloudformation:ValidateTemplate" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "iam:GetRole", "iam:CreateRole", "iam:DeleteRolePolicy", "iam:PutRolePolicy", "iam:DeleteRole", "iam:PassRole" ], "Resource": "arn:aws:iam::*:role/serverless-artillery-*" }, { "Effect": "Allow", "Action": [ "sns:CreateTopic", "sns:DeleteTopic", "sns:GetTopicAttributes" ], "Resource": "arn:aws:sns:*:*:serverless-artillery-*" }, { "Effect": "Allow", "Action": [ "lambda:UpdateFunctionCode", "lambda:ListVersionsByFunction", "lambda:PublishVersion", "lambda:InvokeFunction", "lambda:GetFunction", "lambda:CreateFunction", "lambda:DeleteFunction", "lambda:GetFunctionConfiguration", "lambda:AddPermission" ], "Resource": "arn:aws:lambda:*:*:function:serverless-artillery-*" }, { "Effect": "Allow", "Action": [ "logs:DescribeLogGroups", "logs:CreateLogGroup" ], "Resource": "arn:aws:logs:*:*:log-group:serverless-artillery-*" }, { "Effect": "Allow", "Action": [ "logs:DeleteLogGroup", "lambda:RemovePermission" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "events:DescribeRule", "events:PutRule", "events:DeleteRule", "events:PutTargets", "events:RemoveTargets" ], "Resource": "arn:aws:events:*:*:rule/serverless-artillery-*" } ]

Our next step is to create a SageMaker notebook instance.

  1. On the SageMaker console, under Notebooks, choose Notebook instances.
  2. Choose Create notebook instance.
  3. For Notebook instance name, enter NeoBlog.
  4. For Notebook instance type, choose your instance (for this post, the default ml.t2.medium should be enough).
  5. For IAM role, choose the NeoBlog role that you created.
  6. In the Git repositories section, select Clone a public Git repository to this notebook instance only.
  7. For Git repository URL, enter
  8. Choose Create notebook instance.
  9. After the notebook has reached a Running status, choose Open Jupyter to connect to your notebook instance.
  10. Navigate to the neo-blog repository in Jupyter and choose the NeoBlog.ipynb notebook to start it.

You’re now ready to walk through the remainder of this post and run the notebook’s contents.

Notebook walkthrough

The code snippets in this post match the code in the NeoBlog notebook. This post contains the most relevant commentary, and the notebook provides additional detail. When extra information is provided in the notebook, it’s called out accordingly. Let’s get started!

First, we must retrieve the Abalone dataset and split it into training and validation sets. We store the data in lightsvm format.

  1. Run the following two cells in the Jupyter notebook:
from pathlib import Path
import boto3 for p in ['raw_data', 'training_data', 'validation_data']: Path(p).mkdir(exist_ok=True) s3 = boto3.client('s3')
s3.download_file('sagemaker-sample-files', 'datasets/tabular/uci_abalone/abalone.libsvm', 'raw_data/abalone')

from sklearn.datasets import load_svmlight_file, dump_svmlight_file
from sklearn.model_selection import train_test_split X, y = load_svmlight_file('raw_data/abalone')
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=1984, shuffle=True) dump_svmlight_file(x_train, y_train, 'training_data/abalone.train')
dump_svmlight_file(x_test, y_test, 'validation_data/abalone.test')

Now that we have our data shuffled and prepared, we can train an unoptimized XGBoost model. Refer to the commentary in the Jupyter notebook for details related to the container framework version, hyperparameters, and training mode being used.

  1. Train the model by running the following code cell:
import sagemaker
from sagemaker.xgboost.estimator import XGBoost
from sagemaker.session import Session
from sagemaker.inputs import TrainingInput bucket = Session().default_bucket()
role = sagemaker.get_execution_role() # initialize hyperparameters
hyperparameters = { "max_depth":"5", "eta":"0.2", "gamma":"4", "min_child_weight":"6", "subsample":"0.7", "verbosity":"1", "objective":"reg:squarederror", "num_round":"10000"
} # construct a SageMaker XGBoost estimator
# specify the entry_point to your xgboost training script
estimator = XGBoost(entry_point = "", framework_version='1.2-1', # 1.x MUST be used hyperparameters=hyperparameters, role=role, instance_count=1, instance_type='local', output_path=f's3://{bucket}/neo-demo') # gets saved in bucket/neo-demo/job_name/model.tar.gz # define the data type and paths to the training and validation datasets
content_type = "libsvm"
train_input = TrainingInput('file://training_data', content_type=content_type)
validation_input = TrainingInput('file://validation_data', content_type=content_type) # execute the XGBoost training job{'train': train_input, 'validation': validation_input}, logs=['Training'])

When the local training job finishes running (it should only take a few minutes), the next step is to deploy the XGBoost model artifact to a SageMaker endpoint. The Jupyter notebook contains additional information related to why we use the c5 instance family class, along with how the model artifact is saved in Amazon Simple Storage Service (Amazon S3).

  1. Deploy the model artifact by running the following cell:
from sagemaker.xgboost.model import XGBoostModel # grab the model artifact that was written out by the local training job
s3_model_artifact = estimator.latest_training_job.describe()['ModelArtifacts']['S3ModelArtifacts'] # we have to switch from local mode to remote mode
xgboost_model = XGBoostModel( model_data=s3_model_artifact, role=role, entry_point="", framework_version='1.2-1',
) unoptimized_endpoint_name = 'unoptimized-c5' xgboost_model.deploy( initial_instance_count = 1, instance_type='ml.c5.large', endpoint_name=unoptimized_endpoint_name

After the unoptimized model is deployed (the cell has stopped running), we run a Neo compilation job to optimize the model artifact. In the following code, we use the c5 instance type family, choose the XGBoost framework, and include an input shape vector. The input shape is unused by Neo, but the compilation job throws an error if no value is provided. The compilation job also uses the 1.2.1 version of XGBoost by default, which again is why we specified the 1.2-1 framework version during model training.

  1. Run the Neo compilation job with the following code:
job_name = s3_model_artifact.split("/")[-2]
neo_model = xgboost_model.compile( target_instance_family="ml_c5", role=role, input_shape =f'{{"data": [1, {X.shape[1]}]}}', output_path =f's3://{bucket}/neo-demo/{job_name}', # gets saved in bucket/neo-demo/model-ml_c5.tar.gz framework = "xgboost", job_name=job_name # what it shows up as in console

  1. When the cell stops running and the compilation job is complete, we deploy the Neo-optimized model to its own separate SageMaker endpoint:
optimized_endpoint_name = 'neo-optimized-c5' neo_model.deploy( initial_instance_count = 1, instance_type='ml.c5.large', endpoint_name=optimized_endpoint_name

  1. Next, we validate that the endpoints are functioning as expected. When you run the following code block, you should see numerical predictions returned from both endpoints.
import boto3 smr = boto3.client('sagemaker-runtime') resp = smr.invoke_endpoint(EndpointName='neo-optimized-c5', Body=b'2,0.675,0.55,0.175,1.689,0.694,0.371,0.474', ContentType='text/csv')
print('neo-optimized model response: ', resp['Body'].read())
resp = smr.invoke_endpoint(EndpointName='unoptimized-c5', Body=b'2,0.675,0.55,0.175,1.689,0.694,0.371,0.474', ContentType='text/csv')
print('unoptimized model response: ', resp['Body'].read())

With both endpoints up and running, we can create the CloudWatch dashboard that we use to analyze endpoint performance. For this post, we monitor the metrics CPUUtilization, ModelLatency (which measures how long it takes for a model to return a prediction), and Invocations (which helps us monitor the progress of the load test against the endpoints).

  1. Run the following cell to create the dashboard:
import json cw = boto3.client('cloudwatch') dashboard_name = 'NeoDemo'
region = Session().boto_region_name # get region we're currently in body = { "widgets": [ { "type": "metric", "x": 0, "y": 0, "width": 24, "height": 12, "properties": { "metrics": [ [ "AWS/SageMaker", "Invocations", "EndpointName", optimized_endpoint_name, "VariantName", "AllTraffic", { "stat": "Sum", "yAxis": "left" } ], [ "...", unoptimized_endpoint_name, ".", ".", { "stat": "Sum", "yAxis": "left" } ], [ ".", "ModelLatency", ".", ".", ".", "." ], [ "...", optimized_endpoint_name, ".", "." ], [ "/aws/sagemaker/Endpoints", "CPUUtilization", ".", ".", ".", ".", { "yAxis": "right" } ], [ "...", unoptimized_endpoint_name, ".", ".", { "yAxis": "right" } ] ], "view": "timeSeries", "stacked": False, "region": region, "stat": "Average", "period": 60, "title": "Performance Metrics", "start": "-PT1H", "end": "P0D" } } ]
} cw.put_dashboard(DashboardName=dashboard_name, DashboardBody=json.dumps(body)) print('link to dashboard:')

After you run the cell, you can choose the output link to go to the dashboard, but you won’t see any meaningful data plotted just yet.

Now that the dashboard is created, we can proceed with setting up the Serverless Artillery CLI. To do this, we install Node.js, the Serverless Framework, and Serverless Artillery on our SageMaker notebook instance. The cell that installs Node.js can take a long time to run, which is normal.

  1. Run the following cell to install Node.js and the Serverless Framework:
%conda install -c conda-forge nodejs 

!npm install -g serverless@1.80.0 serverless-artillery@0.4.9

Next, we deploy Serverless Artillery. The code first changes directories into the directory that contains the code for our load generating AWS Lambda function. Then it installs the function’s dependencies and uses the Serverless Artillery CLI to package and deploy the load generating function into our account via the Serverless Framework. For more information on what Serverless Artillery is doing under the hood, refer to the Jupyter notebook.

We set up Serverless Artillery to directly hit our SageMaker endpoints with manually signed requests using the AWS Signature Version 4 algorithm. The benefit of this approach is that we get to directly hit and measure the performance of exclusively the endpoints during the load test. If we front our endpoints with intermediary services like a Lambda-backed API Gateway, the load test results capture the performance characteristics of the all three services together rather than just the SageMaker resources.

  1. Deploy Serverless Artillery with the following code:
!cd serverless_artillery && npm install && slsart deploy --stage dev

After running these cells, you should have Node.js version 12.4.0 or higher, Serverless Framework version 1.80.0, and Serverless Artillery version 0.4.9.

The next task is to create the load test definition, which we do by running two cells. The first cell defines a custom magic command, and the second cell creates the load test definition and saves it into script.yaml.

The test definition has six phases, each of which runs 2 minutes in length. The first phase begins with an arrival rate of 20 users per second, meaning that approximately 10 requests are generated and sent to each endpoint every second for two minutes. The next three phases scale by an additional 20 users per second, and the last two phases scale up by 40. Each request contains 125 rows for inference. The Artillery documentation (the tool that Serverless Artillery is based on) is a good resource for learning about the structure and additional features of load test definitions.

  1. Create the load test definition with the following code:
from IPython.core.magic import register_line_cell_magic @register_line_cell_magic
def writefilewithvariables(line, cell): with open(line, 'w') as f: f.write(cell.format(**globals())) # Get region that we're currently in
region = Session().boto_region_name

%%writefilewithvariables script.yaml config: variables: unoptimizedEndpointName: {unoptimized_endpoint_name} # the xgboost model has 10000 trees optimizedEndpointName: {optimized_endpoint_name} # the xgboost model has 10000 trees numRowsInRequest: 125 # Each request to the endpoint contains 125 rows target: 'https://runtime.sagemaker.{region}' phases: - duration: 120 arrivalRate: 20 # 1200 total invocations per minute (600 per endpoint) - duration: 120 arrivalRate: 40 # 2400 total invocations per minute (1200 per endpoint) - duration: 120 arrivalRate: 60 # 3600 total invocations per minute (1800 per endpoint) - duration: 120 arrivalRate: 80 # 4800 invocations per minute (2400 per endpoint... this is the max of the unoptimized endpoint) - duration: 120 arrivalRate: 120 # only the neo endpoint can handle this load... - duration: 120 arrivalRate: 160 processor: './processor.js' scenarios: - flow: - post: url: '/endpoints/{{{{ unoptimizedEndpointName }}}}/invocations' beforeRequest: 'setRequest' - flow: - post: url: '/endpoints/{{{{ optimizedEndpointName }}}}/invocations' beforeRequest: 'setRequest'

With the load test defined, we’re now ready to start it! Because there are six stages with each stage taking 2 minutes, the test runs for a total of 12 minutes. You can monitor the progression of the load test by clicking on the link generated by running the second cell. The link redirects you to the CloudWatch dashboard that you created earlier.

  1. Perform the load test with the following code:
!slsart invoke --stage dev --path script.yaml

print("Here's the link to the dashboard again:")

Review the CloudWatch metrics

After 12 minutes have passed, refresh the dashboard and look at the metrics that have been captured.

The plotted data should look similar to the following screenshot, which has several interesting observations to unpack.

First of all, even at the very beginning of the load test, when both endpoints were only handling about 10 requests per second (RPS), the model latency of the neo-optimized SageMaker endpoint was still almost three times lower than the unoptimized endpoint. This shows you the power of Neo—with one quick compilation job, we unlocked a performance improvement of nearly three times greater in our XGBoost model hosted on SageMaker!

Secondly, by the end of the load test, the ModelLatency metric of the unoptimized model spiked to almost 1.5 seconds per request. The unoptimized model’s CPUUtilization metric also reaches 181%, which is close to the endpoint’s theoretical maximum of 200% given that the ml.c5.large instance type has 2 vCPUs. On the other hand, the optimized endpoint’s ModelLatency metric never crosses 10,000 microseconds, and the CPUUtilization metric stays well below capacity at under 50%. This indicates that the Neo-optimized endpoint could definitely handle even more load if needed, much more than the load test’s maximum of 80 requests per second.

Looking at the following graph, we can also see that the unoptimized endpoint’s performance begins to drastically drop off around the 21:27 timestamp. To get a better idea of what’s going on, deselect the ModelLatency metric for the unoptimized endpoint (the green line) to get the graph of the subsequent image. Upon doing this, you can see that Invocations metrics confirm the story. Up till the 21:27 mark, both endpoints were handling almost the exact same number of requests from the load test (indicated by the blue and orange lines). Past the 21:27 mark when the number requests per second starts to go above 40, the unoptimized endpoint begins to struggle to keep up. This indicates that the maximum load that the unoptimized endpoint can sustain is around 40 RPS.

The load test report generated by Serverless Artillery is also available to us by navigating to CloudWatch in the console, choosing Log groups under Logs, and searching for the log group that has serverless-artillery in its name. If you choose the log group and then choose the most recent log stream, you can see that the last entries comprise of a report that looks similar to the following image. This report’s metrics are an aggregate of the performances of both SageMaker endpoints, so in this case it’s not very useful to us. The one interesting thing to point out is that under the heavier arrival rates, the unoptimized endpoint started to return 400 Status response codes—a sign of it being overwhelmed.

Clean up

With the load test completed and the results analyzed, all that’s left to do is to clean up the deployed resources by running the following two cells. The first cell deletes the two SageMaker endpoints (and their endpoint configurations) that were deployed, and the second cell destroys the Serverless Artillery resources.

# delete endpoints and endpoint configurations sm = boto3.client('sagemaker') for name in [unoptimized_endpoint_name, optimized_endpoint_name]: sm.delete_endpoint(EndpointName=name) sm.delete_endpoint_config(EndpointConfigName=name)

!slsart remove --stage dev

After you run the preceding cells, exit this notebook and stop or delete the notebook instance. To stop the notebook instance, on the SageMaker console, choose Notebook instances, select the NeoBlog notebook, and on the Actions menu, choose Stop.


Congratulations! You have successfully finished walking through this post. We were able to accomplish the following:

  • Optimize an XGBoost model artifact generated through a local training job with a Neo compilation job
  • Deploy both versions of the artifact to SageMaker endpoints
  • Deploy Serverless Artillery from our Jupyter notebook and configure the tool so that it directly invokes our SageMaker endpoints
  • Perform load tests against both endpoints with Serverless Artillery
  • Analyze our load test results and view how the Neo-optimized model outperforms the unoptimized model

The performance improvements gained through Neo can translate to significant cost savings. As a next step, you should look at your existing portfolio of models to evaluate them as potential candidates for optimization jobs. Creating Neo-optimized artifact versions allows you to achieve equivalent (if not better) performance metrics with less powerful resources, and it’s one of the easiest ways to save money on SageMaker endpoints.

Additionally, you can apply the load testing approach demonstrated in this post to any SageMaker endpoint. When used in tandem, Serverless Artillery and CloudWatch combine into a powerful framework for profiling the performance characteristics of your endpoints, which can then help you make data-driven decisions on what resource configurations best fit your needs. Simply deploy your models, update your load test definition, and start testing!

For more information about Neo, see Compile and Deploy Models with Neo. For other topics and services related to SageMaker, check out the AWS Machine Learning Blog.

About the Author

Adam Kozdrowicz is a Data and Machine Learning Engineer for AWS Professional Services. He specializes in bringing ML proof of concepts into production and automating the entire ML lifecycle. This includes data collection, data processing, model development and training, model deployments, and model monitoring. He also enjoys working with frameworks such as AWS Amplify, AWS SAM, and AWS CDK. During his free time, Adam likes to surf, travel, practice photography, and build machine learning models.

Coinsmart. Beste Bitcoin-Börse in Europa


What Waabi’s launch means for the self-driving car industry



Elevate your enterprise data technology and strategy at Transform 2021.

It is not the best of times for self-driving car startups. The past year has seen large tech companies acquire startups that were running out of cash and ride-hailing companies shutter costly self-driving car projects with no prospect of becoming production-ready anytime soon.

Yet, in the midst of this downturn, Waabi, a Toronto-based self-driving car startup, has just come out of stealth with an insane amount of $83.5 million in a Series A funding round led by Khosla Ventures, with additional participation from Uber, 8VC, Radical Ventures, OMERS Ventures, BDC, and Aurora Innovation. The company’s financial backers also include Geoffrey Hinton, Fei-Fei Li, Peter Abbeel, and Sanja Fidler, artificial intelligence scientists with great influence in the academia and applied AI community.

What makes Waabi qualified for such support? According to the company’s press release, Waabi aims to solve the “scale” challenge of self-driving car research and “bring commercially viable self-driving technology to society.” Those are two key challenges of the self-driving car industry and are mentioned numerous times in the release.

What Waabi describes as its “next generation of self-driving technology” has yet to pass the test of time. But its execution plan provides hints at what directions the self-driving car industry could be headed.

Better machine learning algorithms and simulations

According to Waabi’s press release: “The traditional approach to engineering self-driving vehicles results in a software stack that does not take full advantage of the power of AI, and that requires complex and time-consuming manual tuning. This makes scaling costly and technically challenging, especially when it comes to solving for less frequent and more unpredictable driving scenarios.”

Leading self-driving car companies have driven their cars on real roads for millions of miles to train their deep learning models. Real-road training is costly both in terms of logistics and human resources. It is also fraught with legal challenges as the laws surrounding self-driving car tests vary in different jurisdictions. Yet despite all the training, self-driving car technology struggles to handle corner cases, rare situations that are not included in the training data. These mounting challenges speak to the limits of current self-driving car technology.

Here’s how Waabi claims to solve these challenges (emphasis mine): “The company’s breakthrough, AI-first approach, developed by a team of world leading technologists, leverages deep learning, probabilistic inference and complex optimization to create software that is end-to-end trainable, interpretable and capable of very complex reasoning. This, together with a revolutionary closed loop simulator that has an unprecedented level of fidelity, enables testing at scale of both common driving scenarios and safety-critical edge cases. This approach significantly reduces the need to drive testing miles in the real world and results in a safer, more affordable, solution.”

There’s a lot of jargon in there (a lot of which is probably marketing lingo) that needs to be clarified. I reached out to Waabi for more details and will update this post if I hear back from them.

By “AI-first approach,” I suppose they mean that they will put more emphasis on creating better machine learning models and less on complementary technology such as lidars, radars, and mapping data. The benefit of having a software-heavy stack is the very low costs of updating the technology. And there will be a lot of updating in the coming years as scientists continue to find ways to circumvent the limits of self-driving AI.

The combination of “deep learning, probabilistic reasoning, and complex optimization” is interesting, albeit not a breakthrough. Most deep learning systems use non-probabilistic inference. They provide an output, say a category or a predicted value, without giving the level of uncertainty on the result. Probabilistic deep learning, on the other hand, also provides the reliability of its inferences, which can be very useful in critical applications such as driving.

“End-to-end trainable” machine learning models require no manual-engineered features. This means once you have developed the architecture and determined the loss and optimization functions, all you need to do is provide the machine learning model with training examples. Most deep learning models are end-to-end trainable. Some of the more complicated architectures require a combination of hand-engineered features and knowledge along with trainable components.

Finally, “interpretability” and “reasoning” are two of the key challenges of deep learning. Deep neural networks are composed of millions and billions of parameters. This makes it hard to troubleshoot them when something goes wrong (or find problems before something bad happens), which can be a real challenge in critical scenarios such as driving cars. On the other hand, the lack of reasoning power and causal understanding makes it very difficult for deep learning models to handle situations they haven’t seen before.

According to TechCrunch’s coverage of Waabi’s launch, Raquel Urtasan, the company’s CEO, described the AI system the company uses as a “family of algorithms.”

“When combined, the developer can trace back the decision process of the AI system and incorporate prior knowledge so they don’t have to teach the AI system everything from scratch,” TechCrunch wrote.

self-driving car simulation carla

Above: Simulation is an important component of training deep learning models for self-driving cars. (credit: CARLA)

Image Credit: Frontier Developments

The closed-loop simulation environment is a replacement for sending real cars on real roads. In an interview with The Verge, Urtasan said that Waabi can “test the entire system” in simulation. “We can train an entire system to learn in simulation, and we can produce the simulations with an incredible level of fidelity, such that we can really correlate what happens in simulation with what is happening in the real world.”

I’m a bit on the fence on the simulation component. Most self-driving car companies are using simulations as part of the training regime of their deep learning models. But creating simulation environments that are exact replications of the real world is virtually impossible, which is why self-driving car companies continue to use heavy road testing.

Waymo has at least 20 billion miles of simulated driving to go with its 20 million miles of real-road testing, which is a record in the industry. And I’m not sure how a startup with $83.5 million in funding can outmatch the talent, data, compute, and financial resources of a self-driving company with more than a decade of history and the backing of Alphabet, one of the wealthiest companies in the world.

More hints of the system can be found in the work that Urtasan, who is also a professor in the Department of Computer Science at the University of Toronto, does in academic research. Urtasan’s name appears on many papers about autonomous driving. But one in particular, uploaded on the arXiv preprint server in January, is interesting.

Titled “MP3: A Unified Model to Map, Perceive, Predict and Plan,” the paper discusses an approach to self-driving that is very close to the description in Waabi’s launch press release.

MP3 self-driving neural networks probablistic deep learning

Above: MP3 is a deep learning model that uses probabilistic inference to create scenic representations and perform motion planning for self-driving cars.

The researchers describe MP3 as “an end-to-end approach to mapless driving that is interpretable, does not incur any information loss, and reasons about uncertainty in the intermediate representations.” In the paper researchers also discuss the use of “probabilistic spatial layers to model the static and dynamic parts of the environment.”

MP3 is end-to-end trainable and uses lidar input to create scene representations, predict future states, and plan trajectories. The machine learning model obviates the need for finely detailed mapping data that companies like Waymo use in their self-driving vehicles.

Raquel posted a video on her YouTube that provides a brief explanation of how MP3 works. It’s fascinating work, though many researchers will point out that it not so much of a breakthrough as a clever combination of existing techniques.

There’s also a sizeable gap between academic AI research and applied AI. It remains to be seen if MP3 or a variation of it is the model that Waabi is using and how it will perform in practical settings.

A more conservative approach to commercialization

Waabi’s first application will not be passenger cars that you can order with your Lyft or Uber app.

“The team will initially focus on deploying Waabi’s software in logistics, specifically long-haul trucking, an industry where self-driving technology stands to make the biggest and swiftest impact due to a chronic driver shortage and pervasive safety issues,” Waabi’s press release states.

What the release doesn’t mention, however, is that highway settings are an easier problem to solve because they are much more predictable than urban areas. This makes them less prone to edge cases (such as a pedestrian running in front of the car) and easier to simulate. Self-driving trucks can transport cargo between cities, while human drivers take care of delivery inside cities.

With Lyft and Uber failing to launch their own robo-taxi services, and with Waymo still away from turning One, its fully driverless ride-hailing service, into a scalable and profitable business, Waabi’s approach seems to be well thought.

With more complex applications still being beyond reach, we can expect self-driving technology to make inroads into more specialized settings such as trucking and industrial complexes and factories.

Waabi also doesn’t make any mention of a timeline in the press release. This also seems to reflect the failures of the self-driving car industry in the past few years. Top executives of automotive and self-driving car companies have constantly made bold statements and given deadlines about the delivery of fully driverless technology. None of those deadlines have been met.

Whether Waabi becomes independently successful or ends up joining the acquisition portfolio of one of the tech giants, its plan seems to be a reality check on the self-driving car industry. The industry needs companies that can develop and test new technologies without much fanfare, embrace change as they learn from their mistakes, make incremental improvements, and save their cash for a long race.

Ben Dickson is a software engineer and the founder of TechTalks. He writes about technology, business, and politics.

This story originally appeared on Copyright 2021


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Coinsmart. Beste Bitcoin-Börse in Europa

Continue Reading


10 steps to educate your company on AI fairness



Elevate your enterprise data technology and strategy at Transform 2021.

As companies increasingly apply artificial intelligence, they must address concerns about trust.

Here are 10 practical interventions for companies to employ to ensure AI fairness. They include creating an AI fairness charter and implementing training and testing.

Data-driven technologies and artificial intelligence (AI) are powering our world today — from predicting where the next COVID-19 variant will arise, to helping us travel on the most efficient route. In many domains, the general public has a high amount of trust that the algorithms that are powering these experiences are being developed in a fair manner.

However, this trust can be easily broken. For example, consider recruiting software that, due to unrepresentative training data, penalizes applications that contain the word “women”, or a credit-scoring system that misses real-world evidence of credit-worthiness and thus as a result certain groups get lower credit limits or are denied loans.

The reality is that the technology is moving faster than the education and training on AI fairness. The people who train, develop, implement and market these data-driven experiences are often unaware of the second or third-order implications of their hard work.

As part of the World Economic Forum’s Global Future Council on Artificial Intelligence for Humanity, a collective of AI practitioners, researchers and corporate advisors, we propose 10 practical interventions for companies to employ to ensure AI fairness.

1. Assign responsibility for AI education

Assign a chief AI ethics officer (CAIO) who along with a cross-functional ethics board (including representatives from data science, regulatory, public relations, communications and HR) should be responsible for the designing and implementing AI education activities. The CAIO should also be the “ombudsman” for staff to reach out to in case of fairness concerns, as well as a spokesperson to non-technical staff. Ideally this role should report directly to the CEO for visibility and implementation.

2. Define fairness for your organization

Develop an AI fairness charter template and then ask all departments that are actively using AI to complete it in their context. This is particularly relevant for business line managers and product and service owners.

3. Ensure AI fairness along the supply chain

Require suppliers you are using who have AI built into their procured products and services – for instance a recruiting agency who might use AI for candidate screening – to also complete an AI fairness charter and to adhere to company policies on AI fairness. This is particularly relevant for the procurement function and for suppliers.

4. Educate staff and stakeholders through training and a “learn by doing” approach

Require mandatory training and certification for all employees on AI fairness principles – similar to how staff are required to sign up to codes of business conduct. For technical staff, provide training on how to build models that do not violate fairness principles. All trainings should leverage the insights from the AI fairness charters to directly address issues facing the company. Ensure the course content is regularly reviewed by the ethics board.

5. Create an HR AI fairness people plan

An HR AI fairness plan should include a yearly review by HR to assess the diversity of the team working on data-driven technologies and AI, and an explicit review and upgrade of the competencies and skills that are currently advertised for key AI-relevant product development roles (such as product owner, data scientist and data engineer) to ensure awareness of fairness is part of the job description.

6. Test AI fairness before any tech launches

Require departments and suppliers to run and internally publish fairness outcomes tests before any AI algorithm is allowed to go live. Once you know what groups may be unfairly treated due to data bias, simulate users from that group and monitor the results. This can be used by product teams to iterate and improve their product or service before it goes live. Open source tools, such as Microsoft Fairlearn, can help provide the analysis for a fairness outcome test.

7. Communicate your approach to AI fairness

Set up fairness outcomes learning sessions with customer- and public-facing staff to go through the fairness outcomes tests for any new or updated product or service. This is particularly relevant for marketing and external communications, as well as customer service teams.

8. Dedicate a standing item in board meetings to AI fairness processes

This discussion should include the reporting on progress and adherence, themes raised from the chief AI ethics officer and ethics board, and the results of high-priority fairness outcomes tests

9. Make sure the education sticks

Regularly track and report participation and completion of the AI fairness activities, along with the demonstrated impact of managing fairness in terms of real business value. Provide these updates to department and line managers to communicate to staff to reinforce that by making AI platforms and software more fair, the organization is more effective and productive.

10. Document everything

Document your approach to AI fairness and communicate it in staff and supplier trainings and high-profile events, including for customers and investors.

[This story originally appeared on 10 steps to educate your company on AI fairness | World Economic Forum ( Copyright 2021.]

Nadjia Yousif is Managing Director and Partner at Boston Consulting Group and co-leads the Financial Institutions practice for the UK the Netherlands and Belgium.

Mark Minevich is Chair for Artificial Intelligence Policy at the International Research Centre on Artificial Intelligence under the auspices of UNESCO, Jozef Stefan Institute.


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Coinsmart. Beste Bitcoin-Börse in Europa

Continue Reading

Artificial Intelligence

The rise of robotaxis in China



AutoX, Momenta and WeRide took the stage at TC Sessions: Mobility 2021 to discuss the state of robotaxi startups in China and their relationships with local governments in the country.

They also talked about overseas expansion — a common trajectory for China’s top autonomous vehicle startups — and shed light on the challenges and opportunities for foreign AV companies eyeing the massive Chinese market.

Enterprising governments

Worldwide, regulations play a great role in the development of autonomous vehicles. In China, policymaking for autonomous driving is driven from the bottom up rather than a top-down effort by the central government, observed executives from the three Chinese robotaxi startups.

Huan Sun, Europe general manager at Momenta, which is backed by the government of Suzhou, a city near Shanghai, said her company had a “very good experience” working with the municipal governments across multiple cities.

In China, each local government is incentivized to really act like entrepreneurs like us. They are very progressive in developing the local economy… What we feel is that autonomous driving technology can greatly improve and upgrade the [local governments’] economic structure. (Time stamp: 02:56)

Shenzhen, a special economic zone with considerable lawmaking autonomy, is just as progressive in propelling autonomous driving forward, said Jewel Li, chief operation officer at AutoX, which is based in the southern city.

Coinsmart. Beste Bitcoin-Börse in Europa

Continue Reading


Can we afford AI?



Elevate your enterprise data technology and strategy at Transform 2021.

Of all the concerns surrounding artificial intelligence these days — and no, I don’t mean evil robot overlords, but more mundane things like job replacement and security — perhaps none is more overlooked than cost.

This is understandable, considering AI has the potential to lower the cost of doing business in so many ways. But AI is not only expensive to acquire and deploy, it also requires a substantial amount of compute power, storage, and energy to produce worthwhile returns.

Back in 2019, AI pioneer Elliot Turner estimated that training the XLNet natural language system could cost upwards of $245,000 – roughly 512 TPUs running at full capacity for 60 straight hours. And there is no guarantee it will produce usable results. Even a simple task like training an intelligent machine to solve a Rubik’s Cube could draw up to 2.8GW of power, about the hourly output of three nuclear power plants. These are serious — although still debatable — numbers, considering that some estimates claim technology processes will draw more than half of our global energy output by 2030.

Silicon solution

Perhaps no one understands this better than IBM, which has been at the forefront of the AI evolution — with varying degrees of success –thanks to platforms like Watson and Project Debater. The company’s Albany, New York-based research lab has an AI Hardware Center that might be on the verge of unveiling some intriguing results in the drive to reduce the computational demands of training AI and guiding its decision-making processes, according to Tirias Research analyst Kevin Krewell.

A key development is a quad-core test chip recently unveiled at the International Solid-State Circuits Conference (ISSCC). The chip features a hybrid 8-bit floating-point format for training functions and both 2- and 4-bit integer formats for inference, Krewell wrote in a Forbes piece. This would be a significant improvement over the 32-bit floating-point solutions that power current AI solutions, but only if the right software can be developed to produce the same or better results under these lower logic and memory footprints. So far, IBM has been silent on how it intends to do this, although the company has announced that its DEEPTOOLS compiler, which supports AI model development and training, is compatible with the 7nm die.

Qualcomm is also interested in driving greater efficiency in AI models, with a particular focus on Neural Architecture Search (NAS), the means by which intelligent machines map the most efficient network topologies to accomplish a given task. But since Qualcomm’s chips generally have a low power footprint to begin with, its focus is on developing new, more efficient models that work comfortably within existing architectures, even at scale.

All for one

To that end, the company says it has adopted a holistic approach to modeling that stresses the need to shrink multiple axes — like quantization, compression, and compilation — in a coordinated fashion. Since all of these techniques complement each other, researchers must address the efficiency challenge from their unique angle but not so that a change in one area disrupts gains in another.

When applied to NAS, the key challenges are reducing high compute costs, improving scalability, and delivering more accurate hardware performance metrics. Called DONNA (Distilling Optimal Neural Network Architectures), the solution provides a highly scalable means to define network architectures around accuracy, latency, and other requirements and then deploy them in real-world environments. The company is already reporting a 20% speed boost over MobileNetV2 in locating highly accurate architectures on a Samsung S21 smartphone.

Facebook also has a strong interest in fostering greater efficiency in AI. The company recently unveiled a new algorithm called Seer (SElf-supERvised) that reduces the amount of labeling required to make effective use of datasets. The process allows AI to draw accurate conclusions using a smaller set of comparative data. In this way, it can identify, say, a picture of a cat without having to comb through thousands of existing pictures that have already been labeled as cats. This reduces the number of human hours required in training, as well as the overall data footprint required for identification, all of which speeds up the process and lowers overall costs.

Speed, efficiency, and reduced resource consumption have been driving factors in IT for decades, so it’s no surprise that these goals are starting to drive AI development as well. What is surprising is the speed at which this is happening. Traditionally, new technologies are deployed first, leaving things like costs and efficiency as afterthoughts.

It’s a sign of the times that AI is already adopting streamlined architectures and operations as core capabilities before it hits a critical level of scale. Even the most well-heeled companies recognize that the computational requirements of AI are likely to be far greater than anything they’ve encountered before.


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Coinsmart. Beste Bitcoin-Börse in Europa

Continue Reading
Esports3 days ago

Genshin Impact Echoing Conch Locations Guide

Esports5 days ago

All 17 character locations in Collections in Fortnite Chapter 2, season 7

Esports4 days ago

Here are all the milestones in Fortnite Chapter 2, season 7

Blockchain5 days ago

BPI No Longer Allows Crypto Transactions

AR/VR5 days ago

‘Warhammer Age of Sigmar: Tempestfall’ Gets First Look at Gameplay, Invite-only Beta

Blockchain4 days ago

Doge meme Shiba Inu dog to be auctioned off as NFT

Esports4 days ago

What Time Does Minecraft 1.17 Release?

Esports3 days ago

MLB The Show 21 Kitchen Sink 2 Pack: Base Round Revealed

Esports4 days ago

How to Fly UFOs in Fortnite

Blockchain4 days ago

World Economic Forum Releases a DeFi Policy Toolkit for Fair and Executable Regulations

Esports5 days ago

How to unlock the Superman Skin in Fortnite Chapter 2, season 7

Blockchain5 days ago

Donald Trump on Bitcoin: “Essentially It’s a Currency Competing Against the Dollar”

Esports5 days ago

How to get Rick from Rick and Morty in Fortnite Chapter 2, season 7

zephyrnet3 days ago

7th Global Blockchain Congress by Agora Group & TDeFi on June 21st and 22nd, 2021, Dubai.

Aviation3 days ago

The Story Of The Boeing 777 Family

Gaming4 days ago

MUCK: How To Get The Best Weapon | Wyvern Dagger Guide

Blockchain3 days ago

Woonkly will be the official Title Sponsor of the 7th edition Global Blockchain Congress organized by Agora Group in Dubai

Crowdfunding5 days ago

US Government Claws Back Crypto from Ransomware Scam as Feds Flex Growing Tech Muscle

Blockchain3 days ago

Death Cross is Appearing Over Bitcoin Price Chart

Cyber Security5 days ago

How to Learn Cybersecurity On Your Own