Connect with us


Reducing training time with Apache MXNet and Horovod on Amazon SageMaker




Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. Amazon SageMaker removes the heavy lifting from each step of the ML process to make it easier to develop high-quality models. As datasets continue to increase in size, additional compute is required to reduce the amount of time it takes to train. One method to scale horizontally and add these additional resources on Amazon SageMaker is through the use of Horovod and Apache MXNet. In this post, we show how you can reduce training time with MXNet and Horovod on Amazon SageMaker. We also demonstrate how to further improve performance with advanced sections on Horovod autotuning, Horovod Timeline, Horovod Fusion, and MXNet optimization.

Distributed training

Distributed training of neural networks for computer vision (CV) and natural language processing (NLP) applications has become ubiquitous. With Apache MXNet, you only need to modify a few lines of code to enable distributed training.

Distributed training allows you to reduce training time by scaling horizontally. The goal is to split training tasks into independent subtasks and run these across multiple devices. There are primarily two approaches for training in parallel:

  • Data parallelism – You distribute the data and share the model across multiple compute resources
  • Model parallelism – You distribute the model and share transformed data across multiple compute resources.

In this post, we focus on data parallelism. Specifically, we discuss how Horovod and MXNet allow you to train efficiently on Amazon SageMaker.

Horovod overview

Horovod is an open-source distributed deep learning framework. It uses efficient inter-GPU and inter-node communication methods such as NVIDIA Collective Communications Library (NCCL) and Message Passing Interface (MPI) to distribute and aggregate model parameters between workers. Horovod makes distributed deep learning fast and easy by using a single-GPU training script and scaling it across many GPUs in parallel. It’s built on top of the ring-allreduce communication protocol. This approach allows each training process (such as a process running on a single GPU device) to talk to its peers and exchange gradients by averaging (called reduction) on a subset of gradients. The following diagram illustrates how ring-allreduce works.

Fig. 1 The ring-allreduce algorithm allows worker nodes to average gradients and disperse them to all nodes without the need for a parameter server ( source)

Apache MXNet is integrated with Horovod through the distributed training APIs defined in Horovod, and you can convert the non-distributed training by following the higher level code skeleton, which we show in this post.

Although this greatly simplifies the process of using Horovod, you must consider other complexities. For example, you may need to install additional software and libraries to resolve your incompatibilities for making distributed training work. Horovod requires a certain version of Open MPI, and if you want to use high-performance training on NVIDIA GPUs, you need to install NCCL libraries. These complexities are amplified when you scale across multiple devices, because you need to make sure all the software and libraries in the new nodes are properly installed and configured. Amazon SageMaker includes all the required libraries to run distributed training with MXNet and Horovod. Prebuilt Amazon SageMaker Docker images come with popular open-source deep learning frameworks and pre-configured CUDA, cuDNN, MPI, and NCCL libraries. Amazon SageMaker manages the difficult process of properly installing and configuring your cluster. Amazon SageMaker and MXNet simplify training with Horovod by managing the complexities to support distributed training at scale.

Test problem and dataset

To benchmark the efficiencies realized by Horovod, we trained the notoriously resource-intensive model architectures Mask-RCNN and Faster-RCNN. These model architectures were first introduced in 2018 and 2016, respectively, and are currently considered the baseline model architectures for two popular CV tasks: instance segmentation (Mask-RCNN) and object detection (Faster-RCNN). Mask-RCNN builds upon Faster-RCNN by adding a mask for segmentation. Apache MXNet provides pre-built Mask-RCNN and Faster-RCNN models as part of the GluonCV model zoo, simplifying the process of training these models.

To train our object detection and instance segmentation models, we used the popular COCO2017 dataset. This dataset provides more than 200,000 images and their corresponding labels. The COCO2017 dataset is considered an industry standard for benchmarking CV models.

GluonCV is a CV toolkit built on top of MXNet. It provides out-of-the-box support for various CV tasks, including data loading and preprocessing for many common algorithms available within its model zoo. It also provides a tutorial on getting the COCO2017 dataset.

To make this process replicable for Amazon SageMaker users, we show an entire end-to-end process for training Mask-RCNN and Faster-RCNN with Horovod and MXNet. To begin, we first open the Jupyter environment in your Amazon SageMaker notebook and use the conda_mxnet_p36 kernel. Next, we install the required Python packages:

! pip install gluoncv
! pip install pycocotools

We use the GluonCV toolkit to download the COCO2017 dataset onto our Amazon SageMaker notebook:

import gluoncv as gcv'',path='./')
#Now to install the dataset. Warning, this may take a while
! python --download-dir data

We upload COCO2017 to the specified Amazon Simple Storage Service (Amazon S3) bucket using the following command:

! aws s3 cp './data/' s3://<INSERT BUCKET NAME>/ --recursive –quiet

Training script with Horovod Support

To use Horovod in your training script, you only need to make a few modifications. For code samples and instructions, see Horovod with MXNet. In addition, many GluonCV models in the model zoo have scripts that already support Horovod out of the box. In this section, we review the key changes required for Horovod to correctly work on Amazon SageMaker with Apache MXNet. The following code follows directly from the Horovod documentation:

import mxnet as mx
import horovod.mxnet as hvd
from mxnet import autograd # Initialize Horovod, this has to be done first as it activates Horovod.
hvd.init() # GPU setup context =[mx.gpu(hvd.local_rank())] #local_rank is the specific gpu on that # instance
num_gpus = hvd.size() #This is how many total GPUs you will be using. #Typically, in your data loader you will want to shard your dataset. For # example, in the script train_sampler = gcv.nn.sampler.SplitSortedBucketSampler(..., num_parts=hvd.size() if args.horovod else 1, part_index=hvd.rank() if args.horovod else 0) #Normally, we would shard the dataset first for Horovod.
val_loader =, len(ctx), ...) #... is for your # other arguments # You build and initialize your model as usual.
model = ... # Fetch and broadcast the parameters.
params = model.collect_params()
if params is not None: hvd.broadcast_parameters(params, root_rank=0) # Create DistributedTrainer, a subclass of gluon.Trainer.
trainer = hvd.DistributedTrainer(params, opt) # Create loss function and train your model as usual. 

Training job configuration

The Amazon SageMaker MXNet estimator class supports Horovod via the distributions parameter. We need to add a predefined mpi parameter with the enabled flag, and define the following additional parameters:

  • processes_per_host (int) – Number of processes MPI should launch on each host. This parameter is usually equal to the number of GPU devices available on any given instance.
  • custom_mpi_options (str) – Any custom mpirun flags passed in this field are added to the mpirun command and run by Amazon SageMaker for Horovod training.

The follow example code initializes the distributions parameters:

distributions = {'mpi': { 'enabled': True, 'processes_per_host': 8, #Each instance has 8 gpus 'custom_mpi_options': '-verbose --NCCL_DEBUG=INFO' } }

Next, we need to configure other parameters of our training job, such as hyperparameters, and the input and output Amazon S3 locations. To do this, we use the MXNet estimator class from the Amazon SageMaker Python SDK:

#Define the basic configuration of your Horovod-enabled Sagemaker training # cluster.
num_instances = 2 # How many nodes you want to use.
instance_family = 'ml.p3dn.24xlarge' # Which instance type you want to use. estimator = MXNet( entry_point=<source_name>.py, #Script entry point. source_dir='./source', #Script Location role=role, train_instance_type=instance_family, train_instance_count=num_instances, framework_version='1.6.0', #MXNet version. train_volume_size=100, #Size for the dataset. py_version='py3', #Python version. hyperparameters=hyperparameters, distributions=distributions #For use with Horovod.

We’re now ready to start our first Horovod-powered training job with the following command: {'data':'s3://' + bucket_name + '/data'} )


We performed these benchmarks on two similar GPU instance types: the p3.16xlarge and the more powerful p3dn.24xlarge. Although both have 8 NVIDIA V100 GPUs, the latter instance is designed with distributed training in mind. In addition to a high-throughput network interface amenable to the inter-node data transfers inherent in distributed training, the p3dn.24xlarge boasts more compute and additional memory over the p3.16xlarge.

We ran benchmarks in three different use cases. In the first and second use cases, we trained the models on a single instance using all 8 local GPUs, to demonstrate the efficiencies gained by using Horovod to manage local training across multiple GPUs. In the third use case, we used Horovod for distributed training across multiple instances, each with 8 local GPUs, to demonstrate the additional efficiency increase by scaling horizontally.

The following table summarizes the time and accuracy for each training scenario.

Model Instance Type 1 Instance, 8 GPUs w/o Horovod 1 Instance, 8 GPUs with Horovod 3 Instances, 8 GPUs with Horovod
Training Time Accuracy Training Time Accuracy Training Time Accuracy
Faster RCNN p3.16xlarge 35 h 47 m 37.6 8 h 26 m 37.5 4 h 58 m 37.4
Faster RCNN p3dn.24xlarge 32 h 24 m 37.5 7 h 27 m 37.5 3 h 37 m 37.3
Mask RCNN p3.16xlarge 45 h 28 m

38.5 (bbox)

34.8 (segm)

10 h 28 m

34.4 (bbox)

31.3 (segm)

5 h 34 m

36.8 (bbox)

33.5 (segm)

Mask RCNN p3dn.24xlarge 40 h 49 m

38.3 (bbox)

34.8 (segm)

8 h 41 m 34.6 (bbox)
31.5 (segm)
4 h 2 m

37.0 (bbox)

33.4 (segm)

Table 1: Training time and accuracy are shown for three different training scenarios.

As expected, when using Horovod to distribute training across multiple instances, the time to convergence is significantly reduced. Additionally, even when training on a single instance, Horovod substantially increases training efficiency when using multiple local GPUs, as compared to the default parameter-server approach. Horovod’s simplified APIs and abstractions enable you to unlock efficiency gains when training across multiple GPUs, both on a single machine or many. For more information about using this approach for scaling batch size and learning rate, see Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour.

With the improvement in training time enabled by Horovod and Amazon SageMaker, you can focus more on improving your algorithms instead of waiting for jobs to finish training. You can train in parallel across multiple instances with marginal impact to mean Average Precision (mAP).

Optimizing Horovod training

Horovod provides several additional utilities that allow you to analyze and optimize training performance.

Horovod autotuning

Finding the optimal combinations of parameters for a given combination of model and cluster size may require several iterations of trial and error.

The autotune feature allows you to automate this trial-and-error activity within a single training job, and uses Bayesian optimization to search through the parameter space for the most performant combination of parameters. Horovod searches for the best combination of parameters in the first cycles of a training job. When it defines the best combination, Horovod writes it in the autotune log and uses this combination for the remainder of the training job. For more information, see Autotune: Automated Performance Tuning.

To enable autotuning and capture the search log, pass the following parameters in your MPI configuration:

{ 'mpi': { 'enabled': True, 'custom_mpi_options': '-x HOROVOD_AUTOTUNE=1 -x HOROVOD_AUTOTUNE_LOG=/opt/ml/output/autotune_log.csv' }

Horovod Timeline

Horovod Timeline is a report available after training completion that captures all activities in the Horovod ring. This is useful to understand which operations are taking the longest and identify optimization opportunities. For more information, see Analyze Performance.

To generate a timeline file, add the following parameters in your MPI command:

{ 'mpi': { 'enabled': True, 'custom_mpi_options': '-x HOROVOD_TIMELINE=/opt/ml/output/timeline.json' }

The /opt/ml/output is a directory with a specific purpose. After the training job is complete, Amazon SageMaker automatically archives all files in this directory and uploads it to an Amazon S3 location that you define in the Python Amazon SageMaker SDK API.

Tensor Fusion

The Tensor Fusion feature allows you to perform batch allreduce operations at training time. This typically results in better overall performance. For more information, see Tensor Fusion. By default, Tensor Fusion is enabled and has a buffer size of 64 MB. You can modify buffer size using a custom MPI flag as follows (for our use case, we override the default 64 MB buffer value with 32 MB):

{ 'mpi': { 'enabled': True, 'custom_mpi_options': '-x HOROVOD_FUSION_THRESHOLD=33554432' }

You can also adjust batch cycles using the HOROVOD_CYCLE_TIME parameter. Cycle time is defined in milliseconds. See the following code:

{ 'mpi': { 'enabled': True, 'custom_mpi_options': '-x HOROVOD_CYCLE_TIME=5' }

Optimizing MXNet models

Another optimization technique is related to optimizing the MXNet model itself. We recommend running the code with os.environ['MXNET_CUDNN_AUTOTUNE_DEFAULT'] = '1'. Then you can copy the best OS environment variables for future training. In our testing, we found the following to be the best results:

os.environ['MXNET_GPU_MEM_POOL_TYPE'] = 'Round'
os.environ['MXNET_GPU_COPY_NTHREADS'] = '1'


In this post, we demonstrated how to reduce training time with Horovod and Apache MXNet on Amazon SageMaker. You can train your model out of the box without worrying about any additional complexities.

For more information about deep learning and MXNet, see the MXNet crash course and Dive into Deep Learning book. You can also get started on the MXNet website and MXNet GitHub examples directory. If you’re new to distributed training and want to dive deeper, we highly recommend reading the paper Horovod: fast and easy distributed deep learning in TensorFlow. If you use the AWS Deep Learning Containers and AWS Deep Learning AMIs, you can learn how to set up this workflow in that environment in our recent post How to run distributed training using Horovod and MXNet on AWS DL containers and AWS Deep Learning AMIs.

About the Authors

Vadim Dabravolski is AI/ML Solutions Architect with FinServe team. He is focused on Computer Vision and NLP technologies and how to apply them to business use cases. After hours Vadim enjoys jogging in NYC boroughs, reading non-fiction (business, history, culture, politics, you name it), and rarely just doing nothing.

Corey Barrett is a Data Scientist in the Amazon ML Solutions Lab. As a member of the ML Solutions Lab, he leverages Machine Learning and Deep Learning to solve critical business problems for AWS customers. Outside of work, you can find him enjoying the outdoors, sipping on scotch, and spending time with his family.

Chaitanya Bapat is a Software Engineer with the AWS Deep Learning team. He works on Apache MXNet and integrating the framework with Amazon Sagemaker, DLC and DLAMI. In his spare time, he loves watching sports and enjoys reading books and learning Spanish.

Karan Jariwala is a Software Development Engineer on the AWS Deep Learning team. His work focuses on training deep neural networks. Outside of work, he enjoys hiking, swimming, and playing tennis.



Executive Interview: Steve Bennett, Director Global Government Practice, SAS 




Steve Bennett of SAS seeks to use AI and analytics to help drive government decision-making, resulting in better outcomes for citizens.   

Using AI and analytics to optimize delivery of government service to citizens  

Steve Bennett is Director of the Global Government Practice at SAS, and is the former director of the US National Biosurveillance Integration Center (NBIC) in the Department of Homeland Security, where he worked for 12 years. The mission of the NBIC was to provide early warning and situational awareness of health threats to the nation. He led a team of over 30 scientists, epidemiologists, public health, and analytics experts. With a PhD in computational biochemistry from Stanford University, and an undergraduate degree in chemistry and biology from Caltech, Bennet has a strong passion for using analytics in government to help make better public better decisions. He recently spent a few minutes with AI Trends Editor John P. Desmond to provide an update of his work.  

AI Trends: How does AI help you facilitate the role of analytics in the government?  

Steve Bennett, Director of Global Government Practice, SAS

Steve Bennett: Well, artificial intelligence is something we’ve been hearing a lot about everywhere, even in government, which can often be a bit slower to adopt or implement new technologies. Yet even in government, AI is a pretty big deal. We talk about analytics and government use of data to drive better government decision-making, better outcomes for citizens. That’s been true for a long time.   

A lot of government data exists in forms that are not easily analyzed using traditional statistical methods or traditional analytics. So AI presents the opportunity to get the sorts of insights from government data that may not be possible using other methods. Many folks in the community are excited about the promise of AI being able to help government unlock the value of government data for its missions.  

Are there any examples you would say that exemplify the work? 

AI is well-suited to certain sorts of problems, like finding anomalies or things that stick out in data, needles in a haystack, if you will. AI can be very good at that. AI can be good at finding patterns in very complex datasets. It can be hard for a human to sift through that data on their own, to spot the things that might require action. AI can help detect those automatically.  

For example, we’ve been partnering with the US Food and Drug Administration to support efforts to keep the food supply safe in the United States. One of the challenges for the FDA as the supply chain has gotten increasingly global, is detecting contamination of food. The FDA often has to be reactive. They have to wait for something to happen or wait for something to get pretty far down the line before they can identify it and take action. We worked with FDA to help them implement AI and apply it to that process, so they can more effectively predict where they might see an increased likelihood of contamination in the supply chain and act proactively instead of reactively. So that’s an example of how AI can be used to help support safer food for Americans. 

In another example, AI is helping with predictive maintenance for government fleets and vehicles. We work quite closely with Lockheed Martin to support predictive maintenance with AI for some of the most advanced airframes in the world, like the C-130 [transport] and the F-35 [combat aircraft]. AI helps to identify problems in very complex machines before those problems cause catastrophic failure. The ability for a machine to tell you before it breaks is something AI can do.   

Another example was around unemployment. We have worked with several cities globally to help them figure out how to best put unemployed people back to work. That is something top of mind now as we see increase unemployment because of Covid. For one city in Europe, we have a goal of getting people back to work in 13 weeks or less. They compiled racial and demographic data on the unemployed such as education, previous work experience, whether they have children, where they live—lots of data.  

They matched that to data about government programs, such as job training requested by specific employers, reskilling, and other programs. We built an AI system using machine learning to optimally match people based on what we knew to the best mix of government programs that would get them back to work the fastest. We are using the technology to optimize the government benefits, The results were good at the outset. They did a pilot prior to the Covid outbreak and saw promising results.    

Another example is around juvenile justice. We worked with a particular US state to help them figure out the best way to combat recidivism among juvenile offenders. They had data on 19,000 cases over many years, all about young people who came into juvenile corrections, served their time there, got out and then came back. They wanted to know how they could lower the recidivism rate. We found we could use machine learning to look at aspects of each of these kids, and figure out which of them might benefit from certain special programs after they leave juvenile corrections, to get skills that reduce the likelihood we would see them back in the system again.  

To be clear, this was not profiling, putting a stigma or mark on these kids. It was trying to figure out how to match limited government programs to the kids who would best benefit from those.   

What are key AI technologies that are being employed in your work today? 

Much of what we talk about having a near-term impact falls into the family of what we call machine learning. Machine learning has this great property of being able to take a lot of training data and being able to learn which parts of that data are important for making predictions or identifying patterns. Based on what we learn from that training data, we can apply that to new data coming in.  

A specialized form of machine learning is deep learning, which is good at automatically detecting things in video streams, such as a car or a person. That relies on deep learning.  We have worked in healthcare to help radiologists do a better job detecting cancer from health scans. Police and defense applications in many cases rely on real time video. The ability to make sense of that video very quickly is greatly enhanced by machine learning and deep learning.  

Another area to mention are real time interaction systems, AI chatbots. We’re seeing governments increasingly seeking to turn to chatbots to help them connect with citizens. If a benefits agency or a tax agency is able to build a system that can automatically interact with citizens, it makes government more responsive to citizens. It’s better than waiting on the phone on hold.   

How far along would you say the government sector is in its use of AI and how does it compare to two years ago? 

The government is certainly further along than it was two years ago. In the data we have looked at, 70% of government managers have expressed interest in using AI to enhance their mission. That signal is stronger than what we saw two years ago. But I would say that we don’t see a lot of enterprise-wide applications of AI in the government. Often AI is used for particular projects or specific applications within an agency to help fulfill its mission. So as AI continues to mature, we would expect it to have more of an enterprise-wide use for large scale agency missions.  

What would you say are the challenges using AI to deliver on analytics in government?  

We see a range of challenges in several categories. One is around data quality and execution. One of the first things an agency needs to figure out is whether they have a problem that is well-suited for AI. Would it show patterns or signals in the data? If so, would the project deliver value for the government?  

A big challenge is data quality. For machine learning to work well requires a lot of examples of a lot of data. It’s a very data-hungry sort of technology. If you don’t have that data or you don’t have access to it, even if you’ve got a great problem that could normally be very well-suited for government, you’re not going to be able to use AI.  

Another problem that we see quite often in governments is that the data exists, but it’s not very well organized. It might exist on spreadsheets on a bunch of individual computers all over the agency. It’s not in a place where it can be all brought together and analyzed in an AI way. So the ability for the data to be brought to bear is really important.   

Another one that’s important. Even if you have all of your data in the right place, and you have a problem very well-suited for AI, it could be that culturally, the agency just isn’t ready to make use of the recommendations coming from an AI system in its day-to-day mission. This might be called a cultural challenge. The people in the agency might not have a lot of trust in the AI systems and what they can do. Or it might be an operational mission where there always needs to be a human in the loop. Either way, sometimes culturally there might be limitations in what an agency is ready to use. And we would advise not to bother with AI if you haven’t thought about whether you can actually use it for something when you’re done. That’s how you get a lot of science projects in government.  

We always advise people to think about what they will get at the end of the AI project, and make sure they are ready to drive the results into the decision-making process. Otherwise, we don’t want to waste time and government resources. You might do something different that you are comfortable using in your decision processes. That’s really important to us.  As an example of what not to do, when I worked in government, I made the mistake of spending two years building an outstanding analytics project, using high-performance modeling and simulation, working in Homeland Security. But we didn’t do a good job working on the cultural side, getting those key stakeholders and senior leaders ready to use it. And so we delivered a great technical solution, but we had a bunch of senior leaders that weren’t ready to use it. We learned the hard way that the cultural piece really does matter. 

We also have challenges around data privacy. Government, more than many industries, touches very sensitive data. And as I mentioned, these methods are very data-hungry, so we often need a lot of data. Government has to make doubly sure that it’s following its own privacy protection laws and regulations, and making sure that we are very careful with citizen data and following all the privacy laws in place in the US. And most countries have privacy regulations in place to protect personal data.  

The second component is a challenge around what government is trying to get the systems to do. AI in retail is used to make recommendations, based on what you have been looking at and what you have bought. An AI algorithm is running in the background. The shopper might not like the recommendation, but the negative consequences of that are pretty mild.   

But in government, you might be using AI or analytics to make decisions with bigger impacts—determining whether somebody gets a tax refund, or whether a benefits claim is approved or denied. The outcomes of these decisions have potentially serious impacts. The stakes are much higher when the algorithms get things wrong. Our advice to government is that for key decisions, there always should be that human-in-the-loop. We would never recommend that a system automatically drives some of these key decisions, particularly if they have potential adverse actions for citizens.   

Finally, the last challenge that comes to mind is the challenge of where the research is going. This idea of “could you” versus “should you.” Artificial intelligence unlocks a whole set of areas that you can use such as facial recognition. Maybe in a Western society with liberal, democratic values, we might decide we shouldn’t use it, even though we could. Places like China in many cities are tracking people in real time using advanced facial recognition. In the US, that’s not in keeping with our values, so we choose not to do that.   

That means any government agency thinking about doing an AI project needs to think about values up front. You want to make sure that those values are explicitly encoded in how the AI project is set up. That way we don’t get results on the other end that are not in keeping with our values or where we want to go.  

You mentioned data bias. Are you doing anything in particular to try to protect against bias in the data? 

Good question. Bias is the real area of concern in any kind of AI machine learning work. The AI machine learning system is going to perform in concert with the way it was trained on the training data. So developers need to be careful in the selection of training data, and the team needs systems in place to review the training data so that it’s not biased. We’ve all heard and read the stories in the news about the facial recognition company in China—they make this great facial recognition system, but they only train it on Asian faces. And so guess what? It’s good at detecting Asian faces, but it’s terrible at detecting faces that are darker in color or that are lighter in color, or that have different facial features.  

We have heard many stories like that. You want to make sure you don’t have racial bias, gender bias, or any other kind of bias we want to avoid in the data training set. Encode those explicitly up front when you’re planning your project; that can go a long way towards helping to limit bias. But even if you’ve done that, you want to make sure you’re checking for bias in a system’s performance. We have many great technologies built into our machine learning tools to help you automatically look for those biases and detect if they are present. You also want to be checking for bias after the system has been deployed, to make sure if something pops up, you see it and can take care of it.  

From your background in bioscience, how well would you say the federal government has done in responding to the COVID-19 virus? 

There really are two industries that bore the brunt, at least initially from the COVID-19 spread: government and health care. In most places in the world, health care is part of government. So it has been a big public sector effort to try to deal with COVID. It’s been hit and miss, with many challenges. No other entity can marshal financial resources like the government, so getting economic support out to those that need is really important. Analytics plays a role in that.  

So one of the things that we did in supporting government using what we’re good at—data and analytics in AI—was to look at how we could help use the data to do a better job responding to COVID. We did a lot of work on the simple side of taking what government data they had and putting it into a simple dashboard that displayed where resources were. That way they could quickly identify if they had to move a supply such as masks to a different location. We worked on a more complex AI system to optimize the use of intensive care beds for a government in Europe that wanted to plan use of its medical resources. 

Contact tracing, the ability to very quickly identify people that are exposed and then identify who they’ve been around so that we can isolate those people, is something that can be greatly supported and enhanced by analytics. And we’ve done a lot of work around how to take contact tracing that’s been used for centuries and make it fit for supporting COVID-19 work. The government can do a lot with its data, with analytics and with AI in the fight against COVID-19. 

Do you have any advice for young people, either in school now or early in their careers, for what they should study if they are interested in pursuing work in AI, and especially if they’re interested in working in the government? 

If you are interested in getting into AI, I would suggest two things to focus on. One would be the technical side. If you have a solid understanding of how to implement and use AI, and you’ve built experience doing it as part of your coursework or part of your research work in school, you are highly valuable to government. Many people know a little about AI; they may have taken some business courses on it. But if you have the technical chops to be able to implement it, and you have a passion for doing that inside of government, you will be highly valuable. There would not be a lot of people like you. 

Just as important as the AI side and the data science technical piece, I would highly advise students to work on storytelling. AI can be highly technical when you get into the details. If you’re going to talk to a government or agency leader or an elected official, you will lose them if you can’t quickly tie the value of artificial intelligence to their mission. We call them ‘unicorns’ in SAS, people that have high technical ability and a detailed understanding of how these models can help government, and they have the ability to tell good stories and draw that line to the “so what?” How can a senior agency official in government, how can they use it? How is it helpful to them? 

To work on good presentation skills and practice them is just as important as the technical side. You will find yourself very influential and able to make a difference if you’ve got a good balance of those skills. That’s my view.  

I would also say, in terms of where you specialize technically, being able to converse in SAS has been recently ranked as one of the most highly valued jobs skills. The specific aspects of those technical pieces that can be very, very marketable to you inside and outside of government. 

Learn more about Steve Bennett on the SAS Blog. 


Continue Reading


Getting AI to Learn Like a Baby is Goal of Self-Supervised Learning 




Scientists are studying how to create AI systems that learn from self-supervision, akin to how babies learn from observing their environment. (Credit: Getty Images) 

By AI Trends Staff  

Scientists are working on creating better AI that learns through self-supervision, with the pinnacle being AI that could learn like a baby, based on observation of its environment and interaction with people.  

This would be an important advance because AI has limitations based on the volume of data required to train machine learning algorithms, and the brittleness of the algorithms when it comes to adjusting to changing circumstances. 

Yann LeCun, chief AI scientist at Facebook

“This is the single most important problem to solve in AI today,” stated Yann LeCun, chief AI scientist at Facebook, in an account in the Wall Street Journal. Some early success with self-supervised learning has been seen in the natural language processing used in mobile phones, smart speakers, and customer service bots.   

Training AI today is time-consuming and expensive. The promise of self-supervised learning is for AI to train itself without the need for external labels attached to the data. Dr. LeCun is now focused on applying self-supervised learning to computer vision, a more complex problem in which computers interpret images such as a person’s face.  

The next phase, which he thinks is possible in the next decade or two, is to create a machine that can “learn how the world works by watching video, listening to audio, and reading text,” he stated. 

More than one approach is being tried to help AI learn by itself. One is the neuro-symbolic approach, which combines deep learning and symbolic AI, which represents human knowledge explicitly as facts and rules. IBM is experimenting with this approach in its development of a bot that works alongside human engineers, reading computer logs to look for system failure, understand why a system crashed and offer a remedy. This could increase the pace of scientific discovery, with its ability to spot patterns not otherwise evident, according to Dario Gil, director of IBM Research. “It would help us address huge problems, such as climate change and developing vaccines,” he stated. 

Child Psychologists Working with Computer Scientists on MESS  

DARPA is working with the University of California at Berkeley on a research project, Machine Common Sense, funding collaborations between child psychologists and computer scientists. The system is called MESS, for Model-Building, Exploratory, Social Learning System.   

Alison Gopnik, Professor of Psychology, University of California, Berkeley and the author of “The Philosophical Baby”

“Human babies are the best learners in the universe. How do they do it? And could we get an AI to do the same?,” queried Alison Gopnik, a professor of psychology at Berkeley and the author of “The Philosophical Baby” and “The Scientist in the Crib,” among other books, in a recent article she wrote for the Wall Street Journal.  

“Even with a lot of supervised data, AIs can’t make the same kinds of generalizations that human children can,” Gopnik said. “Their knowledge is much narrower and more limited, and they are easily fooled. Current AIs are like children with super-helicopter-tiger moms—programs that hover over the learner dictating whether it is right or wrong at every step. The helicoptered AI children can be very good at learning to do specific things well, but they fall apart when it comes to resilience and creativity. A small change in the learning problem means that they have to start all over again.” 

The scientists are also experimenting with AI that is motivated by curiosity, which leads to a more resilient learning style, called “active learning” and is a frontier in AI research.  

The challenge of the DARPA Machine Common Sense program is to design an AI that understands the basic features of the world as well as an 18-month-old. “Some computer scientists are trying to build common sense models into the AIs, though this isn’t easy. But it is even harder to design an AI that can actually learn those models the way that children do,” Dr. Gopnik wrote. “Hybrid systems that combine models with machine learning are one of the most exciting developments at the cutting edge of current AI.” 

Training AI models on labeled datasets is likely to play a diminished role as self-supervised learning comes into wider use, LeCun said during a session at the virtual International Conference on Learning Representation (ICLR) 2020, which also included Turing Award winner and Canadian computer scientist Yoshua Bengio.  

The way that self-supervised learning algorithms generate labels from data by exposing relationships between the data’s parts is an advantage.   

“Most of what we learn as humans and most of what animals learn is in a self-supervised mode, not a reinforcement mode. It’s basically observing the world and interacting with it a little bit, mostly by observation in a test-independent way,” stated LeCun, in an account from VentureBeat “This is the type of [learning] that we don’t know how to reproduce with machines.” 

Bengio was optimistic about the potential for AI to gain from the field of neuroscience, in particular for its explorations of consciousness and conscious processing. Bengio predicted that new studies will clarify the way high-level semantic variables connect with how the brain processes information, including visual information. These variables that humans communicate using language could lead to an entirely new generation of deep learning models, he suggested. 

“There’s a lot of progress that could be achieved by bringing together things like grounded language learning, where we’re jointly trying to understand a model of the world and how high-level concepts are related to each other,” said Bengio“Human conscious processing is exploiting assumptions about how the world might change, which can be conveniently implemented as a high-level representation.”  

Bengio Delivered NeurIPS 2019 Talk on System 2 Self-Supervised Models 

At the 2019 Conference on Neural Information Processing Systems (NeurIPS 2019), Bengio spoke on this topic in a keynote speech entitled,  “From System 1 Deep Learning to System 2 Deep Learning,” with System 2 referring to self-supervised models.  

“We want to have machines that understand the world, that build good world models, that understand cause and effect, and can act in the world to acquire knowledge,” he said in an account in TechTalks.  

The intelligent systems should be able to generalize to different distributions in data, just as children learn to adapt as the environment changes around them. “We need systems that can handle those changes and do continual learning, lifelong learning, and so on,” Bengio stated. “This is a long-standing goal for machine learning, but we haven’t yet built a solution to this.”  

Read the source articles in the Wall Street Journal, Alison for the Wall Street Journal, in VentureBeat and in TechTalks. 


Continue Reading


Support for Remote Workers Providing Extra Boost for Conversational AI




Since the coronavirus hit in mid-March and the number of remote workers skyrocketed, conversational AI is being employed in a support role. (Credit: Getty Images) 

By AI Trends Staff 

Conversational AI refers to the use of chatbots, messaging apps, and voice-based assistants to automate customer communications with a brand.   

Software that combines these features to carry on a human-like conversation might be called a “bot.” The term “chatbot” might refer to text-only bots. Amazon Alexa or Google Home virtual assistants use conversational AI; they learn about the customer and the customer learns about them. With deep learning underlying the interaction, the conversation experience should improve over time.  

The advantages of conversational AI in marketing include an instant response, which leads to higher conversion rates of queries to sales.  

Shane Barker, digital marketing consultant, cofounder of Attrock

The adoption of conversational AI is being fueled by the rise in use of messaging apps and voice-based assistants, according to an account from the site of Shane Barker, a digital marketing consultant and cofounder of Attrock, a digital marketing agency.  

The most popular messaging app, according to Statista, is WhatsApp, from a US startup now owned by Facebook, with over 1.6 billion users. That is followed by: Facebook Messenger with 1.3 billion users; WeChat, developed by TenCent of China, with 1.1 billion users;  QQMobile, also from Tencent, with 800 million users; Snapchat from Snap, Inc. of the US, with 314 million users; and Telegram from Telegram Messenger, founded in Russia in 2013 on the macOS and released on Android in May of this year, with 200 million users.  

“If you are not using conversational AI platforms yet, you should start now,” advised Barker. 

The conversations could be text-based or audio-based, and can be done on any messaging or voice-based communication platform. While conversational AI is the technology behind chatbots and voice-based assistants, it is not synonymous with either. You can use a messaging service, a website chatbot or a voice-based assistant, and use conversational AI to automate conversations on it, Barker advises. 

How Conversational AI Can Help Your Business 

Some conversational AI technologies are advanced enough to understand the context and personalize the conversations. User-friendly chatbots can generate leads and help drive sales. The first and most common use of conversational AI is to provide around-the-clock customer service. The bot can answer commonly-asked customer questions, resolve problems and point to solutions. The user company can build a customized database of information that can feed the conversational AI platform to make it more accurate.   

A website chatbot can interact with users and direct them to the right pages, products, or services — basically leading them down the sales funnel. The bot can also drive conversions by cross-selling or up-selling products. The bot can be trained to suggest complementary or higher-value products. The platform can also deliver offers and promotions to customers.  

As far as lead generation is concerned, conversational AI-based chatbots can schedule appointments and collect email addresses during non-working hours. You can then pass that information on to your sales team, who can then nurture those leads.  

Among the conversational AI platforms recommended by Barker are:   

  • LivePerson from LivePerson of New York City, with an AI offering released in 2018 from the company founded in 1998; 
  • SAP Conversational AI from SAP, the German multinational software company; 
  • KAI from Kasisto of New York City, founded in 2013;  
  • MindMeld now from Cisco Systems, founded in 2011 and acquired in 2017; 
  • Mindsay from Mindsay, headquartered in Paris; founded in 2016.

iAdvize Taps Network of Freelance Experts for Customer Service  

Another player is iAdvize, founded in France in 2010, offering a chat tool focused on customer service. Today iAdvize is a leading conversational platform in Europe and is now expanding in the US. The company says the tool is currently being used by over 2,000 e-commerce websites worldwide including Samsung, Disney and Lowe’s. 

The platform uses AI to identify each customer’s needs and connects them to a mix of in-store associates, in-house agents, chatbots and on-demand product experts from ibbu. Founded by iAdvize in 2016, ibbu today uses over 20,000 knowledgeable product experts from around the world who chat with customers and are paid for the advice.   

The freelancers are vetted to be experts in electronics, home improvement, sporting goods, hobbies, and other product segments. They get paid a percentage of sales they generate. Ibbu experts the company says have conducted over 1 million conversations with iAdvize’s e-commerce customers. 

Customers using iAdvize have seen an increase in online sales of 5% to 15%, according to the company. iAdvize was co-founded by Julien Hervouet, now the CEO. He stated in a press release on the announcement of ibbu in the UK in 2016, “We believe the future of marketing is conversational commerce, where brands use genuine fans to improve the customer’s experience of the brand.” 

How Adobe Used an AI Chatbot to Support 22,000 Remote Workers  

Cynthia Stoddard, Senior VP and CIO at Adobe

When the COVID-19 virus hit in March throughout the US, Adobe like many companies sent their workers home and shifted into remote work over a single weekend. “Not surprisingly, our existing processes and workflows weren’t equipped for this abrupt change,” stated Cynthia Stoddard, Senior VP and CIO at Adobe, in a written account published in VentureBeat. “Customers, employees, and partners — many also working at home — couldn’t wait days to receive answers to urgent questions.” 

The first step was to launch an organization-wide channel using Slack, a business communications platform from Slack Technologies, launched in 2013 in San Francisco. The 24×7 global IT help desk would support the channel, with the rest of IT available for rapid event escalation. 

The same questions and issues came up frequently. “We decided to optimize our support for frequently asked questions and issues,” Stoddard stated. They combined AI, machine learning and natural language processing to build a chatbot. Its answers could be as simple as directing employees to an existing knowledge base or FAQ, or walking them through steps to solve a problem. The team focused on the eight most frequently-reported topics, then continued to add capabilities based on what delivers the biggest benefits.  

“The results have been remarkable,” she wrote. Since going live on April 14, the system has responded to more than 3,000 queries and has noticed improvement in some critical issues. For example, more employees are seeking IT support through email. It was important to speed the turnaround time on these queries.  

With the help of a deep learning and NLP based routing mechanism, 38% of email tickets are now automatically routed to the correct support queue within six minutes,” she stated. “The AI routing bot uses a neural network-based classification technique to sort email tickets into classes, or support queues. Based on the predicted classification, the ticket is automatically assigned to the correct support queue.” 

The average time required to dispatch and route email tickets has been reduced by the AI chatbot from about 10 hours to less than 20 minutes. Continuous supervised training on the bot has helped Adobe achieve 97% accuracy, nearly on a par with a human expert. Call volumes for internal support have dropped by 35% as a result.  

The neural network model is retrained every two weeks by adding new data from resolved tickets to the training set. They leveraged the work done for a company chatbot for finance. Adobe continues to look at robotic process automation, to explore business improvements through the combination of autonomous software robots and AI.   

Keeping employees in the loop about the AI and chatbot technology being employed is critical. “When introducing a new/unknown technology tool, it’s critical to keep employee experience at the core of the training and integration process – to ensure they feel comfortable and confident with the change,” Stoddard wrote. 

Read the source articles from the sites of Shane Barker and Statista, from the website of  iAdvize and in VentureBeat 


Continue Reading
Blockchain1 hour ago

Seoul Police Summons Bithumb Chairman For Interrogation

CNBC6 hours ago

Deezer’s country selector lets you listen to music and podcasts like a local

Startups7 hours ago

Unity Software has strong opening, gaining 31% after pricing above its raised range

Techcrunch7 hours ago

Conan O’Brien on how to embrace an ever-changing media landscape

Techcrunch8 hours ago

Chamath launches SPAC, SPAC and SPAC as he SPACs the world with SPACs

CNBC8 hours ago

NBCU’s new deal with Roku will bring Peacock to the streaming platform

Techcrunch8 hours ago

Daily Crunch: Partial US TikTok ban is imminent

CNBC8 hours ago

What we know about Trump’s ‘ban’ on TikTok and WeChat

SaaS9 hours ago

SaaS Ventures takes the investment road less traveled

CNBC9 hours ago

The Wayback Machine and Cloudflare team up to keep websites online

CNBC10 hours ago

Canadian police charged a Tesla owner for sleeping while driving

Visual Capitalist10 hours ago

Basic Income Experiments Around the World

Author profile picture
Publications11 hours ago

These Full-stack Projects Are a Must for Your 2020 Portfolio 🤯

Author profile picture
Publications11 hours ago

How To Use Prometheus Adapter to Autoscale Custom Metrics Deployments

CNBC12 hours ago

The best deals we found this week: $50 off AirPods Pro and more

Automotive12 hours ago

New Nissan Z Not Using Adaptive Steering, Goal Is A ‘Connected’ Feel

CNBC13 hours ago

‘Cyberpunk 2077’ won’t require a high-end gaming rig

Automotive13 hours ago

ALYI EV Battery And Charging Station New Details Out Next Tuesday

SaaS13 hours ago

The 6 Best Digital Strategy Companies of 2020

CNBC13 hours ago

The latest Chromecast leak shows remote in full detail

CNBC13 hours ago

The 8th-generation iPad is already $30 off at Walmart

AR/VR14 hours ago

How Augmented Reality is changing Retail

AR/VR14 hours ago

Benefits & Use Cases of Augmented and Virtual Reality in Education

AR/VR14 hours ago

Facebook Lowers Price of Enterprise-focused Quest to $800

CNBC14 hours ago

DC Universe will become a comics-only service on January 21st

Big Data14 hours ago

Automating Every Aspect of Your Python Project

AR/VR14 hours ago

Felix & Paul Studios’ Space Explorers is Going Travelling as ∞INFINITY: Living Among Stars

AR/VR15 hours ago

The VR Game Launch Roundup: A Bumper Sept Lineup

Cyber Security16 hours ago

Deepfake Detection Poses Problematic Technology Race

Publications16 hours ago

The Shift Summit & Music Festival, September 18-21, 2020

Big Data16 hours ago

What is Simpson’s Paradox and How to Automatically Detect it

Cyber Security16 hours ago

Mitigating Cyber-Risk While We’re (Still) Working from Home

AR/VR16 hours ago

Get Front-Row Access to TIDAL Concerts in Oculus Venues in 2020

Blockchain17 hours ago

Binance US Joins Chicago Defi Alliance For Defi Industry Development

Coinpedia17 hours ago

JZZ Technologies, Inc. Taps Lucrative Market with its Active Lifestyle Publication Delivered to 12 Million Seniors

SaaS17 hours ago

How to Find Any Business Email Address

Esports17 hours ago

U.S. Department of Commerce Announces Sanctions Against TikTok, WeChat

Blockchain17 hours ago

Nvidia Signs Definitive Agreement With SoftbankGroup Corp. To Acquire ARM

Blockchain17 hours ago

Everything you need to know about Blockchain Programming

Coinpedia17 hours ago

Top DeFi Gainers: UNI 50%, SUSHI 25%, MKR 4%