How to speed up a Deep Learning Language model by almost 50X at half the cost

Tags: AWS, Deep Learning, Distributed Computing, Hugging Face, NLP

In this blog post, we show how to accelerate fine-tuning the ALBERT language model while also reducing costs by using Determined’s built-in support for distributed training with AWS spot instances.

Sponsored Post.

By Armand McQueen

One of the big headaches in deep learning is that models take forever to train. As an ML engineer, waiting hours or days for training to complete makes iteratively improving your model a slow and frustrating process. You can speed up model training by using more GPUs, but this raises two challenges:

Distributed training is a hassle because it requires changing your model code and dealing with DevOps headaches like server management, cluster scheduling, networking, etc.
Using many GPUs at once can quickly cause your training costs to skyrocket, especially when using on-demand cloud GPUs.

In this blog post, we show how to accelerate fine-tuning the ALBERT language model while also reducing costs by using Determined’s built-in support for distributed training with AWS spot instances. Originally, ALBERT took over 36 hours to train on a single V100 GPU and cost $112 on AWS. With distributed training and spot instances, training the model using 64 V100 GPUs took only 48 minutes and cost only $47! That’s both a 46x performance improvement and a 58% reduction in cost!

Best of all, realizing these performance gains and cost reductions required nothing more than changing a few configuration settings. As we detail in the blog post, switching to distributed training and leveraging spot instances in Determined can be done without changing your model code, without needing to understand the details of using spot instances, and with no manual server wrangling required.

In the full article, we show you how we fine-tuned ALBERT on the SQuAD 2.0 dataset (using the Huggingface implementation), and how to save money by training with Determined using Spot Instances. You can read the full article “ALBERT on Determined: Distributed Training with Spot Instances” on our blog, and see the experiment in the Determined repository here.

To learn more about Determined and how it can help make your training easier, faster and cheaper, check out our GitHub repo or hop on our community Slack.

= Previous post

Top Stories Past 30 Days

Most Popular
A Guide On How To Become A Data Scientist (Step By Step Approach) Data Scientist, Data Engineer & Other Data Careers, Explained Vaex: Pandas but 1000x faster Data Preparation in SQL, with Cheat Sheet! Top Programming Languages and Their Uses

Most Shared
A Guide On How To Become A Data Scientist (Step By Step Approach) Data Scientist, Data Engineer & Other Data Careers, Explained How to Determine if Your Machine Learning Model is Overtrained DeepMind Wants to Reimagine One of the Most Important Algorithms in Machine Learning Essential Linear Algebra for Data Science and Machine Learning

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://www.kdnuggets.com/2021/06/determined-ai-speed-up-deep-learning-language-model.html

Generative Data Intelligence

How to speed up a Deep Learning Language model by almost 50X at half the cost

How to speed up a Deep Learning Language model by almost 50X at half the cost

Top Stories Past 30 Days

Justin Sun Reveals The Perfect Recipe For A Crypto Market Boom – CryptoInfoNet

Top-5 jurisdictions for obtaining Forex license

Latest Intelligence

The Impact of AI on Industry Advancements

Euro edges higher, ECB eyes June cut – MarketPulse

Bitcoin Ordinals Dev Shares Tips for Mining Runes During the Halving—Without Getting Rekt – Decrypt

Brightening Your Business: 5 Solar Marketing Strategies to Grow Your BusinessThe Essentials of Solar Marketing Strategies

Prop Trading Firm True Forex Funds Introduces Match-Trader as Secondary Trading Platform

Navigating XR Adoption: Current Landscape and Strategies for Growth

Chat with us