XGBoost is a popular and efficient machine learning (ML) algorithm for regression and classification tasks on tabular datasets. It implements a technique known as gradient boosting on trees and performs remarkably well in ML competitions.
Since its launch, Amazon SageMaker has supported XGBoost as a built-in managed algorithm. For more information, see Simplify machine learning with XGBoost and Amazon SageMaker. As of this writing, you can take advantage of the open-source Amazon SageMaker XGBoost container, which has improved flexibility, scalability, extensibility, and Managed Spot Training. For more information, see the Amazon SageMaker sample notebooks and sagemaker-xgboost-container on GitHub, or see XBoost Algorithm.
This post introduces the benefits of the open-source XGBoost algorithm container and presents three use cases.
Benefits of the open-source SageMaker XGBoost container
The new XGBoost container has following benefits:
The open-source XGBoost container supports the latest XGBoost 1.0 release and all improvements, including better performance scaling on multi-core instances and improved stability for distributed training.
With the new script mode, you can now customize or use your own training script. This functionality, which is also available for TensorFlow, MXNet, PyTorch, and Chainer users, allows you to add in custom pre- or post-processing logic, run additional steps after the training process, or take advantage of the full range of XGBoost functions (such as cross-validation support). You can still use the no-script algorithm mode (like other Amazon SageMaker built-in algorithms), which only requires you to specify a data location and hyperparameters.
The open-source container has a more efficient implementation of distributed training, which allows it to scale out to more instances and reduces out-of-memory errors.
Because the container is open source, you can extend, fork, or modify the algorithm to suit your needs, beyond using the script mode. This includes installing additional libraries and changing the underlying version of XGBoost.
Managed Spot Training
You can save up to 90% on your Amazon SageMaker XGBoost training jobs with Managed Spot Training support. This fully managed option lets you take advantage of unused compute capacity in the AWS Cloud. Amazon SageMaker manages the Spot Instances on your behalf so you don’t have to worry about polling for capacity. The new version of XGBoost automatically manages checkpoints for you to make sure your job finishes reliably. For more information, see Managed Spot Training in Amazon SageMaker and Use Checkpoints in Amazon SageMaker.
Additional input formats
XGBoost now includes support for Parquet and Recordio-protobuf input formats. Parquet is a standardized, open-source, self-describing columnar storage format for use in data analysis systems. Recordio-protobuf is a common binary data format used across Amazon SageMaker for various algorithms, which XGBoost now supports for training and inference. For more information, see Common Data Formats for Training. Additionally, this container supports Pipe mode training for these data formats. For more information, see Using Pipe input mode for Amazon SageMaker algorithms.
Using the latest XGBoost container as a built-in algorithm
As an existing Amazon SageMaker XGBoost user, you can take advantage of the new features and improved performance by specifying the version when you create your training jobs. For more information about getting started with XGBoost or using the latest version, see the GitHub repo.
You can upgrade to the new container by specifying the framework version (
1.0-1). This version specifies the upstream XGBoost framework version (
1.0) and an additional Amazon SageMaker version (
1). If you have an existing XGBoost workflow based on the legacy
0.72 container, this is the only change necessary to get the same workflow working with this container. The container also supports XGBoost
0.90 by using version as
See the following code:
Using managed Spot Instances
You can also take advantage of managed Spot Instance support by enabling the
train_use_spot_instances flag on your
Estimator. For more information, see the GitHub repo.
When you are training with managed Spot Instances, the training job may be interrupted, which causes it to take longer to start or finish. If a training job is interrupted, you can use a checkpointed snapshot to resume from a previously saved point, which can save training time (and cost). You can also use the
checkpoint_s3_uri, which is where your training job stores snapshots, to seamlessly resume when a Spot Instance is interrupted. See the following code:
Towards the end of the job, you should see the following two lines of output:
- Training seconds: X – The actual compute time your training job
- Billable seconds: Y – The time you are billed for after you apply Spot discounting
If you enabled
train_use_spot_instances, you should see a notable difference between
Y, which signifies the cost savings from using Managed Spot Training. This is reflected in the following code:
Using script mode
Script mode is a new feature with the open-source Amazon SageMaker XGBoost container. You can use your own training or hosting script to fully customize the XGBoost training or inference workflow. The following code example is a walkthrough of using a customized training script in script mode. For more information, see the GitHub repo.
Preparing the entry-point script
A typical training script loads data from the input channels, configures training with hyperparameters, trains a model, and saves a model to
model_dir so it can be hosted later. Hyperparameters are passed to your script as arguments and can be retrieved with an
Starting with the main guard, use a parser to read the hyperparameters passed to your Amazon SageMaker estimator when creating the training job. These hyperparameters are made available as arguments to your input script. You also parse several Amazon SageMaker-specific environment variables to get information about the training environment, such as the location of input data and where to save the model. See the following code:
Inside the entry-point script, you can optionally customize the inference experience when you use Amazon SageMaker hosting or batch transform. You can customize the following:
- input_fn() – How the input is handled
- predict_fn() – How the XGBoost model is invoked
- output_fn() – How the response is returned
The defaults work for this use case, so you don’t need to define them.
Training with the Amazon SageMaker XGBoost estimator
After you prepare your training data and script, the
XGBoost estimator class in the Amazon SageMaker Python SDK allows you to run that script as a training job on the Amazon SageMaker managed training infrastructure. You also pass the estimator your IAM role, the type of instance you want to use, and a dictionary of the hyperparameters that you want to pass to your script. See the following code:
Deploying the custom XGBoost model
After you train the model, you can use the estimator to create an Amazon SageMaker endpoint—a hosted and managed prediction service that you can use to perform inference. See the following code:
Training with Parquet input
You can now train the latest XGBoost algorithm with Parquet-formatted files or streams directly by using the Amazon SageMaker supported open-sourced ML-IO library. ML-IO is a high-performance data access library for ML frameworks with support for multiple data formats, and is installed by default on the latest XGBoost container. For more information about importing a Parquet file and training with it, see the GitHub repo.
The open-source XGBoost container for Amazon SageMaker provides a fully managed experience and additional benefits that save you money in training and allow for more flexibility.
About the Authors
Rahul Iyer is a Software Development Manager at AWS AI. He leads the Framework Algorithms team, building and optimizing machine learning frameworks like XGBoost and Scikit-learn. Outside work, he enjoys nature photography and cherishes time with his family.
Rocky Zhang is a Senior Product Manager at AWS SageMaker. He builds products that help customers solve real world business problems with Machine Learning. Outside of work he spends most of his time watching, playing, and coaching soccer.
Eric Kim is an engineer in the Algorithms & Platforms Group of Amazon AI. He helps support the AWS service SageMaker, and has experience in machine learning research, development, and application. Outside of work, he is a music lover and a fan of all dogs.
Laurence Rouesnel is a Senior Manager in Amazon AI. He leads teams of engineers and scientists working on deep learning and machine learning research and products, like SageMaker AutoPilot and Algorithms. In his spare time he’s an avid fan of traveling, table-top RPGs, and running.
Using embedded analytics in software applications can drive your business forward
Analytics in your tools can help users gain insights that can help move your clients and the organization to the next level.
More than two years ago, Edsby, which provides a learning management system for educational institutions, began embedding analytics into its software that enabled teachers and administrators to detect student learning trends, assess test scores across student populations, and more, all in the spirit of improving education results.
The Edsby example is not an isolated event. Increasingly, commercial and company in-house software developers are being asked to deliver more value with their applications. In other words, don’t just write applications that process transactions; tell us about the trends and insights transactions reveal by embedding analytics as part of the application.
“Software teams are responsible for building applications with embedded analytics that help their end users make better decisions,” said Steve Schneider, CEO of Logi Analytics, which provides embedded analytics tools for software developers.” This is the idea of providing high-level analytics in the context of an application that people use every day.”
SEE: Microservices: A cheat sheet (free PDF) (TechRepublic)
Schneider said what users want is transactional apps with built-in analytics capabilities that can provide insights to a variety of users with different interests and skill sets. “These are highly sophisticated analytics that must be accessible right from the application,” he said.
With the help of pick-and-click tools, transaction application developers are spared the time of having to learn how to embed analytics from the ground up in their apps. Instead, they can choose to embed an analytics dashboard into their application, or they can quickly orchestrate an API call to another application without a need to custom develop all of the code.
“You can just click on the Embed command, and the tool will give you a Java script,” Schneider said. “In some cases, you have to do a little configuration for security, but it makes it much easier to get analytics-enriched apps to your user market faster.”
Getting apps to market faster
Here’s how an embedded analytics tool can speed apps to market.
A marketing person is tasked with buying ads and organizing campaigns. He or she gathers information and feeds it to IT, which periodically issues reports that show the results of ad placements and campaigns.
SEE: How to overcome business continuity challenges (free PDF) (TechRepublic)
Now with an application that contains embedded analytics, the marketing person can directly drill down into the reporting information embedded in the app without having to contact IT. This can be done through a self-service interface in real time.
“In one case, a manufacturer was trying to improve operational performance through the use of an application and set of stated metrics,” Schneider said. “Everyone had to log in to the application to record their metrics, but the overall goal of improving performance remained elusive. The manufacturer decided to augment the original application with an embedded analytics dashboard that displayed the key metrics and each team’s performance. This provided visibility to everyone. This quickly evolved into a friendly competition between different groups of employees to see who could achieve the best scores, and the overall corporate metrics performance improved.”
For most developers, embedding analytics in applications is still in early stages—but embedded analytics in apps is an area that is poised to expand, and that at some point will be able to incorporate both structured and unstructured data in in-app visualizations.
Best practices for embedded analytics
Companies and commercial enterprises interested in using embedded analytics in transactional applications should consider these two best practices:
- Think about the users of your application and the problems that they’re trying to solve
This begins with asking users what information they need in order to be successful. “Application developers can also benefit if they think more like product managers,” Schneider said. In other words, what can I do with embedded analytics in my application to truly delight my customer—even if it is the user next door in accounting who I see every day?
2. Start simple
If you haven’t used embedded analytics in applications before, choose a relatively easy-to-achieve objective for your first app and work with a cooperative user. By building a series of successful and high usable apps from the start, you instill confidence in this new style of application. At the same time, you can be defining and standardizing your embedded app development methodology in IT.
China and AI: What the World Can Learn and What It Should Be Wary of
China announced in 2017 its ambition to become the world leader in artificial intelligence (AI) by 2030. While the US still leads in absolute terms, China appears to be making more rapid progress than either the US or the EU, and central and local government spending on AI in China is estimated to be in the tens of billions of dollars.
The move has led—at least in the West—to warnings of a global AI arms race and concerns about the growing reach of China’s authoritarian surveillance state. But treating China as a “villain” in this way is both overly simplistic and potentially costly. While there are undoubtedly aspects of the Chinese government’s approach to AI that are highly concerning and rightly should be condemned, it’s important that this does not cloud all analysis of China’s AI innovation.
The world needs to engage seriously with China’s AI development and take a closer look at what’s really going on. The story is complex and it’s important to highlight where China is making promising advances in useful AI applications and to challenge common misconceptions, as well as to caution against problematic uses.
China’s approach to AI development and implementation is fast-paced and pragmatic, oriented towards finding applications which can help solve real-world problems. Rapid progress is being made in the field of healthcare, for example, as China grapples with providing easy access to affordable and high-quality services for its aging population.
Applications include “AI doctor” chatbots, which help to connect communities in remote areas with experienced consultants via telemedicine; machine learning to speed up pharmaceutical research; and the use of deep learning for medical image processing, which can help with the early detection of cancer and other diseases.
Since the outbreak of Covid-19, medical AI applications have surged as Chinese researchers and tech companies have rushed to try and combat the virus by speeding up screening, diagnosis, and new drug development. AI tools used in Wuhan, China, to tackle Covid-19 by helping accelerate CT scan diagnosis are now being used in Italy and have been also offered to the NHS in the UK.
But there are also elements of China’s use of AI that are seriously concerning. Positive advances in practical AI applications that are benefiting citizens and society don’t detract from the fact that China’s authoritarian government is also using AI and citizens’ data in ways that violate privacy and civil liberties.
Most disturbingly, reports and leaked documents have revealed the government’s use of facial recognition technologies to enable the surveillance and detention of Muslim ethnic minorities in China’s Xinjiang province.
The emergence of opaque social governance systems that lack accountability mechanisms are also a cause for concern.
In Shanghai’s “smart court” system, for example, AI-generated assessments are used to help with sentencing decisions. But it is difficult for defendants to assess the tool’s potential biases, the quality of the data, and the soundness of the algorithm, making it hard for them to challenge the decisions made.
China’s experience reminds us of the need for transparency and accountability when it comes to AI in public services. Systems must be designed and implemented in ways that are inclusive and protect citizens’ digital rights.
But a closer look at the dynamics of China’s AI development reveals the importance of local government in implementing innovation policy. Municipal and provincial governments across China are establishing cross-sector partnerships with research institutions and tech companies to create local AI innovation ecosystems and drive rapid research and development.
Beyond the thriving major cities of Beijing, Shanghai, and Shenzhen, efforts to develop successful innovation hubs are also underway in other regions. A promising example is the city of Hangzhou, in Zhejiang Province, which has established an “AI Town,” clustering together the tech company Alibaba, Zhejiang University, and local businesses to work collaboratively on AI development. China’s local ecosystem approach could offer interesting insights to policymakers in the UK aiming to boost research and innovation outside the capital and tackle longstanding regional economic imbalances.
China’s accelerating AI innovation deserves the world’s full attention, but it is unhelpful to reduce all the many developments into a simplistic narrative about China as a threat or a villain. Observers outside China need to engage seriously with the debate and make more of an effort to understand—and learn from—the nuances of what’s really happening.
Building a Discord Bot for ChatOps , Pentesting or Server Automation (Part 5)
Coding and debugging with Visual Studio Code
Open Visual Studio Code and press CTRL+Shift+P to enter the input window. Write “ssh” and select “Remote-SSH: Add New SSH Host…” for adding our server. It will ask you IP Address and the user of our Digital Ocean server
The app will show us the success message allowing us to connect directly
Once again press CTRL+Shift+P and enter “Remote-SSH: Connect to Host…” and select the connection
Now we will use the knowledge of the previous steps. Create the “.env” file with your secret constants, the “requirements.txt” file with the dependencies and the “bot.py” file with your existing bot’s code
To test it quickly we need a “.env” file with the “DISCORD_TOKEN” constant
A “requirements.txt” file like this one
And for the simplest bot code write this in the “bot.py” file
Go back to the terminal or use the integrated terminal in Visual Studio Code and install the requirements with the command
To test the bot write the command
You should see the “<Your bots name and id> is connected” message in the terminal and in Discord you should see the bot status as online
If you like to debug in Visual Studio Code to fix some bugs or to understand the logic, press F5 key in the IDE and select “Python File”
The IDE will enter debug mode allowing you to breakpoint the code and see the content of the variables
We are all set for this step.
If you encounter typos or something doesn’t work no more write me a comment and I will keep this guide updated. Last update June 28 2020.
$1.77 Billion TVL: DeFi Markets Hit All-Time High With as Yield Farming Flourishes
Crypto Price Analysis & Overview July 3: Bitcoin, Ethereum, Ripple, Kyber Network, and Bancor
Longitudinal studies on cannabis
The Best CBD Opportunity In 2020
CBD Gummies for Aspergers: Does CBD Gummies Make You Sleepy During the Day?
The 420 Radio Show LIVE with Guest Laywer / Cannabis Advacate Jack Lloyd on www.420radio.ca
😡San Francisco Health Dept Hands Out Booze, Weed & Dope to Homeless in Quarantine Hotels
The Real Truth Behind CBD Over Prescription Medications
The Real Truth Behind CBD Over Prescription Medications
DRINK.SMOKE HANGOUT WITH FIT June 11, 2020
DRINK.SMOKE HANGOUT WITH FIT June 11, 2020
Illinois and Indiana Compared
Cannabis Grow, Sensor Push set up, Low Stress Training, and Transplant
Alien RDWC PRO GROW 2 1kg plants hydroponic system cannabis
Rose gardening: 6 simple tips to grow roses at home | Life hacks
Cobbler CBD Flower Fern Valley Farms
Grow Cannabis – Airflow – by Jorge Cervantes
WEED – Charlotte’s Web Story Medicated Marijuana and Epilepsy
Hemp Industry Assoc names executive director, partners with industrial hemp group
FinCEN Guidance: How To Hemp
Docufraud Canada Advises Provincial Court of British Columbia Announcement “Be Prepared To Proceed”
Australia: NT Farmers Association Promoting Hemp Cultivation
Paper: Medical cannabis in the UK: From principle to practice
Donald Trump Jr.’s girlfriend Kimberly Guilfoyle tests positive for COVID-19
South African Cannabis Activist Murdered In Robbery
Japan witnessed significant growth in the blockchain industry in 2020.
São Paulo Court Grants Couple Right to Grow Cannabis for Children Care
Sweden: Survey Says 65% Of Swedes Support Medical Cannabis
Abra CEO: Cardano’s “Shelley” upgrade is good as it spurs competition in crypto
$4 Billion OneCoin Crypto Ponzi Scheme Promoter in Singapore To Pay $72,000 Fine
The Five Most Malicious Ransomwares Demanding Crypto to Watch Out For
Pictured – Inside the Midlands’ biggest cannabis farms
Cardano Founder Roasts EOSIO over Voice Launch as ADA Tops at 19-Month High
Mount Rushmore: Trump denounces ‘cancel culture’ at 4 July event
Virginia lawmaker looks to legalize marijuana in special session
Worldwide Endpoint Security Software Market Shares Report Reveals CrowdStrike is Shaping the Endpoint Market
Last Time Bitcoin Volatility Was This Low, BTC Surged by 25% in 24 Hours
Whale: Before the Next Bitcoin Bull Run, Expect an Altcoin “Extinction Event”
Kimberly Guilfoyle, Trump campaign official and girlfriend of president’s son, tests positive for coronavirus
LinkedIn iOS App Caught Reading Clipboard With Every Keystroke, Says it is a Bug
zephyrnet1 week ago
Compound (COMP) is now available on Coinbase Earn
BBC1 week ago
Elijah McClain: Colorado to review black man’s death in custody
Gaming1 week ago
Justice League, Other DC Movies Won’t Leave HBO Max In July After All
Gaming1 week ago
Gerard Butler Dodges Space Rocks In New Trailer For Disaster Movie Greenland
Fintech1 week ago
Hong Kong Insurtech Enters Austria
zephyrnet1 week ago
Compound (COMP) is now available on Coinbase
Automotive3 days ago
Variables Complicate Safety-Critical Device Verification
Blockchain1 week ago
Plus Token Money Launderers Stopped Short by Congestion