Connect with us


Introducing the open-source Amazon SageMaker XGBoost algorithm container




XGBoost is a popular and efficient machine learning (ML) algorithm for regression and classification tasks on tabular datasets. It implements a technique known as gradient boosting on trees and performs remarkably well in ML competitions.

Since its launch, Amazon SageMaker has supported XGBoost as a built-in managed algorithm. For more information, see Simplify machine learning with XGBoost and Amazon SageMaker. As of this writing, you can take advantage of the open-source Amazon SageMaker XGBoost container, which has improved flexibility, scalability, extensibility, and Managed Spot Training. For more information, see the Amazon SageMaker sample notebooks and sagemaker-xgboost-container on GitHub, or see XBoost Algorithm.

This post introduces the benefits of the open-source XGBoost algorithm container and presents three use cases.

Benefits of the open-source SageMaker XGBoost container

The new XGBoost container has following benefits:

Latest version

The open-source XGBoost container supports the latest XGBoost 1.0 release and all improvements, including better performance scaling on multi-core instances and improved stability for distributed training.


With the new script mode, you can now customize or use your own training script. This functionality, which is also available for TensorFlow, MXNet, PyTorch, and Chainer users, allows you to add in custom pre- or post-processing logic, run additional steps after the training process, or take advantage of the full range of XGBoost functions (such as cross-validation support). You can still use the no-script algorithm mode (like other Amazon SageMaker built-in algorithms), which only requires you to specify a data location and hyperparameters.


The open-source container has a more efficient implementation of distributed training, which allows it to scale out to more instances and reduces out-of-memory errors.


Because the container is open source, you can extend, fork, or modify the algorithm to suit your needs, beyond using the script mode. This includes installing additional libraries and changing the underlying version of XGBoost.

Managed Spot Training

You can save up to 90% on your Amazon SageMaker XGBoost training jobs with Managed Spot Training support. This fully managed option lets you take advantage of unused compute capacity in the AWS Cloud. Amazon SageMaker manages the Spot Instances on your behalf so you don’t have to worry about polling for capacity. The new version of XGBoost automatically manages checkpoints for you to make sure your job finishes reliably. For more information, see Managed Spot Training in Amazon SageMaker and Use Checkpoints in Amazon SageMaker.

Additional input formats

XGBoost now includes support for Parquet and Recordio-protobuf input formats. Parquet is a standardized, open-source, self-describing columnar storage format for use in data analysis systems. Recordio-protobuf is a common binary data format used across Amazon SageMaker for various algorithms, which XGBoost now supports for training and inference. For more information, see Common Data Formats for Training. Additionally, this container supports Pipe mode training for these data formats. For more information, see Using Pipe input mode for Amazon SageMaker algorithms.

Using the latest XGBoost container as a built-in algorithm

As an existing Amazon SageMaker XGBoost user, you can take advantage of the new features and improved performance by specifying the version when you create your training jobs. For more information about getting started with XGBoost or using the latest version, see the GitHub repo.

You can upgrade to the new container by specifying the framework version (1.0-1). This version specifies the upstream XGBoost framework version (1.0) and an additional Amazon SageMaker version (1). If you have an existing XGBoost workflow based on the legacy 0.72 container, this is the only change necessary to get the same workflow working with this container. The container also supports XGBoost 0.90 by using version as 0.90-1.

See the following code:

from import get_image_uri
container = get_image_uri(region, 'xgboost', '1.0-1') estimator = sagemaker.estimator.Estimator(container, role, hyperparameters=hyperparameters, train_instance_count=1, train_instance_type='ml.m5.2xlarge', )

Using managed Spot Instances

You can also take advantage of managed Spot Instance support by enabling the train_use_spot_instances flag on your Estimator. For more information, see the GitHub repo.

When you are training with managed Spot Instances, the training job may be interrupted, which causes it to take longer to start or finish. If a training job is interrupted, you can use a checkpointed snapshot to resume from a previously saved point, which can save training time (and cost). You can also use the checkpoint_s3_uri, which is where your training job stores snapshots, to seamlessly resume when a Spot Instance is interrupted. See the following code:

estimator = sagemaker.estimator.Estimator(container, role, hyperparameters=hyperparameters, train_instance_count=1, train_instance_type='ml.m5.2xlarge', output_path=output_path, sagemaker_session=sagemaker.Session(), train_use_spot_instances=train_use_spot_instances, train_max_run=train_max_run, train_max_wait=train_max_wait, checkpoint_s3_uri=checkpoint_s3_uri ){'train': train_input}) 

Towards the end of the job, you should see the following two lines of output:

  • Training seconds: X – The actual compute time your training job
  • Billable seconds: Y – The time you are billed for after you apply Spot discounting

If you enabled train_use_spot_instances, you should see a notable difference between X and Y, which signifies the cost savings from using Managed Spot Training. This is reflected in the following code:

Managed Spot Training savings: (1-Y/X)*100 %

Using script mode

Script mode is a new feature with the open-source Amazon SageMaker XGBoost container. You can use your own training or hosting script to fully customize the XGBoost training or inference workflow. The following code example is a walkthrough of using a customized training script in script mode. For more information, see the GitHub repo.

Preparing the entry-point script

A typical training script loads data from the input channels, configures training with hyperparameters, trains a model, and saves a model to model_dir so it can be hosted later. Hyperparameters are passed to your script as arguments and can be retrieved with an argparse.ArgumentParser instance.

Starting with the main guard, use a parser to read the hyperparameters passed to your Amazon SageMaker estimator when creating the training job. These hyperparameters are made available as arguments to your input script. You also parse several Amazon SageMaker-specific environment variables to get information about the training environment, such as the location of input data and where to save the model. See the following code:

if __name__ == '__main__': parser = argparse.ArgumentParser() # Hyperparameters are described here parser.add_argument('--num_round', type=int) parser.add_argument('--max_depth', type=int, default=5) parser.add_argument('--eta', type=float, default=0.2) parser.add_argument('--objective', type=str, default='reg:squarederror') # Sagemaker specific arguments. Defaults are set in the environment variables. parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN']) parser.add_argument('--validation', type=str, default=os.environ['SM_CHANNEL_VALIDATION']) args = parser.parse_args() train_hp = { 'max_depth': args.max_depth, 'eta': args.eta, 'gamma': args.gamma, 'min_child_weight': args.min_child_weight, 'subsample': args.subsample, 'silent': args.silent, 'objective': args.objective } dtrain = xgb.DMatrix(args.train) dval = xgb.DMatrix(args.validation) watchlist = [(dtrain, 'train'), (dval, 'validation')] if dval is not None else [(dtrain, 'train')] callbacks = [] prev_checkpoint, n_iterations_prev_run = add_checkpointing(callbacks) # If checkpoint is found then we reduce num_boost_round by previously run number of iterations bst = xgb.train( params=train_hp, dtrain=dtrain, evals=watchlist, num_boost_round=(args.num_round - n_iterations_prev_run), xgb_model=prev_checkpoint, callbacks=callbacks ) model_location = args.model_dir + '/xgboost-model' pkl.dump(bst, open(model_location, 'wb'))"Stored trained model at {}".format(model_location))

Inside the entry-point script, you can optionally customize the inference experience when you use Amazon SageMaker hosting or batch transform. You can customize the following:

  • input_fn() – How the input is handled
  • predict_fn() – How the XGBoost model is invoked
  • output_fn() – How the response is returned

The defaults work for this use case, so you don’t need to define them.

Training with the Amazon SageMaker XGBoost estimator

After you prepare your training data and script, the XGBoost estimator class in the Amazon SageMaker Python SDK allows you to run that script as a training job on the Amazon SageMaker managed training infrastructure. You also pass the estimator your IAM role, the type of instance you want to use, and a dictionary of the hyperparameters that you want to pass to your script. See the following code:

from sagemaker.session import s3_input
from sagemaker.xgboost.estimator import XGBoost xgb_script_mode_estimator = XGBoost( entry_point="", hyperparameters=hyperparameters, image_name=container, role=role, train_instance_count=1, train_instance_type="ml.m5.2xlarge", framework_version="1.0-1", output_path="s3://{}/{}/{}/output".format(bucket, prefix, "xgboost-script-mode"), train_use_spot_instances=train_use_spot_instances, train_max_run=train_max_run, train_max_wait=train_max_wait, checkpoint_s3_uri=checkpoint_s3_uri
){"train": train_input})

Deploying the custom XGBoost model

After you train the model, you can use the estimator to create an Amazon SageMaker endpoint—a hosted and managed prediction service that you can use to perform inference. See the following code:

predictor = xgb_script_mode_estimator.deploy(initial_instance_count=1, instance_type="ml.m5.xlarge")
test_data = xgboost.DMatrix('/path/to/data')

Training with Parquet input

You can now train the latest XGBoost algorithm with Parquet-formatted files or streams directly by using the Amazon SageMaker supported open-sourced ML-IO library. ML-IO is a high-performance data access library for ML frameworks with support for multiple data formats, and is installed by default on the latest XGBoost container. For more information about importing a Parquet file and training with it, see the GitHub repo.


The open-source XGBoost container for Amazon SageMaker provides a fully managed experience and additional benefits that save you money in training and allow for more flexibility.

About the Authors

Rahul Iyer is a Software Development Manager at AWS AI. He leads the Framework Algorithms team, building and optimizing machine learning frameworks like XGBoost and Scikit-learn. Outside work, he enjoys nature photography and cherishes time with his family.

Rocky Zhang is a Senior Product Manager at AWS SageMaker. He builds products that help customers solve real world business problems with Machine Learning. Outside of work he spends most of his time watching, playing, and coaching soccer.

Eric Kim is an engineer in the Algorithms & Platforms Group of Amazon AI. He helps support the AWS service SageMaker, and has experience in machine learning research, development, and application. Outside of work, he is a music lover and a fan of all dogs.

Laurence Rouesnel is a Senior Manager in Amazon AI. He leads teams of engineers and scientists working on deep learning and machine learning research and products, like SageMaker AutoPilot and Algorithms. In his spare time he’s an avid fan of traveling, table-top RPGs, and running.



Using embedded analytics in software applications can drive your business forward




Analytics in your tools can help users gain insights that can help move your clients and the organization to the next level.

People interacting with charts and analyzing statistics. Data visualization concept. 3d vector illustration. People work

Image: Mykyta Dolmatov, Getty Images/iStockphoto

More about Big Data

More than two years ago, Edsby, which provides a learning management system for educational institutions, began embedding analytics into its software that enabled teachers and administrators to detect student learning trends, assess test scores across student populations, and more, all in the spirit of improving education results. 

The Edsby example is not an isolated event. Increasingly, commercial and company in-house software developers are being asked to deliver more value with their applications. In other words, don’t just write applications that process transactions; tell us about the trends and insights transactions reveal by embedding analytics as part of the application.

“Software teams are responsible for building applications with embedded analytics that help their end users make better decisions,” said Steve Schneider, CEO of Logi Analytics, which provides embedded analytics tools for software developers.” This is the idea of providing high-level analytics in the context of an application that people use every day.”

SEE: Microservices: A cheat sheet (free PDF) (TechRepublic)

Schneider said what users want is transactional apps with built-in analytics capabilities that can provide insights to a variety of users with different interests and skill sets. “These are highly sophisticated analytics that must be accessible right from the application,” he said. 

With the help of pick-and-click tools, transaction application developers are spared the time of having to learn how to embed analytics from the ground up in their apps. Instead, they can choose to embed an analytics dashboard into their application, or they can quickly orchestrate an API call to another application without a need to custom develop all of the code.

“You can just click on the Embed command, and the tool will give you a Java script,” Schneider said. “In some cases, you have to do a little configuration for security, but it makes it much easier to get analytics-enriched apps to your user market faster.”

Getting apps to market faster

Here’s how an embedded analytics tool can speed apps to market.

A marketing person is tasked with buying ads and organizing campaigns. He or she gathers information and feeds it to IT, which periodically issues reports that show the results of ad placements and campaigns.

SEE: How to overcome business continuity challenges (free PDF) (TechRepublic)

Now with an application that contains embedded analytics, the marketing person can directly drill down into the reporting information embedded in the app without having to contact IT. This can be done through a self-service interface in real time.

“In one case, a manufacturer was trying to improve operational performance through the use of an application and set of stated metrics,” Schneider said. “Everyone had to log in to the application to record their metrics, but the overall goal of improving performance remained elusive. The manufacturer decided to augment the original application with an embedded analytics dashboard that displayed the key metrics and each team’s performance. This provided visibility to everyone. This quickly evolved into a friendly competition between different groups of employees to see who could achieve the best scores, and the overall corporate metrics performance improved.” 

For most developers, embedding analytics in applications is still in early stages—but embedded analytics in apps is an area that is poised to expand, and that at some point will be able to incorporate both structured and unstructured data in in-app visualizations.

Best practices for embedded analytics

Companies and commercial enterprises interested in using embedded analytics in transactional applications should consider these two best practices:

  1. Think about the users of your application and the problems that they’re trying to solve

This begins with asking users what information they need in order to be successful. “Application developers can also benefit if they think more like product managers,” Schneider said. In other words, what can I do with embedded analytics in my application to truly delight my customer—even if it is the user next door in accounting who I see every day?

2. Start simple

If you haven’t used embedded analytics in applications before, choose a relatively easy-to-achieve objective for your first app and work with a cooperative user. By building a series of successful and high usable apps from the start, you instill confidence in this new style of application. At the same time, you can be defining and standardizing your embedded app development methodology in IT.

Also see


Continue Reading


China and AI: What the World Can Learn and What It Should Be Wary of




China announced in 2017 its ambition to become the world leader in artificial intelligence (AI) by 2030. While the US still leads in absolute terms, China appears to be making more rapid progress than either the US or the EU, and central and local government spending on AI in China is estimated to be in the tens of billions of dollars.

The move has ledat least in the Westto warnings of a global AI arms race and concerns about the growing reach of China’s authoritarian surveillance state. But treating China as a “villain” in this way is both overly simplistic and potentially costly. While there are undoubtedly aspects of the Chinese government’s approach to AI that are highly concerning and rightly should be condemned, it’s important that this does not cloud all analysis of China’s AI innovation.

The world needs to engage seriously with China’s AI development and take a closer look at what’s really going on. The story is complex and it’s important to highlight where China is making promising advances in useful AI applications and to challenge common misconceptions, as well as to caution against problematic uses.

Nesta has explored the broad spectrum of AI activity in Chinathe good, the bad, and the unexpected.

The Good

China’s approach to AI development and implementation is fast-paced and pragmatic, oriented towards finding applications which can help solve real-world problems. Rapid progress is being made in the field of healthcare, for example, as China grapples with providing easy access to affordable and high-quality services for its aging population.

Applications include “AI doctor” chatbots, which help to connect communities in remote areas with experienced consultants via telemedicine; machine learning to speed up pharmaceutical research; and the use of deep learning for medical image processing, which can help with the early detection of cancer and other diseases.

Since the outbreak of Covid-19, medical AI applications have surged as Chinese researchers and tech companies have rushed to try and combat the virus by speeding up screening, diagnosis, and new drug development. AI tools used in Wuhan, China, to tackle Covid-19 by helping accelerate CT scan diagnosis are now being used in Italy and have been also offered to the NHS in the UK.

The Bad

But there are also elements of China’s use of AI that are seriously concerning. Positive advances in practical AI applications that are benefiting citizens and society don’t detract from the fact that China’s authoritarian government is also using AI and citizens’ data in ways that violate privacy and civil liberties.

Most disturbingly, reports and leaked documents have revealed the government’s use of facial recognition technologies to enable the surveillance and detention of Muslim ethnic minorities in China’s Xinjiang province.

The emergence of opaque social governance systems that lack accountability mechanisms are also a cause for concern.

In Shanghai’s “smart court” system, for example, AI-generated assessments are used to help with sentencing decisions. But it is difficult for defendants to assess the tool’s potential biases, the quality of the data, and the soundness of the algorithm, making it hard for them to challenge the decisions made.

China’s experience reminds us of the need for transparency and accountability when it comes to AI in public services. Systems must be designed and implemented in ways that are inclusive and protect citizens’ digital rights.

The Unexpected

Commentators have often interpreted the State Council’s 2017 Artificial Intelligence Development Plan as an indication that China’s AI mobilization is a top-down, centrally planned strategy.

But a closer look at the dynamics of China’s AI development reveals the importance of local government in implementing innovation policy. Municipal and provincial governments across China are establishing cross-sector partnerships with research institutions and tech companies to create local AI innovation ecosystems and drive rapid research and development.

Beyond the thriving major cities of Beijing, Shanghai, and Shenzhen, efforts to develop successful innovation hubs are also underway in other regions. A promising example is the city of Hangzhou, in Zhejiang Province, which has established an “AI Town,” clustering together the tech company Alibaba, Zhejiang University, and local businesses to work collaboratively on AI development. China’s local ecosystem approach could offer interesting insights to policymakers in the UK aiming to boost research and innovation outside the capital and tackle longstanding regional economic imbalances.

China’s accelerating AI innovation deserves the world’s full attention, but it is unhelpful to reduce all the many developments into a simplistic narrative about China as a threat or a villain. Observers outside China need to engage seriously with the debate and make more of an effort to understandand learn fromthe nuances of what’s really happening.The Conversation

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Image Credit: Dominik Vanyi on Unsplash


Continue Reading


Building a Discord Bot for ChatOps , Pentesting or Server Automation (Part 5)




Coding and debugging with Visual Studio Code

Open Visual Studio Code and press CTRL+Shift+P to enter the input window. Write “ssh” and select “Remote-SSH: Add New SSH Host…” for adding our server. It will ask you IP Address and the user of our Digital Ocean server

The app will show us the success message allowing us to connect directly

Once again press CTRL+Shift+P and enter “Remote-SSH: Connect to Host…” and select the connection

Now we will use the knowledge of the previous steps. Create the “.env” file with your secret constants, the “requirements.txt” file with the dependencies and the “” file with your existing bot’s code

To test it quickly we need a “.env” file with the “DISCORD_TOKEN” constant

A “requirements.txt” file like this one

And for the simplest bot code write this in the “” file

In summary

Go back to the terminal or use the integrated terminal in Visual Studio Code and install the requirements with the command

To test the bot write the command

You should see the “<Your bots name and id> is connected” message in the terminal and in Discord you should see the bot status as online

If you like to debug in Visual Studio Code to fix some bugs or to understand the logic, press F5 key in the IDE and select “Python File”

The IDE will enter debug mode allowing you to breakpoint the code and see the content of the variables

We are all set for this step.

If you encounter typos or something doesn’t work no more write me a comment and I will keep this guide updated. Last update June 28 2020.


Continue Reading
Blockchain33 seconds ago

$1.77 Billion TVL: DeFi Markets Hit All-Time High With as Yield Farming Flourishes

Blockchain1 min ago

Crypto Price Analysis & Overview July 3: Bitcoin, Ethereum, Ripple, Kyber Network, and Bancor

Cannabis29 mins ago

Longitudinal studies on cannabis

Cannabis36 mins ago

The Best CBD Opportunity In 2020

Cannabis37 mins ago

CBD Gummies for Aspergers: Does CBD Gummies Make You Sleepy During the Day?

Cannabis37 mins ago

The 420 Radio Show LIVE with Guest Laywer / Cannabis Advacate Jack Lloyd on

Cannabis39 mins ago

😡San Francisco Health Dept Hands Out Booze, Weed & Dope to Homeless in Quarantine Hotels

Cannabis40 mins ago

The Real Truth Behind CBD Over Prescription Medications

Cannabis40 mins ago

The Real Truth Behind CBD Over Prescription Medications

Cannabis41 mins ago


Cannabis41 mins ago


Cannabis51 mins ago

Illinois and Indiana Compared

Cannabis56 mins ago

Cannabis Grow, Sensor Push set up, Low Stress Training, and Transplant

Cannabis60 mins ago

Alien RDWC PRO GROW 2 1kg plants hydroponic system cannabis

Cannabis1 hour ago

Rose gardening: 6 simple tips to grow roses at home | Life hacks

Cannabis1 hour ago

Cobbler CBD Flower Fern Valley Farms

Cannabis1 hour ago

Grow Cannabis – Airflow – by Jorge Cervantes

Cannabis1 hour ago

WEED – Charlotte’s Web Story Medicated Marijuana and Epilepsy

Cannabis1 hour ago

Hemp Industry Assoc names executive director, partners with industrial hemp group

Cannabis2 hours ago

FinCEN Guidance: How To Hemp

Cannabis2 hours ago

Docufraud Canada Advises Provincial Court of British Columbia Announcement “Be Prepared To Proceed”

Cannabis2 hours ago

Australia: NT Farmers Association Promoting Hemp Cultivation

Cannabis2 hours ago

Paper: Medical cannabis in the UK: From principle to practice

Business Insider2 hours ago

Donald Trump Jr.’s girlfriend Kimberly Guilfoyle tests positive for COVID-19

Cannabis2 hours ago

South African Cannabis Activist Murdered In Robbery

Blockchain2 hours ago

Japan witnessed significant growth in the blockchain industry in 2020.

Cannabis2 hours ago

São Paulo Court Grants Couple Right to Grow Cannabis for Children Care

Cannabis2 hours ago

Sweden: Survey Says 65% Of Swedes Support Medical Cannabis

Blockchain2 hours ago

Abra CEO: Cardano’s “Shelley” upgrade is good as it spurs competition in crypto

Blockchain2 hours ago

$4 Billion OneCoin Crypto Ponzi Scheme Promoter in Singapore To Pay $72,000 Fine

Blockchain2 hours ago

The Five Most Malicious Ransomwares Demanding Crypto to Watch Out For

Cannabis3 hours ago

Pictured – Inside the Midlands’ biggest cannabis farms

Blockchain3 hours ago

Cardano Founder Roasts EOSIO over Voice Launch as ADA Tops at 19-Month High

BBC3 hours ago

Mount Rushmore: Trump denounces ‘cancel culture’ at 4 July event

Cannabis4 hours ago

Virginia lawmaker looks to legalize marijuana in special session

Cyber Security4 hours ago

Worldwide Endpoint Security Software Market Shares Report Reveals CrowdStrike is Shaping the Endpoint Market

Blockchain4 hours ago

Last Time Bitcoin Volatility Was This Low, BTC Surged by 25% in 24 Hours

Blockchain4 hours ago

Whale: Before the Next Bitcoin Bull Run, Expect an Altcoin “Extinction Event”

CNBC4 hours ago

Kimberly Guilfoyle, Trump campaign official and girlfriend of president’s son, tests positive for coronavirus

Cyber Security4 hours ago

LinkedIn iOS App Caught Reading Clipboard With Every Keystroke, Says it is a Bug