Connect with us


China Wants to Be the World’s AI Superpower. Does It Have What It Takes?




China’s star has been steadily rising for decades. Besides slashing extreme poverty rates from 88 percent to under 2 percent in just 30 years, the country has become a global powerhouse in manufacturing and technology. Its pace of growth may slow due to an aging population, but China is nonetheless one of the world’s biggest players in multiple cutting-edge tech fields.

One of these fields, and perhaps the most significant, is artificial intelligence. The Chinese government announced a plan in 2017 to become the world leader in AI by 2030, and has since poured billions of dollars into AI projects and research across academia, government, and private industry. The government’s venture capital fund is investing over $30 billion in AI; the northeastern city of Tianjin budgeted $16 billion for advancing AI; and a $2 billion AI research park is being built in Beijing.

On top of these huge investments, the government and private companies in China have access to an unprecedented quantity of data, on everything from citizens’ health to their smartphone use. WeChat, a multi-functional app where people can chat, date, send payments, hail rides, read news, and more, gives the CCP full access to user data upon request; as one BBC journalist put it, WeChat “was ahead of the game on the global stage and it has found its way into all corners of people’s existence. It could deliver to the Communist Party a life map of pretty much everybody in this country, citizens and foreigners alike.” And that’s just one (albeit big) source of data.

Many believe these factors are giving China a serious leg up in AI development, even providing enough of a boost that its progress will surpass that of the US.

But there’s more to AI than data, and there’s more to progress than investing billions of dollars. Analyzing China’s potential to become a world leader in AI—or in any technology that requires consistent innovation—from multiple angles provides a more nuanced picture of its strengths and limitations. In a June 2020 article in Foreign Affairs, Oxford fellows Carl Benedikt Frey and Michael Osborne argued that China’s big advantages may not actually be that advantageous in the long run—and its limitations may be very limiting.

Moving the AI Needle

To get an idea of who’s likely to take the lead in AI, it could help to first consider how the technology will advance beyond its current state.

To put it plainly, AI is somewhat stuck at the moment. Algorithms and neural networks continue to achieve new and impressive feats—like DeepMind’s AlphaFold accurately predicting protein structures or OpenAI’s GPT-3 writing convincing articles based on short prompts—but for the most part these systems’ capabilities are still defined as narrow intelligence: completing a specific task for which the system was painstakingly trained on loads of data.

(It’s worth noting here that some have speculated OpenAI’s GPT-3 may be an exception, the first example of machine intelligence that, while not “general,” has surpassed the definition of “narrow”; the algorithm was trained to write text, but ended up being able to translate between languages, write code, autocomplete images, do math, and perform other language-related tasks it wasn’t specifically trained for. However, all of GPT-3’s capabilities are limited to skills it learned in the language domain, whether spoken, written, or programming language).

Both AlphaFold’s and GPT-3’s success was due largely to the massive datasets they were trained on; no revolutionary new training methods or architectures were involved. If all it was going to take to advance AI was a continuation or scaling-up of this paradigm—more input data yields increased capability—China could well have an advantage.

But one of the biggest hurdles AI needs to clear to advance in leaps and bounds rather than baby steps is precisely this reliance on extensive, task-specific data. Other significant challenges include the technology’s fast approach to the limits of current computing power and its immense energy consumption.

Thus, while China’s trove of data may give it an advantage now, it may not be much of a long-term foothold on the climb to AI dominance. It’s useful for building products that incorporate or rely on today’s AI, but not for pushing the needle on how artificially intelligent systems learn. WeChat data on users’ spending habits, for example, would be valuable in building an AI that helps people save money or suggests items they might want to purchase. It will enable (and already has enabled) highly tailored products that will earn their creators and the companies that use them a lot of money.

But data quantity isn’t what’s going to advance AI. As Frey and Osborne put it, “Data efficiency is the holy grail of further progress in artificial intelligence.”

To that end, research teams in academia and private industry are working on ways to make AI less data-hungry. New training methods like one-shot learning and less-than-one-shot learning have begun to emerge, along with myriad efforts to make AI that learns more like the human brain.

While not insignificant, these advancements still fall into the “baby steps” category. No one knows how AI is going to progress beyond these small steps—and that uncertainty, in Frey and Osborne’s opinion, is a major speed bump on China’s fast-track to AI dominance.

How Innovation Happens

A lot of great inventions have happened by accident, and some of the world’s most successful companies started in garages, dorm rooms, or similarly low-budget, nondescript circumstances (including Google, Facebook, Amazon, and Apple, to name a few). Innovation, the authors point out, often happens “through serendipity and recombination, as inventors and entrepreneurs interact and exchange ideas.”

Frey and Osborne argue that although China has great reserves of talent and a history of building on technologies conceived elsewhere, it doesn’t yet have a glowing track record in terms of innovation. They note that of the 100 most-cited patents from 2003 to present, none came from China. Giants Tencent, Alibaba, and Baidu are all wildly successful in the Chinese market, but they’re rooted in technologies or business models that came out of the US and were tweaked for the Chinese population.

“The most innovative societies have always been those that allowed people to pursue controversial ideas,” Frey and Osborne write. China’s heavy censorship of the internet and surveillance of citizens don’t quite encourage the pursuit of controversial ideas. The country’s social credit system rewards people who follow the rules and punishes those who step out of line. Frey adds that top-down execution of problem-solving is effective when the problem at hand is clearly defined—and the next big leaps in AI are not.

It’s debatable how strongly a culture of social conformism can impact technological innovation, and of course there can be exceptions. But a relevant historical example is the Soviet Union, which, despite heavy investment in science and technology that briefly rivaled the US in fields like nuclear energy and space exploration, ended up lagging far behind primarily due to political and cultural factors.

Similarly, China’s focus on computer science in its education system could give it an edge—but, as Frey told me in an email, “The best students are not necessarily the best researchers. Being a good researcher also requires coming up with new ideas.”

Winner Take All?

Beyond the question of whether China will achieve AI dominance is the issue of how it will use the powerful technology. Several of the ways China has already implemented AI could be considered morally questionable, from facial recognition systems used aggressively against ethnic minorities to smart glasses for policemen that can pull up information about whoever the wearer looks at.

This isn’t to say the US would use AI for purely ethical purposes. The military’s Project Maven, for example, used artificially intelligent algorithms to identify insurgent targets in Iraq and Syria, and American law enforcement agencies are also using (mostly unregulated) facial recognition systems.

It’s conceivable that “dominance” in AI won’t go to one country; each nation could meet milestones in different ways, or meet different milestones. Researchers from both countries, at least in the academic sphere, could (and likely will) continue to collaborate and share their work, as they’ve done on many projects to date.

If one country does take the lead, it will certainly see some major advantages as a result. Brookings Institute fellow Indermit Gill goes so far as to say that whoever leads in AI in 2030 will “rule the world” until 2100. But Gill points out that in addition to considering each country’s strengths, we should consider how willing they are to improve upon their weaknesses.

While China leads in investment and the US in innovation, both nations are grappling with huge economic inequalities that could negatively impact technological uptake. “Attitudes toward the social change that accompanies new technologies matter as much as the technologies, pointing to the need for complementary policies that shape the economy and society,” Gill writes.

Will China’s leadership be willing to relax its grip to foster innovation? Will the US business environment be enough to compete with China’s data, investment, and education advantages? And can both countries find a way to distribute technology’s economic benefits more equitably?

Time will tell, but it seems we’ve got our work cut out for us—and China does too.

Image Credit: Adam Birkett on Unsplash



Why machine learning strategies fail




List of icons of major machine learning and cloud providers

Most companies are still trying to figure out how to make AI work. A recent survey looks at some of the barriers to machine learning.Read More Source:

Continue Reading


Oraichain Review: The AI Powered Oracle System




Blockchain technology and artificial intelligence are now being integrated into every industry and nearly every aspect of our economy. Both of these technologies are concerned with the usage and storage of data, which has become critical as data in the modern world is immense and growing.

One thing that hasn’t been accomplished though is the combination of blockchain and artificial intelligence. Combining the two can provide even greater value as adding artificial intelligence to blockchain will allow for analyzing data, which will generate more insight about that data.

Oraichain Overview

Overview of Oraichain’s AI Powered Oracles

One area where this could be particularly useful is in smart contracts. These are protocols or programs that are created to automatically execute, either documenting or controlling relevant actions and events. It does this based on its programming and the terms of a contract that’s been specified.

Smart contracts are increasingly used on blockchains as they have a number of useful benefits, particularly in the increasingly popular decentralized finance space. However  they remain under one limiting constraints in that they must follow strict rules, which prevents the use of an artificial intelligence model in any smart contract.

The solution to this problem is being developed by Oraichain. This is a data oracle platform and it is designed to connect artificial intelligence APIs with smart contracts or other applications.

With Oraichain a smart contract can be enhanced to securely access external artificial intelligence APIs. The focus of blockchains currently is the use of price oracles, but with Oraichain smart contracts will have access to reliable AI data, providing new and useful functionality to blockchains.

What is Oraichain?

As a data oracle platform Oraichain is concerned with the aggregation and connection of smart contracts and AI APIs. It is the very first AI-powered data oracle in the world. Currently there are six major areas or features that Oraichain is bringing to the table.

Oraichain Mainnet

Oraichain Announcing its Recent Mainnet. Via Blog

AI Oracle

As we’ve already mentioned, Oraichain is designed to enhance the utility of smart contracts by allowing them access to external APIs which are AI driven. Current oracle blockchains are primarily focused on price oracles, but Oraichain plans on changing all that.

With Oraichain dApps gain new and useful functionality by being able to use reliable external AI data. This is accomplished by sending requests to validators who acquire and test data from various external AI APIs. Once confirmed the data is stored on-chain, ensuring its reliability and allowing it to be used as proof in the future.

AI Marketplace

The AI Marketplace on Oraichain is where AI providers are able to sell their AI services. This brings AI to Oraichain and rewards the providers with ORAI tokens. There are a number of services that are provided, including price prediction, face authentication, yield farming, and much more.

The AI providers benefit from hosting their models directly on Oraichain without the need for third-parties. Using this mechanism allows small companies or even individuals to compete with larger entities in having their work featured in the AI Marketplace. Developers and users get to choose the AI services they require and pay for them with ORAI tokens.

AI Ecosystem

The AI Marketplace is not the only piece of the AI ecosystem of Oraichain. There is additional AI infrastructure to support the AI model developers. The ecosystem includes a fully developed and functional web GUI to assist in publishing AI services more rapidly and with less troubles.

Yield Farming

Yield farming is just one potential use case for Oraichain. Image via Oraichain Docs.

The ecosystem also allows AI providers to follow the flow of any requests for their services from start to finish. This is included as a means to increase the transparency of the system. With this level of transparency users can easily see which validators are best at execution, and if there are any malicious providers.

Staking & Earning

Validators stake their tokens and receive rewards for securing the network. Other users are also able to delegate their tokens to existing validators and share in those rewards proportionally. It’s important that delegators do understand that this is not passive income.

Delegators need to actively monitor the validators to ensure they continue to perform well within the ecosystem. If they are delegating to a malicious validator they risk having their delegated tokens slashed. So, delegators are equally responsible for ensuring the Oraichain ecosystem remains secure and of high quality.

Test Cases

The test cases are provided to Oraichain to verify the integrity and correctness of any AI services on the blockchain network. It is possible for third parties to become test case providers and then examine specific AI models to determine if they are qualified to operate on Oraichain and charge fees. Users can provide expected outputs and see if the AI model results are similar. These test cases providers encourage the AI providers to continue providing the best quality services.

Orai DAO

Governance on Oraichain is done by the community in a DAO model. Anyone owning ORAI tokens is able to participate in the governance of the network. They can also participate in the ongoing development and the future plans for the Oraichain ecosystem. While the project development team was responsible for creating the foundation for governance, it has now been automated and will forever remain in the hands of the community.

What Prevents Blockchain using AI Models?

Smart contracts in the way they are developed currently are unable to run AI models, and developers have found it to be nearly impossible to integrate an AI model into a smart contract. AI models are typically very complex constructions based on neural networks, SVM, clustering and other approaches. Smart contracts include three characteristics that prevent the inclusion of AI models:

Oraichain Oracle

Three things keep blockchains from using AI models, but Oraichain will fix that. Image via

Strictness: Smart contracts are developed in such a way that they must always follow the strict rules put in place for them. All input for the smart contract must be 100% accurate if an output is expected. However AI models don’t necessarily provide 100% accurate inputs. Oraichain will overcome some of the aspects of smart contract strictness, giving a better user experience and enhanced smart contract functionality.

Environment: Typically smart contracts are created using high-level programming languages, such as Solidity and Rust. This provides better security and syntax for the smart contracts. By contrast most AI models are written in Java or Python.

Data size: Due to the gas costs of running smart contracts on most networks they are usually created with very small storage allowances. Comparatively the AI models are quite large and use a lot of storage space.

Blockchain Based Oracle AI

Oraichain is being developed as a way to create smart contracts that are able to use AI models. On the surface the mechanism being used by Oraichain seems similar to those used by Chainlink or the Band Protocol, but Oraichain is more heavily focused on AI APIs and the quality of the AI models.

Each user request includes attached test cases, and in order to receive payment the providers API must pass a specified number of test cases. Validators manage the test case features, and the quality of the AI models, making Oraichain quite different and unique from other solutions.

Oraichain System Overview

The Oraichain public blockchain allows for a number of user-generated data requests. In addition to users requesting data the blockchain also allows smart contracts to request data securely from artificial intelligence APIs that are external to the blockchain. The blockchain has been built using the Cosmos SDK and utilizes Tendermint’s Byzantine Fault Tolerance (BFT) as a consensus mechanism to ensure transaction confirmations are handled rapidly.

In terms of consensus mechanism the Oraichain protocol is similar to Delegated Proof-of-Stake (DPoS). The network is constructed of validators, each of which owns and stakes ORAI tokens, while other users who hold ORAI tokens are able to delegate them to the nominated validators. In this way both validators and delegators receive rewards proportional to their stake with each newly created block.

Validators have the task of collecting data from AI Providers and validating the data before it is stored to the blockchain. In order to validate the AI API each validator is required to do testing based on the test cases provided by users, test providers, or the smart contracts. Any time a users is unsure which test case might be good they are able to request additional test cases from the test providers. Thus the validity of the AI APIs can always be verified.

Oraichain System Overview

A representation of the inner workings behind Oraichain. Image via Oraichain Docs

You can see above how the flow of requesting an AI API works in the Oraichain system. When performing a request the smart contracts or users are required to call an oracle script which is available from the AI Marketplace or from the Oraichain gateway. These oracle scripts include external AI data sources, provided by the AI providers, along with test cases and optional test sources. There is also a transaction fee required to complete each request.

Whenever a request is submitted a random validator is chosen to complete the request. This validator then retrieves the necessary data from one or more AI providers and executes test scenarios to verify the validity of the data. If the tests pass the data can be passed along, but if the tests fail the request is cancelled.

When a request is successful the results of the test are written to the Oraichain blockchain. This result can be fetched from smart contracts or regular applications and serves as the proof of execution. A successful request is also required to pay the necessary transaction fees, which are used to reward validators and delegators.

There is an overhead of reading results from Oraichain’s transactions, but it helps ensure that the AI API quality is good and there is no data tampering during the process of fetching data from AI providers.

If we compare this testing with Chainlink and Bank Protocol we see that API testing using test cases is unique to Oraichain. Because Oraichain is focused on AI APIs it is crucial that testing is included to control the quality of the AI providers in the ecosystem. Plus users and test providers can submit new and suitable test cases to properly verify any AI API. These test cases incentivize the AI providers to improve the quality and accuracy of their AI models.

Oraichain Validation

Validating test cases to complete a request. Image via Oraichain Docs

Another unique feature added to the Oraichain model is the ability of the community to rate the reputation of each validator in regard to the improvement of the quality of the AI APIs. In this way validators can be slashed if they are found to have low availability, slow response times, failure to validate AI providers, failure to perform test cases properly, or any other bad behavior.

One warning is that a large number of validators are needed to prevent the system from becoming centralized. A greater number of validators serves to increase the availability of the network, while also improving on scalability and successful request performance.

At the same time block reward and transaction fees need to be sufficient to incentivize a large number of validators to join and participate in the Oraichain ecosystem. Otherwise the network could become centralized, and will certainly slow to the point of being unusable.

The Oraichain Team

Oraichain recently made some changes to their leadership, moving the former CTO of Orachain into the position of CEO for Oraichain Vietnam and welcoming Mr. Tu Pham as the CTO of Oraichain.

Oraichain Team

The impressive leadership team at Oraichain. Image via

Chung Dao continues as the CEO of Oraichain. As one of the co-founders of the project he has been instrumental in its growth since the very beginning. He is also the co-founder of Rikkeisoft and has achieved a PhD in Computer Science from The University of Tokyo.

The AI Lead and another co-founder of the project is Diep Nguyen, a lecturer at VNU in Hanoi and holder of a PhD from Keio University.

In addition, Oraichain’s total workforce has now been expanded to 25 people including the core team, AI and blockchain specialists, data scientists and developers.

Oraichain & Binance Chain Integration

Around the same time as making changes to the leadership at Oraichain, the team also completed an integration with Binance Chain. This integration creates a bridge from Ethereum to Binance Chain for the ERC-20 ORAI tokens. Oraichain has committed to providing the necessary liquidity for trading the BNB/ORAI pairing on PancakeSwap.

Oraichain Bridge

Swap easily from ERC-20 to BEP-20 tokens and vice versa. Image via Oraichain Blog.

Anyone who wishes to swap between the ERC-20 ORAI token and the BEP-20 ORAI token can do so at

Further information regarding the new BEP-20 token and instructions on swapping can be found here.

ORAI Token Economics

Anytime an AI request is sent to the Oraichain network there is an associated transaction cost that needs to be paid in ORAI tokens. In fact, the token plays a role as the transaction fee that is paid for request executing validators, AI-API providers, test case providers, and block creating validators.

The transaction fee is not set, but varies based on the fee requirement of the validators who execute the requests, the AI API providers, and the test case providers. This means that anytime there is a request made the validators can choose whether or not to execute the request based on the transaction fee offered.

Once validators have decided whether or not to be included in the pool of willing participants the system randomly chooses one of the validators that have expressed a willingness to execute the request. The validator is also responsible for clarifying the fee paid to AI-API providers, test case providers, and block creating validators in the MsgResultReport.

It is possible for more than one validator to be included in a request, in which case the transaction fee is divided equally among the validators that participated in the request. Again, the validators must decide if they are willing to accept such a transaction fee.

The ORAI token is rewarded for each newly created block, so to keep the value of ORAI token, holders must stake their token to the Oraichain network. The rewarding token is divided based on the number of tokens that a holder is staking to a validator. Moreover, there is a mechanism to punish bad behaviors of validators in aspects of AI API quality, response time, and availability.

Oraichain Tokenomics

The new tokenomics supports the growth of ORAI tokens. Image via Oraichain Blog

The team also changed the tokenonmics by burning 73% of the total supply of ORAI tokens in December 2020. They also extended the emission schedule to 2027, thus flattening the release curve and protecting from sudden supply shocks. It also helps to minimize inflation in the early years of the project.

The ORAI Token

There was a seed sale conducted in October 2020 with ORAI tokens being sold for $0.081 each. There was a goal of $70,000 for the sale, however no data regarding the total funds raised was released. In November 2020 there was a private sale scheduled, but it was never held. Finally, there was a public sale scheduled for February 2021, but after the team changed the tokenomics and burned 73% of the circulating supply the public sale was cancelled.

The price of the ORAI token has surged in 2021, reaching an all-time high of $107.48 on February 20, 2021. That contrasts with the all-time low of $2.83 just four months earlier on October 29, 2020.

Oraichain Chart

The ORAI token has soared higher in just 4 months. Image via

As of February 23, 2021 the price has retreated substantially from its all-time high, trading at $65.06. There are very few exchanges handling the token, with the majority of transactions occurring on Uniswap. There is also a small amount of activity on KuCoin,, and Bithumb Global.

Oraichain Use Cases

There are already a number of use cases that have generated interest in Oraichain.

Yield Farming with AI

The yield farming based on Oraichain was inspired by the development of (YFI). Like yEarn, the Oraichain system helps reduce the complexity of yield trading. Where yEarn uses crowdsourcing knowledge, Oraichain provides AI-based price prediction APIs as inputs to smart contracts. The yield farming use case has two functionalities:

Earn: Get price prediction from Oraichain and automatically decide BUY/SELL tokens. Users choose the best performing AI APIs.

Vaults: Apply automated trading oracle scripts on Oraichain. Deposit tokens and the assigned oracle script will find the best AI input to maximize yield.

yAI Finance

AI powered DeFi platform. Image via

Compared to (crowdsource-based strategies), AI-based trading performance could be less efficient, but risk management could be better since all buying or selling decision is based on AI models (or by machine) and not by human psychology.

Flexible smart contracts & face authentication

There are several scenarios in which face authentication is very useful:

  • using your face to get your balance instead of using a private key,
  • withdrawing tokens to registered wallets using your face
  • using your face in order to reset your private/public key pair
  • using both your private key and face in order to execute a smart contract.

Using face authentication might be riskier than a private key, but it helps improve the user experience. In cases of checking balance and withdrawing tokens to registered wallets, face authentication is considered as safe and convenient.

Fake news Detection

This use case focuses more on a regular application that wants to check if the news can be trusted. Oraichain provides a marketplace in a decentralized manner in which combining results from different providers is possible. If the providers want to receive payments, their APIs must pass the test cases just as any other API provider.

More potential use cases

  • Smart contracts help check if a product is fake in the supply chain
  • Smart contracts deciding a loan based on users’ credit score
  • Smart contracts automatically pricing game items based on their characteristics and DNA
  • Marketplace of automated diagnostics for X-ray images, spam classification, handwriting detection using OCR, and citizen ID card detection using OCR.

Oraichain Roadmap

Oraichain Roadmap

An impressive 2021 roadmap. Image via


Just like other projects that have been built in the data oracle sector the demand for Oraichain should only increase as the DeFi economy continues to expand. Starting with the yAI DeFi product Oraichain is showing it is more than capable of competing in the space.

In addition, this platform fills a niche not served by crowdsourced projects like yEarn Finance. It’s also taking a unique approach that sets it apart from industry leader Chainlink.

The mainnet launch for the project is on February 24, which will be an exciting time to see how much demand there is for the project and its unique take on oracle protocols and DeFi. It could also reinvigorate the ORAI token, which has seen impressive growth over the past four months.

Oraichain is a young project, but it’s already made very impressive strides. The roadmap for the project is quite impressive, but the team is impressive as well. That could lead to unprecedented growth for Oraichain in 2021 and beyond.

As the lone project taking on the problem of adding AI to smart contracts Oraichain could be setting itself up as a leader in the blockchain space for some time to come.

Disclaimer: These are the writer’s opinions and should not be considered investment advice. Readers should do their own research.

The post Oraichain Review: The AI Powered Oracle System appeared first on Coin Bureau.


Continue Reading


Biden should double down on Trump’s policy of promoting AI within government




Binary code, ones and zeros in a 1970 dot matrix font on a distressed US Flag faded to data 1010110 columns.

The current administration should not only maintain the policy of promoting government use of AI, it should make it a priority.Read More Source:

Continue Reading


Setting up Amazon Personalize with AWS Glue




Data can be used in a variety of ways to satisfy the needs of different business units, such as marketing, sales, or product. In this post, we focus on using data to create personalized recommendations to improve end-user engagement. Most ecommerce applications consume a huge amount of customer data that can be used to provide personalized recommendations; however, that data may not be cleaned or in the right format to provide those valuable insights.

The goal of this post is to demonstrate how to use AWS Glue to extract, transform, and load your JSON data into a cleaned CSV format. We then show you how to run a recommendation engine powered by Amazon Personalize on your user interaction data to provide a tailored experience for your customers. The resulting output from Amazon Personalize is recommendations you can generate from an API.

A common use case is an ecommerce platform that collects user-item interaction data and suggests similar products or products that a customer may like. By the end of this post, you will be able to take your uncleaned JSON data and generate personalized recommendations based off of products each user has interacted with, creating a better experience for your end-users. For the purposes of this post, refer to this user-item-interaction dataset to build this solution.

The resources of this solution may incur a cost on your AWS account. For pricing information, see AWS Glue Pricing and Amazon Personalize Pricing.

The following diagram illustrates our solution architecture.


For this post, you need the following:

For instructions on creating a bucket, see Step 1: Create your first S3 bucket. Make sure to attach the Amazon Personalize access policy.

These are very permissive policies; in practice it’s best to use least privilege and only give access where it’s needed. For instructions on creating a role, see Step 2: Create an IAM Role for AWS Glue.

Crawling your data with AWS Glue

We use AWS Glue to crawl through the JSON file to determine the schema of your data and create a metadata table in your AWS Glue Data Catalog. The Data Catalog contains references to data that is used as sources and targets of your ETL jobs in AWS Glue. AWS Glue is a serverless data preparation service that makes it easy to extract, clean, enrich, normalize, and load data. It helps prepare your data for analysis or machine learning (ML). In this section, we go through how to get your JSON data ready for Amazon Personalize, which requires a CSV file.

Your data can have different columns that you may not necessarily want or need to run through Amazon Personalize. In this post, we use the user-item-interaction.json file and clean that data using AWS Glue to only include the columns user_id, item_id, and timestamp, while also transforming it into CSV format. You can use a crawler to access your data store, extract metadata, and create table definitions in the Data Catalog. It automatically discovers new data and extracts schema definitions. This can help you gain a better understanding of your data and what you want to include while training your model.

The user-item-interaction JSON data is an array of records. The crawler treats the data as one object: just an array. We create a custom classifier to create a schema that is based on each record in the JSON array. You can skip this step if your data isn’t an array of records.

  1. On the AWS Glue console, under Crawlers, choose Classifiers.
  2. Choose Add classifier.
  3. For Classifier name¸ enter json_classifier.
  4. For Classifier type, select JSON.
  5. For JSON path, enter $[*].
  6. Choose Create.

Choose Create.

  1. On the Crawlers page, choose Add crawler.
  2. For Crawler name, enter json_crawler.
  3. For Custom classifiers, add the classifier you created.

For Custom classifiers, add the classifier you created.

  1. Choose Next.
  2. For Crawler source type, choose Data stores.
  3. Leave everything else as default and choose Next.
  4. For Choose a data store, enter the Amazon S3 path to your JSON data file.
  5. Choose Next.

Choose Next.

  1. Skip the section Add another data store.
  2. In the Choose an IAM role section, select Choose an existing IAM role.
  3. For IAM role, choose the role that you created earlier (AWSGlueServiceRole-xxx).
  4. Choose Next.

Choose Next.

  1. Leave the frequency as Run on Demand.
  2. On the Output page, choose Add database.
  3. For Database name, enter json_data.
  4. Choose Finish.
  5. Choose Run it now. 

You can also run your crawler by going to the Crawlers page, selecting your crawler, and choosing Run crawler.

Using AWS Glue to convert your files from CSV to JSON

After your crawler finishes running, go to the Tables page on the AWS Glue console. Navigate to the table your crawler created. Here you can see the schema of your data. Make note of the fields you want to use with your Amazon Personalize data. For this post, we want to keep the user_id, item_id, and timestamp columns for Amazon Personalize.

For this post, we want to keep the user_id, item_id, and timestamp columns for Amazon Personalize.

At this point, you have set up your database. Amazon Personalize requires CSV files, so you have to transform the data from JSON format into three cleaned CSV files that include only the data you need in Amazon Personalize. The following table shows examples of the three CSV files you can include in Amazon Personalize. It’s important to note that interactions data is required, whereas user and item data metadata is optional.

Dataset Type Required Fields Reserved Keywords

USER_ID (string)

1 metadata field


ITEM_ID (string)

1 metadata field


USER_ID (string)

ITEM_ID (string)



EVENT_TYPE (string)


EVENT_VALUE (float,null)

It’s also important to make sure that you have at least 1,000 unique combined historical and event interactions in order to train the model. For more information about quotas, see Quotas in Amazon Personalize.

To save the data as a CSV, you need to run an AWS Glue job on the data. A job is the business logic that performs the ETL work in AWS Glue. The job changes the format from JSON into CSV. For more information about data formatting, see Formatting Your Input Data.

  1. On the AWS Glue Dashboard, choose AWS Glue Studio.

AWS Glue Studio is an easy-to-use graphical interface for creating, running, and monitoring AWS Glue ETL jobs.

  1. Choose Create and manage jobs.
  2. Select Source and target added to the graph.
  3. For Source, choose S3.
  4. For Target, choose S3.
  5. Choose Create.

Choose Create.

  1. Choose the data source S3 bucket.
  2. On the Data source properties – S3 tab, add the database and table we created earlier.

On the Data source properties – S3 tab, add the database and table we created earlier.

  1. On the Transform tab, select the boxes to drop user_login and location.

In this post, we don’t use any additional metadata to run our personalization algorithm.

In this post, we don’t use any additional metadata to run our personalization algorithm.

  1. Choose the data target S3 bucket.
  2. On the Data target properties – S3 tab, for Format, choose CSV.
  3. For S3 Target location, enter the S3 path for your target. 

For this post, we use the same bucket we used for the JSON file.

For this post, we use the same bucket we used for the JSON file.

  1. On the Job details page, for Name, enter a name for your job (for this post, json_to_csv).
  2. For IAM Role, choose the role you created earlier.

You should also have included the AmazonS3FullAccess policy earlier.

  1. Leave the rest of the fields at their default settings.

Leave the rest of the fields at their default settings.

  1. Choose Save.
  2. Choose Run.

It may take a few minutes for the job to run.

In your Amazon S3 bucket, you should now see the CSV file that you use in the next section.

Setting up Amazon Personalize

At this point, you have your data formatted in a file type that Amazon Personalize can use. Amazon Personalize is a fully managed service that uses ML and over 20 years of recommendation experience at to enable you to improve end-user engagement by powering real-time personalized product and content recommendations, and targeted marketing promotions. In this section, we go through how to create an Amazon Personalize solution to use your data to create personalized experiences.

  1. On the Amazon Personalize console, under New dataset groups, choose Get started.
  2. Enter the name for your dataset group.

A dataset group contains the datasets, solutions, and event ingestion API.

  1. Enter a dataset name, and enter in the schema details based on your data.

For this dataset, we use the following schema. You can change the schema according to the values in your dataset.

{ "type": "record", "name": "Interactions", "namespace": "com.amazonaws.personalize.schema", "fields": [ { "name": "USER_ID", "type": "string" }, { "name": "ITEM_ID", "type": "string" }, { "name": "TIMESTAMP", "type": "long" } ], "version": "1.0"

  1. Choose Next.
  2. Enter your dataset import job name to import data from Amazon S3.

Make sure that your IAM service role has access to Amazon S3 and Amazon Personalize, and that your bucket has the correct bucket policy.

  1. Enter the path to your data (the Amazon S3 bucket from the previous section).
  2. On the Dashboard page for your dataset groups, under Upload datasets, import the user-item-interactions data (user data and item data are optional but can enhance the solution).

On the Dashboard page for your dataset groups, under Upload datasets,

We include an example item.csv file in the GitHub repo. The following screenshot shows an example of the item data.

The following screenshot shows an example of the item data.

  1. Under Create solutions, for Solutions training, choose Start.

A solution is a trained model of the data you provided with the algorithm, or recipe, that you select.

  1. For Solution name, enter aws-user-personalization.
  2. Choose Next.
  3. Review and choose Finish.
  4. On the dashboard, under Launch campaigns, for Campaign creation, choose Start.

A campaign allows your application to get recommendations from your solution version.

  1. For Campaign name, enter a name.
  2. Choose the solution you created.
  3. Choose Create campaign.

You have now successfully used the data from your data lake and created a recommendation model that can be used to get various recommendations. With this dataset, you can get personalized recommendations for houseware products based off the user’s interactions with other products in the dataset.

Using Amazon Personalize to get your recommendations

To test your solution, go to the campaign you created. In the Test campaign results section, under User ID, enter an ID to get recommendations for. A list of IDs shows up, along with a relative score. The item IDs correlate with specific products recommended.

The following screenshot shows a search for user ID 1. They have been recommended item ID 59, which correlates to a wooden picture frame. The score listed next to the item gives you the predicted relevance of each item to your user.

The following screenshot shows a search for user ID 1.

To learn more about Amazon Personalize scores, see Introducing recommendation scores in Amazon Personalize.

To generate recommendations, you can call the GetRecommendations or GetPersonalizedRanking API using the AWS Command Line Interface (AWS CLI) or a language-specific SDK. With Amazon Personalize, your recommendations can change as the user clicks on the items for more real-time use cases. For more information, see Getting Real-Time Recommendations.


AWS offers a wide range of AI/ML and analytics services that you can use to gain insights and guide better business decisions. In this post, you used a JSON dataset that included additional columns of data, and cleaned and transformed that data using AWS Glue. In addition, you built a custom model using Amazon Personalize to provide recommendations for your customers.

To learn more about Amazon Personalize, see the developer guide. Try this solution out and let us know if you have any questions in the comments.

About the Authors

Zoish PithwafaZoish Pithawala is a Startup Solutions Architect at Amazon Web Services based out of San Francisco. She primarily works with startup customers to help them build secure and scalable solutions on AWS.




Sam TranSam Tran is a Startup Solutions Architect at Amazon Web Services based out of Seattle. He focuses on helping his customers create well-architected solutions on AWS.


Continue Reading
Blockchain4 days ago

Carrefour Shoppers in the UAE to Get Farm-to-Shelf Information with Blockchain Technology

Proposed hardware implementation of the QEC code. The circuit consists of two Josephson junctions coupled by a gyrator, highlighted in red. CREDIT M. Rymarz et al., Phys Rev X (2021), (CC BY 4.0)
Nano Technology4 days ago

Blueprint for fault-tolerant qubits: Scientists at Forschungszentrum Jülich and RWTH Aachen University have designed a circuit for quantum computers which is naturally protected against common errors

Proposed hardware implementation of the QEC code. The circuit consists of two Josephson junctions coupled by a gyrator, highlighted in red. CREDIT M. Rymarz et al., Phys Rev X (2021), (CC BY 4.0)
Nano Technology5 days ago

Blueprint for fault-tolerant qubits: Scientists at Forschungszentrum Jülich and RWTH Aachen University have designed a circuit for quantum computers which is naturally protected against common errors

Automotive5 days ago

SpaceX Starship ready to find out if third time’s the charm later this week

PR Newswire4 days ago

International HPV Awareness Day Summit

Proposed hardware implementation of the QEC code. The circuit consists of two Josephson junctions coupled by a gyrator, highlighted in red. CREDIT M. Rymarz et al., Phys Rev X (2021), (CC BY 4.0)
Nano Technology4 days ago

Blueprint for fault-tolerant qubits: Scientists at Forschungszentrum Jülich and RWTH Aachen University have designed a circuit for quantum computers which is naturally protected against common errors

AI4 days ago

I’m fired: Google AI in meltdown as ethics unit co-lead forced out just weeks after coworker ousted

Automotive4 days ago

FAA clears SpaceX Starship prototype for third launch and landing attempt

PR Newswire4 days ago

Anticoagulant Reversal Drugs Market Size Worth $1.81 Billion By 2027: Grand View Research, Inc.

Nano Technology4 days ago

Dynamics of nanoparticles using a new isolated lymphatic vessel lumen perfusion system

Proposed hardware implementation of the QEC code. The circuit consists of two Josephson junctions coupled by a gyrator, highlighted in red. CREDIT M. Rymarz et al., Phys Rev X (2021), (CC BY 4.0)
Nano Technology5 days ago

Blueprint for fault-tolerant qubits: Scientists at Forschungszentrum Jülich and RWTH Aachen University have designed a circuit for quantum computers which is naturally protected against common errors

PR Newswire4 days ago

IAR Systems introduces 64-bit Arm core support in leading embedded development tools

PR Newswire4 days ago

Why Famtech Will Become a Major Trend in the Coming Years

Proposed hardware implementation of the QEC code. The circuit consists of two Josephson junctions coupled by a gyrator, highlighted in red. CREDIT M. Rymarz et al., Phys Rev X (2021), (CC BY 4.0)
Nano Technology3 days ago

Blueprint for fault-tolerant qubits: Scientists at Forschungszentrum Jülich and RWTH Aachen University have designed a circuit for quantum computers which is naturally protected against common errors

Proposed hardware implementation of the QEC code. The circuit consists of two Josephson junctions coupled by a gyrator, highlighted in red. CREDIT M. Rymarz et al., Phys Rev X (2021), (CC BY 4.0)
Nano Technology5 days ago

Blueprint for fault-tolerant qubits: Scientists at Forschungszentrum Jülich and RWTH Aachen University have designed a circuit for quantum computers which is naturally protected against common errors

PR Newswire4 days ago

Heritage Health Solutions, Inc. Announces New President

Globe NewsWire4 days ago

Notice to the Annual General Meeting of Savosolar Plc

Bioengineer4 days ago

Graphene Oxide membranes could reduce paper industry energy costs

Nano Technology4 days ago

A speed limit also applies in the quantum world: Study by the University of Bonn determines minimum time for complex quantum operations

Bioengineer4 days ago

UH receives $5 million to combat HIV/AIDS epidemic