Connect with us


AI Has Track Record in Fraud Prevention for Credit Card Issuers




Credit service providers Visa and Experian have a track record in using AI for fraud detection. (GETTY IMAGES)

By John P. Desmond, AI Trends Editor

The financial services industry has compiled a track record in the use of AI for fraud detection, with AI applications at Visa and Experian being two notable examples.

The multinational Visa reports saving an estimated $25 billion annually from use of AI applications for fraud detection, according to Melissa McSherry, a senior VP and global head of data for Visa, according to an account in VentureBeat. The path to AI Visa chose may have lessons for other companies thinking about how to launch their automation projects.

“We have definitely taken a use case approach to AI,” McSherry stated. “We don’t deploy AI for the sake of AI. We deploy it because it’s the most effective way to solve a problem.”

Melissa McSherry, Senior VP and Global Head of Data, Visa

The Visa Advanced Authorization (VAA) platform scores every transaction that goes across the network, rating each one based on the likelihood it is fraudulent. This allows more transactions to be approved quickly. “With 3.5 billion cards and 210 billion transactions a year, it is really worth it to everyone to make those cards work better and for more transactions to go through,” McSherry stated.

First deployed in 1993, today the VAA has evolved to use of recurrent neural networks with gradient boosted trees. Having the defined use case of fraud detection, has helped Visa to focus on how AI and machine learning can improve its services.

“It helps that we started with the first use case a long time ago,” McSherry said. “There’s no substitute for experience, and I think we have a fair amount of experience at this point on how to build and deploy these models. And so the first lesson is just at a certain point, you have to pick a use case and you just have to start.”

Visa has seen a 20-30% improvement in model performance when advanced AI techniques are applied versus more traditional machine learning technicals such as gradient boosted trees, she noted. In some cases, the improvement has been more than 100%, which speeds the development of new product services. “We are able to put better products in front of consumers faster,” she stated.

Steve Platt of Experian, the global information and credit services provider, also has experience with AI and fraud detection across more than one generation of systems. Now the head of Global Software at Experience, Platt’s first exposure to AI and fraud detection date to January of 2001, when he joined the Hecht-Nielsen Neurocomputer Corp. (HNC) in San Diego to help commercialize software the founder Robert Hecht-Nielsen had originated.

Steve Platt, Group President, Global Business Information Services, Experian

Hecht-Nielsen was a neuroscientist and entrepreneur who had been teaching at the University of California, according to an account in Forbes. He had worked with a small group of academics and researchers on neural networks, a blend of statistics and AI. They developed a fraud detection product called Falcon and had acquired some customers, who liked the product and were looking for improvements – better predictions and more value.

As manager of the product, Platt emphasized building the fraud detection into the approval process for credit approval transactions. The more advanced card issuers could then deliver an authorization in real time; this was before cloud computing.

Platt also concentrated on getting a high volume of good-quality, well-structured data from credit card transactions to help the Falcon machine learning application learn. He also stayed close to early adopter customers, to understand their problems, integrate with their transaction environment and analyze their data. Leading issues including MBNA, Banc One and First Data were approached to partner on a design solution that would work for them.

HNC was sold in 2002 to Fair Isaac Corp., now called FICO, a data analytics company focused on credit scoring services. Now called the FICO Falcon Platform, the product is still in extensive use. Platt worked for the acquiring company for several years, then founded a fraud prevention company called BasePoint Analytics, and then moved to Experian. He’s been there 10 years and  is now the company’s Group President, Global Business Information Services.

The lessons of HNC have served him well. Experian has a number of AI/machine learning-based products on the market, including its core credit score offering, fraud prevention and collections. “We’re in the business of data and data-driven insights,” he stated.

Experian has developed a way to blend software development practices of yesteryear and today’s AI software development. An internal DataLabs organization pursues projects with business units it deems innovative, exploring new data sources, new algorithms and new use cases.  The lab sets up a common method for AI-based product development that employs agile methods and rapid development. The developers work closely with selected customers to build a proof of concept or prototype. They iterate that into a product, then help the customer put it into operation in one or more regions.

The structured methodology enables experience to monitor where they are in the process, and quickly adjust if the business case is changing. One product developed in this framework is Experian Boost, which allows consumers to “boost” their credit scores by providing mobile phone and utilities payments data not captured in the traditional credit scoring process. Though still undergoing testing, it was brought to the market in nine months.

Bigger Banks Using Fraud Detection; Kount Attracts Investment

Financial institutions with over $100 billion in assets are the most likely to have adopted AI and of those, 73% are currently using AI for payment fraud detection, according to a recent survey, AI Innovation Playbook, published by PYMNTS and reported in Forbes. The study was based on interviews with 200 financial executives from commercial banks, community banks and credit unions across the US.

Fraud detection has proved an attractive target market for startups, such as Kount, which launched in Boise, Idaho in 2007. Today the company holds 29 patents and has been funded with an $80 million investment from CVC Capital Partners in 2016.

Kount’s Identity Trust Global Network delivers real-time fraud prevention, account protection, and enables personalized customer experiences to more than 6,500 leading brands and payment providers.

The closing of many businesses during the coronavirus lockdown has led to soaring e-commerce volume, which has presented opportunities for fraud detection players. Kount founder and CEO Brad Wiskerchen recently wrote in PYMNTS.

Brad Wiskerchen, Founder and CEO, Kount

“As the pandemic accelerated in March and April, we saw digital transaction volumes skyrocket for many industries, like vitamins, wellness, electronics, pet supplies and others,” he stated. “In April, purchase volumes for crafts and online wine were up more than 600% from the average February week. Handling that volume while maintaining exceptional customer experiences and preventing fraud requires an adaptable solution that can make accurate decisions in real time.”

The increased e-commerce means sees more businesses offering new digital experiences, including memberships, accounts and loyalty points. “Each represents a unique area of the customer experience that should be protected,” states Wiskirchen.

Read the source articles in VentureBeat, Forbes and  PYMNTS.



Increasing the relevance of your Amazon Personalize recommendations by leveraging contextual information




Getting relevant recommendations in front of your users at the right time is a crucial step for the success of your personalization strategy. However, your customer’s decision-making process shifts depending on the context at the time when they’re interacting with your recommendations. In this post, I show you how to set up and query a context-aware Amazon Personalize deployment.

Amazon Personalize allows you to easily add sophisticated personalization capabilities to your applications by using the same machine learning (ML) technology used on for over 20 years. No ML experience is required. Amazon Personalize supports the automatic adjustment of recommendations based on contextual information about your user, such as device type, location, time of day, or other information you provide.

The Harvard study How Context Affects Choice defines context as factors that can influence the choice outcome by altering the process by which a decision is made. As a business owner, you can identify this context by analyzing how your customers shop differently when accessing your catalog from a phone vs. a computer, or seeing the shift in your customer’s content consumption on rainy vs. sunny days.

Leveraging your user’s context allows you to provide a more personalized experience for existing users and helps decrease the cold-start phase for new or unidentified users. The cold-start phase refers to the period when your recommendation engine provides non-personalized recommendations due to the lack of historical information regarding that user.

Adding context to Amazon Personalize

You can set up and use context in Amazon Personalize in four simple steps:

  1. Include your user’s context in the historical user-item interactions dataset.
  2. Train a context aware solution with a User Personalization or Personalized Ranking recipe. A recipe refers to the algorithm your recommender is trained on using the behavioral data specified in your interactions dataset plus any user or items metadata.
  3. Specify the user’s context when querying for real-time recommendations using the GetRecommendations or GetPersonalizedRanking
  4. Include your user’s context when recording events using the event tracker.

The following diagram illustrates the architecture of these steps.

You want to be explicit about the context to consider when constructing datasets. A common example of context customers actively use is device type, such as a phone, tablet, or desktop. The study The Effect of Device Type on Buying Behavior in Ecommerce: An Exploratory Study from the University of Twente in the Netherlands has proven that device type has an influence on buying behavior and people might postpone a buying decision if they’re online with the wrong device type. Embedding device type context in your datasets allows Amazon Personalize to learn this pattern and, at inference time, recommend the most appropriate content with awareness of the user’s context.

Recommendations use case

For this use case, a travel enthusiast is our potential customer. They look at a few things when deciding which airline to travel with to their given destination. For example, is it a short or a long flight? Will the trip be booked with cash or with miles? Are they traveling alone? Where are they be departing and returning to? After they answer these initial questions, the next big decision is picking the cabin type to fly in. If our travel enthusiast is flying in a high-end cabin type, we can assume they’re looking at which airline provides the best experience possible. Now that we have a good idea on what our user is looking for, it’s shopping time!

Consider some of the variables that go into the decision-making process of this use case. We can’t control many of these factors, but we can use some to tailor our recommendations. First, identify common denominators that might affect a user’s behavior. In this case, flight duration and cabin type are good candidates to use as context, and traveler type and traveler residence are good candidates for user metadata when building our recommendation datasets. Metadata is information you know about your users and items that stays somewhat constant over a period of time, whereas context is environmental information that can shift rapidly across time, influencing your customer’s perception and behavior.

Selecting the most relevant metadata fields in your training datasets and enriching your interactions datasets with context is important for generating relevant user recommendations. In this post, we build an Amazon Personalize deployment that returns a list of airline recommendations for a customer. We add cabin type as the context and traveler residence as the metadata field and observe how recommendations shift based on context and metadata.


We first need to set up the following Amazon Personalize resources. For full instructions, see Getting Started (Console) to complete the following steps:

  1. Create a dataset group. In this post, we name it airlines-blog-example.
  2. Create an Interactions dataset using the following schema and import data using the interactions_dataset.csv file:
    { "type": "record", "name": "Interactions", "namespace": "com.amazonaws.personalize.schema", "fields": [
    { "name": "ITEM_ID", "type": "string" }, { "name": "USER_ID", "type": "string" }, { "name": "TIMESTAMP", "type": "long" }, { "name":"CABIN_TYPE", "type": "string", "categorical": true }, { "name": "EVENT_TYPE", "type": "string" }, { "name": "EVENT_VALUE", "type": "float" } ], "version": "1.0"

  3. Create a Users dataset using the following schema and import data using the users_dataset.csv file:
    { "type": "record", "name": "Users", "namespace": "com.amazonaws.personalize.schema", "fields": [ { "name": "USER_ID", "type": "string" }, { "name": "USER_RESIDENCE", "type": "string", "categorical": true } ], "version": "1.0"

  4. Create a solution. In this post, we use the default solution configurations, except for the following:
    1. Recipeaws-hrnn-metadata
    2. Event type – RATING
    3. Perform HPO – True

Hyperparameter optimization (HPO) is recommended if you want Amazon Personalize to run parallel trainings and experiments to identify the most performant hyperparameters. For more information, see Hyperparameters and HPO.

  1. Create a campaign.

You can set up the preceding resources on the Amazon Personalize console or by following the Jupyter notebook personalize_hrnn_metadata_contextual_example.ipynb example on the GitHub repo.

Exploring your Amazon Personalize resources

We have now created several Amazon Personalize resources, including a dataset group called airlines-blog-example. The dataset group contains two datasets: interactions and users, which contain the data used to train your Amazon Personalize model (also known as a solution). We also created a campaign to provide real-time recommendations.

We can now explore how the interactions and users dataset schemas help our model learn from the context and metadata embedded in the datasets.

Interactions dataset

We provide Amazon Personalize an interactions dataset with a numeric rating (combination of EVENT_TYPE + EVENT_VALUE) that a user (USER_ID) has given an airline (ITEM_ID) when flying in a certain cabin type (CABIN_TYPE) at a given time (TIMESTAMP). By providing this information to Amazon Personalize in the dataset and schema, we can add CABIN_TYPE as the context when querying the recommendations for a user and recording new interactions through the event tracker. At training time, the model automatically identifies important features from this data (for our use case, the highest rated airlines across cabin types).

The following screenshot showcases a small portion of the interactions_dataset.csv file.

User dataset

We also provide Amazon Personalize a user dataset with the users (USER_ID) who provided the ratings in the interactions dataset, assuming that they gave the rating from their country of residence (USER_RESIDENCE). In this use case, USER_RESIDENCE is the metadata we picked for these users. By providing USER_RESIDENCE as user metadata, the model can learn which airlines are interacted with the most by users across countries and regions, so when we query for recommendations, it takes USER_RESIDENCE in consideration. For example, users in Asia see different airline options compared to users in South America or Europe.

The following screenshot shows a small portion of the user_dataset.csv file.

The raw dataset of user airlines ratings from Skytrax contains 20 columns with over 40,000 records. In this post, we use a modified version of this dataset and split the most relevant columns of the raw dataset into two datasets (users and interactions). For more information about splitting the data in a Jupyter notebook, see personalize_hrnn_metadata_contextual_example.ipynb on the GitHub repo.

The next section shows how context and metadata influence the real-time recommendations provided by your Amazon Personalize campaign.

Applying context to your Amazon Personalize real-time recommendations queries

During this test, we observe the effect that context has on the recommendations provided to users. In our use case, we have an interactions dataset of numerical airline ratings from multiple users. In our schemas, the cabin type is included as a categorical value for the interactions dataset and the user residence as a metadata field in the users dataset. Our theory is that by adding the cabin type as context, the airline recommendations will shift to account for it.

  1. On your Amazon Personalize dataset group dashboard, choose View campaigns.
  2. Choose your newly created campaign.
  3. For User ID, enter JDowns.
  4. Choose Get recommendations.

You should see a Test campaign results page similar to the following screenshot.

We initially queried a list of airlines for our user without any context. We now focus on the top 10 recommendations and verify that they shift based on the context. We can add the context via the console by providing a key and value pair. In our use case, the key is CABIN_TYPE and the value can be one of the following:

  • Economy
  • Premium Economy
  • Business Class
  • First Class

The following two screenshots show our results for querying recommendations for the same user with Economy and First Class as values for the CABIN_TYPE context. The economy context doesn’t shift the top 10 list, but the first class context does have an effect—bumping Alaska Airlines to first place on the list.

You can explore your users_dataset.csv file for additional users to test your recommendations API, and a very similar shift of recommendations based on the context you include in the API call. You can also find that the airlines list shifts based on the User Residency metadata field. For example, the following screenshots show the top 10 recommendations for our JDowns user, who has United States as the value for User Residency, compared to the PhillipHarris user, who has France as the value for User Residency.


As shown in this post, adding context to your recommendation strategy is a very powerful and easy-to-implement exercise when using Amazon Personalize. The benefits of enriching your recommendations with context can result in an increase in your user engagement, which eventually leads to an increase in the revenue influenced by your recommendations.

This post showed you how to create an Amazon Personalize context-aware deployment and an end-to-end test of getting real-time recommendations applying context via the Amazon Personalize console. For instructions on using a Jupyter environment to set up the Amazon Personalize infrastructure and get recommendations using the Boto3 Python SDK, see personalize_hrnn_metadata_contextual_example.ipynb on the GitHub repo.

There’s even more that you can do with Amazon Personalize. For more information about core use cases and automation examples, see the GitHub repo.

If this post helps you or inspires you to solve a problem, share your thoughts and questions in the comments.

About the Author

Luis Lopez Soria is an AI/ML specialist solutions architect working with the AWS machine learning team. He works with AWS customers to help them adopt machine learning on a large scale. He enjoys playing sports, traveling around the world, and exploring new foods and cultures.


Continue Reading


Amazon Forecast can now use Convolutional Neural Networks (CNNs) to train forecasting models up to 2X faster with up to 30% higher accuracy




We’re excited to announce that Amazon Forecast can now use Convolutional Neural Networks (CNNs) to train forecasting models up to 2X faster with up to 30% higher accuracy. CNN algorithms are a class of neural network-based machine learning (ML) algorithms that play a vital role in’s demand forecasting system and enable to predict demand for over 400 million products every day. For more information about’s journey building demand forecasting technology using CNN models, watch the re:MARS 2019 keynote video. Forecast brings the same technology used at into the hands of everyday developers as a fully managed service. Anyone can start using Forecast, without any prior ML experience, by using the Forecast console or the API.

Forecasting is the science of predicting the future. By examining historical trends, businesses can make a call on what might happen and when, and build that into their future plans for everything from product demand to inventory to staffing. Given the consequences of forecasting, accuracy matters. If a forecast is too high, businesses over-invest in products and staff, which ends up as wasted investment. If the forecast is too low, they under-invest, which leads to a shortfall in inventory and a poor customer experience. Today, businesses try to use everything from simple spreadsheets to complex financial planning software to generate forecasts, but high accuracy remains elusive for two reasons:

  • Traditional forecasts struggle to incorporate very large volumes of historical data, missing out on important signals from the past that are lost in the noise.
  • Traditional forecasts rarely incorporate related but independent data, which can offer important context (such as sales, holidays, locations, and marketing promotions). Without the full history and the broader context, most forecasts fail to predict the future accurately.

At Amazon, we have learned over the years that no one algorithm delivers the most accurate forecast for all types of data. Traditional statistical models have been useful in predicting demand for products that have regular demand patterns, such as sunscreen lotions in the summer and woolen clothes in the winter. However, statistical models can’t deliver accurate forecasts for more complex scenarios, such as frequent price changes, differences between regional versus national demand, products with different selling velocities, and the addition of new products. Sophisticated deep learning models can provide higher accuracy in these use cases. Forecast automatically examines your data and selects the best algorithm across a set of statistical and deep learning algorithms to train the more accurate forecasting model for your data. With the addition of the CNN-based deep learning algorithm, Forecast can now further improve accuracy by up to 30% and train models up to 2X faster compared to the currently supported algorithms. This new algorithm can more accurately detect leading indicators of demand, such as pre-order information, product page visits, price changes, and promotional spikes, to build more accurate forecasts.

More Retail, a market leader in the fresh food and grocery category in India, participated in a beta test of the new CNN algorithm, with the help of Ganit, an analytics partner. Supratim Banerjee, Chief Transformation Officer at More Retail Limited, says, “At More, we rapidly innovate to sustain our business and beat competition. We have been looking for opportunities to reduce wastage due to over stocking, while continuing to meet customer demand. In our experiments for the fresh produce category, we found the new CNN algorithm in Amazon Forecast to be 1.7X more accurate compared to our existing forecasting system. This translates into massive cost savings for our business.”

Training a CNN predictor and creating forecasts

You can start using CNNs in Forecast through the CreatePredictor API or on the Forecast console. In this section, we walk through a series of steps required to train a CNN predictor and create forecasts within Forecast.

  1. On the Forecast console, create a dataset group.

  1. Upload your dataset.

  1. Choose Predictors from the navigation pane.
  2. Choose Train predictor.

  1. For Algorithm selection, select Manual.
  2. For Algorithm, choose CNN-QR.

To manually select CNN-QR through the CreatePredictor API, use arn:aws:forecast:::algorithm/CNN-QR for the AlgorithmArn.

When you choose CNN-QR from the drop-down menu, the Advanced Configuration section auto-expands.

  1. To let Forecast train the most optimized and accurate CNN model for your data, select Perform hyperparameter optimization (HPO).
  2. After you enter all your details on the Predictors page, choose Train predictor.

After your predictor is trained, you can view its details by choosing your predictor on the Predictors page. On the predictor’s details page, you can view the accuracy metrics and optimized hyperparameters for your model.

  1. Now that your model is trained, choose Forecasts from the navigation name.
  2. Choose Create a forecast.
  3. Create a forecast using your trained predictor.

You can generate forecasts at any quantile to balance your under-forecasting and over-forecasting costs.

Choosing the most accurate model with Forecast

With this launch, Forecast now supports one proprietary CNN model, one proprietary RNN model, and four other statistical models: Prophet, NPTS (Amazon proprietary), ARIMA, and ETS. The new CNN model is part of AutoML. We recommend always starting your experimentation with AutoML, in which Forecast finds the most optimized and accurate model for your dataset.

  1. On the Train predictor page, for Algorithm selection, select Automatic (AutoML).

  1. After your predictor is trained using AutoML, choose the predictor to see more details on the chosen algorithm.
  2. On the predictor’s details page, in the Algorithm metrics section, choose different algorithms from the drop-down menu to view their accuracy for comparison.

Tips and best practices

As you begin to experiment with CNNs and build your demand planning solutions on top of Forecast, consider the following tips and best practices:

  • For experimentation, start by identifying the most important item IDs for your business that you are looking to improve your forecasting accuracy. Measure the accuracy of your existing forecasting methodology as a baseline.
  • Use Forecast with only your target time series and assess the wQuantileLoss accuracy metric. We recommend selecting AutoML in Forecast to find the most optimized and accurate model for your data. For more information, see Evaluating Predictor Accuracy.
  • AutoML optimizes for accuracy and not training time, so AutoML may take longer to optimize your model. If training time is a concern for you, we recommend manually selecting CNN-QR and assessing its accuracy and training time. A slight degradation in accuracy may be an acceptable trade-off for considerable gains in training time.
  • After you see an increase in accuracy over your baseline, we recommend experimenting to find the right forecasting quantile that balances your under-forecasting and over-forecasting costs to your business.
  • We recommend deploying your model as a continuous workload within your systems to start reaping the benefits of more accurate forecasts. You can continue to experiment by adding related time series and item metadata to further improve the accuracy.
  • Incrementally add related time series or item metadata to train your model to assess whether additional information improves accuracy. Different combinations of related time series and item metadata can give you different results.


The new CNN algorithm is available in all Regions where Forecast is publicly available. For more information about Region availability, see Region Table. For more information about the CNN algorithm, see CNN-QR algorithm documentation.

About the authors

Namita Das is a Sr. Product Manager for Amazon Forecast. Her current focus is to democratize machine learning by building no-code/low-code ML services. She frequently advises startups and has started dabbling in baking.

Danielle Robinson is an Applied Scientist on the Amazon Forecast team. Her research is in time series forecasting and in particular how we can apply new neural network-based algorithms within Amazon Forecast. Her thesis research was focused on developing new, robust, and physically accurate numerical models for computational fluid dynamics. Her hobbies include cooking, swimming, and hiking.

Aaron Spieler is a working student in the Amazon Forecast team. He is starting his masters degree at the University of Tuebingen, and studied Data Engineering at Hasso Plattner Institute after obtaining a BS in Computer Science from University of Potsdam. His research interests span time series forecasting (especially using neural network models), machine learning, and computational neuroscience.

Gunjan Garg: Gunjan Garg is a Sr. Software Development Engineer in the AWS Vertical AI team. In her current role at Amazon Forecast, she focuses on engineering problems and enjoys building scalable systems that provide the most value to end-users. In her free time, she enjoys playing Sudoku and Minesweeper.

Chinmay Bapat is a Software Development Engineer in the Amazon Forecast team. His interests lie in the applications of machine learning and building scalable distributed systems. Outside of work, he enjoys playing board games and cooking.


Continue Reading


Securing Amazon Comprehend API calls with AWS PrivateLink




Amazon Comprehend now supports Amazon Virtual Private Cloud (Amazon VPC) endpoints via AWS PrivateLink so you can securely initiate API calls to Amazon Comprehend from within your VPC and avoid using the public internet.

Amazon Comprehend is a fully managed natural language processing (NLP) service that uses machine learning (ML) to find meaning and insights in text. You can use Amazon Comprehend to analyze text documents and identify insights such as sentiment, people, brands, places, and topics in text. No ML expertise required.

Using AWS PrivateLink, you can access Amazon Comprehend easily and securely by keeping your network traffic within the AWS network, while significantly simplifying your internal network architecture. It enables you to privately access Amazon Comprehend APIs from your VPC in a scalable manner by using interface VPC endpoints. A VPC endpoint is an elastic network interface in your subnet with a private IP address that serves as the entry point for all Amazon Comprehend API calls.

In this post, we show you how to set up a VPC endpoint and enforce the use of this private connectivity for all requests to Amazon Comprehend using AWS Identity and Access Management (IAM) policies.


For this example, you should have an AWS account and sufficient access to create resources in the following services:

Solution overview

The walkthrough includes the following high-level steps:

  1. Deploy your resources.
  2. Create VPC endpoints.
  3. Enforce private connectivity with IAM.
  4. Use Amazon Comprehend via AWS PrivateLink.

Deploying your resources

For your convenience, we have supplied an AWS CloudFormation template to automate the creation of all prerequisite AWS resources. We use the us-east-2 Region in this post, so the console and URLs may differ depending on the Region you select. To use this template, complete the following steps:

  1. Choose Launch Stack:
  2. Confirm the following parameters, which you can leave at the default values:
    1. SubnetCidrBlock1 – The primary IPv4 CIDR block assigned to the first subnet. The default value is
    2. SubnetCidrBlock2 – The primary IPv4 CIDR block assigned to the second subnet. The default value is
  3. Acknowledge that AWS CloudFormation may create additional IAM resources.
  4. Choose Create stack.

The creation process should take roughly 10 minutes to complete.

The CloudFormation template creates the following resources on your behalf:

  • A VPC with two private subnets in separate Availability Zones
  • VPC endpoints for private Amazon S3 and Amazon Comprehend API access
  • IAM roles for use by Lambda and Amazon Comprehend
  • An IAM policy to enforce the use of VPC endpoints to interact with Amazon Comprehend
  • An IAM policy for Amazon Comprehend to access data in Amazon S3
  • An S3 bucket for storing open-source data

The next two sections detail how to manually create a VPC endpoint for Amazon Comprehend and enforce usage with an IAM policy. If you deployed the CloudFormation template and prefer to skip to testing the API calls, you can advance to the Using Amazon Comprehend via AWS PrivateLink section.

Creating VPC endpoints

To create a VPC endpoint, complete the following steps:

  1. On the Amazon VPC console, choose Endpoints.
  2. Choose Create Endpoint.
  3. For Service category, select AWS services.
  4. For Service Name, choose
  5. For VPC, enter the VPC you want to use.
  6. For Availability Zone, select your preferred Availability Zones.
  7. For Enable DNS name, select Enable for this endpoint.

This creates a private hosted zone that enables you to access the resources in your VPC using custom DNS domain names, such as, instead of using private IPv4 addresses or private DNS hostnames provided by AWS. The Amazon Comprehend DNS hostname that the AWS Command Line Interface (CLI) and Amazon Comprehend SDKs use by default ( resolves to your VPC endpoint.

  1. For Security group, choose the security group to associate with the endpoint network interface.

If you don’t specify a security group, the default security group for your VPC is associated.

  1. Choose Create Endpoint.

When the Status changes to available, your VPC endpoint is ready for use.

  1. Choose the Policy tab to apply more restrictive access control to the VPC endpoint.

The following example policy limits VPC endpoint access to an IAM role used by a Lambda function in our deployment. You should apply the principle of least privilege when defining your own policy. For more information, see Controlling access to services with VPC endpoints.

{ "Version": "2012-10-17", "Statement": [ { "Action": [ "comprehend:DetectEntities", "comprehend:CreateDocumentClassifier" ], "Resource": [ "*" ], "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::#########:role/ComprehendPrivateLink-LambdaExecutionRole" ] } } ] }

Enforcing private connectivity with IAM

To allow or deny access to Amazon Comprehend based on the use of a VPC endpoint, we include an aws:sourceVpce condition in the IAM policy. The following example policy provides access specifically to the DetectEntities and CreateDocumentClassifier APIs only when the request utilizes your VPC endpoint. You can include additional Amazon Comprehend APIs in the “Action” section of the policy or use “comprehend:*” to include them all. You can attach this policy to an IAM role to enable compute resources hosted within your VPC to interact with Amazon Comprehend.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "ComprehendEnforceVpce", "Effect": "Allow", "Action": [ "comprehend:CreateDocumentClassifier", "comprehend:DetectEntities" ], "Resource": "*", "Condition": { "StringEquals": { "aws:SourceVpce": "vpce-xxxxxxxx" } } }, { "Sid": "PassRole", "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::#########:role/ComprehendDataAccessRole" } ]

You should replace the VPC endpoint ID with the endpoint ID you created earlier. Permission to invoke the PassRole API is required for asynchronous operations in Amazon Comprehend like CreateDocumentClassifer and should be scoped to your specific data access role.

Using Amazon Comprehend via AWS PrivateLink

To start using Amazon Comprehend with AWS PrivateLink, you perform the following high-level steps:

  1. Review the Lambda function for API testing.
  2. Create the DetectEntities test event.
  3. Train a custom classifier.

Reviewing the Lambda function

To review your Lambda function, on the Lambda console, choose the Lambda function that contains ComprehendPrivateLink in its name.

The VPC section of the Lambda console provides links to the various networking components automatically created for you during the CloudFormation deployment.

The function code includes a sample program that takes user input to invoke the specific Amazon Comprehend APIs supported by our example IAM policy.

Creating a test event

In this section, we create an event to detect entities within sample text using a pretrained model.

  1. From the Test drop-down menu, choose Create new test event.
  2. For Event name, enter a name (for example, DetectEntities).
  3. Replace the event JSON with the following code:
    { "comprehend_api": "DetectEntities", "language_code": "en", "text": ", Inc. is located in Seattle, WA and was founded July 5th, 1994 by Jeff Bezos, allowing customers to buy everything from books to blenders."

  4. Choose Save to store the test event.
  5. Choose Save to update the Lambda function.
  6. Choose Test to invoke the DetectEntities API.

The response should include results similar to the following code:

{ "Entities": [ { "Score": 0.9266431927680969, "Type": "ORGANIZATION", "Text": ", Inc.", "BeginOffset": 0, "EndOffset": 16 }, { "Score": 0.9952651262283325, "Type": "LOCATION", "Text": "Seattle, WA", "BeginOffset": 31, "EndOffset": 42 }, { "Score": 0.9998188018798828, "Type": "DATE", "Text": "July 5th, 1994", "BeginOffset": 59, "EndOffset": 73 }, { "Score": 0.9999810457229614, "Type": "PERSON", "Text": "Jeff Bezos", "BeginOffset": 77, "EndOffset": 87 } ]

You can update the test event to identify entities from your own text.

Training a custom classifier

We now demonstrate how to build a custom classifier. For training data, we use a version of the Yahoo answers corpus that is preprocessed into the format expected by Amazon Comprehend. This corpus, available on the AWS Open Data Registry, is cited in the paper Text Understanding from Scratch by Xiang Zhang and Yann LeCun. It is also used in the post Building a custom classifier using Amazon Comprehend.

  1. Retrieve the training data from Amazon S3.
  2. On the Amazon S3 console, choose the example S3 bucket created for you.
  3. Choose Upload and add the file you retrieved.
  4. Choose the uploaded object and note the Key.
  5. Return to the test function on the Lambda console.
  6. From the Test drop-down menu, choose Create new test event.
  7. For Event name, enter a name (for example, TrainCustomClassifier).
  8. Replace the event input with the following code:
    { "comprehend_api": "CreateDocumentClassifier", "custom_classifier_name": "custom-classifier-example", "language_code": "en", "training_data_s3_key": "comprehend-train.csv"

  9. If you changed the default file name, update the training_data_s3_key to match.
  10. Choose Save to store the test event.
  11. Choose Save to update the Lambda function.
  12. Choose Test to invoke the CreateDocumentClassifier API.

The response should include results similar to the following code:

{ "DocumentClassifierArn": "arn:aws:comprehend:us-east-2:0123456789:document-classifier/custom-classifier-example"

  1. On the Amazon Comprehend console, choose Custom classification to check the status of the document classifier training.

After approximately 20 minutes, the document classifier is trained and available for use.

Cleaning Up

To avoid incurring future charges, delete the resources you created during this walkthrough after concluding your testing.

  1. On the Amazon Comprehend console, delete the custom classifier.
  2. On the Amazon S3 console, empty the bucket created for you.
  3. If you launched the automated deployment, on the AWS CloudFormation console, delete the appropriate stack.

The deletion process takes approximately 10 minutes.


You have now successfully invoked Amazon Comprehend APIs using AWS PrivateLink. The use of IAM policies prevents requests from leaving your VPC and further improves your security posture. You can extend this solution to securely test additional features like Amazon Comprehend custom entity recognition real-time endpoints.

All Amazon Comprehend API calls are now supported via AWS PrivateLink. This feature exists in all commercial Regions where AWS PrivateLink and Amazon Comprehend are available. To learn more about securing Amazon Comprehend, see Security in Amazon Comprehend.

About the Authors

Dave Williams is a Cloud Consultant for AWS Professional Services. He works with public sector customers to securely adopt AI/ML services. In his free time, he enjoys spending time with his family, traveling, and watching college football.

Adarsha Subick is a Cloud Consultant for AWS Professional Services based out of Virginia. He works with public sector customers to help solve their AI/ML-focused business problems. In his free time, he enjoys archery and hobby electronics.

Saman Zarandioon is a Sr. Software Development Engineer for Amazon Comprehend. He earned a PhD in Computer Science from Rutgers University.


Continue Reading
Nano Technology7 hours ago

SEMI Partners with GLOBALFOUNDRIES to Offer Apprenticeship Program Aimed at Building the Electronics Talent Pipeline

Fisher Yu, University of Arkansas CREDIT University of Arkansas
Nano Technology7 hours ago

Materials science researchers develop first electrically injected laser: The diode laser uses semiconducting material germanium tin and could improve micro-processing speed and efficiency at much lower costs

Nano Technology7 hours ago

Advance in programmable synthetic materials: Reading sequence of metal atoms in MOFs allows encoding of multiple chemical functions

Blockchain7 hours ago

Invest 3% in Bitcoin to Avoid COVID-19 Lockdown Devaluation — BitGo CEO

Blockchain7 hours ago

Cointelegraph Launches Newsletter for Professional Investors

Blockchain8 hours ago

Bitcoin Cash short-term Price Analysis: 12 August

Blockchain8 hours ago

Token Launches From Ethereum to Telegram: Where Do We Go From Here?

AR/VR8 hours ago

Enterprise VR Hardware Specialist Varjo Raises $54 Million in Latest Funding Round

Blockchain8 hours ago

Grayscale Bitcoin Trust Saw Surge in Investor Interest After March

Blockchain9 hours ago

VeChain & Oxford Announce New Framework to Assess Consensus Protocols

Blockchain9 hours ago

Championing Blockchain Education in Africa: Women Leading the Bitcoin Cause

Gaming9 hours ago

Evening Reading – August 11, 2020

Blockchain9 hours ago

Chainlink: Traders under zero loss, but why?

Blockchain10 hours ago

The Babylon Project: A Blockchain Focused Hackathon with a Commitment to Diversity & Inclusion

AR/VR10 hours ago

Varjo Raises $54M Financing to Support Its Retina-Quality VR/AR Headsets for Enterprise

Blockchain10 hours ago

Ethereum, Zcash, Dogecoin Price Analysis: 12 August

Blockchain11 hours ago

Peer-to-Peer Exchange CryptoLocally Now Offers Instant Credit Card Payment

Blockchain11 hours ago

Cardano (ADA) Holds On to Crucial Support By a Thread

Blockchain12 hours ago

Bitcoin Creates Double-Top After Failing Close Above $12,000

Blockchain13 hours ago

DeFi Farmers Rush to Yam and Serum for Explosive Yields

Energy14 hours ago

Copper Foil Market Size Worth $10.3 Billion By 2027 | CAGR: 9.7%: Grand View Research, Inc.

Energy14 hours ago

Corundum Market Size Worth $3.5 Billion By 2027 | CAGR: 4.0%: Grand View Research, Inc.

AR/VR14 hours ago

Mozilla is Shuttering its XR Team Amidst Major Layoff, But ‘Hubs’ Will Continue

Energy15 hours ago

New Energy Challenger, Rebel Energy, Places Blue Prism Digital Workers at the Heart of its Launch Plans

Science15 hours ago

Teknosa grows by 580 percent in e-commerce and pulls its operating profit into positive territory in Q2, despite the pandemic

Science15 hours ago

Novo Ventures Portfolio Company F2G Closes US$60.8 Million Financing

Science15 hours ago

F2G Closes US$60.8 Million Financing to fund late stage development of novel mechanism antifungal agent

Blockchain15 hours ago

LocalCryptos Integrates Inbuilt Crypto-To-Crypto Exchanges, Powered by ChangeNOW

Publications15 hours ago

Putin’s plan for Russia’s coronavirus vaccine is at ‘high risk of backfiring,’ expert says

Publications16 hours ago

UK enters recession after GDP plunged by a record 20.4% in the second quarter

Gaming16 hours ago

Another Steam Game Festival Is Coming In October

Science16 hours ago

Top 25 Nationally Ranked Carr, Riggs & Ingram (CRI) Welcomes Cookeville-Based Firm, Duncan, Wheeler & Wilkerson, P.C.

Science16 hours ago

Avast plc Half Year Results For The Six-Months Ended 30 June 2020

Cyber Security16 hours ago

Russian hackers steal Prince Harry and Meghan Markle photos via Cyber Attack

Gaming16 hours ago

Oddworld: New ‘N Tasty Coming To Switch In October

Gaming16 hours ago

Linkin Park’s Mike Shinoda Is Writing A Song For Gamescom 2020

Cyber Security16 hours ago

Texas School District experiences DDoS Cyber Attack

Gaming16 hours ago

‘EVE: Echoes’ from CCP Games and Netease Is Now Available Early on the App Store, Servers Go Live Tomorrow

Gaming16 hours ago

Hans Zimmer Created An Extended Netflix “Ta Dum” Sound For Theatres

Cannabis16 hours ago

Everything you need to know about the Exxus Snap VV