Connect with us


Facial recognition tech: risks, regulations and future startup opportunities in the EU




Facial recognition differs from the conventional camara surveillance, as it is not a mere passive recording, but rather it entails identification of an individual by comparing newly capture images with those images saved in a data base.

The status in Europe

Although facial recognition is not yet specifically regulated in Europe, it is covered by the General Data Privacy Regulation – GDPR – as a means of collecting and processing personal biometric data, including facial data and fingerprints. Therefore, facial recognition is only possible under the criteria of the GDPR.

Biometric data provides a high level of accuracy when identifying an individual due to the uniqueness of the identifiers (facial image or fingerprint) and a great potential to improve business security.

The processing of biometric data, which is considered sensitive data, is in principle prohibited with some exceptions, such as, for reasons of substantial public interests, to protect the vital interest of the data holder or another person, or if data holder has given its explicit consent, to name some.

Moreover, other factors such as proportionality or power imbalance are considered to determine if it is a valid exception, for instance, facial recognition can be considered disproportionate to track attendance in a school, since less intrusive options are available. Also even when the data holder has explicitly consented to the processing of biometric data, consideration should be given to potential imbalance of power dynamics between the individual data holder and the institution processing the data. For instance in a student and school scenario, there could be doubts as to whether the consent of the parents of a student to the use of facial recognition techniques, is freely given in the manner intended by the GDPR and therefore, a valid exception to the prohibition of processing.

One of the challenges in this field is that the underlying technology used for facial recognition, for instance AI, can present serious risks of bias and discrimination, affecting and discriminating many people without the social control mechanism that governs human behaviour. Bias and discrimination are inherent risks of any societal or economic activity. Human decision-making is not immune to mistakes and biases. However, the same bias when present in AI could have a much larger effect.

Authentication vs. identification

Obviously biometrics for authentication (which is described as a security mechanism), is not the same as remote biometric identification (which is used for instance in airports or public spaces, to identify multiple persons’ identities at a distance and in continuous manner by checking them against data stored in a database).

The collection and use of biometric information used for facial recognition and identification in public spaces carries specific risks for fundamental rights. In fact, the European Commission (EC) has warned that remote biometric identification is the most intrusive form of facial recognition and it is in principle prohibited in Europe.

So where is all this going?

What should prevail: the protection of fundamental rights, or the advancement that comes with invasive and overpowering new technologies?

New technologies, like AI, bring some benefits, such as technological advancement and more efficiency and economic growth, but at what cost?

Using a risk-based approach the EC has considered the use of AI for remote biometric identification and other intrusive surveillance technologies to be high-risk, since it could compromise fundamental rights such as human dignity, non-discrimination and privacy protection.

The EU Commission is currently investigating whether additional safeguards are needed or whether facial recognition should not be allowed in certain cases, or certain areas, opening the door for a debate regarding the scenarios that could justify the use of facial recognition for remote biometric identification.

Artificial intelligence entails great benefits but also several potential risks, such as opaque decision-making, gender-based or other kinds of discrimination, intrusion in our private lives or being used for criminal purposes.

To address these challenges, the Commission in its white paper on AI, issued in February this year, has proposed a new regulatory framework on high risk AI, and a prior conformity assessment, including testing and certification of AI facial recognition high risk systems to ensure that they abide by EU standards and requirements.

The regulatory framework will include additional mandatory legal requirements related to training data, record-keeping, transparency, accuracy, oversight and application-based use, and specific requirements for some AI applications, specifically those designed for remote biometric facial recognition.

We should then expect new regulation coming, with the aim to have an AI system framework, compliant with current legislation and that does not compromise fundamental rights.

Opportunities for startups?

Facial recognition technologies are here to stay, therefore, so if you are thinking about changing your hair colour, watch out as your phone might not recognize you! With the speed in which facial recognition is growing, we should not wait too long for new forms of ‘selfie payment’.

Facial recognition is already been used quite successfully in several areas, among them:

  1. Health: Where thanks to face analysis is already possible to track patience use of mediation more accurately;
  2. Market and retail: Where facial recognition promises the most, as ‘knowing your customer’ is a hot topic, this means placing cameras in retail outlets to analyze the shopper behavior and improve the customer experience, subject of course to the corresponding privacy checks; and,
  3. Security and law enforcement: That is, to find missing children, identify and track criminals or accelerate investigations.

With lots of choices on the horizon for facial recognition, it remains to be seen whether European startups will lead new innnovations in this area.



Trump pardons former Google self-driving car engineer Anthony Levandowski




(Reuters) — U.S. President Donald Trump said on Wednesday he had given a full pardon to a former Google engineer sentenced for stealing a trade secret on self-driving cars months before he briefly headed Uber’s rival unit.

Anthony Levandowski, 40, was sentenced in August to 18 months in prison after pleading guilty in March. He was not in custody but a judge had said he could enter custody once the COVID-19 pandemic subsided.

The White House said Levandowski had “paid a significant price for his actions and plans to devote his talents to advance the public good.”

Alphabet’s Waymo, a self-driving auto technology unit spun out of Google, declined to comment. The company previously described Levandowski’s crime as “a betrayal” and his sentence “a win for trade secret laws.”

The pardon was backed by several leaders in the technology industry who have supported Trump, including investors Peter Thiel and Blake Masters and entrepreneur Palmer Luckey, according to the White House.

Levandowski transferred more than 14,000 Google files, including development schedules and product designs, to his personal laptop before he left, and while negotiating a new role with Uber.


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform
  • networking features, and more

Become a member


Continue Reading


Building, automating, managing, and scaling ML workflows using Amazon SageMaker Pipelines




We recently announced Amazon SageMaker Pipelines, the first purpose-built, easy-to-use continuous integration and continuous delivery (CI/CD) service for machine learning (ML). SageMaker Pipelines is a native workflow orchestration tool for building ML pipelines that take advantage of direct Amazon SageMaker integration. Three components improve the operational resilience and reproducibility of your ML workflows: pipelines, model registry, and projects. These workflow automation components enable you to easily scale your ability to build, train, test, and deploy hundreds of models in production, iterate faster, reduce errors due to manual orchestration, and build repeatable mechanisms.

SageMaker projects introduce MLOps templates that automatically provision the underlying resources needed to enable CI/CD capabilities for your ML development lifecycle. You can use a number of built-in templates or create your own custom template. You can use SageMaker Pipelines independently to create automated workflows; however, when used in combination with SageMaker projects, the additional CI/CD capabilities are provided automatically. The following screenshot shows how the three components of SageMaker Pipelines can work together in an example SageMaker project.

This post focuses on using an MLOps template to bootstrap your ML project and establish a CI/CD pattern from sample code. We show how to use the built-in build, train, and deploy project template as a base for a customer churn classification example. This base template enables CI/CD for training ML models, registering model artifacts to the model registry, and automating model deployment with manual approval and automated testing.

MLOps template for building, training, and deploying models

We start by taking a detailed look at what AWS services are launched when this build, train, and deploy MLOps template is launched. Later, we discuss how to modify the skeleton for a custom use case.

To get started with SageMaker projects, you must first enable it on the Amazon SageMaker Studio console. This can be done for existing users or while creating new ones. For more information, see SageMaker Studio Permissions Required to Use Projects.

In SageMaker Studio, you can now choose the Projects menu on the Components and registries menu.

On the projects page, you can launch a preconfigured SageMaker MLOps template. For this post, we choose MLOps template for model building, training, and deployment.

Launching this template starts a model building pipeline by default, and while there is no cost for using SageMaker Pipelines itself, you will be charged for the services launched. Cost varies by Region. A single run of the model build pipeline in us-east-1 is estimated to cost less than $0.50. Models approved for deployment incur the cost of the SageMaker endpoints (test and production) for the Region using an ml.m5.large instance.

After the project is created from the MLOps template, the following architecture is deployed.

Included in the architecture are the following AWS services and resources:

  • The MLOps templates that are made available through SageMaker projects are provided via an AWS Service Catalog portfolio that automatically gets imported when a user enables projects on the Studio domain.
  • Two repositories are added to AWS CodeCommit:
    • The first repository provides scaffolding code to create a multi-step model building pipeline including the following steps: data processing, model training, model evaluation, and conditional model registration based on accuracy. As you can see in the file, this pipeline trains a linear regression model using the XGBoost algorithm on the well-known UCI Abalone dataset. This repository also includes a build specification file, used by AWS CodePipeline and AWS CodeBuild to run the pipeline automatically.
    • The second repository contains code and configuration files for model deployment, as well as test scripts required to pass the quality gate. This repo also uses CodePipeline and CodeBuild, which run an AWS CloudFormation template to create model endpoints for staging and production.
  • Two CodePipeline pipelines:
    • The ModelBuild pipeline automatically triggers and runs the pipeline from end to end whenever a new commit is made to the ModelBuild CodeCommit repository.
    • The ModelDeploy pipeline automatically triggers whenever a new model version is added to the model registry and the status is marked as Approved. Models that are registered with Pending or Rejected statuses aren’t deployed.
  • An Amazon Simple Storage Service (Amazon S3) bucket is created for output model artifacts generated from the pipeline.
  • SageMaker Pipelines uses the following resources:
    • This workflow contains the directed acyclic graph (DAG) that trains and evaluates our model. Each step in the pipeline keeps track of the lineage and intermediate steps can be cached for quickly re-running the pipeline. Outside of templates, you can also create pipelines using the SDK.
    • Within SageMaker Pipelines, the SageMaker model registry tracks the model versions and respective artifacts, including the lineage and metadata for how they were created. Different model versions are grouped together under a model group, and new models registered to the registry are automatically versioned. The model registry also provides an approval workflow for model versions and supports deployment of models in different accounts. You can also use the model registry through the boto3 package.
  • Two SageMaker endpoints:
    • After a model is approved in the registry, the artifact is automatically deployed to a staging endpoint followed by a manual approval step.
    • If approved, it’s deployed to a production endpoint in the same AWS account.

All SageMaker resources, such as training jobs, pipelines, models, and endpoints, as well as AWS resources listed in this post, are automatically tagged with the project name and a unique project ID tag.

Modifying the sample code for a custom use case

After your project has been created, the architecture described earlier is deployed and the visualization of the pipeline is available on the Pipelines drop-down menu within SageMaker Studio.

To modify the sample code from this launched template, we first need to clone the CodeCommit repositories to our local SageMaker Studio instance. From the list of projects, choose the one that was just created. On the Repositories tab, you can select the hyperlinks to locally clone the CodeCommit repos.

ModelBuild repo

The ModelBuild repository contains the code for preprocessing, training, and evaluating the model. The sample code trains and evaluates a model on the UCI Abalone dataset. We can modify these files to solve our own customer churn use case. See the following code:

|-- codebuild-buildspec.yml
|-- pipelines
| |-- abalone
| | |--
| | |--
| | |--
| | |--
| |--
| |--
| |--
| |--
| |--
|-- sagemaker-pipelines-project.ipynb
|-- setup.cfg
|-- tests
| --
|-- tox.ini

We now need a dataset accessible to the project.

  1. Open a new SageMaker notebook inside Studio and run the following cells:
    !unzip -o
    !mv "Data sets" Datasets import os
    import boto3
    import sagemaker
    prefix = 'sagemaker/DEMO-xgboost-churn'
    region = boto3.Session().region_name
    default_bucket = sagemaker.session.Session().default_bucket()
    role = sagemaker.get_execution_role() RawData = boto3.Session().resource('s3')
    .Bucket(default_bucket).Object(os.path.join(prefix, 'data/RawData.csv'))
    .upload_file('./Datasets/churn.txt') print(os.path.join("s3://",default_bucket, prefix, 'data/RawData.csv'))

  1. Rename the abalone directory to customer_churn. This requires us to modify the path inside codebuild-buildspec.yml as shown in the sample repository. See the following code:
    run-pipeline --module-name pipelines.customer-churn.pipeline 

  1. Replace the code with the customer churn preprocessing script found in the sample repository.
  2. Replace the code with the customer churn pipeline script found in the sample repository.
    1. Be sure to replace the “InputDataUrl” default parameter with the Amazon S3 URL obtained in step 1:
      input_data = ParameterString( name="InputDataUrl", default_value=f"s3://YOUR_BUCKET/RawData.csv",

    2. Update the conditional step to evaluate the classification model:
      # Conditional step for evaluating model quality and branching execution
      cond_lte = ConditionGreaterThanOrEqualTo( left=JsonGet(step=step_eval, property_file=evaluation_report, json_path="binary_classification_metrics.accuracy.value"), right=0.8

    One last thing to note is the default ModelApprovalStatus is set to PendingManualApproval. If our model has greater than 80% accuracy, it’s added to the model registry, but not deployed until manual approval is complete.

  1. Replace the code with the customer churn evaluation script found in the sample repository. One piece of the code we’d like to point out is that, because we’re evaluating a classification model, we need to update the metrics we’re evaluating and associating with trained models:
    report_dict = { "binary_classification_metrics": { "accuracy": { "value": acc, "standard_deviation" : "NaN" }, "auc" : { "value" : roc_auc, "standard_deviation": "NaN" }, },
    } evaluation_output_path = '/opt/ml/processing/evaluation/evaluation.json'
    with open(evaluation_output_path, 'w') as f: f.write(json.dumps(report_dict))

The JSON structure of these metrics are required to match the format of sagemaker.model_metrics for complete integration with the model registry. 

ModelDeploy repo

The ModelDeploy repository contains the AWS CloudFormation buildspec for the deployment pipeline. We don’t make any modifications to this code because it’s sufficient for our customer churn use case. It’s worth noting that model tests can be added to this repo to gate model deployment. See the following code:

├── buildspec.yml
├── endpoint-config-template.yml
├── prod-config.json
├── staging-config.json
└── test ├── buildspec.yml └──

Triggering a pipeline run

Committing these changes to the CodeCommit repository (easily done on the Studio source control tab) triggers a new pipeline run, because an Amazon EventBridge event monitors for commits. After a few moments, we can monitor the run by choosing the pipeline inside the SageMaker project.

 The following screenshot shows our pipeline details. Choosing the pipeline run displays the steps of the pipeline, which you can monitor.

When the pipeline is complete, you can go to the Model groups tab inside the SageMaker project and inspect the metadata attached to the model artifacts.

If everything looks good, we can manually approve the model.

This approval triggers the ModelDeploy pipeline and exposes an endpoint for real-time inference.



SageMaker Pipelines enables teams to leverage best practice CI/CD methods within their ML workflows. In this post, we showed how a data scientist can modify a preconfigured MLOps template for their own modeling use case. Among the many benefits is that the changes to the source code can be tracked, associated metadata can be tied to trained models for deployment approval, and repeated pipeline steps can be cached for reuse. To learn more about SageMaker Pipelines, check out the website and the documentation. Try SageMaker Pipelines in your own workflows today.

About the Authors

Sean Morgan is an AI/ML Solutions Architect at AWS. He previously worked in the semiconductor industry, using computer vision to improve product yield. He later transitioned to a DoD research lab where he specialized in adversarial ML defense and network security. In his free time, Sean is an active open-source contributor and maintainer, and is the special interest group lead for TensorFlow Addons.

Hallie Weishahn is an AI/ML Specialist Solutions Architect at AWS, focused on leading global standards for MLOps. She previously worked as an ML Specialist at Google Cloud Platform. She works with product, engineering, and key customers to build repeatable architectures and drive product roadmaps. She provides guidance and hands-on work to advance and scale machine learning use cases and technologies. Troubleshooting top issues and evaluating existing architectures to enable integrations from PoC to a full deployment is her strong suit.

Shelbee Eigenbrode is an AI/ML Specialist Solutions Architect at AWS. Her current areas of depth include DevOps combined with ML/AI. She has been in technology for 23 years, spanning multiple roles and technologies. With over 35 patents granted across various technology domains, her passion for continuous innovation combined with a love of all things data turned her focus to the field of d ata science. Combining her backgrounds in data, DevOps, and machine learning, her current passion is helping customers to not only embrace data science but also ensure all models have a path to production by adopting MLOps practices. In her spare time, she enjoys reading and spending time with family, including her fur family (aka dogs), as well as friends.


Continue Reading


Labeling mixed-source, industrial datasets with Amazon SageMaker Ground Truth




Prior to using any kind of supervised machine learning (ML) algorithm, data has to be labeled. Amazon SageMaker Ground Truth simplifies and accelerates this task. Ground Truth uses pre-defined templates to assign labels that classify the content of images or videos or verify existing labels. Ground Truth allows you to define workflows for labeling various kinds of data, such as text, video, or images, without writing any code. Although these templates are applicable to a wide range of use cases in which the data to be labeled is in a single format or from a single source, industrial workloads often require labeling data from different sources and in different formats. This post explores the use case of industrial welding data consisting of sensor readings and images to show how to implement customized, complex, mixed-source labeling workflows using Ground Truth.

For this post, you deploy an AWS CloudFormation template in your AWS account to provision the foundational resources to get started with implementing of this labeling workflow. This provides you with hands-on experience for the following topics:

  • Creating a private labeling workforce in Ground Truth
  • Creating a custom labeling job using the Ground Truth framework with the following components:
    • Designing a pre-labeling AWS Lambda function that pulls data from different sources and runs a format conversion where necessary
    • Implementing a customized labeling user interface in Ground Truth using crowd templates that dynamically loads the data generated by the pre-labeling Lambda function
    • Consolidating labels from multiple workers using a customized post-labeling Lambda function
  • Configuring a custom labeling job using Ground Truth with a customized interface for displaying multiple pieces of data that have to be labeled as a single item

Prior to diving deep into the implementation, I provide an introduction into the use case and show how the Ground Truth custom labeling framework eases the implementation of highly complex labeling workflows. To make full use of this post, you need an AWS account on which you can deploy CloudFormation templates. The total cost incurred on your account for following this post is under $1.

Labeling complex datasets for industrial welding quality control

Although the mechanisms discussed in this post are generally applicable to any labeling workflow with different data formats, I use data from a welding quality control use case. In this use case, the manufacturing company running the welding process wants to predict whether the welding result will be OK or if a number of anomalies have occurred during the process. To implement this using a supervised ML model, you need to obtain labeled data with which to train the ML model, such as datasets representing welding processes that need to be labeled to indicate whether the process was normal or not. We implement this labeling process (not the ML or modeling process) using Ground Truth, which allows welding experts to make assessments about the result of a welding and assign this result to a dataset consisting of images and sensor data.

The CloudFormation template creates an Amazon Simple Storage Service (Amazon S3) bucket in your AWS account that contains images (prefix images) and CSV files (prefix sensor_data). The images contain pictures taken during an industrial welding process similar to the following, where a welding beam is applied onto a metal surface (for image source, see TIG Stainless Steel 304):

The CSV files contain sensor data representing current, electrode position, and voltage measured by sensors on the welding machine. For the full dataset, see the GitHub repo. A raw sample of this CSV data is as follows:


The first column of the data is a timestamp in milliseconds normalized to the start of the welding process. Each row consists of various sensor values associated with the timestamp. The first row is the electrode position, the second row is the current, and the third row is the voltage (the other values are irrelevant here). For instance, the row with timestamp 1, 100 milliseconds after the start of the welding process, has an electrode position of 94.79, a current of 1464, and a voltage of 428.

Because it’s difficult for humans to make assessments using the raw CSV data, I also show how to preprocess such data on the fly for labeling and turn it into more easily readable plots. This way, the welding experts can view the images and the plots to make their assessment about the welding process.

Deploying the CloudFormation template

To simplify the setup and configurations needed in the following, I created a CloudFormation template that deploys several foundations into your AWS account. To start this process, complete the following steps:

  1. Sign in to your AWS account.
  2. Choose one of the following links, depending on which AWS Region you’re using:
  1. Keep all the parameters as they are and select I acknowledge that AWS CloudFormation might create IAM resources with custom names and I acknowledge that AWS CloudFormation might require the following capability: CAPABILITY_AUTO_EXPAND.
  2. Choose Create stack to start the deployment.

The deployment takes about 3–5 minutes, during which time a bucket with data to label, some AWS Lambda functions, and an AWS Identity and Access Management (IAM) role are deployed. The process is complete when the status of the deployment switches to CREATE_COMPLETE.

The Outputs tab has additional information, such as the Amazon S3 path to the manifest file, which you use throughout this post. Therefore, it’s recommended to keep this browser tab open and follow the rest of the post in another tab.

Creating a Ground Truth labeling workforce

Ground Truth offers three options for defining workforces that complete the labeling: Amazon Mechanical Turk, vendor-specific workforces, and private workforces. In this section, we configure a private workforce because we want to complete the labeling ourselves. Create a private workforce with the following steps:

  1. On the Amazon SageMaker console, under Ground Truth, choose Labeling workforces.
  2. On the Private tab, choose Create private team.

  1. Enter a name for the labeling workforce. For our use case, I enter welding-experts.
  2. Select Invite new workers by email.
  3. Enter your e-mail address, an organization name, and a contact e-mail (which may be the same as the one you just entered).
  4. Choose Create private team.

The console confirms the creation of the labeling workforce at the top of the screen. When you refresh the page, the new workforce shows on the Private tab, under Private teams.

You also receive an e-mail with login instructions, including a temporary password and a link to open the login page.

  1. Choose the link and use your e-mail and temporary password to authenticate and change the password for the login.

It’s recommended to keep this browser tab open so you don’t have to log in again. This concludes all necessary steps to create your workforce.

Configuring a custom labeling job

In this section, we create a labeling job and use this job to explain the details and data flow of a custom labeling job.

  1. On the Amazon SageMaker console, under Ground Truth, choose Labeling jobs.
  2. Choose Create labeling job.

  1. Enter a name for your labeling job, such as WeldingLabelJob1.
  2. Choose Manual data setup.
  3. For Input dataset location, enter the ManifestS3Path value from the CloudFormation stack Outputs
  4. For Output dataset location, enter the ProposedOutputPath value from the CloudFormation stack Outputs
  5. For IAM role, choose Enter a custom IAM role ARN.
  6. Enter the SagemakerServiceRoleArn value from the CloudFormation stack Outputs
  7. For the task type, choose Custom.
  8. Choose Next.

The IAM role is a customized role created by the CloudFormation template that allows Ground Truth to invoke Lambda functions and access Amazon S3.

  1. Choose to use a private labeling workforce.
  2. From the drop-down menu, choose the workforce welding-experts.
  3. For task timeout and task expiration time, 1 hour is sufficient.
  4. The number of workers per dataset object is 1.
  5. In the Lambda functions section, for Pre-labeling task Lambda function, choose the function that starts with PreLabelingLambda-.
  6. For Post-labeling task Lambda function, choose the function that starts with PostLabelingLambda-.
  7. Enter the following code into the templates section. This HTML code specifies the interface that the workers in the private label workforce see when labeling items. For our use case, the template displays four images, and the categories to classify welding results is as follows:
    <script src=""></script>
    <crowd-form> <crowd-classifier name="WeldingClassification" categories="['Good Weld', 'Burn Through', 'Contamination', 'Lack of Fusion', 'Lack of Shielding Gas', 'High Travel Speed', 'Not sure']" header="Please classify the welding process." > <classification-target> <div> <h3>Welding Image</h3> <p><strong>Welding Camera Image </strong>{{ task.input.image.title }}</p> <p><a href="{{ task.input.image.file | grant_read_access }}" target="_blank">Download Image</a></p> <p> <img style="height: 30vh; margin-bottom: 10px" src="{{ task.input.image.file | grant_read_access }}"/> </p> </div> <hr/> <div> <h3>Current Graph</h3> <p><strong>Current Graph </strong>{{ task.input.current.title }}</p> <p><a href="{{ task.input.current.file | grant_read_access }}" target="_blank">Download Current Plot</a></p> <p> <img style="height: 30vh; margin-bottom: 10px" src="{{ task.input.current.file | grant_read_access }}"/> </p> </div> <hr/> <div> <h3>Electrode Position Graph</h3> <p><strong>Electrode Position Graph </strong>{{ task.input.electrode.title }}</p> <p><a href="{{ task.input.electrode.file | grant_read_access }}" target="_blank">Download Electrode Position Plot</a></p> <p> <img style="height: 30vh; margin-bottom: 10px" src="{{ task.input.electrode.file | grant_read_access }}"/> </p> </div> <hr/> <div> <h3>Voltage Graph</h3> <p><strong>Voltage Graph </strong>{{ task.input.voltage.title }}</p> <p><a href="{{ task.input.voltage.file | grant_read_access }}" target="_blank">Download Voltage Plot</a></p> <p> <img style="height: 30vh; margin-bottom: 10px" src="{{ task.input.voltage.file | grant_read_access }}"/> </p> </div> </classification-target> <full-instructions header="Classification Instructions"> <p>Read the task carefully and inspect the image as well as the plots.</p> <p> The image is a picture taking during the welding process. The plots show the corresponding sensor data for the electrode position, the voltage and the current measured during the welding process. </p> </full-instructions> <short-instructions> <p>Read the task carefully and inspect the image as well as the plots</p> </short-instructions> </crowd-classifier>

The wizard for creating the labeling job has a preview function in the section Custom labeling task setup, which you can use to check if all configurations work properly.

  1. To preview the interface, choose Preview.

This opens a new browser tab and shows a test version of the labeling interface, similar to the following screenshot.

  1. To create the labeling job, choose Create.

Ground Truth sets up the labeling job as specified, and the dashboard shows its status.

Assigning labels

To finalize the labeling job that you configured, you log in to the worker portal and assign labels to different data items consisting of images and data plots. The details on how the different components of the labeling job work together are explained in the next section.

  1. On the Amazon SageMaker console, under Ground Truth, choose Labeling workforces.
  2. On the Private tab, choose the link for Labeling portal sign-in URL.

When Ground Truth is finished preparing the labeling job, you can see it listed in the Jobs section. If it’s not showing up, wait a few minutes and refresh the tab.

  1. Choose Start working.

This launches the labeling UI, which allows you to assign labels to mixed datasets consisting of welding images and plots for current, electrode position, and voltage.

For this use case, you can assign seven different labels to a single dataset. These different classes and labels are defined in the HTML of the UI, but you can also insert them dynamically using the pre-labeling Lambda function (discussed in the next section). Because we don’t actually use the labeled data for ML purposes, you can assign the labels randomly to the five items that are displayed by Ground Truth for this labeling job.

After labeling all the items, the UI switches back to the list with available jobs. This concludes the section about configuring and launching the labeling job. In the next section, I explain the mechanics of a custom labeling job in detail and also dive deep into the different elements of the HTML interface.

Custom labeling deep dive

A custom labeling job combines the data to be labeled with three components to create a workflow that allows workers from the labeling workforce to assign labels to each item in the dataset:

  • Pre-labeling Lambda function – Generates the content to be displayed on the labeling interface using the manifest file specified during the configuration of the labeling job. For this use case, the function also converts the CSV files into human readable plots and stores these plots as images in the S3 bucket under the prefix plots.
  • Labeling interface – Uses the output of the pre-labeling function to generate a user interface. For this use case, the interface displays four images (the picture taken during the welding process and the three graphs for current, electrode position, and voltage) and a form that allows workers to classify the welding process.
  • Label consolidation Lambda function – Allows you to implement custom strategies to consolidate classifications of one or several workers into a single response. For our workforce, this is very simple because there is only a single worker whose labels are consolidated into a file, which is stored by Ground Truth into Amazon S3.

Before we analyze these three components, I provide insights into the structure of the manifest file, which describes the data sources for the labeling job.

Manifest and dataset files

The manifest file is a file conforming to the JSON lines format, in which each line represents one item to label. Ground Truth expects either a key source or source-ref in each line of the file. For this use case, I use source, and the mapped value must be a string representing an Amazon S3 path. For this post, we only label five items, and the JSON lines are similar to the following code:

{"source": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/dataset/dataset-1.json"}

For our use case with multiple input formats and files, each line in the manifest points to a dataset file that is also stored on Amazon S3. Our dataset is a JSON document, which contains references to the welding images and the CSV file with the sensor data:

{ "sensor_data": {"s3Path": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/sensor_data/weld.1.csv"}, "image": {"s3Path": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/images/weld.1.png"}

Ground Truth takes each line of the manifest file and triggers the pre-labeling Lambda function, which we discuss next.

Pre-labeling Lambda function

A pre-labeling Lambda function creates a JSON object that is used to populate the item-specific portions of the labeling interface. For more information, see Processing with AWS Lambda.

Before Ground Truth displays an item for labeling to a worker, it runs the pre-labeling function and forwards the information in the manifest’s JSON line to the function. For our use case, the event passed to the function is as follows:

{ "version": "2018-10-06", "labelingJobArn": "arn:aws:sagemaker:eu-west-1:XXX:labeling-job/weldinglabeljob1", "dataObject": { "source": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/dataset/dataset-1.json" }

Although I omit the implementation details here (for those interested, the code is deployed with the CloudFormation template for review), the function for our labeling job uses this input to complete the following steps:

  1. Download the file referenced in the source field of the input (see the preceding code).
  2. Download the dataset file that is referenced in the source
  3. Download a CSV file containing the sensor data. The dataset file is expected to have a reference to this CSV file.
  4. Generate plots for current, electrode position, and voltage from the contents of the CSV file.
  5. Upload the plot files to Amazon S3.
  6. Generate a JSON object containing the references to the aforementioned plot files and the welding image referenced in the dataset file.

When these steps are complete, the function returns a JSON object with two parts :

  • taskInput – Fully customizable JSON object that contains information to be displayed on the labeling UI.
  • isHumanAnnotationRequired – A string representing a Boolean value (True or False), which you can use to exclude objects from being labeled by humans. I don’t use this flag for this use case because we want to label all the provided data items.

For more information, see Processing with AWS Lambda.

Because I want to show the welding images and the three graphs for current, electrode position, and voltage, the result of the Lambda function is as follows for the first dataset:

{ "taskInput": { "image": { "file": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/images/weld.1.png", "title": " from image at s3://iiot-custom-label-blog-bucket-unn4d0l4j0/images/weld.1.png" }, "voltage": { "file": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/plots/weld.1.csv-current.png", "title": " from file at plots/weld.1.csv-current.png" }, "electrode": { "file": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/plots/weld.1.csv-electrode_pos.png", "title": " from file at plots/weld.1.csv-electrode_pos.png" }, "current": { "file": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/plots/weld.1.csv-voltage.png", "title": " from file at plots/weld.1.csv-voltage.png" } }, "isHumanAnnotationRequired": "true"

In the preceding code, the taskInput is fully customizable; the function returns the Amazon S3 paths to the images to display, and also a title, which has some non-functional text. Next, I show how to access these different parts of the taskInput JSON object when building the customized labeling UI displayed to workers by Ground Truth.

Labeling UI: Accessing taskInput content

Ground Truth uses the output of the Lambda function to fill in content into the HTML skeleton that is provided at the creation of the labeling job. In general, the contents of the taskInput output object is accessed using task.input in the HTML code.

For instance, to retrieve the Amazon S3 path where the welding image is stored from the output, you need to access the path taskInput/image/file. Because the taskInput object from the function output is mapped to task.input in the HTML, the corresponding reference to the welding image file is task.input.image.file. This reference is directly integrated into the HTML code of the labeling UI to display the welding image:

<img style="height: 30vh; margin-bottom: 10px" src="{{ task.input.image.file | grant_read_access }}"/>

The grant_read_access filter is needed for files in S3 buckets that aren’t publicly accessible. This makes sure that the URL passed to the browser contains a short-lived access token for the image and thereby avoids having to make resources publicly accessible for labeling jobs. This is often mandatory because the data to be labeled, such as machine data, is confidential. Because the pre-labeling function has also converted the CSV files into plots and images, their integration into the UI is analogous.

Label consolidation Lambda function

The second Lambda function that was configured for the custom labeling job runs when all workers have labeled an item or the time limit of the labeling job is reached. The key task of this function is to derive a single label from the responses of the workers. Additionally, the function can be for any kind of further processing of the labeled data, such as storing them on Amazon S3 in a format ideally suited for the ML pipeline that you use.

Although there are different possible strategies to consolidate labels, I focus on the cornerstones of the implementation for such a function and show how they translate to our use case. The consolidation function is triggered by an event similar to the following JSON code:

{ "version": "2018-10-06", "labelingJobArn": "arn:aws:sagemaker:eu-west-1:261679111194:labeling-job/weldinglabeljob1", "payload": { "s3Uri": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/output/WeldingLabelJob1/annotations/consolidated-annotation/consolidation-request/iteration-1/2020-09-15_16:16:11.json" }, "labelAttributeName": "WeldingLabelJob1", "roleArn": "arn:aws:iam::261679111194:role/AmazonSageMaker-Service-role-unn4d0l4j0", "outputConfig": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/output/WeldingLabelJob1/annotations", "maxHumanWorkersPerDataObject": 1 }

The key item in this event is the payload, which contains an s3Uri pointing to a file stored on Amazon S3. This payload file contains the list of datasets that have been labeled and the labels assigned to them by workers. The following code is an example of such a list entry:

{ "datasetObjectId": "4", "dataObject": { "s3Uri": "s3://iiot-custom-label-blog-bucket-unn4d0l4j0/dataset/dataset-5.json" }, "annotations": [ { "workerId": "", "annotationData": { "content":"{"WeldingClassification":{"label":"Not sure"}}" } } ] }

Along with an identifier that you could use to determine which worker labeled the item, each entry lists for each dataset which labels have been assigned. For example, in the case of multiple workers, there are multiple entries in annotations. Because I created a single worker that labeled all the items for this post, there is only a single entry. The file dataset-5.json has been labeled with Not Sure for the classifier WeldingClassification.

The label consolidation function has to iterate over all list entries and determine for each dataset a label to use as the ground truth for supervised ML training. Ground Truth expects the function to return a list containing an entry for each dataset item with the following structure:

{ "datasetObjectId": "4", "consolidatedAnnotation": { "content": { "WeldingLabelJob1": { "WeldingClassification": "Not sure" } } } }

Each entry of the returned list must contain the datasetObjectId for the corresponding entry in the payload file and a JSON object consolidatedAnnotation, which contains an object content. Ground Truth expects content to contain a key that equals the name of the labeling job, (for our use case, WeldingLabelJob1). For more information, see Processing with AWS Lambda.
You can change this behavior when you create the labeling job by selecting I want to specify a label attribute name different from the labeling job name and entering a label attribute name.

The content inside this key equaling the name of the labeling job is freely configurable and can be arbitrarily complex. For our use case, it’s enough to return the assigned label Not Sure. If any of these formatting requirements are not met, Ground Truth assumes the labeling job didn’t run properly and failed.

Because I specified output as the desired prefix during the creation of the labeling job, the requirements are met, and Ground Truth uploads the list of JSON entries into the bucket and prefix specified during the creation of the consolidated labels, and they are uploaded with the following prefix:


You can use such files for training ML algorithms in Amazon SageMaker or for further processing.

Cleaning up

To avoid incurring future charges, delete all resources created for this post.

  1. On the AWS CloudFormation console, choose Stacks.
  2. Select the stack iiot-custom-label-blog.
  3. Choose Delete.

This step removes all files and the S3 bucket from your account. The process takes about 3–5 minutes.


Supervised ML requires labeled data, and Ground Truth provides a platform for creating labeling workflows. This post showed how to build a complex industrial IoT labeling workflow, in which data from multiple sources needs to be considered for labeling items. The post explained how to create a custom labeling job and provided details on the mechanisms Ground Truth requires to implement such a workflow. To get started with writing your own custom labeling job, refer to the custom labeling documentation page for Ground Truth and potentially re-deploy the CloudFormation template of this post to get a sample for the pre-labeling and consolidation lambdas. Additionally, the blog post “Creating custom labeling jobs with AWS Lambda and Amazon SageMaker Ground Truth” provides additional insights into building custom labeling jobs.

About the Author

As a Principal Prototyping Engagement Manager, Dr. Markus Bestehorn is responsible for building business-critical prototypes with AWS customers, and is a specialist for IoT and machine learning. His “career” started as a 7-year-old when he got his hands on a computer with two 5.25” floppy disks, no hard disk, and no mouse, on which he started writing BASIC, and later C as well as C++ programs. He holds a PhD in computer science and all currently available AWS certifications. When he’s not on the computer, he runs or climbs mountains.


Continue Reading


Facebook and Instagram’s AI-generated image captions now offer far more details




Every picture posted to Facebook and Instagram gets a caption generated by an image analysis AI, and that AI just got a lot smarter. The improved system should be a treat for visually impaired users, and may help you find your photos faster in the future.

Alt text is a field in an image’s metadata that describes its contents: “A person standing in a field with a horse,” or “a dog on a boat.” This lets the image be understood by people who can’t see it.

These descriptions are often added manually by a photographer or publication, but people uploading photos to social media generally don’t bother, if they even have the option. So the relatively recent ability to automatically generate one — the technology has only just gotten good enough in the last couple years — has been extremely helpful in making social media more accessible in general.

Facebook created its Automatic Alt Text system in 2016, which is eons ago in the field of machine learning. The team has since cooked up many improvements to it, making it faster and more detailed, and the latest update adds an option to generate a more detailed description on demand.

The improved system recognizes 10 times more items and concepts than it did at the start, now around 1,200. And the descriptions include more detail. What was once “Two people by a building” may now be “A selfie of two people by the Eiffel Tower.” (The actual descriptions hedge with “may be…” and will avoid including wild guesses.)

But there’s more detail than that, even if it’s not always relevant. For instance, in this image the AI notes the relative positions of the people and objects:

Image Credits: Facebook

Obviously the people are above the drums, and the hats are above the people, none of which really needs to be said for someone to get the gist. But consider an image described as “A house and some trees and a mountain.” Is the house on the mountain or in front of it? Are the trees in front of or behind the house, or maybe on the mountain in the distance?

In order to adequately describe the image, these details should be filled in, even if the general idea can be gotten across with fewer words. If a sighted person wants more detail they can look closer or click the image for a bigger version — someone who can’t do that now has a similar option with this “generate detailed image description” command. (Activate it with a long press in the Android app or a custom action in iOS.)

Perhaps the new description would be something like “A house and some trees in front of a mountain with snow on it.” That paints a better picture, right? (To be clear, these examples are made up, but it’s the sort of improvement that’s expected.)

The new detailed description feature will come to Facebook first for testing, though the improved vocabulary will appear on Instagram soon. The descriptions are also kept simple so they can be easily translated to other languages already supported by the apps, though the feature may not roll out in other countries simultaneously.


Continue Reading
Blockchain4 days ago

The Countdown is on: Bitcoin has 3 Days Before It Reaches Apex of Key Formation

Blockchain3 days ago

Litecoin, VeChain, Ethereum Classic Price Analysis: 17 January

Blockchain3 days ago

Is Ethereum Undervalued, or Polkadot Overvalued?

Blockchain4 days ago

Here’s why Bitcoin or altcoins aren’t the best bets

PR Newswire5 days ago

The merger of FCA and Groupe PSA has been completed

Blockchain4 days ago

Chainlink Futures OI follows asset’s price to hit ATH

Blockchain1 day ago

5 Best Bitcoin Alternatives in 2021

SPACS5 days ago

Affinity Gaming’s SPAC Gaming & Hospitality Acquisition files for a $150 million IPO

Blockchain3 days ago

Data Suggests Whales are Keen on Protecting One Key Bitcoin Support Level

Blockchain3 days ago

Bitcoin Worth $140 Billion Lost Says UK Council

Blockchain3 days ago

Bitcoin Cash Price Analysis: 17 January

NEWATLAS5 days ago

Small, slick French camper van is the bivy of van life

Blockchain3 days ago

eToro’s New Bitcoin Account Incentives Are So Good, They Had To Disable Buy Orders

Blockchain2 days ago

Mitsubishi and Tokyo Tech Tap Blockchain for P2P Energy Trading Network

Blockchain3 days ago

Cardano, Cosmos, BAT Price Analysis: 17 January

Blockchain4 days ago

Mt. Gox Creditors Could Get Bankruptcy-tied Bitcoin

Blockchain4 days ago

Grayscale’s Bitcoin Trust adds over 5k BTC in 24 hours

Blockchain2 days ago

New Highs Inbound: Ethereum is About to See an Explosive Rally Against BTC

Blockchain3 days ago

Why Bitcoin denominated payments won’t be mainstream anytime soon

Cyber Security1 day ago

Rob Joyce to Take Over as NSA Cybersecurity Director