Connect with us

AI

Process documents containing handwritten tabular content using Amazon Textract and Amazon A2I

Avatar

Published

on

Even in this digital age where more and more companies are moving to the cloud and using machine learning (ML) or technology to improve business processes, we still see a vast number of companies reach out and ask about processing documents, especially documents with handwriting. We see employment forms, time cards, and financial applications with tables and forms that contain handwriting in addition to printed information. To complicate things, each document can be in various formats, and each institution within any given industry may have several different formats. Organizations are looking for a simple solution that can process complex documents with varying formats, including tables, forms, and tabular data.

Extracting data from these documents, especially when you have a combination of printed and handwritten text, is error-prone, time-consuming, expensive, and not scalable. Text embedded in tables and forms adds to the extraction and processing complexity. Amazon Textract is an AWS AI service that automatically extracts printed text, handwriting, and other data from scanned documents that goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables.

After the data is extracted, the postprocessing step in a document management workflow involves reviewing the entries and making changes as required by downstream processing applications. Amazon Augmented AI (Amazon A2I) makes it easy to configure a human review into your ML workflow. This allows you to automatically have a human step to review your ML pipeline if the results fall below a specified confidence threshold, set up review and auditing workflows, and modify the prediction results as needed.

In this post, we show how you can use the Amazon Textract Handwritten feature to extract tabular data from documents and have a human review loop using the Amazon A2I custom task type to make sure that the predictions are highly accurate. We store the results in Amazon DynamoDB, which is a key-value and document database that delivers single-digit millisecond performance at any scale, making the data available for downstream processing.

We walk you through the following steps using a Jupyter notebook:

  1. Use Amazon Textract to retrieve tabular data from the document and inspect the response.
  2. Set up an Amazon A2I human loop to review and modify the Amazon Textract response.
  3. Evaluating the Amazon A2I response and storing it in DynamoDB for downstream processing.

Prerequisites

Before getting started, let’s configure the walkthrough Jupyter notebook using an AWS CloudFormation template and then create an Amazon A2I private workforce, which is needed in the notebook to set up the custom Amazon A2I workflow.

Setting up the Jupyter notebook

We deploy a CloudFormation template that performs much of the initial setup work for you, such as creating an AWS Identity and Access Management (IAM) role for Amazon SageMaker, creating a SageMaker notebook instance, and cloning the GitHub repo into the notebook instance.

  1. Choose Launch Stack to configure the notebook in the US East (N. Virginia) Region:

  1. Don’t make any changes to stack name or parameters.
  2. In the Capabilities section, select I acknowledge that AWS CloudFormation might create IAM resources.
  3. Choose Create stack.

Choose Create stack

The following screenshot of the stack details page shows the status of the stack as CREATE_IN_PROGRESS. It can take up to 20 minutes for the status to change to CREATE_COMPLETE.

The following screenshot of the stack details page shows the status of the stack as CREATE_IN_PROGRESS

  1. On the SageMaker console, choose Notebook Instances.
  2. Choose Open Jupyter for the TextractA2INotebook notebook you created.
  3. Open textract-hand-written-a2i-forms.ipynb and follow along there.

Setting up an Amazon A2I private workforce

For this post, you create a private work team and add only one user (you) to it. For instructions, see Create a Private Workforce (Amazon SageMaker Console). When the user (you) accepts the invitation, you have to add yourself to the workforce. For instructions, see the Add a Worker to a Work Team section in Manage a Workforce (Amazon SageMaker Console).

After you create a labeling workforce, copy the workforce ARN and enter it in the notebook cell to set up a private review workforce:

WORKTEAM_ARN= "<your workteam ARN>"

In the following sections, we walk you through the steps to use this notebook.

Retrieving tabular data from the document and inspecting the response

In this section, we go through the following steps using the walkthrough notebook:

  1. Review the sample data, which has both printed and handwritten content.
  2. Set up the helper functions to parse the Amazon Textract response.
  3. Inspect and analyze the Amazon Textract response.

Reviewing the sample data

Review the sample data by running the following notebook cell:

# Document
documentName = "test_handwritten_document.png" display(Image(filename=documentName))

We use the following sample document, which has both printed and handwritten content in tables.

We use the following sample document, which has both printed and handwritten content in tables.

Use the Amazon Textract Parser Library to process the response

We will now import the Amazon Textract Response Parser library to parse and extract what we need from Amazon Textract’s response. There are two main functions here. One, we will extract the form data (key-value pairs) part of the header section of the document. Two, we will parse the table and cells to create a csv file containing the tabular data. In this notebook, we will use Amazon Textract’s Sync API for document extraction, AnalyzeDocument. This accepts image files (png or jpeg) as an input.

client = boto3.client( service_name='textract', region_name= 'us-east-1', endpoint_url='https://textract.us-east-1.amazonaws.com',
) with open(documentName, 'rb') as file: img_test = file.read() bytes_test = bytearray(img_test) print('Image loaded', documentName) # process using image bytes
response = client.analyze_document(Document={'Bytes': bytes_test}, FeatureTypes=['TABLES','FORMS'])

You can use the Amazon Textract Response Parser library to easily parse JSON returned by Amazon Textract. The library parses JSON and provides programming language specific constructs to work with different parts of the document. For more details, please refer to the Amazon Textract Parser Library

from trp import Document
# Parse JSON response from Textract
doc = Document(response) # Iterate over elements in the document
for page in doc.pages: # Print lines and words for line in page.lines: print("Line: {}".format(line.text)) for word in line.words: print("Word: {}".format(word.text)) # Print tables for table in page.tables: for r, row in enumerate(table.rows): for c, cell in enumerate(row.cells): print("Table[{}][{}] = {}".format(r, c, cell.text)) # Print fields for field in page.form.fields: print("Field: Key: {}, Value: {}".format(field.key.text, field.value.text))

Now that we have the contents we need from the document image, let’s create a csv file to store it and also use it for setting up the Amazon A2I human loop for review and modification as needed.

# Lets get the form data into a csv file
with open('test_handwritten_form.csv', 'w', newline='') as csvfile: formwriter = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL) for field in page.form.fields: formwriter.writerow([field.key.text+" "+field.value.text]) # Lets get the table data into a csv file
with open('test_handwritten_tab.csv', 'w', newline='') as csvfile: tabwriter = csv.writer(csvfile, delimiter=',') for r, row in enumerate(table.rows): csvrow = [] for c, cell in enumerate(row.cells): if cell.text: csvrow.append(cell.text.rstrip()) #csvrow += '{}'.format(cell.text.rstrip())+"," tabwriter.writerow(csvrow) 

Alternatively, if you would like to modify this notebook to use a PDF file or for batch processing of documents, use the StartDocumentAnalysis API. StartDocumentAnalysis returns a job identifier (JobId) that you use to get the results of the operation. When text analysis is finished, Amazon Textract publishes a completion status to the Amazon Simple Notification Service (Amazon SNS) topic that you specify in NotificationChannel. To get the results of the text analysis operation, first check that the status value published to the Amazon SNS topic is SUCCEEDED. If so, call GetDocumentAnalysis, and pass the job identifier (JobId) from the initial call to StartDocumentAnalysis.

Inspecting and analyzing the Amazon Textract response

We now load the form line items into a Pandas DataFrame and clean it up to ensure we have the relevant columns and rows that downstream applications need. We then send it to Amazon A2I for human review.

Run the following notebook cell to inspect and analyze the key-value data from the Amazon Textract response:

# Load the csv file contents into a dataframe, strip out extra spaces, use comma as delimiter
df_form = pd.read_csv('test_handwritten_form.csv', header=None, quoting=csv.QUOTE_MINIMAL, sep=',')
# Rename column
df_form = df_form.rename(columns={df_form.columns[0]: 'FormHeader'})
# display the dataframe
df_form

The following screenshot shows our output.

Run the following notebook cell to inspect and analyze the tabular data from the Amazon Textract response:

# Load the csv file contents into a dataframe, strip out extra spaces, use comma as delimiter
df_tab = pd.read_csv('test_handwritten_tab.csv', header=1, quoting=csv.QUOTE_MINIMAL, sep=',')
# display the dataframe
df_tab.head()

The following screenshot shows our output.

The following screenshot shows our output.

We can see that Amazon Textract detected both printed and handwritten content from the tabular data.

Setting up an Amazon A2I human loop

Amazon A2I supports two built-in task types: Amazon Textract key-value pair extraction and Amazon Rekognition image moderation, and a custom task type that you can use to integrate a human review loop into any ML workflow. You can use a custom task type to integrate Amazon A2I with other AWS services like Amazon Comprehend, Amazon Transcribe, and Amazon Translate, as well as your own custom ML workflows. To learn more, see Use Cases and Examples using Amazon A2I.

In this section, we show how to use the Amazon A2I custom task type to integrate with Amazon Textract tables and key-value pairs through the walkthrough notebook for low-confidence detection scores from Amazon Textract responses. It includes the following steps:

  1. Create a human task UI.
  2. Create a workflow definition.
  3. Send predictions to Amazon A2I human loops.
  4. Sign in to the worker portal and annotate or verify the Amazon Textract results.

Creating a human task UI

You can create a task UI for your workers by creating a worker task template. A worker task template is an HTML file that you use to display your input data and instructions to help workers complete your task. If you’re creating a human review workflow for a custom task type, you must create a custom worker task template using HTML code. For more information, see Create Custom Worker Task Template.

For this post, we created a custom UI HTML template to render Amazon Textract tables and key-value pairs in the notebook. You can find the template tables-keyvalue-sample.liquid.html in our GitHub repo and customize it for your specific document use case.

This template is used whenever a human loop is required. We have over 70 pre-built UIs available on GitHub. Optionally, you can create this workflow definition on the Amazon A2I console. For instructions, see Create a Human Review Workflow.

After you create this custom template using HTML, you must use this template to generate an Amazon A2I human task UI Amazon Resource Name (ARN). This ARN has the following format: arn:aws:sagemaker:<aws-region>:<aws-account-number>:human-task-ui/<template-name>. This ARN is associated with a worker task template resource that you can use in one or more human review workflows (flow definitions). Generate a human task UI ARN using a worker task template by using the CreateHumanTaskUi API operation by running the following notebook cell:

def create_task_ui(): ''' Creates a Human Task UI resource. Returns: struct: HumanTaskUiArn ''' response = sagemaker_client.create_human_task_ui( HumanTaskUiName=taskUIName, UiTemplate={'Content': template}) return response
# Create task UI
humanTaskUiResponse = create_task_ui()
humanTaskUiArn = humanTaskUiResponse['HumanTaskUiArn']
print(humanTaskUiArn)

The preceding code gives you an ARN as output, which we use in setting up flow definitions in the next step:

arn:aws:sagemaker:us-east-1:<aws-account-nr>:human-task-ui/ui-hw-invoice-2021-02-10-16-27-23

Creating the workflow definition

In this section, we create a flow definition. Flow definitions allow us to specify the following:

  • The workforce that your tasks are sent to
  • The instructions that your workforce receives (worker task template)
  • Where your output data is stored

For this post, we use the API in the following code:

create_workflow_definition_response = sagemaker_client.create_flow_definition( FlowDefinitionName= flowDefinitionName, RoleArn= role, HumanLoopConfig= { "WorkteamArn": WORKTEAM_ARN, "HumanTaskUiArn": humanTaskUiArn, "TaskCount": 1, "TaskDescription": "Review the table contents and correct values as indicated", "TaskTitle": "Employment History Review" }, OutputConfig={ "S3OutputPath" : OUTPUT_PATH } )
flowDefinitionArn = create_workflow_definition_response['FlowDefinitionArn'] # let's save this ARN for future use

Optionally, you can create this workflow definition on the Amazon A2I console. For instructions, see Create a Human Review Workflow.

Sending predictions to Amazon A2I human loops

We create an item list from the Pandas DataFrame where we have the Amazon Textract output saved. Run the following notebook cell to create a list of items to be sent for review:

NUM_TO_REVIEW = len(df_tab) # number of line items to review
dfstart = df_tab['Start Date'].to_list()
dfend = df_tab['End Date'].to_list()
dfemp = df_tab['Employer Name'].to_list()
dfpos = df_tab['Position Held'].to_list()
dfres = df_tab['Reason for leaving'].to_list()
item_list = [{'row': "{}".format(x), 'startdate': dfstart[x], 'enddate': dfend[x], 'empname': dfemp[x], 'posheld': dfpos[x], 'resleave': dfres[x]} for x in range(NUM_TO_REVIEW)]
item_list

You get an output of all the rows and columns received from Amazon Textract:

[{'row': '0', 'startdate': '1/15/2009 ', 'enddate': '6/30/2011 ', 'empname': 'Any Company ', 'posheld': 'Assistant baker ', 'resleave': 'relocated '}, {'row': '1', 'startdate': '7/1/2011 ', 'enddate': '8/10/2013 ', 'empname': 'Example Corp. ', 'posheld': 'Baker ', 'resleave': 'better opp. '}, {'row': '2', 'startdate': '8/15/2013 ', 'enddate': 'Present ', 'empname': 'AnyCompany ', 'posheld': 'head baker ', 'resleave': 'N/A current '}]

Run the following notebook cell to get a list of key-value pairs:

dforighdr = df_form['FormHeader'].to_list()
hdr_list = [{'hdrrow': "{}".format(x), 'orighdr': dforighdr[x]} for x in range(len(df_form))]
hdr_list

Run the following code to create a JSON response for the Amazon A2I loop by combining the key-value and table list from the preceding cells:

ip_content = {"Header": hdr_list, 'Pairs': item_list, 'image1': s3_img_url }

Start the human loop by running the following notebook cell:

# Activate human loops
import json
humanLoopName = str(uuid.uuid4()) start_loop_response = a2i.start_human_loop( HumanLoopName=humanLoopName, FlowDefinitionArn=flowDefinitionArn, HumanLoopInput={ "InputContent": json.dumps(ip_content) } )

Check the status of human loop with the following code:

completed_human_loops = []
resp = a2i.describe_human_loop(HumanLoopName=humanLoopName)
print(f'HumanLoop Name: {humanLoopName}')
print(f'HumanLoop Status: {resp["HumanLoopStatus"]}')
print(f'HumanLoop Output Destination: {resp["HumanLoopOutput"]}')
print('n') if resp["HumanLoopStatus"] == "Completed": completed_human_loops.append(resp)

You get the following output, which shows the status of the human loop and the output destination S3 bucket:

HumanLoop Name: f69bb14e-3acd-4301-81c0-e272b3c77df0
HumanLoop Status: InProgress
HumanLoop Output Destination: {'OutputS3Uri': 's3://sagemaker-us-east-1-<aws-account-nr>/textract-a2i-handwritten/a2i-results/fd-hw-forms-2021-01-11-16-54-31/2021/01/11/16/58/13/f69bb14e-3acd-4301-81c0-e272b3c77df0/output.json'}

Annotating the results via the worker portal

Run the steps in the notebook to check the status of the human loop. You can use the accompanying SageMaker Jupyter notebook to follow the steps in this post.

  1. Run the following notebook cell to get a login link to navigate to the private workforce portal:
    workteamName = WORKTEAM_ARN[WORKTEAM_ARN.rfind('/') + 1:]
    print("Navigate to the private worker portal and do the tasks. Make sure you've invited yourself to your workteam!")
    print('https://' + sagemaker_client.describe_workteam(WorkteamName=workteamName)['Workteam']['SubDomain'])

  1. Choose the login link to the private worker portal.
  2. Select the human review job.
  3. Choose Start working.

Choose Start working.

You’re redirected to the Amazon A2I console, where you find the original document displayed, your key-value pair, the text responses detected from Amazon Textract, and your table’s responses.

Choose Start working.

Scroll down to find the correction form for key-value pairs and text, where you can verify the results and compare the Amazon Textract response to the original document. You will also find the UI to modify the tabular handwritten and printed content.

You can modify each cell based on the original image response and reenter correct values and submit your response. The labeling workflow is complete when you submit your responses.

Evaluating the results

When the labeling work is complete, your results should be available in the S3 output path specified in the human review workflow definition. The human answers are returned and saved in the JSON file. Run the notebook cell to get the results from Amazon S3:

import re
import pprint pp = pprint.PrettyPrinter(indent=4) for resp in completed_human_loops: splitted_string = re.split('s3://' + 'a2i-experiments' + '/', resp['HumanLoopOutput']['OutputS3Uri']) output_bucket_key = splitted_string[1] response = s3.get_object(Bucket='a2i-experiments', Key=output_bucket_key) content = response["Body"].read() json_output = json.loads(content) pp.pprint(json_output) print('n')

The following code shows a snippet of the Amazon A2I annotation output JSON file:

{ 'flowDefinitionArn': 'arn:aws:sagemaker:us-east-1:<aws-account-nr>:flow-definition/fd-hw-invoice-2021-02-22-23-07-53', 'humanAnswers': [ { 'acceptanceTime': '2021-02-22T23:08:38.875Z', 'answerContent': { 'TrueHdr3': 'Full Name: Jane ' 'Smith', 'predicted1': 'relocated', 'predicted2': 'better opp.', 'predicted3': 'N/A, current', 'predictedhdr1': 'Phone ' 'Number: ' '555-0100', 'predictedhdr2': 'Mailing ' 'Address: ' 'same as ' 'above', 'predictedhdr3': 'Full Name: ' 'Jane Doe', 'predictedhdr4': 'Home ' 'Address: ' '123 Any ' 'Street, Any ' 'Town. USA', 'rating1': { 'agree': True, 'disagree': False}, 'rating2': { 'agree': True, 'disagree': False}, 'rating3': { 'agree': False, 'disagree': True}, 'rating4': { 'agree': True, 'disagree': False}, 'ratingline1': { 'agree': True, 'disagree': False}, 'ratingline2': { 'agree': True, 'disagree': False}, 'ratingline3': { 'agree': True, 'disagree': False}}

Storing the Amazon A2I annotated results in DynamoDB

We now store the form with the updated contents in a DynamoDB table so downstream applications can use it. To automate the process, simply set up an AWS Lambda trigger with DynamoDB to automatically extract and send information to your API endpoints or applications. For more information, see DynamoDB Streams and AWS Lambda Triggers.

To store your results, complete the following steps:

  1. Get the human answers for the key-values and text into a DataFrame by running the following notebook cell:
    #updated array values to be strings for dataframe assignment
    for i in json_output['humanAnswers']: x = i['answerContent'] for j in range(0, len(df_form)): df_form.at[j, 'TrueHeader'] = str(x.get('TrueHdr'+str(j+1))) df_form.at[j, 'Comments'] = str(x.get('Comments'+str(j+1))) df_form = df_form.where(df_form.notnull(), None)
    

  1. Get the human-reviewed answers for tabular data into a DataFrame by running the following cell:
    #updated array values to be strings for dataframe assignment
    for i in json_output['humanAnswers']: x = i['answerContent'] for j in range(0, len(df_tab)): df_tab.at[j, 'TrueStartDate'] = str(x.get('TrueStartDate'+str(j+1))) df_tab.at[j, 'TrueEndDate'] = str(x.get('TrueEndDate'+str(j+1))) df_tab.at[j, 'TrueEmpName'] = str(x.get('TrueEmpName'+str(j+1))) df_tab.at[j, 'TruePosHeld'] = str(x.get('TruePosHeld'+str(j+1))) df_tab.at[j, 'TrueResLeave'] = str(x.get('TrueResLeave'+str(j+1))) df_tab.at[j, 'ChangeComments'] = str(x.get('Change Reason'+str(j+1))) df_tab = df_tab.where(df_tab.notnull(), None)You will get below output:

  1. Combine the DataFrames into one DataFrame to save in the DynamoDB table:
    # Join both the dataframes to prep for insert into DynamoDB
    df_doc = df_form.join(df_tab, how='outer')
    df_doc = df_doc.where(df_doc.notnull(), None)
    df_doc

Creating the DynamoDB table

Create your DynamoDB table with the following code:

# Get the service resource.
dynamodb = boto3.resource('dynamodb')
tablename = "emp_history-"+str(uuid.uuid4()) # Create the DynamoDB table.
table = dynamodb.create_table(
TableName=tablename,
KeySchema=[
{ 'AttributeName': 'line_nr', 'KeyType': 'HASH'
}
],
AttributeDefinitions=[
{ 'AttributeName': 'line_nr', 'AttributeType': 'N'
},
],
ProvisionedThroughput={ 'ReadCapacityUnits': 5, 'WriteCapacityUnits': 5
}
)
# Wait until the table exists.
table.meta.client.get_waiter('table_exists').wait(TableName=tablename)
# Print out some data about the table.
print("Table successfully created. Item count is: " + str(table.item_count))

You get the following output:

Table successfully created. Item count is: 0

Uploading the contents of the DataFrame to a DynamoDB table

Upload the contents of your DataFrame to your DynamoDB table with the following code:

Note: When adding contents from multiple documents in your DynamoDB table, please ensure you add a document number as an attribute to differentiate between documents. In the example below we just use the index as the line_nr because we are working with a single document.

for idx, row in df_doc.iterrows(): table.put_item( Item={ 'line_nr': idx, 'orig_hdr': str(row['FormHeader']) , 'true_hdr': str(row['TrueHeader']), 'comments': str(row['Comments']), 'start_date': str(row['Start Date ']), 'end_date': str(row['End Date ']), 'emp_name': str(row['Employer Name ']), 'position_held': str(row['Position Held ']), 'reason_for_leaving': str(row['Reason for leaving']), 'true_start_date': str(row['TrueStartDate']), 'true_end_date': str(row['TrueEndDate']), 'true_emp_name': str(row['TrueEmpName']), 'true_position_held': str(row['TruePosHeld']), 'true_reason_for_leaving': str(row['TrueResLeave']), 'change_comments': str(row['ChangeComments']) } )

To check if the items were updated, run the following code to retrieve the DynamoDB table value:

response = table.get_item(
Key={ 'line_nr': 2
}
)
item = response['Item']
print(item)

Alternatively, you can check the table on the DynamoDB console, as in the following screenshot.

Conclusion

This post demonstrated how easy it is to use services in the AI layer of the AWS AI/ML stack, such as Amazon Textract and Amazon A2I, to read and process tabular data from handwritten forms, and store them in a DynamoDB table for downstream applications to use. You can also send the augmented form data from Amazon A2I to an S3 bucket to be consumed by your AWS analytics applications.

For video presentations, sample Jupyter notebooks, or more information about use cases like document processing, content moderation, sentiment analysis, text translation, and more, see Amazon Augmented AI Resources. If this post helps you or inspires you to solve a problem, we would love to hear about it! The code for this solution is available on the GitHub repo for you to use and extend. Contributions are always welcome!


About the Authors

Prem Ranga is an Enterprise Solutions Architect based out of Atlanta, GA. He is part of the Machine Learning Technical Field Community and loves working with customers on their ML and AI journey. Prem is passionate about robotics, is an autonomous vehicles researcher, and also built the Alexa-controlled Beer Pours in Houston and other locations.

Mona Mona is an AI/ML Specialist Solutions Architect based out of Arlington, VA. She works with the World Wide Public Sector team and helps customers adopt machine learning on a large scale. She is passionate about NLP and ML explainability areas in AI/ML.

Sriharsha M S is an AI/ML specialist solution architect in the Strategic Specialist team at Amazon Web Services. He works with strategic AWS customers who are taking advantage of AI/ML to solve complex business problems. He provides technical guidance and design advice to implement AI/ML applications at scale. His expertise spans application architecture, big data, analytics, and machine learning.

Source: https://aws.amazon.com/blogs/machine-learning/process-documents-containing-handwritten-tabular-content-using-amazon-textract-and-amazon-a2i/

Artificial Intelligence

Deep Learning vs Machine Learning: How an Emerging Field Influences Traditional Computer Programming

Avatar

Published

on

When two different concepts are greatly intertwined, it can be difficult to separate them as distinct academic topics. That might explain why it’s so difficult to separate deep learning from machine learning as a whole. Considering the current push for both automation as well as instant gratification, a great deal of renewed focus has been heaped on the topic.

Everything from automated manufacturing worfklows to personalized digital medicine could potentially grow to rely on deep learning technology. Defining the exact aspects of this technical discipline that will revolutionize these industries is, however, admittedly much more difficult. Perhaps it’s best to consider deep learning in the context of a greater movement in computer science.

Defining Deep Learning as a Subset of Machine Learning

Machine learning and deep learning are essentially two sides of the same coin. Deep learning techniques are a specific discipline that belong to a much larger field that includes a large variety of trained artificially intelligent agents that can predict the correct response in an equally wide array of situations. What makes deep learning independent of all of these other techniques, however, is the fact that it focuses almost exclusively on teaching agents to accomplish a specific goal by learning the best possible action in a number of virtual environments.

Traditional machine learning algorithms usually teach artificial nodes how to respond to stimuli by rote memorization. This is somewhat similar to human teaching techniques that consist of simple repetition, and therefore might be thought of the computerized equivalent of a student running through times tables until they can recite them. While this is effective in a way, artificially intelligent agents educated in such a manner may not be able to respond to any stimulus outside of the realm of their original design specifications.

That’s why deep learning specialists have developed alternative algorithms that are considered to be somewhat superior to this method, though they are admittedly far more hardware intensive in many ways. Subrountines used by deep learning agents may be based around generative adversarial networks, convolutional neural node structures or a practical form of restricted Boltzmann machine. These stand in sharp contrast to the binary trees and linked lists used by conventional machine learning firmware as well as a majority of modern file systems.

Self-organizing maps have also widely been in deep learning, though their applications in other AI research fields have typically been much less promising. When it comes to defining the deep learning vs machine learning debate, however, it’s highly likely that technicians will be looking more for practical applications than for theoretical academic discussion in the coming months. Suffice it to say that machine learning encompasses everything from the simplest AI to the most sophisticated predictive algorithms while deep learning constitutes a more selective subset of these techniques.

Practical Applications of Deep Learning Technology

Depending on how a particular program is authored, deep learning techniques could be deployed along supervised or semi-supervised neural networks. Theoretically, it’d also be possible to do so via a completely unsupervised node layout, and it’s this technique that has quickly become the most promising. Unsupervised networks may be useful for medical image analysis, since this application often presents unique pieces of graphical information to a computer program that have to be tested against known inputs.

Traditional binary tree or blockchain-based learning systems have struggled to identify the same patterns in dramatically different scenarios, because the information remains hidden in a structure that would have otherwise been designed to present data effectively. It’s essentially a natural form of steganography, and it has confounded computer algorithms in the healthcare industry. However, this new type of unsupervised learning node could virtually educate itself on how to match these patterns even in a data structure that isn’t organized along the normal lines that a computer would expect it to be.

Others have proposed implementing semi-supervised artificially intelligent marketing agents that could eliminate much of the concern over ethics regarding existing deal-closing software. Instead of trying to reach as large a customer base as possible, these tools would calculate the odds of any given individual needing a product at a given time. In order to do so, it would need certain types of information provided by the organization that it works on behalf of, but it would eventually be able to predict all further actions on its own.

While some companies are currently relying on tools that utilize traditional machine learning technology to achieve the same goals, these are often wrought with privacy and ethical concerns. The advent of deep structured learning algorithms have enabled software engineers to come up with new systems that don’t suffer from these drawbacks.

Developing a Private Automated Learning Environment

Conventional machine learning programs often run into serious privacy concerns because of the fact that they need a huge amount of input in order to draw any usable conclusions. Deep learning image recognition software works by processing a smaller subset of inputs, thus ensuring that it doesn’t need as much information to do its job. This is of particular importance for those who are concerned about the possibility of consumer data leaks.

Considering new regulatory stances on many of these issues, it’s also quickly become something that’s become important from a compliance standpoint as well. As toxicology labs begin using bioactivity-focused deep structured learning packages, it’s likely that regulators will express additional concerns in regards to the amount of information needed to perform any given task with this kind of sensitive data. Computer scientists have had to scale back what some have called a veritable fire hose of bytes that tell more of a story than most would be comfortable with.

In a way, these developments hearken back to an earlier time when it was believed that each process in a system should only have the amount of privileges necessary to complete its job. As machine learning engineers embrace this paradigm, it’s highly likely that future developments will be considerably more secure simply because they don’t require the massive amount of data mining necessary to power today’s existing operations.

Image Credit: toptal.io

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://datafloq.com/read/deep-learning-vs-machine-learning-how-emerging-field-influences-traditional-computer-programming/13652

Continue Reading

Artificial Intelligence

Extra Crunch roundup: Tonal EC-1, Deliveroo’s rocky IPO, is Substack really worth $650M?

Avatar

Published

on

For this morning’s column, Alex Wilhelm looked back on the last few months, “a busy season for technology exits” that followed a hot Q4 2020.

We’re seeing signs of an IPO market that may be cooling, but even so, “there are sufficient SPACs to take the entire recent Y Combinator class public,” he notes.

Once we factor in private equity firms with pockets full of money, it’s evident that late-stage companies have three solid choices for leveling up.

Seeking more insight into these liquidity options, Alex interviewed:

  • DigitalOcean CEO Yancey Spruill, whose company went public via IPO;
  • Latch CFO Garth Mitchell, who discussed his startup’s merger with real estate SPAC $TSIA;
  • Brian Cruver, founder and CEO of AlertMedia, which recently sold to a private equity firm.

After recapping their deals, each executive explains how their company determined which flashing red “EXIT” sign to follow. As Alex observed, “choosing which option is best from a buffet’s worth of possibilities is an interesting task.”

Thanks very much for reading Extra Crunch! Have a great weekend.

Walter Thompson
Senior Editor, TechCrunch
@yourprotagonist


Full Extra Crunch articles are only available to members
Use discount code ECFriday to save 20% off a one- or two-year subscription


The Tonal EC-1

Image Credits: Nigel Sussman

On Tuesday, we published a four-part series on Tonal, a home fitness startup that has raised $200 million since it launched in 2018. The company’s patented hardware combines digital weights, coaching and AI in a wall-mounted system that sells for $2,995.

By any measure, it is poised for success — sales increased 800% between December 2019 and 2020, and by the end of this year, the company will have 60 retail locations. On Wednesday, Tonal reported a $250 million Series E that valued the company at $1.6 billion.

Our deep dive examines Tonal’s origins, product development timeline, its go-to-market strategy and other aspects that combined to spark investor interest and customer delight.

We call this format the “EC-1,” since these stories are as comprehensive and illuminating as the S-1 forms startups must file with the SEC before going public.

Here’s how the Tonal EC-1 breaks down:

We have more EC-1s in the works about other late-stage startups that are doing big things well and making news in the process.

What to make of Deliveroo’s rough IPO debut

Why did Deliveroo struggle when it began to trade? Is it suffering from cultural dissonance between its high-growth model and more conservative European investors?

Let’s peek at the numbers and find out.

Kaltura puts debut on hold. Is the tech IPO window closing?

The Exchange doubts many folks expected the IPO climate to get so chilly without warning. But we could be in for a Q2 pause in the formerly scorching climate for tech debuts.

Is Substack really worth $650M?

A $65 million Series B is remarkable, even by 2021 standards. But the fact that a16z is pouring more capital into the alt-media space is not a surprise.

Substack is a place where publications have bled some well-known talent, shifting the center of gravity in media. Let’s take a look at Substack’s historical growth.

RPA market surges as investors, vendors capitalize on pandemic-driven tech shift

Business process organization and analytics. Business process visualization and representation, automated workflow system concept. Vector concept creative illustration

Image Credits: Visual Generation / Getty Images

Robotic process automation came to the fore during the pandemic as companies took steps to digitally transform. When employees couldn’t be in the same office together, it became crucial to cobble together more automated workflows that required fewer people in the loop.

RPA has enabled executives to provide a level of automation that essentially buys them time to update systems to more modern approaches while reducing the large number of mundane manual tasks that are part of every industry’s workflow.

E-commerce roll-ups are the next wave of disruption in consumer packaged goods

Elevated view of many toilet rolls on blue background

Image Credits: Javier Zayas Photography (opens in a new window) / Getty Images

This year is all about the roll-ups, the aggregation of smaller companies into larger firms, creating a potentially compelling path for equity value. The interest in creating value through e-commerce brands is particularly striking.

Just a year ago, digitally native brands had fallen out of favor with venture capitalists after so many failed to create venture-scale returns. So what’s the roll-up hype about?

Hack takes: A CISO and a hacker detail how they’d respond to the Exchange breach

3d Flat isometric vector concept of data breach, confidential data stealing, cyber attack.

Image Credits: TarikVision (opens in a new window) / Getty Images

The cyber world has entered a new era in which attacks are becoming more frequent and happening on a larger scale than ever before. Massive hacks affecting thousands of high-level American companies and agencies have dominated the news recently. Chief among these are the December SolarWinds/FireEye breach and the more recent Microsoft Exchange server breach.

Everyone wants to know: If you’ve been hit with the Exchange breach, what should you do?

5 machine learning essentials nontechnical leaders need to understand

Jumble of multicoloured wires untangling into straight lines over a white background. Cape Town, South Africa. Feb 2019.

Image Credits: David Malan (opens in a new window) / Getty Images

Machine learning has become the foundation of business and growth acceleration because of the incredible pace of change and development in this space.

But for engineering and team leaders without an ML background, this can also feel overwhelming and intimidating.

Here are best practices and must-know components broken down into five practical and easily applicable lessons.

Embedded procurement will make every company its own marketplace

Businesswomen using mobile phone analyzing data and economic growth graph chart. Technology digital marketing and network connection.

Image Credits: Busakorn Pongparnit / Getty Images

Embedded procurement is the natural evolution of embedded fintech.

In this next wave, businesses will buy things they need through vertical B2B apps, rather than through sales reps, distributors or an individual merchant’s website.

Knowing when your startup should go all-in on business development

One red line with arrow head breaking out from a business or finance growth chart canvas.

Image Credits: twomeows / Getty Images

There’s a persistent fallacy swirling around that any startup growing pain or scaling problem can be solved with business development.

That’s frankly not true.

Dear Sophie: What should I know about prenups and getting a green card through marriage?

lone figure at entrance to maze hedge that has an American flag at the center

Image Credits: Bryce Durbin/TechCrunch

Dear Sophie:

I’m a founder of a startup on an E-2 investor visa and just got engaged! My soon-to-be spouse will sponsor me for a green card.

Are there any minimum salary requirements for her to sponsor me? Is there anything I should keep in mind before starting the green card process?

— Betrothed in Belmont

Startups must curb bureaucracy to ensure agile data governance

Image of a computer, phone and clock on a desk tied in red tape.

Image Credits: RichVintage / Getty Images

Many organizations perceive data management as being akin to data governance, where responsibilities are centered around establishing controls and audit procedures, and things are viewed from a defensive lens.

That defensiveness is admittedly justified, particularly given the potential financial and reputational damages caused by data mismanagement and leakage.

Nonetheless, there’s an element of myopia here, and being excessively cautious can prevent organizations from realizing the benefits of data-driven collaboration, particularly when it comes to software and product development.

Bring CISOs into the C-suite to bake cybersecurity into company culture

Mixed race businesswoman using tablet computer in server room

Image Credits: Jetta Productions Inc (opens in a new window) / Getty Images

Cyber strategy and company strategy are inextricably linked. Consequently, chief information security officers in the C-Suite will be just as common and influential as CFOs in maximizing shareholder value.

How is edtech spending its extra capital?

Money tree: an adult hand reaches for dollar bills growing on a leafless tree

Image Credits: Tetra Images (opens in a new window) / Getty Images

Edtech unicorns have boatloads of cash to spend following the capital boost to the sector in 2020. As a result, edtech M&A activity has continued to swell.

The idea of a well-capitalized startup buying competitors to complement its core business is nothing new, but exits in this sector are notable because the money used to buy startups can be seen as an effect of the pandemic’s impact on remote education.

But in the past week, the consolidation environment made a clear statement: Pandemic-proven startups are scooping up talent — and fast.

Tech in Mexico: A confluence of Latin America, the US and Asia

Aerial view of crowd connected by lines

Image Credits: Orbon Alija (opens in a new window)/ Getty Images

Knowledge transfer is not the only trend flowing in the U.S.-Asia-LatAm nexus. Competition is afoot as well.

Because of similar market conditions, Asian tech giants are directly expanding into Mexico and other LatAm countries.

How we improved net retention by 30+ points in 2 quarters

Sparks coming off US dollar bill attached to jumper cables

Image Credits: Steven Puetzer (opens in a new window) / Getty Images

There’s certainly no shortage of SaaS performance metrics leaders focus on, but NRR (net revenue retention) is without question the most underrated metric out there.

NRR is simply total revenue minus any revenue churn plus any revenue expansion from upgrades, cross-sells or upsells. The greater the NRR, the quicker companies can scale.

5 mistakes creators make building new games on Roblox

BRAZIL - 2021/03/24: In this photo illustration a Roblox logo seen displayed on a smartphone. (Photo Illustration by Rafael Henrique/SOPA Images/LightRocket via Getty Images)

Image Credits: SOPA Images (opens in a new window) / Getty Images

Even the most experienced and talented game designers from the mobile F2P business usually fail to understand what features matter to Robloxians.

For those just starting their journey in Roblox game development, these are the most common mistakes gaming professionals make on Roblox.

CEO Manish Chandra, investor Navin Chaddha explain why Poshmark’s Series A deck sings

CEO Manish Chandra, investor Navin Chaddha explain why Poshmark’s Series A deck sings image

“Lead with love, and the money comes.” It’s one of the cornerstone values at Poshmark. On the latest episode of Extra Crunch Live, Chandra and Chaddha sat down with us and walked us through their original Series A pitch deck.

Will the pandemic spur a smart rebirth for cities?

New versus old - an old brick building reflected in windows of modern new facade

Image Credits: hopsalka (opens in a new window) / Getty Images

Cities are bustling hubs where people live, work and play. When the pandemic hit, some people fled major metropolitan markets for smaller towns — raising questions about the future validity of cities.

But those who predicted that COVID-19 would destroy major urban communities might want to stop shorting the resilience of these municipalities and start going long on what the post-pandemic future looks like.

The NFT craze will be a boon for lawyers

3d rendering of pink piggy bank standing on sounding block with gavel lying beside on light-blue background with copy space. Money matters. Lawsuit for money. Auction bids.

Image Credits: Gearstd (opens in a new window) / Getty Images

There’s plenty of uncertainty surrounding copyright issues, fraud and adult content, and legal implications are the crux of the NFT trend.

Whether a court would protect the receipt-holder’s ownership over a given file depends on a variety of factors. All of these concerns mean artists may need to lawyer up.

Viewing Cazoo’s proposed SPAC debut through Carvana’s windshield

It’s a reasonable question: Why would anyone pay that much for Cazoo today if Carvana is more profitable and whatnot? Well, growth. That’s the argument anyway.

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://techcrunch.com/2021/04/02/extra-crunch-roundup-tonal-ec-1-deliveroos-rocky-ipo-is-substack-really-worth-650m/

Continue Reading

AI

What did COVID do to all our models?

Avatar

Published

on

What did COVID do to all our models?

An interview with Dean Abbott and John Elder about change management, complexity, interpretability, and the risk of AI taking over humanity.


By Heather Fyson, KNIME

What did COVID do to all our models?

After the KNIME Fall Summit, the dinosaurs went back home… well, switched off their laptops. Dean Abbott and John Elder, longstanding data science experts, were invited to the Fall Summit by Michael to join him in a discussion of The Future of Data Science: A Fireside Chat with Industry Dinosaurs. The result was a sparkling conversation about data science challenges and new trends. Since switching off the studio lights, Rosaria has distilled and expanded some of the highlights about change management, complexity, interpretability, and more in the data science world. Let’s see where it brought us.

What is your experience with change management in AI, when reality changes and models have to be updated? What did COVID do to all our models?

 
[Dean] Machine Learning (ML) algorithms assume consistency between past and future. When things change, the models fail. COVID has changed our habits, and therefore our data. Pre-COVID models struggle to deal with the new situation.

[John] A simple example would be the Traffic layer on Google Maps. After lockdowns hit country after country in 2020, Google Maps traffic estimates were very inaccurate for a while. It had been built on fairly stable training data but now that system was thrown completely out of whack.

How do you figure out when the world has changed and the models don’t work anymore?

 
[Dean] Here’s a little trick I use: I partition my data by time and label records as “before” and “after”. I then build a classification model to discriminate the “after” vs. the “before” from the same inputs the model uses. If the discrimination is possible, then the “after” is different from the “before”, the world has changed, the data has changed, and the models must be retrained.

How complicated is it to retrain models in projects, especially after years of customization?

 
[John] Training models is usually the easiest step of all! The vast majority of otherwise successful projects die in the implementation phase. The greatest time is spent in the data cleansing and preparation phase. And the most problems are missed or made in the business understanding / project definition phase. So if you understand what the flaw is and can obtain new data and have the implementation framework in place, creating a new model is, by comparison, very straightforward.

Based on your decades-long experience, how complex is it to put together a really functioning Data Science application?

 
[John] It can vary of course, by complexity. Most of our projects get functioning prototypes at least in a few months. But for all, I cannot stress enough the importance of feedback: You have to talk to people much more often than you want to. And listen! We learn new things about the business problem, the data, or constraints, each time. Not all us quantitative people are skilled at speaking with humans, so it often takes a team. But the whole team of stakeholders has to learn to speak the same language.

[Dean] It is important to talk to our business counterpart. People fear change and don’t want to change the current status. One key problem really is psychological. The analysts are often seen as an annoyance. So, we have to build the trust between the business counterpart and the analytics geeks. The start of a project should always include the following step: Sync up domain experts / project managers, the analysts, and the IT and infrastructure (DevOps) team so everyone is clear on the objectives of the project and how it will be executed. Analysts are number 11 on the top 10 list of people they have to see every day! Let’s avoid embodying data scientist arrogance: “The business can’t understand us/our techniques, but we know what works best”. What we don’t understand, however, are the domains experts are actually experts in the domain we are working in! Translation of data science assumptions and approaches into language that is understood by the domain experts is key!

The latest trend now is deep learning, apparently it can solve everything. I got a question from a student lately, asking “why do we need to learn other ML algorithms if deep learning is the state of the art to solve data science problems”?

 
[Dean] Deep learning sucked a lot of the oxygen out of the room. It feels so much like the early 1990s when neural networks ascended with similar optimism! Deep Learning is a set of powerful techniques for sure, but they are hard to implement and optimize. XGBoost, Ensembles of trees, are also powerful but currently more mainstream. The vast majority of problems we need to solve using advanced analytics really don’t require complex solutions, so start simple; deep learning is overkill in these situations. It is best to use the Occam’s razor principle: if two models perform the same, adopt the simplest.

About complexity. The other trend, opposite to deep learning, is ML interpretability. Here, you greatly (excessively?) simplify the model in order to be able to explain it. Is interpretability that important?

 
[John] I often find myself fighting interpretability. It is nice, sure, but often comes at too high a cost of the most important model property: reliable accuracy. But many stakeholders believe interpretability is essential, so it becomes a barrier for acceptance. Thus, it is essential to discover what kind of interpretability is needed. Perhaps it is just knowing what the most important variables are? That’s doable with many nonlinear models. Maybe, as with explaining to credit applicants why they were turned down, one just needs to interpret outputs for one case at a time? We can build a linear approximation for a given point. Or, we can generate data from our black box model and build an “interpretable” model of any complexity to fit that data.

Lastly, research has shown that if users have the chance to play with a model – that is, to poke it with trial values of inputs and see its outputs, and perhaps visualize it – they get the same warm feelings of interpretability. Overall, trust – in the people and technology behind the model – is necessary for acceptance, and this is enhanced by regular communication and by including the eventual users of the model in the build phases and decisions of the modeling process.

[Dean] By the way KNIME Analytics Platform has a great feature to quantify the importance of the input variables in a Random Forest! The Random Forest Learner node outputs the statistics of candidate and splitting variables. Remember that, when you use the Random Forest Learner node.

There is an increase in requests for explanations of what a model does. For example, for some security classes, the European Union is demanding verification that the model doesn’t do what it’s not supposed to do. If we have to explain it all, then maybe Machine Learning is not the way to go. No more Machine Learning?

 
[Dean]  Maybe full explainability is too hard to obtain, but we can achieve progress by performing a grid search on model inputs to create something like a score card describing what the model does. This is something like regression testing in hardware and software QA. If a formal proof what models are doing is not possible, then let’s test and test and test! Input Shuffling and Target Shuffling can help to achieve a rough representation of the model behavior.

[John] Talking about understanding what a model does, I would like to raise the problem of reproducibility in science. A huge proportion of journal articles in all fields — 65 to 90% — is believed to be unreplicable. This is a true crisis in science. Medical papers try to tell you how to reproduce their results. ML papers don’t yet seem to care about reproducibility. A recent study showed that only 15% of AI papers share their code.

Let’s talk about Machine Learning Bias. Is it possible to build models that don’t discriminate?

 
[John] (To be a nerd for a second, that word is unfortunately overloaded. To “discriminate” in the ML world word is your very goal: to make a distinction between two classes.) But to your real question, it depends on the data (and on whether the analyst is clever enough to adjust for weaknesses in the data): The models will pull out of the data the information reflected therein. The computer knows nothing about the world except for what’s in the data in front of it. So the analyst has to curate the data — take responsibility for those cases reflecting reality. If certain types of people, for example, are under-represented then the model will pay less attention to them and won’t be as accurate on them going forward. I ask, “What did the data have to go through to get here?” (to get in this dataset) to think of how other cases might have dropped out along the way through the process (that is survivor bias). A skilled data scientist can look for such problems and think of ways to adjust/correct for them.

[Dean] The bias is not in the algorithms. The bias is in the data. If the data is biased, we’re working with a biased view of the world. Math is just math, it is not biased.

Will AI take over humanity?!

 
[John] I believe AI is just good engineering. Will AI exceed human intelligence? In my experience anyone under 40 believes yes, this is inevitable, and most over 40 (like me, obviously): no! AI models are fast, loyal, and obedient. Like a good German Shepherd dog, an AI model will go and get that ball, but it knows nothing about the world other than the data it has been shown. It has no common sense. It is a great assistant for specific tasks, but actually quite dimwitted.

[Dean] On that note, I would like to report two quotes made by Marvin Minsky in 1961 and 1970, from the dawn of AI, that I think describe well the future of AI.

“Within our lifetime some machines may surpass us in general intelligence” (1961)

“In three to eight years we’ll have a machine with the intelligence of a human being” (1970)

These ideas have been around for a long time. Here is one reason why AI will not solve all the problems: We’re judging its behavior based on one number, one number only! (Model error.) For example, predictions of stock prices over the next five years, predicted by building models using root mean square error as the error metric, cannot possibly paint the full picture of what the data are actually doing and severely hampers the model and its ability to flexibly uncover the patterns. We all know that RMSE is too coarse of a measure. Deep Learning algorithms will continue to get better, but we also need to get better at judging how good a model really is. So, no! I do not think that AI will take over humanity.

We have reached the end of this interview. We would like to thank Dean and John for their time and their pills of knowledge. Let’s hope we meet again soon!

About Dean Abbott and John Elder

What did COVID do to all our models Dean Abbott is Co-Founder and Chief Data Scientist at SmarterHQ. He is an internationally recognized expert and innovator in data science and predictive analytics, with three decades of experience solving problems in omnichannel customer analytics, fraud detection, risk modeling, text mining & survey analysis. Included frequently in lists of pioneering data scientists and data scientists, he is a popular keynote speaker and workshop instructor at conferences worldwide, also serving on Advisory Boards for the UC/Irvine Predictive Analytics and UCSD Data Science Certificate programs. He is the author of Applied Predictive Analytics (Wiley, 2014) and co-author of The IBM SPSS Modeler Cookbook (Packt Publishing, 2013).


What did COVID do to all our models John Elder founded Elder Research, America’s largest and most experienced data science consultancy in 1995. With offices in Charlottesville VA, Baltimore MD, Raleigh, NC, Washington DC, and London, they’ve solved hundreds of challenges for commercial and government clients by extracting actionable knowledge from all types of data. Dr. Elder co-authored three books — on practical data mining, ensembles, and text mining — two of which won “book of the year” awards. John has created data mining tools, was a discoverer of ensemble methods, chairs international conferences, and is a popular workshop and keynote speaker.


 
Bio: Heather Fyson is the blog editor at KNIME. Initially on the Event Team, her background is actually in translation & proofreading, so by moving to the blog in 2019 she has returned to her real passion of working with texts. P.S. She is always interested to hear your ideas for new articles.

Original. Reposted with permission.

Related:

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://www.kdnuggets.com/2021/04/covid-do-all-our-models.html

Continue Reading

AI

The AI Trends Reshaping Health Care

Avatar

Published

on

Click to learn more about author Ben Lorica.

Applications of AI in health care present a number of challenges and considerations that differ substantially from other industries. Despite this, it has also been one of the leaders in putting AI to work, taking advantage of the cutting-edge technology to improve care. The numbers speak for themselves: The global AI in health care market size is expected to grow from $4.9 billion in 2020 to $45.2 billion by 2026. Some major factors driving this growth are the sheer volume of health care data and growing complexities of datasets, the need to reduce mounting health care costs, and evolving patient needs.

Deep learning, for example, has made considerable inroads into the clinical environment over the last few years. Computer vision, in particular, has proven its value in medical imaging to assist in screening and diagnosis. Natural language processing (NLP) has provided significant value in addressing both contractual and regulatory concerns with text mining and data sharing. Increasing adoption of AI technology by pharmaceutical and biotechnology companies to expedite initiatives like vaccine and drug development, as seen in the wake of COVID-19, only exemplifies AI’s massive potential.

We’re already seeing amazing strides in health care AI, but it’s still the early days, and to truly unlock its value, there’s a lot of work to be done in understanding the challenges, tools, and intended users shaping the industry. New research from John Snow Labs and Gradient Flow, 2021 AI in Healthcare Survey Report, sheds light on just this: where we are, where we’re going, and how to get there. The global survey explores the important considerations for health care organizations in varying stages of AI adoption, geographies, and technical prowess to provide an extensive look into the state of AI in health care today.               

One of the most significant findings is around which technologies are top of mind when it comes to AI implementation. When asked what technologies they plan to have in place by the end of 2021, almost half of respondents cited data integration. About one-third cited natural language processing (NLP) and business intelligence (BI) among the technologies they are currently using or plan to use by the end of the year. Half of those considered technical leaders are using – or soon will be using – technologies for data integration, NLP, business intelligence, and data warehousing. This makes sense, considering these tools have the power to help make sense of huge amounts of data, while also keeping regulatory and responsible AI practices in mind.

When asked about intended users for AI tools and technologies, over half of respondents identified clinicians among their target users. This indicates that AI is being used by people tasked with delivering health care services – not just technologists and data scientists, as in years past. That number climbs even higher when evaluating mature organizations, or those that have had AI models in production for more than two years. Interestingly, nearly 60% of respondents from mature organizations also indicated that patients are also users of their AI technologies. With the advent of chatbots and telehealth, it will be interesting to see how AI proliferates for both patients and providers over the next few years.

In considering software for building AI solutions, open-source software (53%) had a slight edge over public cloud providers (42%). Looking ahead one to two years, respondents indicated openness to also using both commercial software and commercial SaaS. Open-source software gives users a level of autonomy over their data that cloud providers can’t, so it’s not a big surprise that a highly regulated industry like health care would be wary of data sharing. Similarly, the majority of companies with experience deploying AI models to production choose to validate models using their own data and monitoring tools, rather than evaluation from third parties or software vendors. While earlier-stage companies are more receptive to exploring third-party partners, more mature organizations are tending to take a more conservative approach.                      

Generally, attitudes remained the same when asked about key criteria used to evaluate AI solutions, software libraries or SaaS solutions, and consulting companies to work with.Although the answers varied slightly for each category,technical leaders considered no data sharing with software vendors or consulting companies, the ability to train their own models, and state-of-the art accuracy as top priorities. Health care-specific models and expertise in health care data engineering, integration, and compliance topped the list when asked about solutions and potential partners. Privacy, accuracy, and health care experience are the forces driving AI adoption. It’s clear that AI is poised for even more growth, as data continues to grow and technology and security measures improve. Health care, which can sometimes be seen as a laggard for quick adoption, is taking to AI and already seeing its significant impact. While its approach, the top tools and technologies, and applications of AI may differ from other industries, it will be exciting to see what’s in store for next year’s survey results.

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://www.dataversity.net/the-ai-trends-reshaping-health-care/

Continue Reading
Esports2 days ago

chessbae removed as moderator from Chess.com amid drama

Esports4 days ago

Dota 2 Dawnbreaker Hero Guide

Esports3 days ago

Why did Twitch ban the word “obese” from its predictions?

Esports4 days ago

Dallas Empire escape with a win against Minnesota at the Stage 2 Major

Esports4 days ago

A detailed look at Dawnbreaker, Dota 2’s first new carry in four years

Esports5 days ago

Dota 2 new hero: A list of possible suspects

Esports1 day ago

Hikaru Nakamura drops chessbae, apologizes for YouTube strike

Esports4 days ago

Dota 2: Patch 7.29 Analysis Of Top Changes

Esports1 day ago

DreamHack Online Open Ft. Fortnite April Edition – How To Register, Format, Dates, Prize Pool & More

Esports3 days ago

Dota 2: Team Nigma Completes Dota 2 Roster With iLTW

Fintech2 days ago

Australia’s Peppermint Innovation signs agreement with the Philippine’s leading micro-financial services provider

Esports5 days ago

Apex Legends tier list: the best legends to use in Season 8

Blockchain5 days ago

Krypto-News Roundup 9. April

Esports4 days ago

xQc calls ZULUL supporters racist for wanting Twitch emote back

Esports4 days ago

Dota 2 patch 7.29: Impact of Outposts, Water Runes and other major general gameplay changes

Esports4 days ago

Geely Holdings’ LYNK&CO Sponsors LNG Esports’ LPL Team

Esports3 days ago

Fortnite: Blatant Cheater Finishes Second In A Solo Cash Cup

Esports4 days ago

Mission Control, Tripleclix Team with Hollister for Fortnite Event/Product Launch

Blockchain4 days ago

Revolut integriert 11 neue Kryptowährungen

Esports3 days ago

Hikaru Nakamura accused of striking Eric Hansen’s YouTube channel

Trending