This post is co-written with Liam Pearson, a Data Scientist at Genworth Mortgage Insurance Australia Limited.
Genworth Mortgage Insurance Australia Limited is a leading provider of lenders mortgage insurance (LMI) in Australia; their shares are traded on Australian Stock Exchange as ASX: GMA.
Genworth Mortgage Insurance Australia Limited is a lenders mortgage insurer with over 50 years of experience and volumes of data collected, including data on dependencies between mortgage repayment patterns and insurance claims. Genworth wanted to use this historical information to train Predictive Analytics for Loss Mitigation (PALM) machine learning (ML) models. With the ML models, Genworth could analyze recent repayment patterns for each of the insurance policies to prioritize them in descending order of likelihood (chance of a claim) and impact (amount insured). Genworth wanted to run batch inference on ML models in parallel and on schedule while keeping the amount of effort to build and operate the solution to the minimum. Therefore, Genworth and AWS chose Amazon SageMaker batch transform jobs and serverless building blocks to ingest and transform data, perform ML inference, and process and publish the results of the analysis.
Genworth’s Advanced Analytics team engaged in an AWS Data Lab program led by Data Lab engineers and solutions architects. In a pre-lab phase, they created a solution architecture to fit specific requirements Genworth had, especially around security controls, given the nature of the financial services industry. After the architecture was approved and all AWS building blocks identified, training needs were determined. AWS Solutions Architects conducted a series of hands-on workshops to provide the builders at Genworth with the skills required to build the new solution. In a 4-day intensive collaboration, called a build phase, the Genworth Advanced Analytics team used the architecture and learnings to build an ML pipeline that fits their functional requirements. The pipeline is fully automated and is serverless, meaning that there is no maintenance, scaling issues, or downtime. Post-lab activities were focused on productizing the pipeline and adopting it as a blueprint for other ML use cases.
In this post, we (the joint team of Genworth and AWS Architects) explain how we approached the design and implementation of the solution, the best practices we followed, the AWS services we used, and the key components of the solution architecture.
We followed the modern ML pipeline pattern to implement a PALM solution for Genworth. The pattern allows ingestion of data from various sources, followed by transformation, enrichment, and cleaning of the data, then ML prediction steps, finishing up with the results made available for consumption with or without data wrangling of the output.
In short, the solution implemented has three components:
- Data ingestion and preparation
- ML batch inference using three custom developed ML models
- Data post processing and publishing for consumption
The following is the architecture diagram of the implemented solution.
Let’s discuss the three components in more detail.
Component 1: Data ingestion and preparation
Genworth source data is published weekly into a staging table in their Oracle on-premises database. The ML pipeline starts with an AWS Glue job (Step 1, Data Ingestion, in the diagram) connecting to the Oracle database over an AWS Direct Connect connection secured with VPN to ingest raw data and store it in an encrypted Amazon Simple Storage Service (Amazon S3) bucket. Then a Python shell job runs using AWS Glue (Step 2, Data Preparation) to select, clean, and transform the features used later in the ML inference steps. The results are stored in another encrypted S3 bucket used for curated datasets that are ready for ML consumption.
Component 2: ML batch inference
Genworth’s Advanced Analytics team has already been using ML on premises. They wanted to reuse pretrained model artifacts to implement a fully automated ML inference pipeline on AWS. Furthermore, the team wanted to establish an architectural pattern for future ML experiments and implementations, allowing them to iterate and test ideas quickly in a controlled environment.
The three existing ML artifacts forming the PALM model were implemented as a hierarchical TensorFlow neural network model using Keras. The models seek to predict the probability of an insurance policy submitting a claim, the estimated probability of a claim being paid, and the magnitude of that possible claim.
Because each ML model is trained on different data, the input data needs to be standardized accordingly. Individual AWS Glue Python shell jobs perform this data standardization specific to each model. Three ML models are invoked in parallel using SageMaker batch transform jobs (Step 3, ML Batch Prediction) to perform the ML inference and store the prediction results in the model outputs S3 bucket. SageMaker batch transform manages the compute resources, installs the ML model, handles data transfer between Amazon S3 and the ML model, and easily scales out to perform inference on the entire dataset.
Component 3: Data postprocessing and publishing
Before the prediction results from the three ML models are ready for use, they require a series of postprocessing steps, which were performed using AWS Glue Python shell jobs. The results are aggregated and scored (Step 4, PALM Scoring), business rules applied (Step 5, Business Rules), the files generated (Step 6, User Files Generation), and data in the files validated (Step 7, Validation) before publishing the output of these steps back to a table in the on-premises Oracle database (Step 8, Delivering the Results). The solution uses Amazon Simple Notification Service (Amazon SNS) and Amazon CloudWatch Events to notify users via email when the new data becomes available or any issues occur (Step 10, Alerts & Notifications).
All of the steps in the ML pipeline are decoupled and orchestrated using AWS Step Functions, giving Genworth the ease of implementation, the ability to focus on the business logic instead of the scaffolding, and the flexibility they need for future experiments and other ML use cases. The following diagram shows the ML pipeline orchestration using a Step Functions state machine.
Business benefit and what’s next
By building a modern ML platform, Genworth was able to automate an end-to-end ML inference process, which ingests data from an Oracle database on premises, performs ML operations, and helps the business make data-driven decisions. Machine learning helps Genworth simplify high-value manual work performed by the Loss Mitigation team.
This Data Lab engagement has demonstrated the importance of making modern ML and analytics tools available to teams within an organization. It has been a remarkable experience witnessing how quickly an idea can be piloted and, if successful, productionized.
In this post, we showed you how easy it is to build a serverless ML pipeline at scale with AWS Data Analytics and ML services. As we discussed, you can use AWS Glue for a serverless, managed ETL processing job and SageMaker for all your ML needs. All the best on your build!
Genworth, Genworth Financial, and the Genworth logo are registered service marks of Genworth Financial, Inc. and used pursuant to license.
About the Authors
Liam Pearson is a Data Scientist at Genworth Mortgage Insurance Australia Limited who builds and deploys ML models for various teams within the business. In his spare time, Liam enjoys seeing live music, swimming and—like a true millennial—enjoying some smashed avocado.
Maria Sokolova is a Solutions Architect at Amazon Web Services. She helps enterprise customers modernize legacy systems and accelerates critical projects by providing technical expertise and transformations guidance where they’re needed most.
Vamshi Krishna Enabothala is a Data Lab Solutions Architect at AWS. Vamshi works with customers on their use cases, architects a solution to solve their business problems, and helps them build a scalable prototype. Outside of work, Vamshi is an RC enthusiast, building and playing with RC equipment (cars, boats, and drones), and also enjoys gardening.
OceanDAO Launches 7th Round of Grants, valued at $224K, for Data Science, Developer, AI Research Projects
OceanDAO, a distributed autonomous organization supporting the Ocean Protocol, reveals that the 7th round is now open for submissions. More than $200,000 is being offered for Data Science, Developer, and AI Research projects according to a release shared with Crowdfund Insider.
During its first six months, OceanDAO has “made 49 grants to community projects,” the announcement noted while adding that more than 15M OCEAN tokens used were to vote in the funding initiative, “painting a promising picture of an autonomous future for the Ocean Protocol community.”
The announcement also mentioned that OceanDAO presents opportunities for public financing that’s open to data science and AI practitioners “interested in building and creating streams to sell and curate data.”
The release also noted:
“OceanDAO’s seventh round is now open for submissions with 400,000 OCEAN (valued at $224K USD) available and up to 32,000 OCEAN per project. Proposals are due by July 6th. The community voting period begins on July 8th. Interested parties can pitch project ideas and form teams on the OceanDAO Discord. More information on the submission process can be found on OceanDAO’s website. OceanDAO is the community funding initiative of Ocean Protocol, the data exchange protocol.”
The update pointed out that OceanDAO’s funding has managed to reach almost ½ million OCEAN tokens during the first six rounds since its launch. OceanDAO, the grants DAO to assist with funding Ocean Protocol community-curated initiatives, has reportedly made 49 allocations since December of last year, with its 7th round now taking submissions.
OceanDAO intends to expand the fast-evolving Ocean ecosystem, as “a key component in the Ocean’s near-term growth and long-term sustainability,” the release noted while adding that OceanDAO remains focused on making strategic investments in certain areas that can assist with expanding the Ocean Protocol ecosystem including: “building and improving applications or integrations to Ocean, community outreach, making data available on an Ocean-powered marketplace, building and improving Ocean core software, and improvements to the OceanDAO.”
Alex Napheys, OceanDAO Community & Growth Lead, stated:
“Our main goal is to support the long-term growth of the Ocean Protocol. The OceanDAO community is evolving monthly including some of the brightest and enthusiastic builders in the new data economy sector. The DAO aims to continually grow the [number] of projects it supports by onboarding the next wave to the OceanDAO community.”
As mentioned in the release, the community behind OceanDAO includes talented data scientists, engineers, builders, educators, and more. OceanDAO holds monthly rounds, during which teams are invited to apply for grants.
OceanDAO community regularly casts its votes for initiatives that aim to provide the best chance for growth and sustainability “based on the following criteria: return on investment towards growth and alignment with Ocean’s mission.”
Town Hall meetings are “held every week and are open to the public to discuss the status of projects and the future of the DAO,” the announcement confirmed.
OceanDAO backs initiatives across “all aforementioned categories with financial resources to meet their objectives.”
OceanDAO investments reportedly include:
- DataUnion.app, the project “creates a two-sided market and economy for crowdsourced data to enable long and short-term benefits of AI for everyone.”
- Rugpullindex.com, helping data scientists “to make better decisions when buying data online.”
- Opsci Bay, an open science bay “for self-sovereign data flows from Lab to Market that is GDPR-compliant.”
- Data Whale, a user-friendly “one-stop” solution that “helps data economy participants to understand the ecosystem and make smart staking decisions.”
- ResilientML, will bring a vast collection of data sets “curated by experts in NLP for utilization directly in machine learning methods and sentiment models running in the Ocean environment and available through the Ocean marketplace.”
As noted in the release:
“As the projects drive traction in the Ocean ecosystem, it grows network fees and improves fundamentals for OCEAN, which in turn increases funds to OceanDAO available for future investments. This “snowball effect” is a core mechanism of the Web3 Sustainability Loop developed by Ocean Protocol Founder Trent McConaghy, in which both Network Revenue and Network Rewards are directed to work that is used for growth.”
Network Rewards help “to kickstart the project and to ensure funding. Network Revenue can help to push growth further once the Web3 project achieves traction at scale,” the announcement noted.
You may access the list of initiatives supported since OceanDAO’s launch here. OceanDAO has reportedly seen more than 60 proposals since December of last year, and all project proposals are publicly available to view online.
As previously reported, Ocean Protocol’s mission is to support a new Data Economy that “reaches the world, giving power back to data owners and enabling people to capture value from data to better our world.”
According to Ocean Protocol developers, data is like “a new asset class; Ocean Protocol unlocks its value.” Data owners and consumers use the Ocean Market app “to publish, discover, and consume data assets in a secure, privacy-preserving fashion.”
Ocean datatokens “turn data into data assets” and this enables data wallets, data exchanges, and data co-ops by “leveraging crypto wallets, exchanges, and other DeFi tools.” Projects use Ocean libraries and OCEAN in their own apps “to help drive the new Data Economy.”
The OCEAN token is used “to stake on data, govern Ocean Protocol’s community funding, and buy & sell data,” the announcement explained while confirming that its supply is “disbursed over time to drive near-term growth and long-term sustainability.” OCEAN has been designed “to increase with a rise in usage volume.”
AI Fraud Protection Firm Servicing Digital Goods nSure.ai Raises $6.8 Million Seed Round
Israel-based nSure.ai has raised a $6.8 million Seed round led by DisruptiveAI, Phoenix Insurance, Kamet (an AXA backed VC), Moneta Seeds and other individual investors.
nSure.ai is a “predictive AI fraud protection company” that services digital goods such as gift cards, prepaid debit cards, software and game keys, digital wallet transfers, international money transfers, tickets, and more. The company explains that sellers of physical goods have processing times that allow them to double-check charges and can withhold a shipment if needed. Digital sellers lack this buffer, so even if fraud is detected minutes later, the assailant may be untraceable. nSure.ai is bringing anti-fraud technological and chargeback guarantees to the digital goods sector.
“We are thrilled that our investors have placed their trust in our leadership and confidence in nSure.ai,” says Alex Zeltcer, co-founder and CEO. “This investment enables us to register thousands of new merchants, who can feel confident selling higher-risk digital goods, without accepting fraud as a part of business.”
The founders of nSure.ai, Zeltcer and Ziv Isaiah say they experienced first-hand the unique challenges faced by retailers of digital assets. During the first week of operating their online gift card business, 40% of sales were fraudulent, resulting in chargebacks. nSure.ai’s 98% approval rate offers a more accurate fraud-detection strategy, allowing retailers to recapture nearly $100 billion a year in revenue lost by declining legitimate customers, according to Zeltcer.
Gadi Tirosh, Venture Partner at Disruptive AI, says they believe fraud, especially in the field of digital goods, can only be fought with top-of-the-line AI technologies.
“nSure.ai has both the technology and industry understanding to win this market.”
The funding is expected to be used to further develop nSure.ai’s predictive AI and machine learning algorithms. nSure.ai solution currently monitors and manages millions of transactions every month, and has approved close to $1B in volume since going live.
AI enhanced Insurtech Tractable Acquires $60M via Series D Round led by Insight Partners, Georgian
Tractable, the AI firm assisting insurers with accident and disaster recovery, recently revealed that it has acquired $60 million through a Series D round that was led by Insight Partners and Georgian.
Tractable’s latest investment round has now doubled the total raised by the firm from $55 million to $115 million and values the business at $1 billion – making it “the world’s first computer vision ‘unicorn’ for financial services,” according to a release.
As explained in the announcement:
“When drivers get into an accident, they (or their repairer) can submit photos of the damage to their insurer, which Tractable’s AI analyzes in real time to accelerate decisions that can otherwise take days, such as predicting whether the car is repairable, or assessing what repairs should take place. Over 20 of the global top 100 auto insurers use Tractable today to help their customers get back to normal faster after an accident.”
The proceeds from the investment round will serve to “double down” on accident recovery, which is the firm’s primary business. It will also fund new artificial intelligence solutions for accurately assessing the condition of an automobile, enabling users to understand car damage down to individual parts – “to enable transparent sale and purchase decisions.”
The release further noted that LKQ North America, the provider of alternative automobile parts and the largest automotive recycler in the world, currently uses Tractable’s AI “to optimize the recycling of end-of-life vehicles in North America.” The update also mentioned that automotive firms and auto leasing financial institutions will now be able “to benefit from the technology.”
Additionally, the round will be funding the application of Tractable’s tech to assess different homes and properties. As stated in the release, working cooperatively with a global insurer based in Japan, Tractable will help homeowners “recover faster from a typhoon by allowing them to submit photos and obtain an AI-accelerated claim payout.”
Tractable reports more than 600% revenue growth during the last 2 years, “in part through attracting new customers such as GEICO, the second-largest auto insurer in the US.” Other clients include Tokio Marine Nichido, Mitsui Sumitomo, Aioi Nissay Dowa and Sompo Japan, the four largest P&C insurers in Japan; Covéa, the largest auto insurer in France; Admiral Seguros, the Spanish entity of UK leader Admiral Group; and Ageas, a top UK insurer.
Alex Dalyac, CEO and founder of Tractable, stated:
“Six years ago we founded Tractable to bring the AI breakthrough in image classification to the real world. We cracked how to assess cars, helping over a million people recover from accidents, and helping recycle cars that couldn’t be repaired. We’ve turned $55M raised until now into $1B of valuation. And yet, there are other image recognition tasks out there, and more AI breakthroughs to come. Next up for us is homes.”
Lonne Jaffe, MD at Insight Partners and Tractable Board member, remarked:
“Tractable’s accelerating growth at scale is a testament to the power and differentiation of their applied machine learning system, which continues to improve as more businesses adopt it. We’re excited to double down on our partnership with Tractable as they work to help the world recover faster from accidents and disasters that affect hundreds of millions of lives.”
Emily Walsh, Partner at Georgian Partners, added:
“Tractable’s industry-leading computer vision capabilities are continuing to fuel incredible customer ROI and growth for the firm. We’re excited to continue to partner with Tractable as they apply their artificial intelligence capabilities to new, multi-billion dollar market opportunities in the used vehicle and natural disaster recovery industries.”
Each of These Microscopic Glass Beads Stores an Image Encoded on a Strand of DNA
Increasingly, civilization’s information is stored digitally, and that storage is abundant and growing. We don’t bother deleting those seven high-definition videos of the ceiling or 20 blurry photos of a table corner taken by our kid. There’s plenty of room on a smartphone or in the cloud, and we count on both increasing every year.
As we fluidly copy information from device to device, this situation seems durable. But that’s not necessarily true.
The amount of data we create is increasing rapidly. And if we (apocalyptically) lost the ability to produce digital storage devices—hard drives or magnetic tape, for example—our civilization’s collective digital record would begin to sprout holes within years. In decades, it’d become all but unreadable. Digital storage isn’t like books or stone tablets. It has a shorter expiration date. And, although we take storage for granted, it’s still expensive and energy hungry.
Which is why researchers are looking for new ways to archive information. And DNA, life’s very own “hard drive,” may be one solution. DNA offers incredibly dense data storage, and under the right conditions, it can keep information intact for millennia.
In recent years, scientists have advanced DNA data storage. They’ve shown how we can encode individual books, photographs, and even GIFs in DNA and then retrieve them. But there hasn’t been a scalable way to organize and retrieve large collections of DNA files. Until now, that is.
In a new Nature Materials paper, a team from MIT and Harvard’s Broad Institute describe a DNA-based storage system that allows them to search for and pull individual files—in this case images encoded in DNA. It’s a bit like thumbing through your file cabinet, reading the paper tabs to identify a folder, and then pulling the deed to your car from it. Only, obviously, the details are bit more complicated.
“We need new solutions for storing these massive amounts of data that the world is accumulating, especially the archival data,” said Mark Bathe, an MIT professor of biological engineering and senior author of the paper. “DNA is a thousandfold denser than even flash memory, and another property that’s interesting is that once you make the DNA polymer, it doesn’t consume any energy. You can write the DNA and then store it forever.”
How to Organize a DNA Storage System
How does one encode an image in a strand of DNA, anyway? It’s a fairly simple matter of translation.
Each pixel of a digital image is encoded in bits. These bits are represented by 1s and 0s. To convert it into DNA, scientists assign each of these bits to the DNA’s four base molecules, or nucleotides, adenine, cytosine, guanine, and thymine—usually referred to in shorthand by the letters A, C, G, and T. The DNA bases A and G, for example, could represent 1, and C and T could represent 0.
Next, researchers string together (or synthesize) a chain of DNA bases representing each and every bit of information in the original file. To retrieve the image, researchers reverse the process, reading the sequence of DNA bases (or sequencing it) and translating the data back into bits.
The standard retrieval process has a few drawbacks, however.
Researchers use a technique called a polymerase chain reaction (PCR) to pull files. Each strand of DNA includes an identifying sequence that matches a short sequence of nucleotides called a PCR primer. When the primer is added to the DNA solution, it bonds with matching DNA strands—the ones we want to read—and only those sequences are amplified (that is, copied for sequencing). The problem? Primers can interact with off-target sequences. Worse, the process uses enzymes that chew up all the DNA.
“You’re kind of burning the haystack to find the needle, because all the other DNA is not getting amplified and you’re basically throwing it away,” said Bathe.
To get around this, the Broad Institute team encapsulated the DNA strands in microscopic (6-micron) glass beads. They affixed short, single-stranded DNA labels to the surface of each bead. Like file names, the labels describe the bead’s contents. A tiger image might be labeled “orange,” “cat,” “wild.” A house cat might be labeled “orange,” “cat,” “domestic.” With just four labels per bead, you could uniquely label 1020 DNA files.
The team can retrieve specific files by adding complementary nucleotide sequences, or primers, corresponding to an individual file’s label. The primers contain fluorescent molecules, and when they link up with a complementary strand—that is, the searched-for label—they form a double helix and glow. Machines separate out the glowing beads, which are opened and the DNA inside sequenced. The rest of the DNA files remain untouched, left in peace to guard their information.
The best part of the method is its scalability. You could, in theory, have a huge DNA library stored in a test tube—Bathe notes a coffee mug of DNA could store all the world’s data—but without an easy way to search and retrieve the exact file you’re looking for, it’s worthless. With this method, everything can be retrieved.
George Church, a Harvard professor of genetics and well-known figure in the field of synthetic biology, called it a “giant leap” for the field.
“The rapid progress in writing, copying, reading, and low-energy archival data storage in DNA form has left poorly explored opportunities for precise retrieval of data files from huge…databases,” he said. “The new study spectacularly addresses this using a completely independent outer layer of DNA and leveraging different properties of DNA (hybridization rather than sequencing), and moreover, using existing instruments and chemistries.”
This Isn’t Coming For Your Computer
To be clear, all DNA data storage, including the work outlined in this study, remains firmly in the research phase. Don’t expect DNA hard drives for your laptop anytime soon.
Synthesizing DNA is still extremely expensive. It’d cost something like $1 trillion dollars to write a petabyte of data in DNA. To match magnetic tape, a common method of archival data storage, Bathe estimates synthesis costs would have to fall six orders of magnitude. Also, this isn’t the speediest technique (to put it mildly).
The cost of DNA synthesis will fall—the technology is being advanced in other areas as well—and with more work, the speed will improve. But the latter may be beside the point. That is, if we’re mainly concerned with backing up essential data for the long term with minimal energy requirements and no need to regularly access it, then speed is less important than fidelity, data density, and durability.
DNA already stores the living world’s information, now, it seems, it can do the same for all things digital too.
Image Credit: Courtesy of the researchers (via MIT News).
World of Warcraft 9.1 Release Date: When is it?
Biocides Market worth $13.6 billion by 2026 – Exclusive Report by MarketsandMarkets™
Select Smart Genshin Impact: How to Make the Personality Quiz Work
Here are the patch notes for Brawl Stars’ Jurassic Splash update
Former PayPal Employees Launch Cross-Border Payment System
PancakeSwap (CAKE) Price Prediction 2021-2025: Will CAKE Hit $60 by 2021?
Here are the patch notes for Call of Duty: Warzone’s season 4 update
How to complete Path to Glory Update SBC in FIFA 21 Ultimate Team
XCMG dostarcza ponad 100 sztuk żurawi dostosowanych do regionu geograficznego dla międzynarodowych klientów
Bitmain Released New Mining Machines For DOGE And LTC
Will Jeff Bezos & Kim Kardashian Take “SAFEMOON to the Moon”?
MUCK: Best Seeds To Get Great Loot Instantly | Seeds List
How to Get the Valorant ‘Give Back’ Skin Bundle
How to unlock the MG 82 and C58 in Call of Duty: Black Ops Cold War season 4
Digital Renminbi and Cash Exchange Service ATMs Launch in Beijing
Southwest celebrates 50 Years with a new “Freedom One” logo jet on N500WR
How to unlock the Call of Duty: Black Ops Cold War season 4 battle pass
CUHK Pairs with ConsenSys To Launch Blockchain-based Covid Digital Health Passport
Bitcoin isn’t as Anonymous as People Think it is: Cornell Economist
TU Delft unveils databank to predict future of composite aerostructures
Energy7 days ago
Extensive Demand from the Personal Care and Cosmetics Industry Coupled with the Booming Construction Industry will Invite Impactful Growth for the Mineral Oil & Mineral Spirit Market: TMR
Esports4 days ago
World of Warcraft 9.1 Release Date: When is it?
Techcrunch1 week ago
This Week in Apps: WWDC 21 highlights, Instagram Creator Week recap, Android 12 beta 2 arrives
Energy1 week ago
Recon Updates Progress on its Technology-Driven Solutions for Electric Submersible Progressing Cavity Pump with $5 Million Orders Secured
Esports1 week ago
Genshin Impact Music Rock Puzzle Guide
Aviation1 week ago
What Happened To Thomas Cook’s Fleet?
Blockchain1 week ago
[Tagalog] Axie Infinity Scholarship Philippines Guide
Energy4 days ago
Biocides Market worth $13.6 billion by 2026 – Exclusive Report by MarketsandMarkets™
Esports1 day ago
Select Smart Genshin Impact: How to Make the Personality Quiz Work
Blockchain1 week ago
Crypto Fund Manager Says Bitcoin ETFs to be Approved By 2022
PR Newswire1 week ago
European Hematology Association: Humoral Response to the Pfizer/BioNTech BNT162b2 Vaccine Is Impaired in Patients Receiving CAR-T or High-Intensity Immunosuppressive Therapy
Esports1 week ago
Lost Ark Founders Pack: Everything You Need to Know