Customer behaviors changed radically during the pandemic lockdown which naturally changed the payments industry as well.
To avoid fraud, the financial industry has always had specific checks and balances in place to identify suspicious behavior or chart changes in customer behaviors. These checks and balances, which were based on logic and training to identify patterns, worked well for the most part until a pandemic that was based on anything but logic and patterns had customers who never used online banking or purchased goods and services through a digital wallet suddenly doing so repeatedly. And more customers online meant more fraud.
“The patterns and models that were always used to determine fraud are unfortunately, no longer relevant,” James Heinzman, EVP of financial service solutions for ThetaRay, said in an interview with ATM Marketplace. “Banks are now generating 400-600 times the volume of alerts than they were before the pandemic and the models are identifying the new behaviors as suspicious, instead of recognizing that the fundamental behavior of the market has changed.”
Heinzman explained because the industry never saw such radical changes before, the models that were used pre-pandemic have to be rewritten, retrained and re-calibrated to be in any way meaningful going forward.
“This is a very expensive and resource-consuming effort. Does it really make sense to rewrite legacy technology programs? We believe that unsupervised machine learning is a better solution to the problem. Advanced artificial intelligence solutions solves this issue because it’s data-driven and automatically tunes to the new normal,” Heinzman said. “It accurately distinguishes between wholesale market changes and real suspicious behaviors. Because there are no predefined rules or models, it can make connections and identify patterns even if there’s been no prior example.”
Q: If anti-money laundering and anti-fraud detection models been rendered nearly useless by COVID-19 what should a bank do?
A: The first thing [a bank] needs to do is take a fresh look at their AML programs. Check out the people, process and technology. They need to acknowledge that the world has changed forever; the virus isn’t going away. It’s time for banks to really adopt new technologies and modernize their infrastructure. A lot of banking was based on relationships —human interaction. That’s changed dramatically. The banking of the future will be conducted primarily through web and digital channels. AML programs need to adapt to meet this change and modernize the technology.
Q: How are banks leaving themselves open to an increase in cybercrime?
A:The bank workforce is now working from home, security systems weren’t designed for that, and new vulnerabilities have arisen. It’s also forcing the banks to reveal sensitive information to these remote workers on their personal devices, thus exposing sensitive data. Banks need to adopt more modern approaches to cybercrime and look not just at perimeter defenses, but at business activity as well, in order to identify and mitigate these breaches that go undetected.
Q: What are the top cybercrimes that financial institutions have to watch out for?
A:There are a lot of attacks we’ve never seen before hitting these banks, and they’re only getting more sophisticated. And now thanks to the Financial Crimes Enforcement Network breach, they’re better informed. Phishing and spear-phishing techniques are becoming more effective thanks to the remote workforce; the chances of clicking on a suspicious link increase when employees aren’t in the office and supervising them proves to be difficult. Additionally, the fear of job displacement causes employees to do things they normally wouldn’t do, which puts them at a higher risk for falling for this type of attack. The greatest threats in the post COVID era may well come from within the banks themselves.
Q: Can you explain the new intuitive type of AI?
A: Artificial Intuition enables computers to identify threats and opportunities without following a predefined model based on human experience or past events and being told what to look for. Just as human intuition allows us to make decisions without specifically being instructed on how to do so. Essentially, artificial intuition evaluates all the data points and how they are connected. It can create a dynamic view on what is happening and how everything is connected to everything else. In this way, it can mimic how the human brain processes information and makes decisions about what is unusual.
Consider what happens when you see a person approaching you. Most people don’t take out a ruler and measure the distance between their ears, or how far apart their eyes are or the length of their chin. What they do is very quickly look at all of the information about this person. All the connections of data points and compare it to a memory. Is this someone I have seen before? Are they a friend? A threat? etc. In the same way, artificial intuition can make connections between data points that separately seem normal, but when looked at together arouse suspicion.
Q: Why is it referred to as artificial intuition and not intelligence?
A: When people hear the term “artificial intelligence,” they tend to picture these extremely logic-based systems that follow rule-based programming. Yes, it’s true that most real-world AI applications aren’t HAL9000, but they still follow a set of rules they were trained on; “if X happens then Y will happen.” Artificial intuition moves away from this rules-based approach and allows the system to act on its own using sophisticated algorithms to make connections between data points. It doesn’t need to be taught every possible scenario, and let’s face it, that would be impossible since it would require human beings to be aware and understand an infinite number of possible situations. It can go beyond human intelligence and automatically make inferences about big data that mimic a human’s intuition, if they were able to process it.
Q: What are the benefits of this technology?
A: Financial institutions have increasingly begun to adopt artificial intuition to detect new and sophisticated financial cybercrime schemes, including correspondent banking, cross border transactions, trade finance, money laundering, fraud and ATM hacking. These crimes are usually concealed between thousands upon thousands of transactions that have their own set of connected parameters. By combining sophisticated, patented mathematical algorithms and user-friendly technology interfaces, artificial intuition automatically and accurately identifies real suspicious activity and presents it to analysts in an easily understandable format with full transparency, and expandability. It also provides all the tools and forensic data together in a single interface to investigate and resolve issues identified. It increases efficiency and reduces risk.
Q: What are the challenges?
A: The main challenges are not really with the technology per se, but they are more related to the paradigm shift the technology represents. This is new and it’s different. It represents change that can be scary for some and hard to embrace for others. It can also be perceived as a threat to internal stakeholders who prefer to build and create technology solutions in house. These challenges can be dangerous for a bank and can inhibit their ability to find the best solutions to the current crises.
Other challenges are not unique to this technology, but they do merit a mention here. Big data is still a big issue for banks. They have gotten better at having a cohesive data strategy and bringing in big data technologies, but they still have a way to go. Data acquisition, data quality, and the ETL process are still challenges for banks that inhibit their ability to rapidly deploy new technology.
Q: How does AI “sense” changes?
A: A system using artificial intuition senses changes by applying a qualitative model to the data it’s analyzing, rather than the traditional quantitative model that rules-based AI solutions use. From there, artificial intuition analyzes the dataset and develops a contextual language that represents the overall configuration of the data. It’s able to understand the “big picture” of what lies in front of it, rather than just the individual data points.
For instance, even if X, Y and Z data points look completely normal on their own, an artificial intuition-based model would identify that, when analyzed together, something just doesn’t add up. The system would flag that. It also automatically identified “drift” in the data. As business operations and market conditions change, so does the data being presented to the system. By automatically identifying these drifts, the system can tune itself to the “new normal”. In this way, it can continuously identify only those cases that are truly suspicious, future-proofing the bank’s defenses and providing effective and efficient detection.
Q: Is it able to decrease alerts and separate the threats from just basic behavioral shifts?
A:Yes. In fact, the pandemic is a great example of this. As a result of COVID-19, banking habits of consumers have changed dramatically. In-person banking is not happening on the scale it did before COVID, cash payments have dropped dramatically (no one wants to touch it for fear of catching the virus), credit and debit card usage has spiked and online and mobile banking have exploded.
These macro changes in consumer behavior are automatically identified as “drift” in the data by the system and the technology relearns the “new normal” across the entire dataset. By understanding these macro changes and accounting for them, the system can continue to produce high quality, low volumes of alerts. Banks are reporting that they have decreased the total number of alerts by 30-40%, with 95% being “investigation worthy,” while simultaneously reducing the overall investigation time per alert by as much as 50%. When you compare this with the spike of 400-600 times more alerts and 90+ percent false positives that banks are experiencing with their current systems, it speaks volumes.
Q: How does your company’s technology work to combat fraud?
A: ThetaRay technology is designed to deliver end-to-end solutions from data integration through alert disposition and reporting. With a combination of market-leading big data technologies, microservices, API based architecture and cloud-ready capabilities the technology can be deployed on-premise or in the cloud. With full-stack detection capabilities, the system can bring together different detection capabilities in a single solutions. The user interface is designed by investigation experts for investigation experts and to provide a high quality, efficient user experience.
Q: Are there limitations?
A:The core technology is based on the human brain and really is limited only by the imagination of those who would use it. The technology runs on standard hardware and big data infrastructure, so it can scale up and scale out as necessary. It is designed to seamlessly integrate with existing systems and can immediately enhance and augment current systems and bring effectiveness through better identification of real results and efficiency through reduction of alert volumes and false positives. Over time, the system could either be integrated into the technology ecosystem at a bank, or it could be transitioned as a replacement of a legacy system.
Q: What does the future look like for financial institutions in your opinion post-pandemic?
A:The pandemic has accelerated a lot of curves. By that I mean the adoption of digital banking, the movement towards a checkless and cashless society, how consumers interact with their financial services providers and the real-time, self-directed approach to banking that younger generations are demanding will continue to fundamentally change the banking industry.
Fintechs that provide bank-like services, are gaining market share and pressuring banks to innovate. I think this is a difficult time for banks as they struggle to adjust and compete in the post-pandemic world. This change was in process well before the pandemic, but it has dramatically accelerated because of it. I think it will be a bit rough for banks over the next few years. With interest rates near zero (in some countries they are negative), profit centers and revenue models will be reevaluated, innovative new products will be developed, services and delivery methods will be created, and in the end, the financial sector will emerge from it better, stronger and more resilient than ever before. I see this period a bit like growing pains. It is going to be hard, even painful for a period of time, but the end result will be worth it.
Q: What is the most important thing a financial institution can do to protect itself against fraud?
A: Forget about the past. Whatever was working before the pandemic will not work anymore. Embrace new ideas, technologies, and methods. Don’t be afraid to innovate. Try new things and “fail fast.” The longer you wait, the worse it will be. Don’t be paralyzed by analysis and inaction. Take a fresh look at problems and be creative in the approach.
The post Fighting fraud in the future: What banks need to do appeared first on Fintech News.
Deep Learning vs Machine Learning: How an Emerging Field Influences Traditional Computer Programming
When two different concepts are greatly intertwined, it can be difficult to separate them as distinct academic topics. That might explain why it’s so difficult to separate deep learning from machine learning as a whole. Considering the current push for both automation as well as instant gratification, a great deal of renewed focus has been heaped on the topic.
Everything from automated manufacturing worfklows to personalized digital medicine could potentially grow to rely on deep learning technology. Defining the exact aspects of this technical discipline that will revolutionize these industries is, however, admittedly much more difficult. Perhaps it’s best to consider deep learning in the context of a greater movement in computer science.
Defining Deep Learning as a Subset of Machine Learning
Machine learning and deep learning are essentially two sides of the same coin. Deep learning techniques are a specific discipline that belong to a much larger field that includes a large variety of trained artificially intelligent agents that can predict the correct response in an equally wide array of situations. What makes deep learning independent of all of these other techniques, however, is the fact that it focuses almost exclusively on teaching agents to accomplish a specific goal by learning the best possible action in a number of virtual environments.
Traditional machine learning algorithms usually teach artificial nodes how to respond to stimuli by rote memorization. This is somewhat similar to human teaching techniques that consist of simple repetition, and therefore might be thought of the computerized equivalent of a student running through times tables until they can recite them. While this is effective in a way, artificially intelligent agents educated in such a manner may not be able to respond to any stimulus outside of the realm of their original design specifications.
That’s why deep learning specialists have developed alternative algorithms that are considered to be somewhat superior to this method, though they are admittedly far more hardware intensive in many ways. Subrountines used by deep learning agents may be based around generative adversarial networks, convolutional neural node structures or a practical form of restricted Boltzmann machine. These stand in sharp contrast to the binary trees and linked lists used by conventional machine learning firmware as well as a majority of modern file systems.
Self-organizing maps have also widely been in deep learning, though their applications in other AI research fields have typically been much less promising. When it comes to defining the deep learning vs machine learning debate, however, it’s highly likely that technicians will be looking more for practical applications than for theoretical academic discussion in the coming months. Suffice it to say that machine learning encompasses everything from the simplest AI to the most sophisticated predictive algorithms while deep learning constitutes a more selective subset of these techniques.
Practical Applications of Deep Learning Technology
Depending on how a particular program is authored, deep learning techniques could be deployed along supervised or semi-supervised neural networks. Theoretically, it’d also be possible to do so via a completely unsupervised node layout, and it’s this technique that has quickly become the most promising. Unsupervised networks may be useful for medical image analysis, since this application often presents unique pieces of graphical information to a computer program that have to be tested against known inputs.
Traditional binary tree or blockchain-based learning systems have struggled to identify the same patterns in dramatically different scenarios, because the information remains hidden in a structure that would have otherwise been designed to present data effectively. It’s essentially a natural form of steganography, and it has confounded computer algorithms in the healthcare industry. However, this new type of unsupervised learning node could virtually educate itself on how to match these patterns even in a data structure that isn’t organized along the normal lines that a computer would expect it to be.
Others have proposed implementing semi-supervised artificially intelligent marketing agents that could eliminate much of the concern over ethics regarding existing deal-closing software. Instead of trying to reach as large a customer base as possible, these tools would calculate the odds of any given individual needing a product at a given time. In order to do so, it would need certain types of information provided by the organization that it works on behalf of, but it would eventually be able to predict all further actions on its own.
While some companies are currently relying on tools that utilize traditional machine learning technology to achieve the same goals, these are often wrought with privacy and ethical concerns. The advent of deep structured learning algorithms have enabled software engineers to come up with new systems that don’t suffer from these drawbacks.
Developing a Private Automated Learning Environment
Conventional machine learning programs often run into serious privacy concerns because of the fact that they need a huge amount of input in order to draw any usable conclusions. Deep learning image recognition software works by processing a smaller subset of inputs, thus ensuring that it doesn’t need as much information to do its job. This is of particular importance for those who are concerned about the possibility of consumer data leaks.
Considering new regulatory stances on many of these issues, it’s also quickly become something that’s become important from a compliance standpoint as well. As toxicology labs begin using bioactivity-focused deep structured learning packages, it’s likely that regulators will express additional concerns in regards to the amount of information needed to perform any given task with this kind of sensitive data. Computer scientists have had to scale back what some have called a veritable fire hose of bytes that tell more of a story than most would be comfortable with.
In a way, these developments hearken back to an earlier time when it was believed that each process in a system should only have the amount of privileges necessary to complete its job. As machine learning engineers embrace this paradigm, it’s highly likely that future developments will be considerably more secure simply because they don’t require the massive amount of data mining necessary to power today’s existing operations.
Image Credit: toptal.io
Extra Crunch roundup: Tonal EC-1, Deliveroo’s rocky IPO, is Substack really worth $650M?
For this morning’s column, Alex Wilhelm looked back on the last few months, “a busy season for technology exits” that followed a hot Q4 2020.
We’re seeing signs of an IPO market that may be cooling, but even so, “there are sufficient SPACs to take the entire recent Y Combinator class public,” he notes.
Once we factor in private equity firms with pockets full of money, it’s evident that late-stage companies have three solid choices for leveling up.
Seeking more insight into these liquidity options, Alex interviewed:
- DigitalOcean CEO Yancey Spruill, whose company went public via IPO;
- Latch CFO Garth Mitchell, who discussed his startup’s merger with real estate SPAC $TSIA;
- Brian Cruver, founder and CEO of AlertMedia, which recently sold to a private equity firm.
After recapping their deals, each executive explains how their company determined which flashing red “EXIT” sign to follow. As Alex observed, “choosing which option is best from a buffet’s worth of possibilities is an interesting task.”
Thanks very much for reading Extra Crunch! Have a great weekend.
Senior Editor, TechCrunch
Full Extra Crunch articles are only available to members
Use discount code ECFriday to save 20% off a one- or two-year subscription
The Tonal EC-1
On Tuesday, we published a four-part series on Tonal, a home fitness startup that has raised $200 million since it launched in 2018. The company’s patented hardware combines digital weights, coaching and AI in a wall-mounted system that sells for $2,995.
By any measure, it is poised for success — sales increased 800% between December 2019 and 2020, and by the end of this year, the company will have 60 retail locations. On Wednesday, Tonal reported a $250 million Series E that valued the company at $1.6 billion.
Our deep dive examines Tonal’s origins, product development timeline, its go-to-market strategy and other aspects that combined to spark investor interest and customer delight.
We call this format the “EC-1,” since these stories are as comprehensive and illuminating as the S-1 forms startups must file with the SEC before going public.
Here’s how the Tonal EC-1 breaks down:
We have more EC-1s in the works about other late-stage startups that are doing big things well and making news in the process.
What to make of Deliveroo’s rough IPO debut
Why did Deliveroo struggle when it began to trade? Is it suffering from cultural dissonance between its high-growth model and more conservative European investors?
Let’s peek at the numbers and find out.
Kaltura puts debut on hold. Is the tech IPO window closing?
The Exchange doubts many folks expected the IPO climate to get so chilly without warning. But we could be in for a Q2 pause in the formerly scorching climate for tech debuts.
Is Substack really worth $650M?
A $65 million Series B is remarkable, even by 2021 standards. But the fact that a16z is pouring more capital into the alt-media space is not a surprise.
Substack is a place where publications have bled some well-known talent, shifting the center of gravity in media. Let’s take a look at Substack’s historical growth.
RPA market surges as investors, vendors capitalize on pandemic-driven tech shift
Robotic process automation came to the fore during the pandemic as companies took steps to digitally transform. When employees couldn’t be in the same office together, it became crucial to cobble together more automated workflows that required fewer people in the loop.
RPA has enabled executives to provide a level of automation that essentially buys them time to update systems to more modern approaches while reducing the large number of mundane manual tasks that are part of every industry’s workflow.
E-commerce roll-ups are the next wave of disruption in consumer packaged goods
This year is all about the roll-ups, the aggregation of smaller companies into larger firms, creating a potentially compelling path for equity value. The interest in creating value through e-commerce brands is particularly striking.
Just a year ago, digitally native brands had fallen out of favor with venture capitalists after so many failed to create venture-scale returns. So what’s the roll-up hype about?
Hack takes: A CISO and a hacker detail how they’d respond to the Exchange breach
The cyber world has entered a new era in which attacks are becoming more frequent and happening on a larger scale than ever before. Massive hacks affecting thousands of high-level American companies and agencies have dominated the news recently. Chief among these are the December SolarWinds/FireEye breach and the more recent Microsoft Exchange server breach.
Everyone wants to know: If you’ve been hit with the Exchange breach, what should you do?
5 machine learning essentials nontechnical leaders need to understand
Machine learning has become the foundation of business and growth acceleration because of the incredible pace of change and development in this space.
But for engineering and team leaders without an ML background, this can also feel overwhelming and intimidating.
Here are best practices and must-know components broken down into five practical and easily applicable lessons.
Embedded procurement will make every company its own marketplace
Embedded procurement is the natural evolution of embedded fintech.
In this next wave, businesses will buy things they need through vertical B2B apps, rather than through sales reps, distributors or an individual merchant’s website.
Knowing when your startup should go all-in on business development
There’s a persistent fallacy swirling around that any startup growing pain or scaling problem can be solved with business development.
That’s frankly not true.
Dear Sophie: What should I know about prenups and getting a green card through marriage?
I’m a founder of a startup on an E-2 investor visa and just got engaged! My soon-to-be spouse will sponsor me for a green card.
Are there any minimum salary requirements for her to sponsor me? Is there anything I should keep in mind before starting the green card process?
— Betrothed in Belmont
Startups must curb bureaucracy to ensure agile data governance
Many organizations perceive data management as being akin to data governance, where responsibilities are centered around establishing controls and audit procedures, and things are viewed from a defensive lens.
That defensiveness is admittedly justified, particularly given the potential financial and reputational damages caused by data mismanagement and leakage.
Nonetheless, there’s an element of myopia here, and being excessively cautious can prevent organizations from realizing the benefits of data-driven collaboration, particularly when it comes to software and product development.
Bring CISOs into the C-suite to bake cybersecurity into company culture
Cyber strategy and company strategy are inextricably linked. Consequently, chief information security officers in the C-Suite will be just as common and influential as CFOs in maximizing shareholder value.
How is edtech spending its extra capital?
Edtech unicorns have boatloads of cash to spend following the capital boost to the sector in 2020. As a result, edtech M&A activity has continued to swell.
The idea of a well-capitalized startup buying competitors to complement its core business is nothing new, but exits in this sector are notable because the money used to buy startups can be seen as an effect of the pandemic’s impact on remote education.
But in the past week, the consolidation environment made a clear statement: Pandemic-proven startups are scooping up talent — and fast.
Tech in Mexico: A confluence of Latin America, the US and Asia
Knowledge transfer is not the only trend flowing in the U.S.-Asia-LatAm nexus. Competition is afoot as well.
Because of similar market conditions, Asian tech giants are directly expanding into Mexico and other LatAm countries.
How we improved net retention by 30+ points in 2 quarters
There’s certainly no shortage of SaaS performance metrics leaders focus on, but NRR (net revenue retention) is without question the most underrated metric out there.
NRR is simply total revenue minus any revenue churn plus any revenue expansion from upgrades, cross-sells or upsells. The greater the NRR, the quicker companies can scale.
5 mistakes creators make building new games on Roblox
Even the most experienced and talented game designers from the mobile F2P business usually fail to understand what features matter to Robloxians.
For those just starting their journey in Roblox game development, these are the most common mistakes gaming professionals make on Roblox.
CEO Manish Chandra, investor Navin Chaddha explain why Poshmark’s Series A deck sings
“Lead with love, and the money comes.” It’s one of the cornerstone values at Poshmark. On the latest episode of Extra Crunch Live, Chandra and Chaddha sat down with us and walked us through their original Series A pitch deck.
Will the pandemic spur a smart rebirth for cities?
Cities are bustling hubs where people live, work and play. When the pandemic hit, some people fled major metropolitan markets for smaller towns — raising questions about the future validity of cities.
But those who predicted that COVID-19 would destroy major urban communities might want to stop shorting the resilience of these municipalities and start going long on what the post-pandemic future looks like.
The NFT craze will be a boon for lawyers
There’s plenty of uncertainty surrounding copyright issues, fraud and adult content, and legal implications are the crux of the NFT trend.
Whether a court would protect the receipt-holder’s ownership over a given file depends on a variety of factors. All of these concerns mean artists may need to lawyer up.
Viewing Cazoo’s proposed SPAC debut through Carvana’s windshield
It’s a reasonable question: Why would anyone pay that much for Cazoo today if Carvana is more profitable and whatnot? Well, growth. That’s the argument anyway.
The AI Trends Reshaping Health Care
Click to learn more about author Ben Lorica.
Applications of AI in health care present a number of challenges and considerations that differ substantially from other industries. Despite this, it has also been one of the leaders in putting AI to work, taking advantage of the cutting-edge technology to improve care. The numbers speak for themselves: The global AI in health care market size is expected to grow from $4.9 billion in 2020 to $45.2 billion by 2026. Some major factors driving this growth are the sheer volume of health care data and growing complexities of datasets, the need to reduce mounting health care costs, and evolving patient needs.
Deep learning, for example, has made considerable inroads into the clinical environment over the last few years. Computer vision, in particular, has proven its value in medical imaging to assist in screening and diagnosis. Natural language processing (NLP) has provided significant value in addressing both contractual and regulatory concerns with text mining and data sharing. Increasing adoption of AI technology by pharmaceutical and biotechnology companies to expedite initiatives like vaccine and drug development, as seen in the wake of COVID-19, only exemplifies AI’s massive potential.
We’re already seeing amazing strides in health care AI, but it’s still the early days, and to truly unlock its value, there’s a lot of work to be done in understanding the challenges, tools, and intended users shaping the industry. New research from John Snow Labs and Gradient Flow, 2021 AI in Healthcare Survey Report, sheds light on just this: where we are, where we’re going, and how to get there. The global survey explores the important considerations for health care organizations in varying stages of AI adoption, geographies, and technical prowess to provide an extensive look into the state of AI in health care today.
One of the most significant findings is around which technologies are top of mind when it comes to AI implementation. When asked what technologies they plan to have in place by the end of 2021, almost half of respondents cited data integration. About one-third cited natural language processing (NLP) and business intelligence (BI) among the technologies they are currently using or plan to use by the end of the year. Half of those considered technical leaders are using – or soon will be using – technologies for data integration, NLP, business intelligence, and data warehousing. This makes sense, considering these tools have the power to help make sense of huge amounts of data, while also keeping regulatory and responsible AI practices in mind.
When asked about intended users for AI tools and technologies, over half of respondents identified clinicians among their target users. This indicates that AI is being used by people tasked with delivering health care services – not just technologists and data scientists, as in years past. That number climbs even higher when evaluating mature organizations, or those that have had AI models in production for more than two years. Interestingly, nearly 60% of respondents from mature organizations also indicated that patients are also users of their AI technologies. With the advent of chatbots and telehealth, it will be interesting to see how AI proliferates for both patients and providers over the next few years.
In considering software for building AI solutions, open-source software (53%) had a slight edge over public cloud providers (42%). Looking ahead one to two years, respondents indicated openness to also using both commercial software and commercial SaaS. Open-source software gives users a level of autonomy over their data that cloud providers can’t, so it’s not a big surprise that a highly regulated industry like health care would be wary of data sharing. Similarly, the majority of companies with experience deploying AI models to production choose to validate models using their own data and monitoring tools, rather than evaluation from third parties or software vendors. While earlier-stage companies are more receptive to exploring third-party partners, more mature organizations are tending to take a more conservative approach.
Generally, attitudes remained the same when asked about key criteria used to evaluate AI solutions, software libraries or SaaS solutions, and consulting companies to work with.Although the answers varied slightly for each category,technical leaders considered no data sharing with software vendors or consulting companies, the ability to train their own models, and state-of-the art accuracy as top priorities. Health care-specific models and expertise in health care data engineering, integration, and compliance topped the list when asked about solutions and potential partners. Privacy, accuracy, and health care experience are the forces driving AI adoption. It’s clear that AI is poised for even more growth, as data continues to grow and technology and security measures improve. Health care, which can sometimes be seen as a laggard for quick adoption, is taking to AI and already seeing its significant impact. While its approach, the top tools and technologies, and applications of AI may differ from other industries, it will be exciting to see what’s in store for next year’s survey results.
Turns out humans are leading AI systems astray because we can’t agree on labeling
Top datasets used to train AI models and benchmark how the technology has progressed over time are riddled with labeling errors, a study shows.
Data is a vital resource in teaching machines how to complete specific tasks, whether that’s identifying different species of plants or automatically generating captions. Most neural networks are spoon-fed lots and lots of annotated samples before they can learn common patterns in data.
But these labels aren’t always correct; training machines using error-prone datasets can decrease their performance or accuracy. In the aforementioned study, led by MIT, analysts combed through ten popular datasets that have been cited more than 100,000 times in academic papers and found that on average 3.4 per cent of the samples are wrongly labelled.
The datasets they looked at range from photographs in ImageNet, to sounds in AudioSet, reviews scraped from Amazon, to sketches in QuickDraw. Examples of some of the mistakes compiled by the researchers show that in some cases, it’s a clear blunder, such as a drawing of a light bulb tagged as a crocodile, in others, however, it’s not always obvious. Should a picture of a bucket of baseballs be labeled as ‘baseballs’ or ‘bucket’?
Inside the 1TB ImageNet dataset used to train the world’s AI: Naked kids, drunken frat parties, porno stars, and more
Annotating each sample is laborious work. This work is often outsourced work to services like Amazon Mechanical Turk, where workers are paid the square root of sod all to sift through the data piece by piece, labeling images and audio to feed into AI systems. This process amplifies biases and errors, as Vice documented here.
Workers are pressured to agree with the status quo if they want to get paid: if a lot of them label a bucket of baseballs as a ‘bucket’, and you decide it’s ‘baseballs’, you may not be paid at all if the platform figures you’re wrong or deliberately trying to mess up the labeling. That means workers will choose the most popular label to avoid looking like they’ve made a mistake. It’s in their interest to stick to the narrative and avoid sticking out like a sore thumb. That means errors, or worse, racial biases and suchlike, snowball in these datasets.
The error rates vary across the datasets. In ImageNet, the most popular dataset used to train models for object recognition, the rate creeps up to six per cent. Considering it contains about 15 million photos, that means hundreds of thousands of labels are wrong. Some classes of images are more affected than others, for example, ‘chameleon’ is often mistaken for ‘green lizard’ and vice versa.
There are other knock-on effects: neural nets may learn to incorrectly associate features within data with certain labels. If, say, many images of the sea seem to contain boats and they keep getting tagged as ‘sea’, a machine might get confused and be more likely to incorrectly recognize boats as seas.
Problems don’t just arise when trying to compare the performance of models using these noisy datasets. The risks are higher if these systems are deployed in the real world, Curtis Northcutt, co-lead author of the stud and a PhD student at MIT, and also cofounder and CTO of ChipBrain, a machine-learning hardware startup, explained to The Register.
“Imagine a self-driving car that uses an AI model to make steering decisions at intersections,” he said. “What would happen if a self-driving car is trained on a dataset with frequent label errors that mislabel a three-way intersection as a four-way intersection? The answer: it might learn to drive off the road when it encounters three-way intersections.
What would happen if a self-driving car is trained on a dataset with frequent label errors that mislabel a three-way intersection as a four-way intersection?
“Maybe one of your AI self-driving models is actually more robust to training noise, so that it doesn’t drive off the road as much. You’ll never know this if your test set is too noisy because your test set labels won’t match reality. This means you can’t properly gauge which of your auto-pilot AI models drives best – at least not until you deploy the car out in the real-world, where it might drive off the road.”
When the team working on the study trained some convolutional neural networks on portions of ImageNet that have been cleared of errors, their performance improved. The boffins believe that developers should think twice about training large models on datasets that have high error rates, and advise them to sort through the samples first. Cleanlab, the software the team developed and used to identify incorrect and inconsistent labels, can be found on GitHub.
“Cleanlab is an open-source python package for machine learning with noisy labels,” said Northcutt. “Cleanlab works by implementing all of the theory and algorithms in the sub-field of machine learning called confident learning, invented at MIT. I built cleanlab to allow other researchers to use confident learning – usually with just a few lines of code – but more importantly, to advance the progress of science in machine learning with noisy labels and to provide a framework for new researchers to get started easily.”
And be aware that if a dataset’s labels are particularly shoddy, training large complex neural networks may not always be so advantageous. Larger models tend to overfit to data more than smaller ones.
“Sometimes using smaller models will work for very noisy datasets. However, instead of always defaulting to using smaller models for very noisy datasets, I think the main takeaway is that machine learning engineers should clean and correct their test sets before they benchmark their models,” Northcutt concluded. ®
LoL gameplay design director pulled, transferred to Riot’s MMO
How to counter Renekton in League of Legends
3 big reasons why Dota 2’s new hero Dawnbreaker is just bad
Valorant Mouse Guide: Find The Right DPI For You
Sources: floppy to replace Relyks on Cloud9 Blue’s VALORANT team
What is Booba.tv? New site spotlights sex appeal on Twitch
Levi’s Finds Partnership With NRG Esports is a Good Fit
Lakeland University Partners With Bucks Gaming for the 2021 NBA 2K League Season
KylieBitkin banned from NoPixel GTA server after xQc drama
Gamers Club and Riot Games Organize Women’s Valorant Circuit in Latin America
How to play League of Legends’ newest champion Gwen
Tarik steps down from Evil Geniuses, addresses Valo
FIFA 21: How to vote for the Team of the Season
MTG Arena Strixhaven Standard Decks to Try
Call of Duty League Stage 3 Groups Revealed
Fortnite Balance Changes and Weapon Update – April 15
Fortnite: DreamHack Postpones Open Duos Tournament For “Integrity Concerns”
Fortnite Season 6 Week 5 Challenges: Off-Road Tires, More
Deathstroke To Get A Fortnite Skin In Future Update
Astralis Responds to Promisq Telling People to Get Cancer
EG Replaces Tarik With Michu in CSGO
Ubisoft and FACEIT Expand R6 Amateur Competitive Platform to Brazil
Manchester United manager Ole Solskjaer invests in esports
NSA-Whistleblower Edward Snowden versteigert seinen ersten NFT „Stay Free“
Deal of the week: save up to 26% on this Elgato Wave:3 streaming mic
Tencent Esports Partners With Hubei Province to Host CrossFire Franchise League in Wuhan
Relog Media announces LAN Sweet LAN for 2022
Rough start for Ramzes in DPC Season 2 CIS League
The State of Esports in Argentina
Ole Gunnar Solskjær joins ULTI Agency, OverActive Media raises $40m | ESI Digest #38
Esports5 days ago
chessbae removed as moderator from Chess.com amid drama
Esports1 week ago
Dota 2 Patch 7.29 Will Reveal a New Hero
Esports2 days ago
Free Fire World Series APK Download for Android
Esports4 days ago
DreamHack Online Open Ft. Fortnite April Edition – How To Register, Format, Dates, Prize Pool & More
Fintech1 week ago
Novatti’s Ripple partnership live to The Philippines
Blockchain1 week ago
Bitcoin Preis Update: BTC fällt unter 59.500 USD
Esports7 days ago
Dota 2 Dawnbreaker Hero Guide
Cyber Security1 week ago
Fintechs are ransomware targets. Here are 9 ways to prevent it.
Blockchain1 week ago
Krypto-News Roundup 8. April
Blockchain1 week ago
Ripple Klage: CEO zeigt sich nach Anhörung positiv
Blockchain1 week ago
Unternehmen gründen Crypto Council: Fidelity und Coinbase mit dabei
Esports4 days ago
Hikaru Nakamura drops chessbae, apologizes for YouTube strike