Connect with us

Artificial Intelligence

Deep Science: AI adventures in arts and letters

Avatar

Published

on

There’s more AI news out there than anyone can possibly keep up with. But you can stay tolerably up to date on the most interesting developments with this column, which collects AI and machine learning advancements from around the world and explains why they might be important to tech, startups or civilization.

To begin on a lighthearted note: The ways researchers find to apply machine learning to the arts are always interesting — though not always practical. A team from the University of Washington wanted to see if a computer vision system could learn to tell what is being played on a piano just from an overhead view of the keys and the player’s hands.

Audeo, the system trained by Eli Shlizerman, Kun Su and Xiulong Liu, watches video of piano playing and first extracts a piano-roll-like simple sequence of key presses. Then it adds expression in the form of length and strength of the presses, and lastly polishes it up for input into a MIDI synthesizer for output. The results are a little loose but definitely recognizable.

Diagram showing how video of a piano player's hands on the keys is turned into MIDI sequences.

Image Credits: Shlizerman, et. al

“To create music that sounds like it could be played in a musical performance was previously believed to be impossible,” said Shlizerman. “An algorithm needs to figure out the cues, or ‘features,’ in the video frames that are related to generating music, and it needs to ‘imagine’ the sound that’s happening in between the video frames. It requires a system that is both precise and imaginative. The fact that we achieved music that sounded pretty good was a surprise.”

Another from the field of arts and letters is this extremely fascinating research into computational unfolding of ancient letters too delicate to handle. The MIT team was looking at “locked” letters from the 17th century that are so intricately folded and sealed that to remove the letter and flatten it might permanently damage them. Their approach was to X-ray the letters and set a new, advanced algorithm to work deciphering the resulting imagery.

Diagram showing x-ray views of a letter and how it is analyzed to virtually unfold it.

Diagram showing X-ray views of a letter and how it is analyzed to virtually unfold it. Image Credits: MIT

“The algorithm ends up doing an impressive job at separating the layers of paper, despite their extreme thinness and tiny gaps between them, sometimes less than the resolution of the scan,” MIT’s Erik Demaine said. “We weren’t sure it would be possible.” The work may be applicable to many kinds of documents that are difficult for simple X-ray techniques to unravel. It’s a bit of a stretch to categorize this as “machine learning,” but it was too interesting not to include. Read the full paper at Nature Communications.

Diagram showing reviews of electric car charge points are analyzed and turned into useful data.

Image Credits: Asensio, et. al

You arrive at a charge point for your electric car and find it to be out of service. You might even leave a bad review online. In fact, thousands of such reviews exist and constitute a potentially very useful map for municipalities looking to expand electric vehicle infrastructure.

Georgia Tech’s Omar Asensio trained a natural language processing model on such reviews and it soon became an expert at parsing them by the thousands and squeezing out insights like where outages were common, comparative cost and other factors.

Checkout PrimeXBT
Trade with the Official CFD Partners of AC Milan
The Easiest Way to Way To Trade Crypto.
Source: https://techcrunch.com/2021/03/05/deep-science-ai-adventures-in-arts-and-letters/

Artificial Intelligence

Deep Learning vs Machine Learning: How an Emerging Field Influences Traditional Computer Programming

Avatar

Published

on

When two different concepts are greatly intertwined, it can be difficult to separate them as distinct academic topics. That might explain why it’s so difficult to separate deep learning from machine learning as a whole. Considering the current push for both automation as well as instant gratification, a great deal of renewed focus has been heaped on the topic.

Everything from automated manufacturing worfklows to personalized digital medicine could potentially grow to rely on deep learning technology. Defining the exact aspects of this technical discipline that will revolutionize these industries is, however, admittedly much more difficult. Perhaps it’s best to consider deep learning in the context of a greater movement in computer science.

Defining Deep Learning as a Subset of Machine Learning

Machine learning and deep learning are essentially two sides of the same coin. Deep learning techniques are a specific discipline that belong to a much larger field that includes a large variety of trained artificially intelligent agents that can predict the correct response in an equally wide array of situations. What makes deep learning independent of all of these other techniques, however, is the fact that it focuses almost exclusively on teaching agents to accomplish a specific goal by learning the best possible action in a number of virtual environments.

Traditional machine learning algorithms usually teach artificial nodes how to respond to stimuli by rote memorization. This is somewhat similar to human teaching techniques that consist of simple repetition, and therefore might be thought of the computerized equivalent of a student running through times tables until they can recite them. While this is effective in a way, artificially intelligent agents educated in such a manner may not be able to respond to any stimulus outside of the realm of their original design specifications.

That’s why deep learning specialists have developed alternative algorithms that are considered to be somewhat superior to this method, though they are admittedly far more hardware intensive in many ways. Subrountines used by deep learning agents may be based around generative adversarial networks, convolutional neural node structures or a practical form of restricted Boltzmann machine. These stand in sharp contrast to the binary trees and linked lists used by conventional machine learning firmware as well as a majority of modern file systems.

Self-organizing maps have also widely been in deep learning, though their applications in other AI research fields have typically been much less promising. When it comes to defining the deep learning vs machine learning debate, however, it’s highly likely that technicians will be looking more for practical applications than for theoretical academic discussion in the coming months. Suffice it to say that machine learning encompasses everything from the simplest AI to the most sophisticated predictive algorithms while deep learning constitutes a more selective subset of these techniques.

Practical Applications of Deep Learning Technology

Depending on how a particular program is authored, deep learning techniques could be deployed along supervised or semi-supervised neural networks. Theoretically, it’d also be possible to do so via a completely unsupervised node layout, and it’s this technique that has quickly become the most promising. Unsupervised networks may be useful for medical image analysis, since this application often presents unique pieces of graphical information to a computer program that have to be tested against known inputs.

Traditional binary tree or blockchain-based learning systems have struggled to identify the same patterns in dramatically different scenarios, because the information remains hidden in a structure that would have otherwise been designed to present data effectively. It’s essentially a natural form of steganography, and it has confounded computer algorithms in the healthcare industry. However, this new type of unsupervised learning node could virtually educate itself on how to match these patterns even in a data structure that isn’t organized along the normal lines that a computer would expect it to be.

Others have proposed implementing semi-supervised artificially intelligent marketing agents that could eliminate much of the concern over ethics regarding existing deal-closing software. Instead of trying to reach as large a customer base as possible, these tools would calculate the odds of any given individual needing a product at a given time. In order to do so, it would need certain types of information provided by the organization that it works on behalf of, but it would eventually be able to predict all further actions on its own.

While some companies are currently relying on tools that utilize traditional machine learning technology to achieve the same goals, these are often wrought with privacy and ethical concerns. The advent of deep structured learning algorithms have enabled software engineers to come up with new systems that don’t suffer from these drawbacks.

Developing a Private Automated Learning Environment

Conventional machine learning programs often run into serious privacy concerns because of the fact that they need a huge amount of input in order to draw any usable conclusions. Deep learning image recognition software works by processing a smaller subset of inputs, thus ensuring that it doesn’t need as much information to do its job. This is of particular importance for those who are concerned about the possibility of consumer data leaks.

Considering new regulatory stances on many of these issues, it’s also quickly become something that’s become important from a compliance standpoint as well. As toxicology labs begin using bioactivity-focused deep structured learning packages, it’s likely that regulators will express additional concerns in regards to the amount of information needed to perform any given task with this kind of sensitive data. Computer scientists have had to scale back what some have called a veritable fire hose of bytes that tell more of a story than most would be comfortable with.

In a way, these developments hearken back to an earlier time when it was believed that each process in a system should only have the amount of privileges necessary to complete its job. As machine learning engineers embrace this paradigm, it’s highly likely that future developments will be considerably more secure simply because they don’t require the massive amount of data mining necessary to power today’s existing operations.

Image Credit: toptal.io

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://datafloq.com/read/deep-learning-vs-machine-learning-how-emerging-field-influences-traditional-computer-programming/13652

Continue Reading

Artificial Intelligence

Extra Crunch roundup: Tonal EC-1, Deliveroo’s rocky IPO, is Substack really worth $650M?

Avatar

Published

on

For this morning’s column, Alex Wilhelm looked back on the last few months, “a busy season for technology exits” that followed a hot Q4 2020.

We’re seeing signs of an IPO market that may be cooling, but even so, “there are sufficient SPACs to take the entire recent Y Combinator class public,” he notes.

Once we factor in private equity firms with pockets full of money, it’s evident that late-stage companies have three solid choices for leveling up.

Seeking more insight into these liquidity options, Alex interviewed:

  • DigitalOcean CEO Yancey Spruill, whose company went public via IPO;
  • Latch CFO Garth Mitchell, who discussed his startup’s merger with real estate SPAC $TSIA;
  • Brian Cruver, founder and CEO of AlertMedia, which recently sold to a private equity firm.

After recapping their deals, each executive explains how their company determined which flashing red “EXIT” sign to follow. As Alex observed, “choosing which option is best from a buffet’s worth of possibilities is an interesting task.”

Thanks very much for reading Extra Crunch! Have a great weekend.

Walter Thompson
Senior Editor, TechCrunch
@yourprotagonist


Full Extra Crunch articles are only available to members
Use discount code ECFriday to save 20% off a one- or two-year subscription


The Tonal EC-1

Image Credits: Nigel Sussman

On Tuesday, we published a four-part series on Tonal, a home fitness startup that has raised $200 million since it launched in 2018. The company’s patented hardware combines digital weights, coaching and AI in a wall-mounted system that sells for $2,995.

By any measure, it is poised for success — sales increased 800% between December 2019 and 2020, and by the end of this year, the company will have 60 retail locations. On Wednesday, Tonal reported a $250 million Series E that valued the company at $1.6 billion.

Our deep dive examines Tonal’s origins, product development timeline, its go-to-market strategy and other aspects that combined to spark investor interest and customer delight.

We call this format the “EC-1,” since these stories are as comprehensive and illuminating as the S-1 forms startups must file with the SEC before going public.

Here’s how the Tonal EC-1 breaks down:

We have more EC-1s in the works about other late-stage startups that are doing big things well and making news in the process.

What to make of Deliveroo’s rough IPO debut

Why did Deliveroo struggle when it began to trade? Is it suffering from cultural dissonance between its high-growth model and more conservative European investors?

Let’s peek at the numbers and find out.

Kaltura puts debut on hold. Is the tech IPO window closing?

The Exchange doubts many folks expected the IPO climate to get so chilly without warning. But we could be in for a Q2 pause in the formerly scorching climate for tech debuts.

Is Substack really worth $650M?

A $65 million Series B is remarkable, even by 2021 standards. But the fact that a16z is pouring more capital into the alt-media space is not a surprise.

Substack is a place where publications have bled some well-known talent, shifting the center of gravity in media. Let’s take a look at Substack’s historical growth.

RPA market surges as investors, vendors capitalize on pandemic-driven tech shift

Business process organization and analytics. Business process visualization and representation, automated workflow system concept. Vector concept creative illustration

Image Credits: Visual Generation / Getty Images

Robotic process automation came to the fore during the pandemic as companies took steps to digitally transform. When employees couldn’t be in the same office together, it became crucial to cobble together more automated workflows that required fewer people in the loop.

RPA has enabled executives to provide a level of automation that essentially buys them time to update systems to more modern approaches while reducing the large number of mundane manual tasks that are part of every industry’s workflow.

E-commerce roll-ups are the next wave of disruption in consumer packaged goods

Elevated view of many toilet rolls on blue background

Image Credits: Javier Zayas Photography (opens in a new window) / Getty Images

This year is all about the roll-ups, the aggregation of smaller companies into larger firms, creating a potentially compelling path for equity value. The interest in creating value through e-commerce brands is particularly striking.

Just a year ago, digitally native brands had fallen out of favor with venture capitalists after so many failed to create venture-scale returns. So what’s the roll-up hype about?

Hack takes: A CISO and a hacker detail how they’d respond to the Exchange breach

3d Flat isometric vector concept of data breach, confidential data stealing, cyber attack.

Image Credits: TarikVision (opens in a new window) / Getty Images

The cyber world has entered a new era in which attacks are becoming more frequent and happening on a larger scale than ever before. Massive hacks affecting thousands of high-level American companies and agencies have dominated the news recently. Chief among these are the December SolarWinds/FireEye breach and the more recent Microsoft Exchange server breach.

Everyone wants to know: If you’ve been hit with the Exchange breach, what should you do?

5 machine learning essentials nontechnical leaders need to understand

Jumble of multicoloured wires untangling into straight lines over a white background. Cape Town, South Africa. Feb 2019.

Image Credits: David Malan (opens in a new window) / Getty Images

Machine learning has become the foundation of business and growth acceleration because of the incredible pace of change and development in this space.

But for engineering and team leaders without an ML background, this can also feel overwhelming and intimidating.

Here are best practices and must-know components broken down into five practical and easily applicable lessons.

Embedded procurement will make every company its own marketplace

Businesswomen using mobile phone analyzing data and economic growth graph chart. Technology digital marketing and network connection.

Image Credits: Busakorn Pongparnit / Getty Images

Embedded procurement is the natural evolution of embedded fintech.

In this next wave, businesses will buy things they need through vertical B2B apps, rather than through sales reps, distributors or an individual merchant’s website.

Knowing when your startup should go all-in on business development

One red line with arrow head breaking out from a business or finance growth chart canvas.

Image Credits: twomeows / Getty Images

There’s a persistent fallacy swirling around that any startup growing pain or scaling problem can be solved with business development.

That’s frankly not true.

Dear Sophie: What should I know about prenups and getting a green card through marriage?

lone figure at entrance to maze hedge that has an American flag at the center

Image Credits: Bryce Durbin/TechCrunch

Dear Sophie:

I’m a founder of a startup on an E-2 investor visa and just got engaged! My soon-to-be spouse will sponsor me for a green card.

Are there any minimum salary requirements for her to sponsor me? Is there anything I should keep in mind before starting the green card process?

— Betrothed in Belmont

Startups must curb bureaucracy to ensure agile data governance

Image of a computer, phone and clock on a desk tied in red tape.

Image Credits: RichVintage / Getty Images

Many organizations perceive data management as being akin to data governance, where responsibilities are centered around establishing controls and audit procedures, and things are viewed from a defensive lens.

That defensiveness is admittedly justified, particularly given the potential financial and reputational damages caused by data mismanagement and leakage.

Nonetheless, there’s an element of myopia here, and being excessively cautious can prevent organizations from realizing the benefits of data-driven collaboration, particularly when it comes to software and product development.

Bring CISOs into the C-suite to bake cybersecurity into company culture

Mixed race businesswoman using tablet computer in server room

Image Credits: Jetta Productions Inc (opens in a new window) / Getty Images

Cyber strategy and company strategy are inextricably linked. Consequently, chief information security officers in the C-Suite will be just as common and influential as CFOs in maximizing shareholder value.

How is edtech spending its extra capital?

Money tree: an adult hand reaches for dollar bills growing on a leafless tree

Image Credits: Tetra Images (opens in a new window) / Getty Images

Edtech unicorns have boatloads of cash to spend following the capital boost to the sector in 2020. As a result, edtech M&A activity has continued to swell.

The idea of a well-capitalized startup buying competitors to complement its core business is nothing new, but exits in this sector are notable because the money used to buy startups can be seen as an effect of the pandemic’s impact on remote education.

But in the past week, the consolidation environment made a clear statement: Pandemic-proven startups are scooping up talent — and fast.

Tech in Mexico: A confluence of Latin America, the US and Asia

Aerial view of crowd connected by lines

Image Credits: Orbon Alija (opens in a new window)/ Getty Images

Knowledge transfer is not the only trend flowing in the U.S.-Asia-LatAm nexus. Competition is afoot as well.

Because of similar market conditions, Asian tech giants are directly expanding into Mexico and other LatAm countries.

How we improved net retention by 30+ points in 2 quarters

Sparks coming off US dollar bill attached to jumper cables

Image Credits: Steven Puetzer (opens in a new window) / Getty Images

There’s certainly no shortage of SaaS performance metrics leaders focus on, but NRR (net revenue retention) is without question the most underrated metric out there.

NRR is simply total revenue minus any revenue churn plus any revenue expansion from upgrades, cross-sells or upsells. The greater the NRR, the quicker companies can scale.

5 mistakes creators make building new games on Roblox

BRAZIL - 2021/03/24: In this photo illustration a Roblox logo seen displayed on a smartphone. (Photo Illustration by Rafael Henrique/SOPA Images/LightRocket via Getty Images)

Image Credits: SOPA Images (opens in a new window) / Getty Images

Even the most experienced and talented game designers from the mobile F2P business usually fail to understand what features matter to Robloxians.

For those just starting their journey in Roblox game development, these are the most common mistakes gaming professionals make on Roblox.

CEO Manish Chandra, investor Navin Chaddha explain why Poshmark’s Series A deck sings

CEO Manish Chandra, investor Navin Chaddha explain why Poshmark’s Series A deck sings image

“Lead with love, and the money comes.” It’s one of the cornerstone values at Poshmark. On the latest episode of Extra Crunch Live, Chandra and Chaddha sat down with us and walked us through their original Series A pitch deck.

Will the pandemic spur a smart rebirth for cities?

New versus old - an old brick building reflected in windows of modern new facade

Image Credits: hopsalka (opens in a new window) / Getty Images

Cities are bustling hubs where people live, work and play. When the pandemic hit, some people fled major metropolitan markets for smaller towns — raising questions about the future validity of cities.

But those who predicted that COVID-19 would destroy major urban communities might want to stop shorting the resilience of these municipalities and start going long on what the post-pandemic future looks like.

The NFT craze will be a boon for lawyers

3d rendering of pink piggy bank standing on sounding block with gavel lying beside on light-blue background with copy space. Money matters. Lawsuit for money. Auction bids.

Image Credits: Gearstd (opens in a new window) / Getty Images

There’s plenty of uncertainty surrounding copyright issues, fraud and adult content, and legal implications are the crux of the NFT trend.

Whether a court would protect the receipt-holder’s ownership over a given file depends on a variety of factors. All of these concerns mean artists may need to lawyer up.

Viewing Cazoo’s proposed SPAC debut through Carvana’s windshield

It’s a reasonable question: Why would anyone pay that much for Cazoo today if Carvana is more profitable and whatnot? Well, growth. That’s the argument anyway.

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://techcrunch.com/2021/04/02/extra-crunch-roundup-tonal-ec-1-deliveroos-rocky-ipo-is-substack-really-worth-650m/

Continue Reading

AI

The AI Trends Reshaping Health Care

Avatar

Published

on

Click to learn more about author Ben Lorica.

Applications of AI in health care present a number of challenges and considerations that differ substantially from other industries. Despite this, it has also been one of the leaders in putting AI to work, taking advantage of the cutting-edge technology to improve care. The numbers speak for themselves: The global AI in health care market size is expected to grow from $4.9 billion in 2020 to $45.2 billion by 2026. Some major factors driving this growth are the sheer volume of health care data and growing complexities of datasets, the need to reduce mounting health care costs, and evolving patient needs.

Deep learning, for example, has made considerable inroads into the clinical environment over the last few years. Computer vision, in particular, has proven its value in medical imaging to assist in screening and diagnosis. Natural language processing (NLP) has provided significant value in addressing both contractual and regulatory concerns with text mining and data sharing. Increasing adoption of AI technology by pharmaceutical and biotechnology companies to expedite initiatives like vaccine and drug development, as seen in the wake of COVID-19, only exemplifies AI’s massive potential.

We’re already seeing amazing strides in health care AI, but it’s still the early days, and to truly unlock its value, there’s a lot of work to be done in understanding the challenges, tools, and intended users shaping the industry. New research from John Snow Labs and Gradient Flow, 2021 AI in Healthcare Survey Report, sheds light on just this: where we are, where we’re going, and how to get there. The global survey explores the important considerations for health care organizations in varying stages of AI adoption, geographies, and technical prowess to provide an extensive look into the state of AI in health care today.               

One of the most significant findings is around which technologies are top of mind when it comes to AI implementation. When asked what technologies they plan to have in place by the end of 2021, almost half of respondents cited data integration. About one-third cited natural language processing (NLP) and business intelligence (BI) among the technologies they are currently using or plan to use by the end of the year. Half of those considered technical leaders are using – or soon will be using – technologies for data integration, NLP, business intelligence, and data warehousing. This makes sense, considering these tools have the power to help make sense of huge amounts of data, while also keeping regulatory and responsible AI practices in mind.

When asked about intended users for AI tools and technologies, over half of respondents identified clinicians among their target users. This indicates that AI is being used by people tasked with delivering health care services – not just technologists and data scientists, as in years past. That number climbs even higher when evaluating mature organizations, or those that have had AI models in production for more than two years. Interestingly, nearly 60% of respondents from mature organizations also indicated that patients are also users of their AI technologies. With the advent of chatbots and telehealth, it will be interesting to see how AI proliferates for both patients and providers over the next few years.

In considering software for building AI solutions, open-source software (53%) had a slight edge over public cloud providers (42%). Looking ahead one to two years, respondents indicated openness to also using both commercial software and commercial SaaS. Open-source software gives users a level of autonomy over their data that cloud providers can’t, so it’s not a big surprise that a highly regulated industry like health care would be wary of data sharing. Similarly, the majority of companies with experience deploying AI models to production choose to validate models using their own data and monitoring tools, rather than evaluation from third parties or software vendors. While earlier-stage companies are more receptive to exploring third-party partners, more mature organizations are tending to take a more conservative approach.                      

Generally, attitudes remained the same when asked about key criteria used to evaluate AI solutions, software libraries or SaaS solutions, and consulting companies to work with.Although the answers varied slightly for each category,technical leaders considered no data sharing with software vendors or consulting companies, the ability to train their own models, and state-of-the art accuracy as top priorities. Health care-specific models and expertise in health care data engineering, integration, and compliance topped the list when asked about solutions and potential partners. Privacy, accuracy, and health care experience are the forces driving AI adoption. It’s clear that AI is poised for even more growth, as data continues to grow and technology and security measures improve. Health care, which can sometimes be seen as a laggard for quick adoption, is taking to AI and already seeing its significant impact. While its approach, the top tools and technologies, and applications of AI may differ from other industries, it will be exciting to see what’s in store for next year’s survey results.

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://www.dataversity.net/the-ai-trends-reshaping-health-care/

Continue Reading

AI

Turns out humans are leading AI systems astray because we can’t agree on labeling

Avatar

Published

on

Top datasets used to train AI models and benchmark how the technology has progressed over time are riddled with labeling errors, a study shows.

Data is a vital resource in teaching machines how to complete specific tasks, whether that’s identifying different species of plants or automatically generating captions. Most neural networks are spoon-fed lots and lots of annotated samples before they can learn common patterns in data.

But these labels aren’t always correct; training machines using error-prone datasets can decrease their performance or accuracy. In the aforementioned study, led by MIT, analysts combed through ten popular datasets that have been cited more than 100,000 times in academic papers and found that on average 3.4 per cent of the samples are wrongly labelled.

The datasets they looked at range from photographs in ImageNet, to sounds in AudioSet, reviews scraped from Amazon, to sketches in QuickDraw. Examples of some of the mistakes compiled by the researchers show that in some cases, it’s a clear blunder, such as a drawing of a light bulb tagged as a crocodile, in others, however, it’s not always obvious. Should a picture of a bucket of baseballs be labeled as ‘baseballs’ or ‘bucket’?

Shocking contents revealed

Inside the 1TB ImageNet dataset used to train the world’s AI: Naked kids, drunken frat parties, porno stars, and more

READ MORE

Annotating each sample is laborious work. This work is often outsourced work to services like Amazon Mechanical Turk, where workers are paid the square root of sod all to sift through the data piece by piece, labeling images and audio to feed into AI systems. This process amplifies biases and errors, as Vice documented here.

Workers are pressured to agree with the status quo if they want to get paid: if a lot of them label a bucket of baseballs as a ‘bucket’, and you decide it’s ‘baseballs’, you may not be paid at all if the platform figures you’re wrong or deliberately trying to mess up the labeling. That means workers will choose the most popular label to avoid looking like they’ve made a mistake. It’s in their interest to stick to the narrative and avoid sticking out like a sore thumb. That means errors, or worse, racial biases and suchlike, snowball in these datasets.

The error rates vary across the datasets. In ImageNet, the most popular dataset used to train models for object recognition, the rate creeps up to six per cent. Considering it contains about 15 million photos, that means hundreds of thousands of labels are wrong. Some classes of images are more affected than others, for example, ‘chameleon’ is often mistaken for ‘green lizard’ and vice versa.

There are other knock-on effects: neural nets may learn to incorrectly associate features within data with certain labels. If, say, many images of the sea seem to contain boats and they keep getting tagged as ‘sea’, a machine might get confused and be more likely to incorrectly recognize boats as seas.

Problems don’t just arise when trying to compare the performance of models using these noisy datasets. The risks are higher if these systems are deployed in the real world, Curtis Northcutt, co-lead author of the stud and a PhD student at MIT, and also cofounder and CTO of ChipBrain, a machine-learning hardware startup, explained to The Register.

“Imagine a self-driving car that uses an AI model to make steering decisions at intersections,” he said. “What would happen if a self-driving car is trained on a dataset with frequent label errors that mislabel a three-way intersection as a four-way intersection? The answer: it might learn to drive off the road when it encounters three-way intersections.

What would happen if a self-driving car is trained on a dataset with frequent label errors that mislabel a three-way intersection as a four-way intersection?

“Maybe one of your AI self-driving models is actually more robust to training noise, so that it doesn’t drive off the road as much. You’ll never know this if your test set is too noisy because your test set labels won’t match reality. This means you can’t properly gauge which of your auto-pilot AI models drives best – at least not until you deploy the car out in the real-world, where it might drive off the road.”

When the team working on the study trained some convolutional neural networks on portions of ImageNet that have been cleared of errors, their performance improved. The boffins believe that developers should think twice about training large models on datasets that have high error rates, and advise them to sort through the samples first. Cleanlab, the software the team developed and used to identify incorrect and inconsistent labels, can be found on GitHub.

“Cleanlab is an open-source python package for machine learning with noisy labels,” said Northcutt. “Cleanlab works by implementing all of the theory and algorithms in the sub-field of machine learning called confident learning, invented at MIT. I built cleanlab to allow other researchers to use confident learning – usually with just a few lines of code – but more importantly, to advance the progress of science in machine learning with noisy labels and to provide a framework for new researchers to get started easily.”

And be aware that if a dataset’s labels are particularly shoddy, training large complex neural networks may not always be so advantageous. Larger models tend to overfit to data more than smaller ones.

“Sometimes using smaller models will work for very noisy datasets. However, instead of always defaulting to using smaller models for very noisy datasets, I think the main takeaway is that machine learning engineers should clean and correct their test sets before they benchmark their models,” Northcutt concluded. ®

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://go.theregister.com/feed/www.theregister.com/2021/04/01/mit_ai_accuracy/

Continue Reading
Esports4 days ago

Dota 2 Patch 7.29 Will Reveal a New Hero

Esports4 days ago

Best Warzone guns: the weapons you need to use in Black Ops Cold War Season 2

Fintech3 days ago

Novatti’s Ripple partnership live to The Philippines

Fintech5 days ago

Standard Chartered turbocharges digital payments proposition with investment and the merger of CurrencyFair with Assembly Payments

Blockchain4 days ago

Evil Geniuses Partner With Cryptocurrency Exchange Platform Coinbase

Blockchain5 days ago

Unternehmen gründen Crypto Council: Fidelity und Coinbase mit dabei

Esports4 days ago

Overwatch Archives event 2021: new challenges, skins, and more

Blockchain5 days ago

Bitcoin Preis Update: BTC fällt unter 59.500 USD

Cyber Security4 days ago

Fintechs are ransomware targets. Here are 9 ways to prevent it.

Blockchain4 days ago

Krypto-News Roundup 8. April

Esports5 days ago

100 Thieves reveal NFTs in Enter Infinity Collection

Blockchain5 days ago

Ripple Klage: CEO zeigt sich nach Anhörung positiv

Esports4 days ago

indiefoxx was just banned from Twitch again, but why?

Blockchain3 days ago

DFB bringt digitale Sammelkarten auf die Blockchain

Esports2 days ago

Dota 2 Dawnbreaker Hero Guide

Esports5 days ago

Astralis vs Gambit Esports: ESL Pro League betting analysis

Blockchain4 days ago

Krypto News Roundup 7. April 2021

Esports2 days ago

Dallas Empire escape with a win against Minnesota at the Stage 2 Major

Blockchain4 days ago

WEF-Gipfel 2021: Zukunft der Wirtschaft ist tokenisiert

Fintech3 days ago

TrueLayer raises US$70m to build the world’s most valuable Open Banking network

Trending