Connect with us

Artificial Intelligence

AI/ML remains the most in-demand tech skill post COVID

Avatar

Published

on

 

By Sejuti Das

The year 2020 has witnessed some massive transformations of the decade with countries going under lockdown and millions of professionals losing their jobs overnight. Post seven months of isolation, when things are now beginning to return to normal, businesses have started looking to survive the post-COVID era. With tech professionals becoming a critical asset amid this crisis, companies have started putting out their resources for recruiting the right tech talent to remain relevant. And, thus, there has been a gradual rise in demand for professionals with advanced technical skills.

While tech skills have always been sought after, some skills are expected to gain more traction than others in the post-COVID world. In fact, various reports and studies conducted at the beginning of year highlighted a rise in domains like artificial intelligencedeep learning, data analytics and machine learning. Despite the lockdown, these domains continued to develop at a steady pace.

According to a recent study done by Analytics India Magazine, it has been noted that open jobs for analytics professionals have the highest proportion at 33.7%, which is then followed by machine learning at 20.4% and cybersecurity at 15.4%. This could be attributed to the immense amount of data that is generated in every transaction across B2B and B2C segments. Such data also shows that despite the recession, there is still a requirement for data analysis and automation to reduce redundancies, and to safeguard critical data.

AI/ML Can Help In Adapting To The Change 

To remain relevant in the era beyond COVID, businesses are critically looking for tech talent and skills that can help them adapt to the new change. And for that, artificial intelligence and machine learning have turned out to be the critical skills that are creating a buzz in the industry. Not only the technologies have vast use cases in the industry but are also helping in significant innovations for the post-pandemic world.

With businesses expected to automate their operations and primary manual roles, the requirement for AI and ML experts have been on the rise. As a matter of fact, LinkedIn’s job report for this year noted that the hiring growth for AI specialists has grown to 74% annually in the past few years. On the other hand, according to another job portal, Indeed, the average base salary of a machine learning engineer is $145,539 per year in the US and a median wage of 13.6 Lakhs in India. 

The data show that in spite of the economic downturn, the requirement for AI and ML talents has to be steady without any decrease in their salaries. The report further noted that skills like TensorFlowPythonNatural Language Processing are the highest in demand and thus will be critical for professionals to land on AI/ML jobs.

Agreeing to this, Ammar Jagirdar, Product Head at Qure.ai said, while the demand for skilled data scientists and engineers has always been strong, in the post COVID era, we appreciate the importance of adaptability. “Our world has changed rapidly and may transform further, but we’ll be ready for these challenges,” said Jagirdar. “At Qure, we are looking for candidates with experience in deep learning R&D and product deployment, specifically for healthcare. We are also presently hiring for data science positions as well as front-end and back-end engineering roles.”

Even Lakshya Sivaramakrishnan, Program Lead at Google believes that machine learning and data science will be the tech skills that will massively gain traction in the post COVID era. “This is because more and more companies will be looking for more adaptability and flexibility post the pandemic and would adapt to the required change to stay relevant.” However, she also believes that businesses cannot keep building models without having professionals with engineering standpoint, and that’s where mean stack developers come into play.

Indeed shared a recent report on top ten most in-demand tech jobs, which highlighted a 30% growth in tech jobs listings and how developers, systems integration engineers and SAS programmers are most coveted. It stated that the average annual salary of a software engineer manager has increased to $144,793 and a SAS ABAP developer’s highest salary earring is $139,920. These numbers highlight the increased requirement of these lucrative skills in the industry.

With that being said, no one can deny the importance of AI and ML in almost every industry, including healthcare, finance, retail, manufacturing, education, etc. When asked, Abhinav Tushar, Head of AI at Vernacular.ai, stated that, with machine learning gaining traction, there is more demand for work involving automated analysis of the text, audio and video media content. “Therefore, I think ML for analysis on speech/text has overall been exceptionally high in demand in recent times.”

Thus, the possibilities are endless, and one can apply machine learning skills to every requirement — starting from developing chatbots for better customer service to improving workplace communication and enhancing cybersecurity. Thus professionals who are delving into the fields of AI and ML are definitely going to be most in-demand for the post-COVID world.

Conclusion

With such evidence in hand, it can be established that the post-COVID world will be way advanced than today with enterprises adopting emerging technologies like artificial intelligence and machine learning. Since AI and ML will be the new norm rather than the exception, it is critical for companies as well as professionals to look at their strategies and find ways to delve into these evolving fields to stay relevant post-COVID.

The post AI/ML remains the most in-demand tech skill post COVID appeared first on Fintech News.

Checkout PrimeXBT
Source: https://www.fintechnews.org/ai-ml-remains-the-most-in-demand-tech-skill-post-covid/

Artificial Intelligence

Deep Learning vs Machine Learning: How an Emerging Field Influences Traditional Computer Programming

Avatar

Published

on

When two different concepts are greatly intertwined, it can be difficult to separate them as distinct academic topics. That might explain why it’s so difficult to separate deep learning from machine learning as a whole. Considering the current push for both automation as well as instant gratification, a great deal of renewed focus has been heaped on the topic.

Everything from automated manufacturing worfklows to personalized digital medicine could potentially grow to rely on deep learning technology. Defining the exact aspects of this technical discipline that will revolutionize these industries is, however, admittedly much more difficult. Perhaps it’s best to consider deep learning in the context of a greater movement in computer science.

Defining Deep Learning as a Subset of Machine Learning

Machine learning and deep learning are essentially two sides of the same coin. Deep learning techniques are a specific discipline that belong to a much larger field that includes a large variety of trained artificially intelligent agents that can predict the correct response in an equally wide array of situations. What makes deep learning independent of all of these other techniques, however, is the fact that it focuses almost exclusively on teaching agents to accomplish a specific goal by learning the best possible action in a number of virtual environments.

Traditional machine learning algorithms usually teach artificial nodes how to respond to stimuli by rote memorization. This is somewhat similar to human teaching techniques that consist of simple repetition, and therefore might be thought of the computerized equivalent of a student running through times tables until they can recite them. While this is effective in a way, artificially intelligent agents educated in such a manner may not be able to respond to any stimulus outside of the realm of their original design specifications.

That’s why deep learning specialists have developed alternative algorithms that are considered to be somewhat superior to this method, though they are admittedly far more hardware intensive in many ways. Subrountines used by deep learning agents may be based around generative adversarial networks, convolutional neural node structures or a practical form of restricted Boltzmann machine. These stand in sharp contrast to the binary trees and linked lists used by conventional machine learning firmware as well as a majority of modern file systems.

Self-organizing maps have also widely been in deep learning, though their applications in other AI research fields have typically been much less promising. When it comes to defining the deep learning vs machine learning debate, however, it’s highly likely that technicians will be looking more for practical applications than for theoretical academic discussion in the coming months. Suffice it to say that machine learning encompasses everything from the simplest AI to the most sophisticated predictive algorithms while deep learning constitutes a more selective subset of these techniques.

Practical Applications of Deep Learning Technology

Depending on how a particular program is authored, deep learning techniques could be deployed along supervised or semi-supervised neural networks. Theoretically, it’d also be possible to do so via a completely unsupervised node layout, and it’s this technique that has quickly become the most promising. Unsupervised networks may be useful for medical image analysis, since this application often presents unique pieces of graphical information to a computer program that have to be tested against known inputs.

Traditional binary tree or blockchain-based learning systems have struggled to identify the same patterns in dramatically different scenarios, because the information remains hidden in a structure that would have otherwise been designed to present data effectively. It’s essentially a natural form of steganography, and it has confounded computer algorithms in the healthcare industry. However, this new type of unsupervised learning node could virtually educate itself on how to match these patterns even in a data structure that isn’t organized along the normal lines that a computer would expect it to be.

Others have proposed implementing semi-supervised artificially intelligent marketing agents that could eliminate much of the concern over ethics regarding existing deal-closing software. Instead of trying to reach as large a customer base as possible, these tools would calculate the odds of any given individual needing a product at a given time. In order to do so, it would need certain types of information provided by the organization that it works on behalf of, but it would eventually be able to predict all further actions on its own.

While some companies are currently relying on tools that utilize traditional machine learning technology to achieve the same goals, these are often wrought with privacy and ethical concerns. The advent of deep structured learning algorithms have enabled software engineers to come up with new systems that don’t suffer from these drawbacks.

Developing a Private Automated Learning Environment

Conventional machine learning programs often run into serious privacy concerns because of the fact that they need a huge amount of input in order to draw any usable conclusions. Deep learning image recognition software works by processing a smaller subset of inputs, thus ensuring that it doesn’t need as much information to do its job. This is of particular importance for those who are concerned about the possibility of consumer data leaks.

Considering new regulatory stances on many of these issues, it’s also quickly become something that’s become important from a compliance standpoint as well. As toxicology labs begin using bioactivity-focused deep structured learning packages, it’s likely that regulators will express additional concerns in regards to the amount of information needed to perform any given task with this kind of sensitive data. Computer scientists have had to scale back what some have called a veritable fire hose of bytes that tell more of a story than most would be comfortable with.

In a way, these developments hearken back to an earlier time when it was believed that each process in a system should only have the amount of privileges necessary to complete its job. As machine learning engineers embrace this paradigm, it’s highly likely that future developments will be considerably more secure simply because they don’t require the massive amount of data mining necessary to power today’s existing operations.

Image Credit: toptal.io

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://datafloq.com/read/deep-learning-vs-machine-learning-how-emerging-field-influences-traditional-computer-programming/13652

Continue Reading

Artificial Intelligence

Extra Crunch roundup: Tonal EC-1, Deliveroo’s rocky IPO, is Substack really worth $650M?

Avatar

Published

on

For this morning’s column, Alex Wilhelm looked back on the last few months, “a busy season for technology exits” that followed a hot Q4 2020.

We’re seeing signs of an IPO market that may be cooling, but even so, “there are sufficient SPACs to take the entire recent Y Combinator class public,” he notes.

Once we factor in private equity firms with pockets full of money, it’s evident that late-stage companies have three solid choices for leveling up.

Seeking more insight into these liquidity options, Alex interviewed:

  • DigitalOcean CEO Yancey Spruill, whose company went public via IPO;
  • Latch CFO Garth Mitchell, who discussed his startup’s merger with real estate SPAC $TSIA;
  • Brian Cruver, founder and CEO of AlertMedia, which recently sold to a private equity firm.

After recapping their deals, each executive explains how their company determined which flashing red “EXIT” sign to follow. As Alex observed, “choosing which option is best from a buffet’s worth of possibilities is an interesting task.”

Thanks very much for reading Extra Crunch! Have a great weekend.

Walter Thompson
Senior Editor, TechCrunch
@yourprotagonist


Full Extra Crunch articles are only available to members
Use discount code ECFriday to save 20% off a one- or two-year subscription


The Tonal EC-1

Image Credits: Nigel Sussman

On Tuesday, we published a four-part series on Tonal, a home fitness startup that has raised $200 million since it launched in 2018. The company’s patented hardware combines digital weights, coaching and AI in a wall-mounted system that sells for $2,995.

By any measure, it is poised for success — sales increased 800% between December 2019 and 2020, and by the end of this year, the company will have 60 retail locations. On Wednesday, Tonal reported a $250 million Series E that valued the company at $1.6 billion.

Our deep dive examines Tonal’s origins, product development timeline, its go-to-market strategy and other aspects that combined to spark investor interest and customer delight.

We call this format the “EC-1,” since these stories are as comprehensive and illuminating as the S-1 forms startups must file with the SEC before going public.

Here’s how the Tonal EC-1 breaks down:

We have more EC-1s in the works about other late-stage startups that are doing big things well and making news in the process.

What to make of Deliveroo’s rough IPO debut

Why did Deliveroo struggle when it began to trade? Is it suffering from cultural dissonance between its high-growth model and more conservative European investors?

Let’s peek at the numbers and find out.

Kaltura puts debut on hold. Is the tech IPO window closing?

The Exchange doubts many folks expected the IPO climate to get so chilly without warning. But we could be in for a Q2 pause in the formerly scorching climate for tech debuts.

Is Substack really worth $650M?

A $65 million Series B is remarkable, even by 2021 standards. But the fact that a16z is pouring more capital into the alt-media space is not a surprise.

Substack is a place where publications have bled some well-known talent, shifting the center of gravity in media. Let’s take a look at Substack’s historical growth.

RPA market surges as investors, vendors capitalize on pandemic-driven tech shift

Business process organization and analytics. Business process visualization and representation, automated workflow system concept. Vector concept creative illustration

Image Credits: Visual Generation / Getty Images

Robotic process automation came to the fore during the pandemic as companies took steps to digitally transform. When employees couldn’t be in the same office together, it became crucial to cobble together more automated workflows that required fewer people in the loop.

RPA has enabled executives to provide a level of automation that essentially buys them time to update systems to more modern approaches while reducing the large number of mundane manual tasks that are part of every industry’s workflow.

E-commerce roll-ups are the next wave of disruption in consumer packaged goods

Elevated view of many toilet rolls on blue background

Image Credits: Javier Zayas Photography (opens in a new window) / Getty Images

This year is all about the roll-ups, the aggregation of smaller companies into larger firms, creating a potentially compelling path for equity value. The interest in creating value through e-commerce brands is particularly striking.

Just a year ago, digitally native brands had fallen out of favor with venture capitalists after so many failed to create venture-scale returns. So what’s the roll-up hype about?

Hack takes: A CISO and a hacker detail how they’d respond to the Exchange breach

3d Flat isometric vector concept of data breach, confidential data stealing, cyber attack.

Image Credits: TarikVision (opens in a new window) / Getty Images

The cyber world has entered a new era in which attacks are becoming more frequent and happening on a larger scale than ever before. Massive hacks affecting thousands of high-level American companies and agencies have dominated the news recently. Chief among these are the December SolarWinds/FireEye breach and the more recent Microsoft Exchange server breach.

Everyone wants to know: If you’ve been hit with the Exchange breach, what should you do?

5 machine learning essentials nontechnical leaders need to understand

Jumble of multicoloured wires untangling into straight lines over a white background. Cape Town, South Africa. Feb 2019.

Image Credits: David Malan (opens in a new window) / Getty Images

Machine learning has become the foundation of business and growth acceleration because of the incredible pace of change and development in this space.

But for engineering and team leaders without an ML background, this can also feel overwhelming and intimidating.

Here are best practices and must-know components broken down into five practical and easily applicable lessons.

Embedded procurement will make every company its own marketplace

Businesswomen using mobile phone analyzing data and economic growth graph chart. Technology digital marketing and network connection.

Image Credits: Busakorn Pongparnit / Getty Images

Embedded procurement is the natural evolution of embedded fintech.

In this next wave, businesses will buy things they need through vertical B2B apps, rather than through sales reps, distributors or an individual merchant’s website.

Knowing when your startup should go all-in on business development

One red line with arrow head breaking out from a business or finance growth chart canvas.

Image Credits: twomeows / Getty Images

There’s a persistent fallacy swirling around that any startup growing pain or scaling problem can be solved with business development.

That’s frankly not true.

Dear Sophie: What should I know about prenups and getting a green card through marriage?

lone figure at entrance to maze hedge that has an American flag at the center

Image Credits: Bryce Durbin/TechCrunch

Dear Sophie:

I’m a founder of a startup on an E-2 investor visa and just got engaged! My soon-to-be spouse will sponsor me for a green card.

Are there any minimum salary requirements for her to sponsor me? Is there anything I should keep in mind before starting the green card process?

— Betrothed in Belmont

Startups must curb bureaucracy to ensure agile data governance

Image of a computer, phone and clock on a desk tied in red tape.

Image Credits: RichVintage / Getty Images

Many organizations perceive data management as being akin to data governance, where responsibilities are centered around establishing controls and audit procedures, and things are viewed from a defensive lens.

That defensiveness is admittedly justified, particularly given the potential financial and reputational damages caused by data mismanagement and leakage.

Nonetheless, there’s an element of myopia here, and being excessively cautious can prevent organizations from realizing the benefits of data-driven collaboration, particularly when it comes to software and product development.

Bring CISOs into the C-suite to bake cybersecurity into company culture

Mixed race businesswoman using tablet computer in server room

Image Credits: Jetta Productions Inc (opens in a new window) / Getty Images

Cyber strategy and company strategy are inextricably linked. Consequently, chief information security officers in the C-Suite will be just as common and influential as CFOs in maximizing shareholder value.

How is edtech spending its extra capital?

Money tree: an adult hand reaches for dollar bills growing on a leafless tree

Image Credits: Tetra Images (opens in a new window) / Getty Images

Edtech unicorns have boatloads of cash to spend following the capital boost to the sector in 2020. As a result, edtech M&A activity has continued to swell.

The idea of a well-capitalized startup buying competitors to complement its core business is nothing new, but exits in this sector are notable because the money used to buy startups can be seen as an effect of the pandemic’s impact on remote education.

But in the past week, the consolidation environment made a clear statement: Pandemic-proven startups are scooping up talent — and fast.

Tech in Mexico: A confluence of Latin America, the US and Asia

Aerial view of crowd connected by lines

Image Credits: Orbon Alija (opens in a new window)/ Getty Images

Knowledge transfer is not the only trend flowing in the U.S.-Asia-LatAm nexus. Competition is afoot as well.

Because of similar market conditions, Asian tech giants are directly expanding into Mexico and other LatAm countries.

How we improved net retention by 30+ points in 2 quarters

Sparks coming off US dollar bill attached to jumper cables

Image Credits: Steven Puetzer (opens in a new window) / Getty Images

There’s certainly no shortage of SaaS performance metrics leaders focus on, but NRR (net revenue retention) is without question the most underrated metric out there.

NRR is simply total revenue minus any revenue churn plus any revenue expansion from upgrades, cross-sells or upsells. The greater the NRR, the quicker companies can scale.

5 mistakes creators make building new games on Roblox

BRAZIL - 2021/03/24: In this photo illustration a Roblox logo seen displayed on a smartphone. (Photo Illustration by Rafael Henrique/SOPA Images/LightRocket via Getty Images)

Image Credits: SOPA Images (opens in a new window) / Getty Images

Even the most experienced and talented game designers from the mobile F2P business usually fail to understand what features matter to Robloxians.

For those just starting their journey in Roblox game development, these are the most common mistakes gaming professionals make on Roblox.

CEO Manish Chandra, investor Navin Chaddha explain why Poshmark’s Series A deck sings

CEO Manish Chandra, investor Navin Chaddha explain why Poshmark’s Series A deck sings image

“Lead with love, and the money comes.” It’s one of the cornerstone values at Poshmark. On the latest episode of Extra Crunch Live, Chandra and Chaddha sat down with us and walked us through their original Series A pitch deck.

Will the pandemic spur a smart rebirth for cities?

New versus old - an old brick building reflected in windows of modern new facade

Image Credits: hopsalka (opens in a new window) / Getty Images

Cities are bustling hubs where people live, work and play. When the pandemic hit, some people fled major metropolitan markets for smaller towns — raising questions about the future validity of cities.

But those who predicted that COVID-19 would destroy major urban communities might want to stop shorting the resilience of these municipalities and start going long on what the post-pandemic future looks like.

The NFT craze will be a boon for lawyers

3d rendering of pink piggy bank standing on sounding block with gavel lying beside on light-blue background with copy space. Money matters. Lawsuit for money. Auction bids.

Image Credits: Gearstd (opens in a new window) / Getty Images

There’s plenty of uncertainty surrounding copyright issues, fraud and adult content, and legal implications are the crux of the NFT trend.

Whether a court would protect the receipt-holder’s ownership over a given file depends on a variety of factors. All of these concerns mean artists may need to lawyer up.

Viewing Cazoo’s proposed SPAC debut through Carvana’s windshield

It’s a reasonable question: Why would anyone pay that much for Cazoo today if Carvana is more profitable and whatnot? Well, growth. That’s the argument anyway.

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://techcrunch.com/2021/04/02/extra-crunch-roundup-tonal-ec-1-deliveroos-rocky-ipo-is-substack-really-worth-650m/

Continue Reading

AI

The AI Trends Reshaping Health Care

Avatar

Published

on

Click to learn more about author Ben Lorica.

Applications of AI in health care present a number of challenges and considerations that differ substantially from other industries. Despite this, it has also been one of the leaders in putting AI to work, taking advantage of the cutting-edge technology to improve care. The numbers speak for themselves: The global AI in health care market size is expected to grow from $4.9 billion in 2020 to $45.2 billion by 2026. Some major factors driving this growth are the sheer volume of health care data and growing complexities of datasets, the need to reduce mounting health care costs, and evolving patient needs.

Deep learning, for example, has made considerable inroads into the clinical environment over the last few years. Computer vision, in particular, has proven its value in medical imaging to assist in screening and diagnosis. Natural language processing (NLP) has provided significant value in addressing both contractual and regulatory concerns with text mining and data sharing. Increasing adoption of AI technology by pharmaceutical and biotechnology companies to expedite initiatives like vaccine and drug development, as seen in the wake of COVID-19, only exemplifies AI’s massive potential.

We’re already seeing amazing strides in health care AI, but it’s still the early days, and to truly unlock its value, there’s a lot of work to be done in understanding the challenges, tools, and intended users shaping the industry. New research from John Snow Labs and Gradient Flow, 2021 AI in Healthcare Survey Report, sheds light on just this: where we are, where we’re going, and how to get there. The global survey explores the important considerations for health care organizations in varying stages of AI adoption, geographies, and technical prowess to provide an extensive look into the state of AI in health care today.               

One of the most significant findings is around which technologies are top of mind when it comes to AI implementation. When asked what technologies they plan to have in place by the end of 2021, almost half of respondents cited data integration. About one-third cited natural language processing (NLP) and business intelligence (BI) among the technologies they are currently using or plan to use by the end of the year. Half of those considered technical leaders are using – or soon will be using – technologies for data integration, NLP, business intelligence, and data warehousing. This makes sense, considering these tools have the power to help make sense of huge amounts of data, while also keeping regulatory and responsible AI practices in mind.

When asked about intended users for AI tools and technologies, over half of respondents identified clinicians among their target users. This indicates that AI is being used by people tasked with delivering health care services – not just technologists and data scientists, as in years past. That number climbs even higher when evaluating mature organizations, or those that have had AI models in production for more than two years. Interestingly, nearly 60% of respondents from mature organizations also indicated that patients are also users of their AI technologies. With the advent of chatbots and telehealth, it will be interesting to see how AI proliferates for both patients and providers over the next few years.

In considering software for building AI solutions, open-source software (53%) had a slight edge over public cloud providers (42%). Looking ahead one to two years, respondents indicated openness to also using both commercial software and commercial SaaS. Open-source software gives users a level of autonomy over their data that cloud providers can’t, so it’s not a big surprise that a highly regulated industry like health care would be wary of data sharing. Similarly, the majority of companies with experience deploying AI models to production choose to validate models using their own data and monitoring tools, rather than evaluation from third parties or software vendors. While earlier-stage companies are more receptive to exploring third-party partners, more mature organizations are tending to take a more conservative approach.                      

Generally, attitudes remained the same when asked about key criteria used to evaluate AI solutions, software libraries or SaaS solutions, and consulting companies to work with.Although the answers varied slightly for each category,technical leaders considered no data sharing with software vendors or consulting companies, the ability to train their own models, and state-of-the art accuracy as top priorities. Health care-specific models and expertise in health care data engineering, integration, and compliance topped the list when asked about solutions and potential partners. Privacy, accuracy, and health care experience are the forces driving AI adoption. It’s clear that AI is poised for even more growth, as data continues to grow and technology and security measures improve. Health care, which can sometimes be seen as a laggard for quick adoption, is taking to AI and already seeing its significant impact. While its approach, the top tools and technologies, and applications of AI may differ from other industries, it will be exciting to see what’s in store for next year’s survey results.

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://www.dataversity.net/the-ai-trends-reshaping-health-care/

Continue Reading

AI

Turns out humans are leading AI systems astray because we can’t agree on labeling

Avatar

Published

on

Top datasets used to train AI models and benchmark how the technology has progressed over time are riddled with labeling errors, a study shows.

Data is a vital resource in teaching machines how to complete specific tasks, whether that’s identifying different species of plants or automatically generating captions. Most neural networks are spoon-fed lots and lots of annotated samples before they can learn common patterns in data.

But these labels aren’t always correct; training machines using error-prone datasets can decrease their performance or accuracy. In the aforementioned study, led by MIT, analysts combed through ten popular datasets that have been cited more than 100,000 times in academic papers and found that on average 3.4 per cent of the samples are wrongly labelled.

The datasets they looked at range from photographs in ImageNet, to sounds in AudioSet, reviews scraped from Amazon, to sketches in QuickDraw. Examples of some of the mistakes compiled by the researchers show that in some cases, it’s a clear blunder, such as a drawing of a light bulb tagged as a crocodile, in others, however, it’s not always obvious. Should a picture of a bucket of baseballs be labeled as ‘baseballs’ or ‘bucket’?

Shocking contents revealed

Inside the 1TB ImageNet dataset used to train the world’s AI: Naked kids, drunken frat parties, porno stars, and more

READ MORE

Annotating each sample is laborious work. This work is often outsourced work to services like Amazon Mechanical Turk, where workers are paid the square root of sod all to sift through the data piece by piece, labeling images and audio to feed into AI systems. This process amplifies biases and errors, as Vice documented here.

Workers are pressured to agree with the status quo if they want to get paid: if a lot of them label a bucket of baseballs as a ‘bucket’, and you decide it’s ‘baseballs’, you may not be paid at all if the platform figures you’re wrong or deliberately trying to mess up the labeling. That means workers will choose the most popular label to avoid looking like they’ve made a mistake. It’s in their interest to stick to the narrative and avoid sticking out like a sore thumb. That means errors, or worse, racial biases and suchlike, snowball in these datasets.

The error rates vary across the datasets. In ImageNet, the most popular dataset used to train models for object recognition, the rate creeps up to six per cent. Considering it contains about 15 million photos, that means hundreds of thousands of labels are wrong. Some classes of images are more affected than others, for example, ‘chameleon’ is often mistaken for ‘green lizard’ and vice versa.

There are other knock-on effects: neural nets may learn to incorrectly associate features within data with certain labels. If, say, many images of the sea seem to contain boats and they keep getting tagged as ‘sea’, a machine might get confused and be more likely to incorrectly recognize boats as seas.

Problems don’t just arise when trying to compare the performance of models using these noisy datasets. The risks are higher if these systems are deployed in the real world, Curtis Northcutt, co-lead author of the stud and a PhD student at MIT, and also cofounder and CTO of ChipBrain, a machine-learning hardware startup, explained to The Register.

“Imagine a self-driving car that uses an AI model to make steering decisions at intersections,” he said. “What would happen if a self-driving car is trained on a dataset with frequent label errors that mislabel a three-way intersection as a four-way intersection? The answer: it might learn to drive off the road when it encounters three-way intersections.

What would happen if a self-driving car is trained on a dataset with frequent label errors that mislabel a three-way intersection as a four-way intersection?

“Maybe one of your AI self-driving models is actually more robust to training noise, so that it doesn’t drive off the road as much. You’ll never know this if your test set is too noisy because your test set labels won’t match reality. This means you can’t properly gauge which of your auto-pilot AI models drives best – at least not until you deploy the car out in the real-world, where it might drive off the road.”

When the team working on the study trained some convolutional neural networks on portions of ImageNet that have been cleared of errors, their performance improved. The boffins believe that developers should think twice about training large models on datasets that have high error rates, and advise them to sort through the samples first. Cleanlab, the software the team developed and used to identify incorrect and inconsistent labels, can be found on GitHub.

“Cleanlab is an open-source python package for machine learning with noisy labels,” said Northcutt. “Cleanlab works by implementing all of the theory and algorithms in the sub-field of machine learning called confident learning, invented at MIT. I built cleanlab to allow other researchers to use confident learning – usually with just a few lines of code – but more importantly, to advance the progress of science in machine learning with noisy labels and to provide a framework for new researchers to get started easily.”

And be aware that if a dataset’s labels are particularly shoddy, training large complex neural networks may not always be so advantageous. Larger models tend to overfit to data more than smaller ones.

“Sometimes using smaller models will work for very noisy datasets. However, instead of always defaulting to using smaller models for very noisy datasets, I think the main takeaway is that machine learning engineers should clean and correct their test sets before they benchmark their models,” Northcutt concluded. ®

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://go.theregister.com/feed/www.theregister.com/2021/04/01/mit_ai_accuracy/

Continue Reading
Esports2 days ago

chessbae removed as moderator from Chess.com amid drama

Fintech4 days ago

Novatti’s Ripple partnership live to The Philippines

Esports3 days ago

Dota 2 Dawnbreaker Hero Guide

Blockchain5 days ago

DFB bringt digitale Sammelkarten auf die Blockchain

Fintech4 days ago

TrueLayer raises US$70m to build the world’s most valuable Open Banking network

Esports4 days ago

Dallas Empire escape with a win against Minnesota at the Stage 2 Major

Esports3 days ago

A detailed look at Dawnbreaker, Dota 2’s first new carry in four years

Esports2 days ago

Why did Twitch ban the word “obese” from its predictions?

Esports4 days ago

Dota 2 new hero: A list of possible suspects

Esports5 days ago

How to Unlock the Black Ops Cold War ZRG 20mm Sniper Rifle

Esports5 days ago

Asmongold calls out Twitch after indiefoxx banned four times

Esports3 days ago

Dota 2: Patch 7.29 Analysis Of Top Changes

Esports4 days ago

B-Hopping in CSGO: How to Do It, And Its Uses In a Match

Fintech4 days ago

Cape awarded MVP Grant to kickstart new wave of Open Banking powered business finance tools

Esports5 days ago

Valorant New Map to be Named ‘Breeze’, Leaks Suggest

Esports5 days ago

ALGS commissioner John Nelson on the past, present, and future of Apex Legends esports

Esports4 days ago

xQc calls ZULUL supporters racist for wanting Twitch emote back

Fintech1 day ago

Australia’s Peppermint Innovation signs agreement with the Philippine’s leading micro-financial services provider

Esports4 days ago

Apex Legends tier list: the best legends to use in Season 8

Esports3 days ago

Dota 2: Team Nigma Completes Dota 2 Roster With iLTW

Trending