Connect with us

Artificial Intelligence

Big Data industry predictions for 2021

Avatar

Published

on

Big Data industry predictions for 2021

By Daniel Gutierrez

2020 has been year for the ages, with so many domestic and global challenges. But the big data industry has significant inertia moving into 2021. In order to give our valued readers a pulse on important new trends leading into next year, we here at insideBIGDATA heard from all our friends across the vendor ecosystem to get their insights, reflections and predictions for what may be coming. We were very encouraged to hear such exciting perspectives. Even if only half actually come true, Big Data in the next year is destined to be quite an exciting ride. Enjoy!

Daniel D. Gutierrez – Editor-in-Chief & Resident Data Scientist

Analytics

The “analytic divide” is going to get worse. Like the much-publicized “digital divide” we’re also seeing the emergence of an “analytic divide.” Many companies were driven to invest in analytics due to the pandemic, while others have been forced to cut anything they didn’t view as critical to keep the lights on – and a proper investment in analytics was, for these organizations, analytics was on the chopping block. This means that the analytic divide will further widen in 2021, and this trend will continue for many years to come. Without a doubt, winners and losers in every industry will continue to be defined by those that are leveraging analytics and those that are not. – Alan Jacobson, Chief Data and Analytics Officer, at Alteryx

Likely gone are the days of piecemeal analytics and reporting solutions that are likely fulfilling niche business use cases. This is unsustainable. Companies cannot have highly departmentalized analytics implementations that have the effect of localized problem solving and the larger business not seeing the full benefit. This current situation will change into one where analytics will be done on all data that the company has access to, with the capability of these analytics be implemented in a collaborative manner by a variety of interest groups with different skills sets (e.g., data science, lines of business leaders) and with a full-on focus towards operationalizing analytics insights in near real time. In other words, no more piecemeal and no more just science experimentation. – Sri Raghavan, Director, Data Science and Advanced Analytics Product Marketing at Teradata

Prescriptive analytics will be a key component for digital transformation success: Advanced analytics are becoming mainstreamed as businesses increasingly collect and analyze data across their organizations, with 35% of U.S. manufacturers deploying advanced analytics in the past three years. For AI to have a significant impact across the value chain, prescriptive analytics will be the catalyst to optimize performance. Prescriptive analytics will become an essential piece for scaling AI within organizations, by leveraging product and customer data to advise AI models on how to improve processes, adjust production and increase efficiency. Prescriptive analytics enables constant improvement with an AI model by continuously monitoring and adjusting based on evolving conditions. Prescriptive models can then enable decision automation, where the models can take the best course of action based on prescriptions. Going beyond predictive analytics to prescriptive analytics will ultimately enable digital transformation success for manufacturers in 2021. – George Young, Global Managing Director of Kalypso

Augmented analytics and self-service will become more widely in demand given the distributed workforce and hunger for information. In response, traditional analytics will become increasingly disrupted by AI. The increase in a distributed workforce is going to create a greater demand for augmented analytics where the individual user is guided through the process of creating queries to get immediate answers to their data questions. We are seeing a converge of analytics and AI in two areas – at the infrastructure level and at the analyst level.

People are beginning to realize that they have different data pipelines that are providing data for an analytics engine and they are building a different stack for ML. Instead of two completely separate stacks, we see a convergence of these into an infrastructure that is easier to maintain while ensuring that the same data is being used to supply both engines . A second convergence will happen regarding a ‘hunger’ for information and bridging a gap to answer questions using data. Traditional analytics will start to get more disrupted by AI. Platforms (such as Tableau, Power BI, etc.) will start get displaced by bots and virtual assistants that will be conversational in nature. We see this as a push to speed through a pull for self-service. We also anticipate NLP becoming more widely used in 2021. – Scott Schlesinger, Global Data, Analytics & AI Practice Leader at Ness

The lines between IT and other departments when it comes to data and analytics in particular will continue to blur. Data and analytics have the potential to drive extremely positive and meaningful business outcomes, and when it happens, there is often also powerful collaboration across different functional areas as each one has a level of accountability for the success of the analytics approach. Areas like data governance, data literacy, open data platforms, integration and utilization of data in different parts of the enterprise will enable business users to perform tasks traditionally reserved for IT teams and the data that business units generate will feed into platforms that IT manages. This — coupled with a shortage of data scientists and analytics professionals — also means that data platforms will become more seamless and easy to deploy so that all parts of an organization will be able to leverage it. – Frances Zelazny, CMO of Signals Analytics

In the 2000s, putting Microsoft Office on your resume could make you a good candidate for a job, but a decade later it was a skill that was taken for granted. Nowadays, SQL proficiency can make you stand out, but what will happen in the years ahead?

As data literacy rises, analytics skills will become the norm for all business professionals and start to disappear from candidates’ resumes. Just as you’re unlikely to see ‘Office proficiency’ today, you’re unlikely to see ‘data proficiency’ by the end of the decade. We’ve entered a third wave of analytics, and with it the expectation that business users can interact with data without the help of an expert. Very soon, if you’re unable to marry hard data with business context to define and execute a strategy, you’re going to struggle in the workplace. The ideal candidate for businesses in 2021 and beyond will be a person who can both understand and speak data — because in a few short years, data literacy will be something employers demand and expect. Those who want to get ahead are acquiring these talents now. – ThoughtSpot CEO Sudheesh Nair

As companies shift their data infrastructure to a federated (one engine queries different sources), disaggregated (compute is separate from storage is separate from the data lake) stack, we’ll see traditional data warehousing and tightly coupled database architectures relegated to legacy workloads. But one thing will remain the same when it comes to this shift – SQL will continue to be the lingua franca for analytics. Data analysts, data engineers, data scientists and product managers along with their database admins will use SQL for analytics. – Dave Simmen, Co-founder and Chief Technology Officer (CTO), Ahana

Organizations everywhere are escalating their use of analytics systems but are challenged with the need to for event-data platforms that can perform real-time data wrangling. In 2021 organizations will demand intelligent data platforms that can consume static and streaming data from a variety of sources in any format, size or velocity; Wrangle the data (enrich and map) on-the-fly; and deliver the data to systems, devices and applications securely and in real time. – Sean Bowen, CEO of Push Technology

One single SQL query for all data workloads. The way forward is based not only on automation, but also on how quickly and widely you can make your analytics accessible and shareable. Analytics gives you a clear direction of what your next steps should be to keep customers and employees happy, and even save lives. Managing your data is no longer a luxury, but a necessity–and determines how successful you or your company will be. If you can remove complexity or cost of managing data, you’ll be very effective. Ultimately, the winner of the space will take the complexity and cost out of data management, and workloads will be unified so you can write one single SQL query to manage and access all workloads across multiple data residencies. – Raj Verma, CEO of SingleStore

AI and Analytics capabilities were provided by different platforms / teams in the past. Over the years, we are seeing the platform is converging and the AI team is more focused on the algorithmic side, while AI & Analytics platform teams merged to provide the software infrastructure for both analytics and AI use cases. – Haoyuan Li, Founder and CEO, Alluxio

As data professionals, we have a responsibility to the broader public. I think that within the next year we will see progress toward a code of ethics within the data analytics space, led by conscious companies who recognize the seriousness of potential abuses. Perhaps the US government will intervene and pass some version of its own GDPR, but I believe that technology companies will lead this charge. What Facebook has done with engagement data is not illegal, but we’ve seen that it can have deleterious effects on child development and on our personal habits. In the coming years, we will look back on the way companies used personal data in the 2010s and cringe in the way we do when we see people smoking on a plane in films from the 1960s. – Jeremy Levy, CEO of Indicative

Emotion is a key factor affecting customer behavior and has a strong influence on brand loyalty. Therefore, it is increasingly useful for companies to find a way to measure emotions of customers during their decision-making processes. Emotional analytics focuses on studying and recognizing the full gamut of human emotions that includes mood, attitude and personality. It employs predictive models and AI/ML to analyze human movements, word choices, voice tones, and facial expressions. Emotional analytics can help companies build a more holistic customer profile, understand how to influence emotions and develop customized product and services tailored to individuals. Sentiment analysis about products and services, across geographies, social networks, and review web sites enables companies to better understand and improve their customer satisfaction level. Using emotional analytics, companies can better understand how their marketing and services influence emotion in order to provide more positively engaging customer experiences. – Paul Moxon, SVP, Data Architecture at Denodo

Getting product analytics right is hard. Every interaction results in mounds of data, and digging through that to find that ‘needle in the haystack’ insight requires a lot of effort, discipline, and time to make it work. These barriers to entry mean data analysis is often limited to companies who have the resources, bandwidth, and the knowledge to do it right. But it’s also a discipline that’s growing in importance — even before the pandemic, consumer interactions with brands were generally happening on digital platforms, and now they are there almost exclusively. There are countless amounts of information out there that can explain the ROI of each interaction, and without a doubt, some of that is potentially game changing. But, frankly, we’re human, and if we have to work hard to get value out of something, we’re going to be less likely to do it consistently. That’s why in 2021, analytics will move from being a reactive game — gathering data that analysts then have to sift through to find those insights — to a proactive one, connecting teams directly to those “a-ha!” moments that inspire immediate and informed action. – Matin Movassate, CEO and Founder at Heap

Artificial Intelligence

As businesses look toward goals to reopen and recoup sufficient revenue streams, they’ll need to leverage smart technologies to gather key insights in real-time that allow them to do so. Adopting artificial intelligence (AI) technologies can help guide companies to understand if their strategies to keep customers and employees safe are working, while continuing to foster growth. As companies recognize the unique abilities of AI to help ease corporate policy management and compliance, ensure safety and evolve customer experience, we’ll see boosted rates of AI adoption across industries. – Hillary Ashton, EVP and Chief Product Officer at Teradata

In 2021 we will see AI, machine learning and IoT define and shape our lives and behaviors, a phenomenon that will continue for many years to come. These advancements impact how we work, how we buy, how we spend, how we do every little thing in our lives. But I think the real star that companies will turn to will be the enabling technologies such as cloud and edge computing, which will continue to dominate due to their ability to process and manage all the necessary data that fuels AI, ML, and IoT, as well as enabling technologies like iPaaS, APIM and RPA. These technologies will continue to lead the digital transformation charge for businesses as they move from manual or paper-driven business to digital businesses that can finally tap the power of AI and IoT. – Manoj Choudhary, CTO at Jitterbit

Artificial Intelligence becomes less artificial in 2021: Even with a vaccine for COVID-19 on the horizon, how people work and interact has fundamentally shifted. In the new year, remote work will continue, social distancing requirements will remain, and supply chains will continue to face disruption. This new way of life demands a new way for companies to continue operations effectively across the value chain – from the product to the plant to the end user. The use of artificial intelligence (AI) will be the standard for addressing these challenges. However, without considering how humans will interact with and leverage these new autonomous systems, AI will fail.

In 2021, enterprises will take a human-centered approach to AI initiatives, understanding user needs and values, then adapting AI designs and models accordingly, which will in turn, improve adoption. Enterprises must put the same focus on people and culture as the technology itself for AI to be successful. Organizational change management (OCM) teams will be critical for driving digital transformation and AI forward by bringing people along for the change journey and setting the organization up for measurable results. Proper change management is the most important – yet overlooked – aspect of any digital transformation initiative. – George Young, Global Managing Director at Kalypso

In 2021, enterprises will move away from quick wins by relying on AI systems, to focus on lasting and meaningful business value. This change will drive deeper data literacy initiatives across organizations. It will require people to learn new skills and behave in new ways. – Sundeep Reddy Mallu, Head of Analytics at Gramener 

Most consumers will continue to be skeptical of AI. With several big consumer brands in the hot seat around questionable AI ethics, most people still don’t trust AI. For many, it’s because they don’t understand it or even realize they’re using it daily. Consumers are getting so many AI-powered services for free — Facebook, Google, TikTok, etc. — that they don’t understand what they’re personally giving up in return — namely their personal data. As long as the general public continues to be naïve, they won’t be able to anticipate the dangers AI can introduce or how to protect themselves — unless the market better educates customers or implements regulations to protect them. Despite this, there’s some evidence that we’re turning the corner on AI’s trustworthiness. Eighty-one percent of business leader respondents to Pega’s upcoming survey said they’re optimistic that AI bias will be sufficiently mitigated in five years. Businesses had better hope this turns out to be true – because as more of the public wakes up to how AI impacts their lives, and in some cases plays favorites, they will continue to ask harder questions that further erodes trust in AI, forcing businesses to have to answer to them. – Vince Jeffs, Senior Director – Product Strategy, Marketing AI and Decisioning, Pega

AI powered digital workers will help businesses stay strategic in the long-term. Few disagree with the notion that AI and automation are essential to companies’ survival going forward. However, research has indicated that most companies have not fully realized the benefit of their AI and automation investments. By linking powerful AI capabilities to business processes through the digital workforce, we’ll increasingly see organizations implement AI driven automation at scale. AI infused automation will increasingly be linked to core strategic initiatives such as improved customer focus, revenue growth, capital allocation, supply chain management, risk management, cost and operational efficiency and more. AI powered digital workers will be leveraged as primary tools for executing on corporate strategy and managing enterprise scale risks. Rapid and effective adoption of automation will increasingly be seen as an essential component to remaining competitive in markets. – Eric Tyree, Head of AI and Research at Blue Prism

AI experimentation will become more strategic. Experimentation takes place throughout the entire model development process – usually every important decision or assumption comes with at least some experiment or previous research to justify those decisions. Experimentation can take many shapes, from building full-fledged predictive ML models to doing statistical tests or charting data. Trying all combinations of every possible hyperparameter, feature handling, etc., quickly becomes untraceable. Therefore, we’ll begin to see organizations define a time and/or computation budget for experiments as well as an acceptability threshold for usefulness of the model. – Florian Douetteau, CEO and co-founder of Dataiku

In 2021, we will finally see AI go mainstream. As a result of COVID-19, businesses were forced to digitally transform in order to survive in the new normal. According to our research, digital acceleration shows no sign of stopping in the new year, with 86% of companies currently reaping the benefits of better customer experience through AI, likely to continue. The pandemic has also changed business priorities for AI investment. For instance, we’ve seen companies shift from simpler tasks like automation to focus on workforce planning and simulation modeling. As organizations continue to see benefits from their digital investments in complex processes, AI will only become more widespread and widely used over the next year. – Anand Rao, Global Artificial Intelligence Lead at PwC

Convergence of AI & BI will boost data insights. AI has been part of every corporate discussion over the past 5 years. And yet, challenges persist in democratizing advanced AI insights across large sections of employees. As new AI-powered BI products emerge, silos will be broken and every user will be able to leverage data analytics and find insights easily. Simple interfaces, personalized insights, and engaging data experiences will become the hallmarks of data analytics in 2021 and beyond. – Dhiren Patel, MachEye’s Chief Product Officer & Head of Customer Success

Racial bias in many AI-driven facial recognition algorithms has been a big topic of conversation over the past year and came to a head due to the social unrest of 2020. Research has found widespread evidence that racial minorities were far more likely than whites to be misidentified. In 2021, we will see the correction of AI bias become a major topic for any company that leverages AI or facial recognition technology. By using government-issued documents, you can quickly and easily prove ID ownership by analyzing the face on the document and comparing it to the face trying to access your system. 2021 will be the year that AI bias comes to light and companies will begin implementing radical change to eliminate racial bias in its software — some of which can be done by putting a deliberate focus on fairness and training of the company’s ML system to reduce racial facial recognition errors. – Mohan Mahadevan, VP of Research, Onfido

2021 will be the year that teams go from casually dating AI to being in a committed relationship. AI isn’t just for R&D projects anymore. It’s time to commit to adapting these solutions instead of just flirting with them. We have to automate now. – David Karandish, Founder and CEO of Capacity 

With the confluence of computational power, internet-scale data and modern machine learning algorithms, we have broken remarkable new ground with AI over the past few years. In the coming years, we will enter an expansionary era, where a long tail of commercial use-cases will be prototyped, packaged and productionized – either to enhance existing products and services or to create entirely new ones. – Dave Costenaro, Chief Data Officer at Capacity 

AI Success Moves From General Purpose to Niche Focuses. While AI investment continues to grow in the enterprise, businesses are reevaluating their tech stacks to accommodate niche AI, rather than “general purpose” black boxes that claim to do everything. Niche, perfected use-cases that solve specific problems are going to take budget priority, rather than automation which promises to do everything. – Viral Bajaria, CTO at 6sense

Rise of Artificial Narrow Intelligence: Not long ago, AI was what we now know as artificial general intelligence, like self driving cars or image recognition. However, today there is a new category of artificial narrow intelligence which is trying to replicate a human decision making process. From a supply chain perspective, this new AI can help to inform better decision making around every aspect of a supply chain, from “How do I fill a truck?” or “How do I get products on time?” In 2021, I’m envisioning an increase in these narrow solutions to replace tactical and smaller scale decisions. – Andy Fox, Director of Global Impact with LLamasoft

At the fringes, we will begin to see “Counter-AI” start to materialize. As governments try to track people and businesses try to manipulate them or gain deep insights into behavior, I predict a backlash of methods to foil tracking and customer 360’s. Not unlike the work various groups have done on anti-facial recognition tools, we will begin to see high and low-tech methods for boggling the AI’s used to monitor and understand us. – Head of Architecture for Atos North America’s AI Lab in partnership with Google Cloud, Jonas Bull

As more agencies begin to adopt these AI- and ML-based solutions, there’s an onus on law enforcement to abide by ethical policies and to remove bias in such tools. As such, departments will begin to establish their own policies and work with governing bodies on responsible and ethical AI usage, including proper training for the relevant teams and business functions, as well as creating an environment with an ethos of data-driven and responsible decision-making. Going a step further, law enforcement organizations will continue to ensure AI systems are vetted to be bias-free and corrected as needed. And they will open a line of communication with the public to promote transparency regarding the use of these tools. – Heather Mahalik, Senior Director of Digital Intelligence, Cellebrite

We’ll see more data-driven companies leverage open source for analytics and AI in 2021. Open source analytics technologies like Presto and Apache Spark power AI platforms and are much more flexible and cost effective than their traditional enterprise data warehouse counterparts that rely on consolidating data in one place–a time-consuming and costly endeavor that usually requires vendor lock-in. Next year will see a rise in usage of analytic engines like Presto for AI applications because of its open nature – open source license, open format, open interfaces, and open cloud. – Dipti Borkar, Co-founder and Chief Product Officer (CPO), Ahana

The industry will shift away from generic horizontal AI platforms, such as IBM Watson and Amazon Lex, towards domain specific AI powered products and managed service models. Generic platforms are not solutions, They start cold, without any training data or data model structure — building this, then optimizing it in production is an expert and resource intensive task that is beyond most companies capability. The move from the early innovator market into mass market adoption will be driven in 2021 by the adoption of domain specific AI powered products that are pre-trained for a specific industry and are proven to work. – Jake Tyler, co-founder & CEO, Finn AI

In 2021, AI won’t be mapped on the human spectrum of competence. We can have algorithms that crush any human at chess but are unable to make a cup of tea and computer programs that can perform mathematics millions of times faster than humans but, if asked who might win the next World Cup, they wouldn’t even understand the question. Their capabilities are not universal. We’ve reached a point with AI where we simultaneously overestimate and underestimate the power of algorithms. When we overestimate them, we see human judgment relegated to an afterthought – a dangerous place to be. The use of a “mutant algorithm” in grading A-level results is the scandal du jour in the UK, despite the algorithm producing many results that simply violate common sense. When we underestimate algorithms, we see entire industries crumble because they didn’t see change on the horizon. How can the traditional taxi business compete when Uber’s algorithm can get you a ride in less than 3 minutes? In 2021, expect engineers to avoid AI and algorithmic blunders by not trying to map algorithms onto the human spectrum of competence. Using AI technologies – such as any-context speech recognition – to enhance what humans can do and finding the right balance between AI automation and human knowledge for real world use cases – such as customer experience and web conferencing – will begin to shape the effective use of AI for the future. – Ian Firth, VP at Speechmatics

Responsible AI / ML will become the hottest topic in the cloud ML industry. Given society’s increased emphasis on combatting unfairness and bias and the overall interest in better interpretability and explainability of machine learning models, cloud providers will invest and enhance their ML offerings to offer a full suite of responsible ML / AI capabilities that will aim to satisfy and reassure regulators, modelers, management and the market on the fair use of ML. Meanwhile, AI / ML will continue to see explosive growth and usage across the whole industry, with significant enhancements in ease-of-use and UX combining within a responsible AI / ML framework to drive the next growth spurt of this sector. – Yiannis Antoniou, analyst, Gigaom

AIOps for networking will become mainstream: Next year, AIOps will go from theory to practice for many organizations. With the increase of remote workers and the home becoming the new micro branch, AI will become table stakes for delivering a great client to cloud user experience while controlling IT support costs for remote employees. IT teams will need to embrace AIOps to scale and automate their operations. AIOps cloud SaaS will turn the customer support paradigm upside down. Instead of users submitting tickets to IT, AI will proactively identify users with connectivity or experience issues and will either resolve (the self-driving network) or will open a ticket with suggested remediation actions for IT. – Bob Friday, CTO of Mist Systems, a Juniper Networks company

Artificial intelligence and machine learning will play a much more integral role in supply chain strategy than in previous years. The need for more real-time insights throughout the supply chain will continue to grow in 2021, especially as supply chain organizations re-evaluate their operations as a result of sudden changes in buying behaviors during the COVID-19 pandemic.

To address this need, supply chain organizations will need to look to artificial intelligence (AI) and machine learning (ML) enabled technology to upgrade from current, descriptive and prescriptive analytics, and leverage predictive analytics — which provide recommended actions before an incident occurs based on previous actions. Oftentimes, companies experience a mess of silos and fragmentation due to being acquired by large companies that have different systems. In 2021, supply chain stakeholders will look to deploy digital twins across all modules as an extra layer of visibility and to ensure synchronization between a company’s existing systems and new technology, such as sensors and nano sensors, which are coming to market in increasingly larger volumes. – Mahesh Veerina, CEO of Cloudleaf

Bias in AI causes harm at great scale – from impacting the recruitment process by reinforcing gender stereotypes to racial discrimination in credit scoring and lending. Organizations know that hiring a diverse workforce can provide a level of truth for AI models, and they know that training data needs to be constantly monitored for bias, as it impacts the quality and accuracy of algorithms. They also know that there is no current benchmark for ethics-based measurements to truly mitigate bias in AI, and that there needs to be. In 2021, we’ll see organizations moving past just acknowledging and “worrying” about bias in AI and start to make more significant moves to solve for it – because it will be required. Specific teams and/or initiatives will be formed to combat all the concerns that fall under the umbrella of responsible AI, including everything from inherent bias in data to treating data trainers fairly. Establishing responsible AI initiatives will not only become a board-level mandate for some, but the partners and customers of companies leading AI efforts will demand it. – Appen CTO Wilson Pang

AIOps Will Heat Up to Enhance the Customer Experience and deliver on Application Assurance and Optimization. With a year of unpredictability behind us, enterprises will have to expect the unexpected when it comes to making technology stacks infallible and proactive. We’ll see demand for AIOps continue to grow, as it can address and anticipate these unexpected scenarios using AI, ML, and predictive analytics. The increasing complexity of digital enterprise applications spanning hybrid on-premise and cloud infrastructures coupled with the adoption of modern application architectures such as containerization will result in an unprecedented increase in both the volume and complexity of data.  While data overload from modern digital environments can delay repair and overwhelm IT Ops teams, noisy datasets will be a barrier of the past as smarter strategies and centralized AIOps systems help organizations improve the customer experience, deliver on modern application assurance and optimization, tie it to intelligent automation, and thrive as autonomous digital enterprises. In fact, conventional IT Operations approaches may no longer be feasible – making the adoption of AIOps inevitable to be able to scale resources and effectively manage modern environments. – Ali Siddiqui, Chief Product Officer, BMC Software

The stark reality is that 2021 will be the year when those actually doing AI will start achieving value at scale, while those spending months training brittle models and failing to catch up will be at an increasing, exponential, disadvantage. Last mile challenges won’t get any easier – but a fundamental shift in thinking and approach will be critical to overcoming complexity obstacles. – Dr. Josh Sullivan, Head of Modzy

Elegant risk assessment: As the AIOps space continues to mature, we see an opportunity for vendors to refine their risk assessment capabilities to enable customers to fix issues with near-certainty, without breaking anything else in the system. In 2021, one area where we will see increased focus from both vendors and more adoption among users will be around enabling more elegant  dependency mapping so engineers can accurately assess risk as a part of the remediation process or build-deploy cycle for software changes, to ensure that a change in one part of an environment won’t break the system elsewhere. – Michael Olson, Director, Product Marketing at New Relic

In 2021, AI Won’t Be Mapped on the Human Spectrum of Competence: We can have algorithms that crush any human at chess but are unable to make a cup of tea and computer programs that can perform mathematics millions of times faster than humans but, if asked who might win the next World Cup, they wouldn’t even understand the question. Their capabilities are not universal. We’ve reached a point with AI where we simultaneously overestimate and underestimate the power of algorithms.

When we overestimate them, we see human judgment relegated to an afterthought – a dangerous place to be. The use of a “mutant algorithm” in grading A-level results is the scandal du jour in the UK, despite the algorithm producing many results that simply violate common sense. When we underestimate algorithms, we see entire industries crumble because they didn’t see change on the horizon. How can the traditional taxi business compete when Uber’s algorithm can get you a ride in less than 3 minutes? In 2021, expect engineers to avoid AI and algorithmic blunders by not trying to map algorithms onto the human spectrum of competence. Using AI technologies – such as any-context speech recognition – to enhance what humans can do and finding the right balance between AI automation and human knowledge for real world use cases – such as customer experience and web conferencing – will begin to shape the effective use of AI for the future. – Ian Firth, VP at Speechmatics

ML on the edge is going to be one of the major focus in the AI/ML industry in 2021. Demand for intelligent edge applications is rising rapidly in the automotive, smart factory, and smart home industry. With widely available efficient edge ML development tools and semiconductor companies launching new MCUs with ML features, adoption of edge ML applications will become the major trend. – Sang Won Lee, CEO of Qeexo

The clinical community will increase their use of federated learning approaches to build robust AI models across various institutions, geographies, patient demographics and medical scanners. The sensitivity and selectivity of these models are outperforming AI models built at a single institution, even when there is copious data to train with. As an added bonus, researchers can collaborate on AI model creation without sharing confidential patient information. Federated learning is also beneficial for building AI models for areas where data is scarce, such as for pediatrics and rare diseases. – Kimberly Powell, Vice President & General Manager, NVIDIA Healthcare

AI Center of Excellence: Companies have scrambled over the past 10 years to snap up highly paid data scientists, yet their productivity has been lower than expected because of a lack of supportive infrastructure. More organizations will speed the investment return on AI by building centralized, shared infrastructure at supercomputing scale. This will facilitate the grooming and scaling of data science talent, the sharing of best practices and accelerate the solving of complex AI problems. – Charlie Boyle, Vice President & General Manager, NVIDIA DGX Systems

AI Expression Will Narrow in on Seamless User Experiences: As we look at the history of AI, algorithms were king and user experience came second. But as we head into 2021, AI-enabled applications will be increasingly focused on usability as a priority. The best expressions of AI are seamless for the user and work unobtrusively in the background. Platforms supported by AI/ML will find new ways to lead users to better conclusions and solutions.

This happens by interrogating huge volumes of data, looking for anomalies, insights and trends, then presenting results in the appropriate business context. Truly frictionless AI/ML should be the end-goal for all business platforms. I hope to see more sophisticated applications of AI that will identify what each user is trying to accomplish and automatically surface insights that can be leveraged for quick action. This ease-of-use will be incredibly valuable for the broad base of users, both technical and non-technical. – Sanjay Vyas, CTO of Planful

Ethical AI will take a key role in product development in 2021, but it is a difficult problem to solve: Ethical AI is becoming an important issue, but a difficult dilemma to solve. Companies are using data and AI to create solutions, but they may be bypassing human rights in terms of discrimination, surveillance, transparency, privacy, security, freedom of expression, the right to work, and access to public services. 

To avoid increasing reputational, regulatory and legal risks, ethical AI is imperative and will eventually give way to AI policy. AI policy will ensure a high standard of transparency and protective measures for people. In the data sphere, CEOs and CTOs will need to find ways to eliminate bias in algorithms through careful analysis, vetting and programming. – Krishna Tammana, CTO of Talend

Next year, we will see companies focus on, adopt and develop AI solutions that actually deliver ROI as opposed to gimmicks or building technology for technology’s sake. Organizations will be focused on demonstrable progress and measurable outcomes and will therefore invest in solutions that solve specific problems. The companies that have a deep understanding of the complexities and challenges their customers are looking to solve and are willing to invest their R&D dollars in the solutions will find success. – Joe Petro, CTO at Nuance Communications, Inc.

The AI skills gap will persist, and organizations will think of new ways to adapt. It’s been difficult for organizations to hire the talent needed to deploy AI and reap all the benefits, with half of industry insiders reporting this challenge. What’s more, many organizations have accelerated digital transformation initiatives by a matter of months or years – but there is a discrepancy in available talent and training opportunities to support these initiatives. Due to increased demand, we predict that companies will offer more upskilling initiatives and incentives for employees to learn new skills, as well as work to build data and AI literacy across all levels of the organization.

The pandemic has presented an opportunity for organizations to prioritize these actions and help employees develop new skills in their rapid transition to remote work. Looking ahead, 2021 will be about education – both operating in a new normal and catching up to the expedited digital initiatives. – Traci Gusher, Principal, Data & Analytics, KPMG

Addressing bias in AI algorithms will be a top priority causing guidelines to be rolled out for machine learning support of ethnicity for facial recognition. Enterprises are becoming increasingly concerned about demographic bias in AI algorithms (race, age, gender) and its effect on their brand and potential to raise legal issues. Evaluating how vendors address demographic bias will become a top priority when selecting identity proofing solutions in 2021. According to Gartner, more than 95% of RFPs for document-centric identity proofing (comparing a government-issued ID to a selfie) will contain clear requirements regarding minimizing demographic bias by 2022, an increase from fewer than 15% today. Organizations will increasingly need to have clear answers to organizations who want to know how a vendor’s AI “black box” was built, where the data originated from and how representative the training data is to the broader population being served.

As organizations continue to adopt biometric-based facial recognition technology for identity verification, the industry must address the inherent bias in systems. The topic of AI, data and ethnicity is not new, but it must come to a head in 2021. According to researchers at MIT who analyzed imagery datasets used to develop facial recognition technologies, 77% of images were male and 83% were white, signaling to one of the main reasons why systematic bias exists in facial recognition technology. In 2021, guidelines will be introduced to offset this systematic bias. Until that happens, organizations using facial recognition technology should be asking their technology providers how their algorithms are trained and ensure that their vendor is not training algorithms on purchased data sets. – Robert Prigge, CEO of Jumio

Big Data

In 2021, open and free data collection will fuel future innovations. A recent survey from Frost & Sullivan found that 54% of IT decision-makers expressed a need for large-scale data collection to keep pace with their businesses’ growth and online competition. However, for businesses to utilize online data effectively, it first needs to be accessible – not blocked. Today, businesses often prohibit public data collection attempts despite collecting it themselves. This situation is caused by two major factors: the continuous need to block malicious or fraudulent online activity as part of security precautions, and the notion that this public data contributes to a company’s competitive edge.

I believe that during 2021 and onwards companies will realize that public data collection is part of the general and necessary ongoing business conduct. They will also realize that data isn’t everything when it comes to a business’s competitive edge. Areas such as inventory, prices, product quality, and service quality, etc., play a big role as well.  Once that realization settles in, blocking data will serve only to protect against abusive online activities. To secure ethical data collection, I hope we all promote an open exchange of information in central data hubs. Sites will continue to block abusers; this will not change. However, they may permit ethical data collectors. Ultimately, the future of online data collection is up to those who control it. At the rapid rate that data is being produced, future data collection efforts will need to evolve and grow. Companies will need automated data collection to keep up with their competitors and be able to gather data at a faster rate.  After all, the speed at which companies can collect fresh data will determine their relevancy and success. – Ron Kol, CTO at Luminati Networks

Data will become truly operational on an enterprise scale: The amount of data businesses have is growing exponentially – there are more sources, types and amounts than ever before, plus increasing amounts of data are being delivered in near-real time. But to truly understand, access and take action on data, enterprises will need to change how they consume it — starting by cutting out the middleman. By finding ways to automate the data cataloguing and profiling processes, employees – including those with less of a technical background – will be able to get the data they need to effectively and efficiently make good business decisions. – Eric Raab, SVP, Engineering and Product, Information Builders

It’s essential to capture and synthesize “alternative” data: How early could we have detected COVID-19? Studies of “alternative” data – in this case, traffic data outside hospitals in Wuhan and keyword searches by Internet users in that area – indicate that the virus may have been circulating in late 2019. The investment community has been a pioneer in using alternative data, including audio, aerial photos, water quality, and sentiment.10 This is the front line for data-driven innovation, and getting an edge here can result in huge gains. But in the wake of 2020, alternative data will become mainstream, with the goal of spotting anomalies much earlier.

From that, we can get derivative data, which comes from combinations, associations and syntheses with data from systems of record. As IDC says: “As more data gets captured and becomes available from external sources, the ability to use more of it becomes a differentiating factor. That includes taking lessons from industries other than your own.” 11 This trend, similar to what Gartner calls “X analytics,” 12 isn’t new but is finally becoming an important foundation of modern data and analytics, thanks to cheaper processing and more mature AI techniques – including knowledge graphs, data fabrics, natural language processing (NLP), explainable AI and analytics on all types of content. This trend is completely dependent on ML and AI, as the human eye can’t catch it all. – Dan Sommer, Senior Director, Global Market Intelligence Lead at Qlik

In the industry we often talk about breaking down data silos, but we should acknowledge that some silos will always be there. In large organizations you will always have local departments or regions that have their own tools or databases, and that will continue. If you have data sovereignty, that local office in your organization will have a silo. That’s why the best approach is to look at how you can have a better understanding of the data you have. A data intelligence platform can serve as your index and your map, showing you the silos you have and how they are connected by providing a 360-degree view of data assets. – Stijn “Stan” Christiaens, co-founder and CTO of Collibra

OpenTelemetry will create data overload. In 2021, the use of OpenTelemetry will become the new industry norm. Yes, it will make data collection easier by creating consistency across sources — but it will also create a data firehose for companies, making it even harder to find the small portion of data containing actionable insights. The constant stream of data will overwhelm companies if they don’t have a system in place to quickly find the 5% that is truly actionable. Because of this, IT teams will shift their focus from acquiring data to building a framework to take action from data. As teams do so, it will be imperative to implement tools that can immediately start surfacing actionable data in the time it takes to make a cappuccino. – Phil Tee, CEO of Moogsoft

A digital twin is a virtualized model of a process, product or service. The pairing of the virtual and physical worlds allows data analysis and system monitoring to help identify problems before they even occur. This prevents downtime, develops new opportunities and even plans for the future by using simulations. This generation of digital twins allow businesses to not only model and visualize a business asset, but also to make predictions, take actions in real-time and use current technologies such as AI and ML to augment and act on data in clever ways. – Anil Kaul, CEO at Absolutdata

Digital transformation will – at last – start to become transformational. At this point, “digital transformation” has become a buzzword that all enterprises have learned to recognize, yet the vast majority (80% according to IDC) of these efforts are still too tactical in nature. Robotic process automation (RPA), for example, may be considered a transformational tool, but on its own it’s not. In order for organizations to see true transformation in 2021, they’ll need to leverage more advanced platforms that combine core automation and AI features—such as text analytics, document understanding, and process mining. It’s also critical that these platforms have low-code capabilities that enable citizen developers to build and deploy enterprise grade automations that drive value back to their organizations. Without that, it will continue to be challenging for companies to deliver enterprise-wide digital transformation—which is fueled by the ability to easily deploy automation, even to the most complex processes. – Guy Kirkwood, Chief Evangelist at UiPath

Business Intelligence

Proliferation of low-code/no-code ML. The increase of low-code and no-code ML systems, designed to make AI more accessible to companies, will help improve adoption of AI. However, eventually companies will reach a ceiling and outgrow the one-size-fits-all approach, seeking more advanced use cases for AI that require deeper expertise. Ultimately, the need for customization will increase the need for qualified data scientists, rather than low-code systems replacing them. We aren’t going to automate away the need for data scientists any time soon. — Kevin Goldsmith, CTO, Anaconda

Business Intelligence is shifting to a new paradigm of advanced data analytics with the integration of Natural Language, Natural Search, AI/ML, Augmented Analytics, Automated Data Preparation, and Automated Data Catalogs. This will transform business decision-making processes with higher-quality real-time insights. – Ramesh Panuganty, CEO of BI company MachEye

BI and AI will deepen their liaison. Whether scoring BI data sets against ML models and visualizing the predictions, or leveraging natural language processing for generating visualizations, insights, and summaries, AI and BI will increase their synergies. And as conventional BI capabilities continue to commoditize, vendors will need BI+AI as a new front in the innovation wars. – Andrew Brust, analyst, Gigaom

Chatbots

Employee to Enterprise – Conversational AI adoption will be natural and often the first contact. Conversational AI is normalized and here to stay. Interfaces that guide consumers through the online marketplace, employees through training courses and users through search engines and websites saw great returns on investment when outfitted with advanced Conversational AI technology. – Shiva Ramani, CEO of iOPEX

AI will not displace human beings any time soon. When you look at the use of AI in consumer-facing operations today, it’s mainly used in AI-supported chatbots and customer personalization features. If we look at how consumers have taken advantage of AI-supported features during the pandemic, we can see that they’re actually using them to resolve issues faster through human agents. Companies like Bank of America, which has a consumer-facing AI-powered chatbot named Erica, saw consumers using Erica to find the best course of engaging customer support teams. Rather than asking Erica questions to fix any issues directly, customers simply asked Erica how they should go about reaching out to the customer service team to rapidly resolve their problem with the appropriate human agent. – James Isaacs, President and CEO of Cyara

Today, we interact with bots more than ever before, whether it’s customer service chatbots or the AI on our devices, like Siri and Alexa. These bots are used for real-time decision making to automate processes that were previously done by humans. For example, bots have automated the retail return processes for companies like Amazon. However, it becomes more complicated for enterprises to manage the identities of automated bots, especially when they are interacting with other bots at machine speed. The identities of bots must be managed and protected by the enterprise, similar to employee and customer identity, so that data isn’t compromised. This is important for CIOs and security leaders to keep in mind, because using bots for automation purposes will open new attack vectors if those bots’ APIs are hacked. – Jasen Meece, CEO of Cloudentity

NLP (natural language processing) changes the conversation on data analysis: Just as we are using Google Home and Alexa in our everyday lives, conversational analytics through NLP will be the golden ticket for enterprises in extracting valuable big data insights from their business operations. This includes unearthing trends that may have gone unnoticed and allowing experts from within the enterprise to engage with data in a meaningful way. – Sam Mahalingam, CTO, Altair

Conversational AI, first and foremost, needs an ubiquitous messaging channel to converse on. The rise of business messaging on IP-based channels such as Whatsapp, GIP and others is driving a resurgence in the use of Conversational AI. Companies across industries such as banking, e-commerce, retail, travel etc are now enabling conversational AI for virtually every customer touchpoint including marketing, sales and support. Powered by recent advances in natural language processing (NLP), conversational AI is poised to transform how consumers interact with businesses. – Beerud Sheth, CEO of Gupshup

Cloud

I think we will begin to see a more thoughtful, balanced approach to multi- and hybrid cloud adoption, particularly for hybrid cloud. We are getting past the public versus private cloud conversations, and businesses are accepting the reality that cloud is not an “either, or” decision. Historically, we have seen “public cloud” being associated with cutting-edge innovation and “private cloud” being associated with slow, legacy businesses that are resistant to change. This sentiment is changing, as businesses are beginning to better understand the value they can get from a hybrid cloud architecture that enables them to deploy agile, modern applications on the platform that best balances their specific cost, performance, security, compliance and governance needs.

With this comes an increase in hybrid enabling technologies such as containers and hybrid integration platforms. Another consideration is tethered compute, which is a hyperscale cloud provider solution running in your own data center. Examples are AWS Outposts, Google Anthos, and Microsoft Azure Stack. Although these have been too slow to adopt to date, we could start to see the beginning of growth here as customers see the value of private/public cloud, coupled with the consistency of hyperscale cloud service consumption. – Kim King, Director of Product Marketing – Cloud Management at Snow Software

COVID-19 Accelerates Cloud Spending: With the increase of remote working due to the COVID-19 pandemic, companies are investing a larger portion of IT budgets on cloud-based technologies, moving away from paper-based processes. Enterprises’ average cloud spending is up 59% from 2018 to $73.8M in 2020. That trend will continue into 2021 as companies are forced to adopt strategies to work remotely and recognize the benefits of maintaining those modes of operating even as they begin to transition employees back to physical locations. A prime example will be contracting where COVID drove digital transformation of the contract request, approval, execution, and post-award management systems and has laid the groundwork for even more advancements in contract lifecycle management. – Harshad Oak, General Manager, Customer Adoption & Value, at Icertis

Once considered the “layover” on the way to the cloud, hybrid is now the destination: A hybrid cloud approach used to be considered the stepping stone to a cloud-first implementation. Now, customers are seeing that a hybrid approach makes the most sense, both strategically for their business needs and economically. According to IDC, 70% of customers’ apps and data remain outside the public cloud. With that in mind, in 2021, we’ll see even more customers embrace a hybrid approach. Due to data latency, application entanglement and security and compliance reasons we see more and more organizations across industries wanting to keep their data on-premises. At the same time, partially due to pandemic economics, data egress charges and vendor lock-in with public cloud providers, the reality is CIOs and IT orgs are embracing hybrid as the outcome and not a means to an end. – Keith White, General Manager, GreenLake Cloud Services

Cloud agility is fantastic, but it can easily lead to runaway costs. Similarly, shared on-premise big data clusters often waste resources. Both of these result in missed SLAs. If they want to eliminate chronic overspend, companies need to institute a method to monitor and manage their cloud spend. The most effective way to do this is through observability and auto-tuning. – Ash Munshi, CEO, Pepperdata

Database/Data Warehouse/Data Lake

The solutions that companies use to store their data continue to rapidly evolve in the next year. We are seeing increased migrations into open source relational database solutions, non-relational database solutions, PaaS-based database solutions, and a combination thereof. The primary focus of these initiatives can be grouping under the heading of reducing operating costs, whether they are being undertaken to reduce hefty support contracts from vendors like Oracle and Microsoft (both the open source and non-relational database migrations fall into this category), reduce headcount expense (migrations to PaaS services falls into this category), or gaining performance efficiencies by migrating to a more purpose-built database solution.

Data migration is happening right now and at a large scale, so there are many considerations that need to be made when transitioning to these new database solutions, including the capabilities of the future state solution versus the current state, the impact to licensing and support contracts, and a method to ensure that the correct solutions are deployed. While PaaS solutions provide some great benefits, DBAs are still required to monitor and manage those systems and work with application teams to drive efficiencies in performance, availability, and security. – Marc Caruso, Chief Architect, Syntax

360. That’s the number of database systems out in the wild. And while choice is good and finding the right tool for the job is smart, it also adds major complexity. As companies move to modernize in the cloud, they will seek simplification, which will lead to massive consolidation in the database market. Database vendors that offer multi-functional capabilities will win, rather than a multitude of niche databases that need to be stitched together and require different ways of accessing data. – Franz Aman, CMO of relational database company MariaDB

The solutions that companies use to store their data continue to rapidly evolve in the next year. We are seeing increased migrations into open source relational database solutions, non-relational database solutions, PaaS-based database solutions, and a combination thereof. The primary focus of these initiatives can be grouping under the heading of reducing operating costs, whether they are being undertaken to reduce hefty support contracts from vendors like Oracle and Microsoft (both the open source and non-relational database migrations fall into this category), reduce headcount expense (migrations to PaaS services falls into this category), or gaining performance efficiencies by migrating to a more purpose-built database solution.

Data migration is happening right now and at a large scale, so there are many considerations that need to be made when transitioning to these new database solutions, including the capabilities of the future state solution versus the current state, the impact to licensing and support contracts, and a method to ensure that the correct solutions are deployed. While PaaS solutions provide some great benefits, DBAs are still required to monitor and manage those systems and work with application teams to drive efficiencies in performance, availability, and security. – Marc Caruso, Chief Architect, Syntax

The database market will grow to $1 trillion by 2025. For the last two decades, there’s been an iron grip on the database market with IBM, Oracle and SAP HANA leading the charge. Now we are seeing a changing of the guard, which gives customers the option of deciding what is best for their business. Forrester even points out that the public cloud infrastructure market will grow 35% in 120 billion in 2021. I predict that the database market cap will grow to $1 trillion by 2025 and over seven to 10 really strong database companies will grow significantly in the next decade. – Raj Verma, CEO of SingleStore

The Data Lake Can Do What Data Warehouses Do and Much More: While the separation of compute and data provides advantages for data lakes over data warehouses, data warehouses have historically had other advantages over data lakes. But that’s now changing with the latest open source innovations in the data tier. For example, Apache Iceberg is a new table format that provides key data warehouse functionality in the data lake such as transactional consistency, rollbacks and time travel while introducing new capabilities that enable multiple applications to work together on the same data in a transactionally consistent manner. Another new open source project, Project Nessie, builds on the capabilities of Iceberg as well as Delta Lake by providing Git-like semantics for data lakes. Nessie also makes loosely-coupled transactions a reality, enabling a single transaction spanning operations from multiple users and engines including Spark, Dremio, Kafka and Hive. – Tomer Shiran, co-founder of Dremio

Three major trends will emerge in 2021, the return of the meta-data layer, embedded AI and automated analytics and new simplified query interfaces designed specifically for business users.  The return of meta data layers, as key foundational components of analytic solutions, is needed to support improved governance and extensibility of data assets.  With smart meta-data layers new simplified user interfaces will emerge the allow business users to interact with the data in a more guided approach allowing them to reduce the time to insight with minimal analytical skills.  AI and automated analytics will shift from the enterprise domain towards software vendors who will embed these capabilities and enable mass adoption via their customer base. – Glen Rabie, CEO at Yellowfin

Data Engineering

Companies will reinvest in the data engineer and data pipelines. One impact of 2020 was that a lot of companies shifted to a survival-first approach, which resulted in a “grab-and-go” mentality to their data integration. As businesses’ bottom lines are stabilizing and we’re seeing more predictability at the macroeconomic level, our prediction is that 2021 is the year of the data engineer, and that companies are going to get back to a “built to last” approach for data pipelines. “Built to last” for the water in your pipes at home means that water is always on, clean and at the right temperature. “Built to last” for data means that you build smart data pipelines to ensure timeliness and confidence in your data analytics. – StreamSets CEO Girish Pancha

Companies will realize the need to put more effort into DevOps: “There is still so much work that needs to be done with DevOps pipelines, including securing and testing the delivery process. The software developer community knows where it needs to go, but the work and obstacles in the way are always bigger than expected. Because of this, I am skeptical we’ll see big changes in 2021 in terms of tooling or CI/CD patterns. Rather, we’ll see more people realize they need to put more efforts into their DevOps pipeline, processes, and validation. They will double-down to accelerate and improve their CI/CD automation. Only when these processes are mature can organizations have confidence in their delivery practices and tooling. – Fred Simon, co-founder and Chief Data Scientist, JFrog

Data Governance

IT will infuse access governance with intelligence to protect workforce cybersecurity in 2021. Accelerating changes in enterprise technologies, cyberthreats and the user landscape are increasing pressure on traditional identity governance and administration (IGA) solutions and, in turn, on security and compliance teams. On top of growing compliance risks, enterprise IT environments become more complex every year, increasing the number of applications and systems to which companies provide user access. These challenges are driving organizations to seek out AI-driven solutions that simplify and automate the access request, access approval, certification and role modeling processes. In 2021, we will see AI increasingly employed to enable an autonomous identity approach.

AI-infused authentication and authorization solutions will be layered on top of, or integrated with, existing IGA solutions, providing contextual, enterprise-wide visibility by collecting and analyzing all identity data, and enabling insight into different risk levels of user access at scale. The use of AI will allow systems to identify and alert security and compliance teams about high-risk access or policy violations. Over time we will see these AI systems produce explainable results while increasing automation of some of the most difficult cybersecurity challenges inside the enterprise. – Eve Maler, CTO at ForgeRock

We have seen the global implementation of AI governance frameworks take off in 2020 where enterprises are asking for details on the outcome of AI applications. Ensuring an appropriate level of explainability of AI applications is key as well as using good quality data, ensuring auditability, being ethical, fair and transparent, complying with data protection requirements, and implementing effective cybersecurity measures. Implementation of AI governance frameworks is seen more in financial and banking currently, but in 2021 we’ll see this become more widespread.

Other verticals such as healthcare, e-commerce and mobility services will begin to use it as a competitive differentiator. For instance, healthcare providers are beginning to be more transparent with how data is used, and how they are ethical and fair in protecting that data. If businesses want to stay ahead of the curve, they should start developing ethical AI frameworks now in order to position themselves as a leader in this global movement. – Mohan Mahadevan, VP of Research, Onfido

AI Will Gain Momentum in Cloud Security and Governance. In 2021, AI will go far beyond simply detecting anomalies and thereby flagging potential threats to security teams. Cloud governance is an increasingly complex task and is quickly reaching a point where it’s impossible for humans to manage alone. AI will increasingly be relied on in the coming year to maintain cloud hygiene by streamlining workflows, managing changes and archiving. Once proper cloud hygiene is established and maintained with AI, it will also be used as a strategic predictive knowledge tool. By predicting and addressing threats and vulnerabilities, AI will help enterprises create the best possible outcome for their cloud environments. Leveraging AI as a strategic asset will empower CIOs to make informed decisions about their cloud environments, such as evaluating costs and compliance risks. – Keith Neilson, Technical Evangelist for CloudSphere

As we look to 2021, we will see the conversation of ethical AI and data governance be applied to multiple different areas, such as contact tracing (fighting COVID-19), connected vehicles and smart devices (who owns the data?), and personal cyber profiles (increased cyber footprint leading to privacy questions). – Cindy Maike, VP of Industry Solutions, Cloudera

Data governance for a multi-environment reality. Long gone are the times where organizations simply housed all their own data on-premise or even just within one cloud provider. Now organizations have data on-premise and are partnered with several cloud providers based on their specific needs. This reality has created a “rethink” of how data governance needs to be approached. Organizations must determine how their current data governance will be impacted and what needs to be adjusted, how to monitor data quality in the cloud, and how to manage data movement in and out of the cloud (and the massive expense that comes with that). – Todd Wright, Head of Data Management and Data Privacy Solutions at SAS

AI Will Gain Momentum in Cloud Security and Governance. In 2021, AI will go far beyond simply detecting anomalies and thereby flagging potential threats to security teams. Cloud governance is an increasingly complex task and is quickly reaching a point where it’s impossible for humans to manage alone. AI will increasingly be relied on in the coming year to maintain cloud hygiene by streamlining workflows, managing changes and archiving. Once proper cloud hygiene is established and maintained with AI, it will also be used as a strategic predictive knowledge tool. By predicting and addressing threats and vulnerabilities, AI will help enterprises create the best possible outcome for their cloud environments. Leveraging AI as a strategic asset will empower CIOs to make informed decisions about their cloud environments, such as evaluating costs and compliance risks. – Keith Neilson, Technical Evangelist for CloudSphere

Data Science

2020 was brutal for some firms, rewarding for others, and challenging for all. As we enter 2021, laggards have an existential imperative to reinvent themselves digitally, leading firms struggle to keep pace with demands. All of these enterprises need to capitalize on 100% data integration with predictable costs, reliable performance and real-time visibility. – Bonnie Holub, Practice Lead, Data Science, Americas at Teradata

Data democratization will become the new norm. It’s the job of the CDO to ensure expansion of growth across the entire business. This can be achieved by providing structured data that people can actually use. A successful CDO should democratize data so that it’s accessible and understandable by people. A good CTO will complement the CDO by creating the necessary tooling to find the required data. This means giving users a set of visualization tools and reporting tools that allow them to get after the data to run insights. As we move into 2021, we’ll continue to see further and tighter collaboration between these two roles, driven by necessity. If you have tools with bad data, you’re exacerbating the data challenge. If you have limited tools, only a small subset can do anything with the data. – Derek Knudsen, Chief Technology Officer at Alteryx

Citizen analysts will increasingly up-skill to become data scientists. The growing complexity of most industries and companies also means that once we see self-reliance in terms of developing IT processes or using analytics, there will quickly be a huge push to expand that skill-set further. With the market erratically changing from month to month there will be a much greater emphasis placed on data science than ever before. This, in turn, will drive more citizen analysts to up-skill to become data scientists. – Sharmila Mulligan, Chief Strategy and Marketing Officer at Alteryx

Python data visualization libraries will sync. We’re finally starting to see Python data visualization libraries work together, and this work will continue in 2021. Python has had some really great visualization libraries for years, but there has been a lot of variety and confusion that make it difficult for users to choose appropriate tools. Developers at many different organizations have been working to integrate Anaconda-developed capabilities like Datashader’s server-side big data rendering and HoloViews’ linked brushing into a wide variety of plotting libraries, making more power available to a wider user base and reducing duplication of efforts. Ongoing work will further aid this synchronization in 2021 and beyond. — James A. Bednar, Sr. Manager, Technical Consulting, Anaconda

Business skills will become more critical than ever for data scientists. Data scientists will need to speak the language of business in order to translate data insight and predictive modeling into actionable insight for business impact. Technology owners will also have to simplify access to the technology, so that technical and business owners can work together. The emphasis for data scientists will be not just on how quickly they can build things, but on how well they can collaborate with the rest of the business. – Florian Douetteau, CEO and co-founder of Dataiku

Self-service has evolved to self-sufficiency: In a virtual world, self-service needs to evolve. When there are no instruction manuals and no one there to hold a user’s hand, a fast, intuitive ramp-up becomes a hygiene factor for adoption, and compelling user interfaces will no longer be a nice-to-have. But we’ve also seen that users often don’t want to self-serve; they increasingly expect insights to come to them. As a result, we’ll see more micro-insights and stories for the augmented consumer. In addition, data is too often overlooked. Empowering users to access data, insights and business logic earlier and more intuitively will enable the move from visualization self-service to data self-sufficiency. AI will play a major role here, surfacing micro-insights and helping us move from scripted and people-oriented processes to more automated, low-code and no code data preparation and analytics. If more people can be self-sufficient with data earlier in the value chain, anomalies can be detected earlier and problems solved sooner. – Dan Sommer, Senior Director, Global Market Intelligence Lead at Qlik

Historically companies put a lot of value on people who were “Data Scientists”. Going forward, there will be a need to hire people that are experts in data collection. For AI models to work, vast amounts of data is required, and moreover, critical data still resides in silos in many organizations; hence, individuals with skills in data collection will be high in demand. – Clara Angotti, President of Next Pathway

Data scientists will play a critical role in the development of a COVID-19 vaccine. From development of a vaccine to analysis of trials and deployment, data will be the key to knowing if we have found a preventative solution. Data scientists will be as important as traditionally trained scientists in producing the first viable vaccine. To accelerate the development of vaccines, people must be able to manage, make decisions and trust that data. Knowing that speed is critical, data agility is required and new automated systems will enable new innovations, ultimately leading to a vaccine. Accelerating the delivery of the vaccine will require a great deal of agility and automation in managing data. – Infoworks CEO Buno Pati.

While data  continues to rule the world, organizations are still finding themselves struggling to leverage that data for a true competitive advantage. The Citizen Data Science Movement has emerged to widely promote the ability to manipulate and interpret data. But is there is a better way? Wouldn’t it be smarter (and easier) to simply bring business meaning to the data and repair the data rather than fix the people given that raw uninterpreted data located somewhere in a system isn’t very helpful. – Kendall Clark, founder & CEO of Enterprise Knowledge Graph Platform developer, Stardog

We’ll See an Uptick of Architecting for Data Science: Mastering data management will be top of mind for many IT groups as they look to improve business intelligence and agility. For this reason, data science—the umbrella under which artificial intelligence, machine learning, automation, data lakes and others thrive—will see huge growth in 2021. From analyzing data-driven behaviors to transform grocery shopping to leveraging powerful computing in the cloud to improve media production models, data science will take the lead for many to stay competitive. Too expensive to provision on their own, many of these companies will outsource their data science projects to third parties with a subscription model. — Dustin Milberg, Field CTO Cloud Services at InterVision

Automate Your Pipelines to Unleash the Full Potential of Data Scientists: Data scientists are too often busy with tasks like data preparation, feature engineering and modeling. As these tasks become augmented with tools that help automate these steps, we’ll see data scientists trade in routine tasks for time spent on deeper, strategic approaches that will make them invaluable resources.  We expect to see more systematic implementations of business AI solutions to make ad-hoc analyses more efficiently repeatable. – Justin Silver, Ph.D. an AI Strategist at PROS

Deep Learning

The adoption of Deep Learning based enterprise solutions in startups and enterprises will see a gradual uptick. The key hindrance will continue to be the costs of procuring GPU instances and high-cost human resources. – Sundeep Reddy Mallu, Head of Analytics at Gramener

As we all witnessed in recent years, research and development in Natural Language Processing has progressed rapidly by breakthroughs in Transformer language models such as BERT, GPT-3 etc. While they are achieving state-of-the-art performance, they require large datasets and large amounts of computational resources for training and inference with a significant carbon footprint. We will see more efforts and research coming out with new model architectures and training techniques to address the concerns of carbon emissions, very long training times, with space and compute effective models to make these breakthroughs more accessible; recent models like Performers with Fast Attention will serve as catalysts to move in this direction. – Kavan Shukla, Data Scientist, Finn AI

Hardware

Hardware and software converge with the rise of AI-specific hardware. As Apple’s announcement of the M1 chip showed, purpose-built hardware is becoming more mainstream, meaning that people will begin to think more about the actual hardware that they are working on than they did previously—including data scientists. The rise in ML-specific hardware will likely lead to performance improvements, but also provides another variable in model deployment. It’ll be particularly impactful in cloud and mobile environments. This will further break down the wall that has traditionally existed between hardware and software, with AI use cases leading the way. — Kevin Goldsmith, CTO, Anaconda

Since 2012 AI compute power has grown at 5X the rate of Moore’s Law, doubling approximately every 3.5 months. Given the growing number of applications built on top of AI engines impacting our everyday lives – some even critical to humanity as a whole (e.g. modeling and solving for climate change), finding a solution to this performance scaling mismatch is high on every serious fabless and chip manufacturing company’s priority list. The need for shifts in how Moore’s Law is perceived will become more apparent in 2021. The latest trend has been to talk about writing more efficient software to yield year-over-year performance improvements. This is a risky bet, since the development of fundamentally new algorithms cannot happen on a schedule and are therefore not compatible with the traditional semiconductor tick-tock advancement schedule. Underlying compute technologies must also improve. We will continue to see shifts and improvements in the coming year. – Nick Harris, CEO and co-founder of Lightmatter

In-memory Computing

In 2021, accelerated by COVID-19 and stricter regulations, enterprises will continue to drive their data transformation initiatives to thrive in the burgeoning online-digital economy. Extreme speed, cloud agility, and operational analytics will be adopted by enterprises to optimize data-driven operations and to rapidly introduce new services and applications.

Technology solutions based on a cloud-native data fabric, also known as a Digital Integration Hub, will allow organizations to offload and decouple from legacy systems of record and databases to meet their digital and analytical requirements and be able to migrate to the cloud without the need to completely divest from their existing-mission-critical systems. The introduction of In-memory speed and scale for analytics and BI will fuel real-time reporting and visualization of fresh data and enable ML models to utilize more accurate real-time data for on-line services such as loan approvals, fraud analysis and customer 360 capabilities. AIOps will also be a focus and be deployed to automate and streamline complex data and analytics operations, reduce time-to-market and lower costs while minimizing human errors. – Adi Paz – CEO – GigaSpaces 

In 2020, the COVID-19 pandemic drove many businesses, especially those in food delivery, ecommerce, logistics, and remote access and collaboration services, to dramatically scale out and upgrade infrastructure to maintain high application performance in the face of surges in website visitors, delivery requests, sales transactions, video streaming and more. Many of these businesses found that the fastest approach to maintaining or improving performance while simultaneously increasing application throughput was to deploy a distributed in-memory data grid (IMDG) – built using an in-memory computing platform such as Apache Ignite – that can be inserted between an existing application and disk-based database without major modifications to either. The IMDG improves performance by caching application data in RAM and applying massively parallel processing (MPP) across a distributed cluster of server nodes. It also provides a simple path to scale out capacity because the distributed architecture allows the compute power and RAM of the cluster to be increased simply by adding new nodes.

 In 2021, IMC platforms will become easier to use and the number of knowledgeable IMC practitioners will continue to grow rapidly. This will enable IMC adoption to spread across more industries and to a wider pool of companies. As a result, more businesses will be better positioned to take advantage of IMC for rapid application acceleration, not just for response to the demands of COVID, but also to meet new strategic and competitive demands as the pandemic threat abates. – Nikita Ivanov, CTO and founder of GridGain Systems

IoT

IoT adoption in the enterprise will heat up more than ever: In light of the pandemic’s impacts on business, enterprises will be looking for new or additional ways to increase the speed to decision making in 2021. IoT can play a role in this. From a BI standpoint, the challenge is to recognize that IoT has different data models that need to be accommodated, like performance over time. Reducing the lag time between data production and operations will be key. The smartest organizations will realize that they can’t simply spend money on this, but instead need to be strategic to create new data models that share thoughtful insights. – Eric Raab, SVP, Engineering and Product, Information Builders

The pandemic has greatly accelerated the need for companies to complete their Industry 4.0 transformations with solutions that allow them to have more flexibility, visibility and efficiency in their operations. We’ll see an acceleration of adoption of solutions that help address that need, ranging from AI including machine learning, machine vision and advanced analytics. As the economy bounces back, we’ll continue to see investment in the foundational OT infrastructure with more IT capabilities to allow the broad ecosystem of players to deploy these solutions and will see Industry 4.0 adoption significantly ramp up in 2021. – Christine Boles, VP, IoT Group and GM, Industrial Solutions Division, Intel

Explosion of edge computing: We will continue to see an increase in edge computing throughout the data center industry due to increased computing and speed demands from consumers and companies. A low latency network is critical in environments that strive to maximize compute throughput and reduce server idle time. – Timothy Vang, Ph.D., vice president of marketing & applications for Semtech’s Signal Integrity Products Group

Edge is the new cloud: For companies scaling smart factory initiatives in 2021, real-time availability of mission-critical workloads will be necessary to ensure business outcomes. Edge computing will complement existing cloud infrastructure by enabling real-time data processing where the work takes place (e.g., motors, pumps, generator, or other sensors). Implementing integrated analytics from the edge to the cloud will help these enterprises maximize the value of investments in digital systems.

The industry will continue to move toward more decentralized compute environments, and the edge will add significant value to digital transformation initiatives. By integrating edge functionalities with existing cloud infrastructure, organizations will worry less about logistical IT considerations and, instead, focus on rethinking what’s possible in a smart machine: What questions can it answer faster? What new problems can it solve? How can it protect operations better? Analysts note that by 2022, 99% of industrial enterprises will utilize edge computing for this reason. – Keith Higgins, VP of Digital Transformation for Rockwell Automation

Creative minds push IoT forward: IoT and smart product development will hinge on creative designs and thoughtful solutions as technical improvements of microprocessors slow due to engineers running up against the limitations of what is physically possible as chipmakers near the theoretical limit for how thin these devices can be. Post-Moore’s Law product development will rely on the ingenuity of engineers and designers to create imaginative solutions to solve business and society problems and improve everyday consumer processes, instead of simply relying on the next generation of powerful chipsets. – Sam Mahalingam, CTO, Altair

Machine Learning

Investment dollars in IT Operations will shift from vanilla workflow automation to native AI/ML solutions with a drive to become digital operations. Workflow operations and their respective automation will naturally evolve to include AI/ML solutions as the technology becomes more powerful. AI and ML are advancing and in turn improving workflow automation as companies collect more data as well as shift organization and administrative operations. – Shiva Ramani, CEO of iOPEX

Enterprises will find new applications for machine learning technologies that automate manual processes and enhance monitoring capabilities. Companies will look for products that deliver deeper monitoring, more automation and value-added information across their IT spend. For example, availability solutions that provide application-aware monitoring and automation of configuration and management tasks would be prioritized over traditional failover solutions. New innovations in HA will emerge to handle the increasing complexity of failures and disasters brought on by IoT devices and their dependencies. – Cassius Rhue, VP, Customer Experience, SIOS Technology

Historically, algorithms were more about machine learning and neural networks. We are now seeing more and more machines that are self-contained and can teach and train themselves in a way that is remarkably similar to the subconscious part of the human brain. In other words, algorithms used to mimic the analytic part of the brain; now they are mimicking the largest, most powerful, and most intriguing part of the human brain, which we call common sense, gut feelings, and intuition. Instead of relying on human beings to train and teach them, today’s unsupervised machine algorithms are able to gather massive amounts of data, create pictures of the world, and make deductions that are very similar to ones that would be made by human beings. We’re coming into a world where computers can train themselves. – Mark Gazit, CEO of ThetaRay

Reducing bias: this year, there have been many necessary conversations around bias and mitigation in AI algorithms and around how to address the societal impacts of algorithm-based personalization. However, we need to continue development of tools that provide insight into the results of ML systems, reveal bias, and check drift in deployed models over time. This becomes ever more critical as more of these systems are put into production, to ensure that we’re not perpetuating or creating sources of harmful bias. — Kevin Goldsmith, CTO, Anaconda

Enterprises will find new applications for machine learning technologies that automate manual processes and enhance monitoring capabilities. Companies will look for products that deliver deeper monitoring, more automation and value-added information across their IT spend. For example, availability solutions that provide application-aware monitoring and automation of configuration and management tasks would be prioritized over traditional failover solutions. New innovations in HA will emerge to handle the increasing complexity of failures and disasters brought on by IoT devices and their dependencies. – Cassius Rhue, VP, Customer Experience, SIOS Technology

Organizations whose early successes in machine learning have spurred them to expand their programs are finding that a fast moving production line of high quality datasets are the fuel that will drive that expansion. This will elevate Data as a Service to a high priority for data engineering teams. – Luke Han, co-founder and CEO, Kyligence

The Ability to Trust and Operationalize ML will be 2021’s Litmus Test For Survival: On top of a pandemic and a recession, we’re continuing to grapple with the exponentially growing amounts of data and ever-increasing complexities of new technologies. If businesses want to be successful in making sense of their large data sums and technical complexities, they must leverage and operationalize machine learning models in explainable and easy to understand ways. It is no longer enough to focus on getting models into production, the focus must now be on getting models into the hands of the business users and decision-makers. But to operationalize, businesses must be able to trust in, derive understanding from, and communicate about, a model’s ability to meaningful impact business potential. In 2021, a business’ ability to trust its model — to the extent that they are able to produce action from AI-derived insight — will be determinant of its ability to survive. – Santiago Giraldo, Senior Product Marketing Manager of Machine Learning, Cloudera

Companies of all sizes and at all stages are moving aggressively towards operationalizing machine learning efforts. There are several popular frameworks for model training, including Tensorflow and PyTorch, leading the game. Just like Apache Spark is considered a leader for data transformation jobs and Presto is emerging as the leading tech for interactive querying, 2021 will be the year we’ll see a frontrunner dominate the broader model training space with pyTorch or Tensorflow as leading contenders. – Haoyuan Li, Founder and CEO, Alluxio

SaaS change data as the missing piece for ML/AI: Organizations with a focus on artificial intelligence and machine learning will continue to hunger for meaningful training datasets that can be fed into their ML algorithms to spot cause-and-effect change patterns over time. To do this, they will turn to their ever-changing datasets in 3rd party cloud/SaaS applications as inputs into these algorithms. This will create pressure for them to capture and ingest every single change in that data over time into their DataOps ecosystem. – Joe Gaska, CEO of GRAX

Role played by AI and ML will expand as identity intelligence comes to the forefront. As we reach a tipping point in the future of authentication, users are increasingly security-aware when it comes to protecting their digital identities online. Identity verification will become increasingly contextual, and AI will play an expanding role to determine the dynamic risk of access which a rule-based system simply cannot provide. Supervised and unsupervised deep learning, reinforcement learning, and genetic algorithms will not just apply pre-defined inference models but will also allow security solutions to adapt to changing enterprise behavior and learn from other companies as they encounter and mitigate threats. Combating deep fakes with in-built algorithms, deriving value from big data and driving decision-making through powerful analytics will play a key role in identity intelligence. – Rajesh Ganesan, Vice President, ManageEngine (division of Zoho Corp.)

Robotics

With the need to keep people out of close quarters perpetuating into the new year, we will naturally see significant investment in automation. However, for maybe the first time, robotics will be taking on the mundane, simple human tasks as opposed to the more difficult and strategic. We have seen robots assist humans in many complicated applications, such as robots trained to perform the most precise microsurgeries. Robots will now start to take on tasks that let essential workers who previously needed to be in person, work remotely. With more investments in augmented and virtual reality, for example, we will see robot security guards controlled by remote workers roaming office and factory floors; remote workers will be able to control drones remotely to pick and pack boxes in a warehouse. In 2021, the revolution will be roboticized. – Ahson Ahmad, Chief Product and Customer Officer, Ripcord

Security

Deepfakes will become a significant threat to business integrity. COVID-19 has forced in-person communication to go virtual, which means businesses are relying on video conferencing to conduct meetings more than ever before. While the notion of deepfakes may not be new, they are getting increasingly sophisticated and are becoming remarkably easy to generate. Take ThisPersonDoesNotExist.com, for example, which leverages AI to create completely believable images of people that don’t exist in real life. If this process can be conducted with relatively little information, then certainly hackers can leverage work profiles used for video conferencing technology — which have employees’ names and pictures automatically associated with them — to create convincing fakes. – James Carder, Chief Security Officer for LogRhythm

Prediction: As Fraud Detection Becomes Harder, ML Fraud Models Will Strengthen But Use More Recent Datasets: To determine fraud risk, companies typically use a data set of past transactions that they believe will be representative of the future to train their machine learning (ML) models. However, the huge impact of COVID-19 on consumer data and behavior has created a disconnect because past data is no longer representative of the future. This has led many organizations to either use underfit models that perform well but don’t catch new fraud patterns, or overfit models that create a lot of surprises such as flooded manual review queues or more chargebacks and fraud. Many companies have also shifted from using ML to rules-based models and manual reviews that rely more on human intuition. In 2021, companies will be able to leverage their understanding of these new behavioral patterns to start building stronger ML models again. However, to be successful, they will need to use more recent data, take things as they come when building models, and assess their progress as they go. – Arjun Kakkar, Vice President of Strategy & Operations at Ekata

Artificial intelligence has created new security threats, the greatest of which may be deepfakes. Deepfakes are fake audio, video, or images that rely on artificial intelligence technology to mimic reality. Deepfakes can have serious consequences in the wrong hands, such as deepfake fraud. While we haven’t seen many of these attacks yet, in 2019, fraudsters used deepfake audio to steal over $200,000 from a UK-based energy company. And with remote work environments giving fraudsters more ammunition to carry out their attacks, 2021 will be the year that technology unleashes real time audio transcription and businesses will have to remain vigilant to ensure they don’t get scammed. Businesses should be wary of any suspicious phone calls, and never send money or share sensitive information without verifying that a caller is who they claim to be.

Additionally, setting up basic cybersecurity tools and protocols can prevent fraudsters from gaining access to the sensitive information they need to create deepfake images and audio in the first place. Cybersecurity researchers are working on tools to detect deepfake content, but until then, companies will need to rely on their intuition and existing cybersecurity tools to make sure they don’t get duped. – Terry Nelms, PhD, Sr. Director of Research, Pindrop

Fueled by the influx of data breaches and the perceived exploitation of personal data by Big Tech, consumer data privacy will continue to be a huge focus in 2021 and beyond, and we can expect to see more legislation introduced that protects consumer rights and fines businesses for the irresponsible usage of data. To cultivate trust and improve the customer experience in an increasingly competitive business landscape, more organizations will give consumers ownership and control of their personal data in the coming years. By combining ethical, compliant and privacy-preserving principles with technology infrastructure built to scale for the future, society will move towards a system where the value of data will benefit both individuals and enterprises alike. – James Kingston, VP of Research and Innovation Partnerships at Dataswift, AI researcher, and Director of the HAT-LAB.

Data security governance is a required and critical building block to threat mitigation. Until recently, most data governance programs have focused on data flows and analytics without thinking much about security. New data privacy laws and regulations have forced data stakeholders such as CDO, CFO, CISO, and DPO to make data security one of the necessary building blocks of their data governance efforts. But data security governance is complex as no single vendor product can implement all required data security governance controls. In 2021, as businesses continue to collect and process more and more data, they will have to figure out how to quickly unify their information, so their entire organization is drawing information from the same, trusted and secure well. Next, businesses need to implement and manage their data source through a data protection system with necessary privacy controls in place, so data threats are mitigated. These steps will ensure future business and financial risks are minimized. – Anne Hardy, CISO of Talend

AI will be Key to Bolstering Security in a Remote World. Security is top-of-mind for any organization’s C-suite that has embarked on a digital transformation journey, but its importance has only been accelerated by the pandemic. With so many endpoints scattered across the world as employees have the flexibility to work remote from wherever they choose, vulnerabilities multiply. A major trend we will see in 2021 and beyond is the application of AI to security measures, because humans alone cannot monitor, control and check each endpoint to adequately or efficiently protect a modern enterprise. If security leaders (especially those at Fortune 500 companies) don’t make the time and financial investment to enhance security with AI now, they can expect to be targeted by hackers in the future and scramble to protect their data. -Scott Boettcher, VP, Enterprise Information Management, NTT DATA Services

Storage

Legacy NAS is Dead for AI. With the introduction of PCIe Gen4, I/O rates have now completely broken away from CPU core evolutions. Legacy NFS providers are stuck with single-stream TCP that is rate-limited by the capability of a single CPU core on the application server. PCIe Gen4 will double the peak I/O performance of applications in 2021, while a CPU core will no longer be able to equally double single-core I/O performance. There is no greater concentration of single-host IO than in the AI market – for applications such as machine learning and deep learning. To resolve this, customers will seek solutions that support multi-threading, RDMA, and the ability to bypass CPUs altogether – as is the case with NVIDIA’s GPUDirect Storage. The demands to keep GPUs and AI Processors fed and efficient will dramatically outstrip the I/O capabilities of legacy TCP-based NAS, leading customers to walk away from legacy NAS altogether in 2021. – Renen Hallak, Founder and CEO of VAST Data

Object storage shatters the myth that it’s only used for archive. Although object storage is best known as a backup and archive storage solution, three trends will expand that perception in 2021. First, flash-based object storage will gain favor in data analytics workloads that also have high capacity requirements. Second, S3-compatible storage will simplify Kubernetes deployments, making it a logical choice for modern applications. Third, cloud-native applications will be increasingly be deployed on prem, driving the need for on-prem S3-compatible storage to enhance application portability. As a result, more organizations will use object storage to support compute-heavy use cases, such as AI, ML and data analytics, shattering the “cheap and deep” myth once and for all. – Jon Toor, CMO for Cloudian

Organizations are now collecting massive amounts of machine learning and IoT data. If your company depends on collecting and analyzing data to operate and succeed, what happens if that data is not fully backed up and easily recoverable? Most companies are thinking mainly about data analysis and much less about data backup or security. But as data increasingly moves from analysis to production environments, that’s when protection becomes critical. Cutting-edge storage tools increasingly rely on AI and machine learning to automate the data backup process. Given the exploding size of enterprise data, these intelligent tools will become vital for maintaining an efficient backup process that can quickly and effortlessly react to changing requirements while saving untold hours on manual backups. – Shridar Subramanian, CMO of StorageCraft

Verticals

The potential for AI to improve supply chain processes has been an area of focus for companies for at least 5 years, but after the disruptions caused by COVID-19, many supply chain analysts and enterprises have turned their attention to AI as a possible solution to their woes. 67% of enterprises invested in some technology solution to help them weather the pandemic, and 60% of industrial enterprises are looking to AI specifically. However, AI models are fueled by data. The accuracy, scope, and capabilities of an AI model depend entirely on the training data behind it. However, that data must be organized and labeled in a machine-readable format before an AI program can digest it. Before they embrace AI, enterprises must leverage modern integration technology to automatically compile data from interactions with their ecosystem of suppliers, partners, traders and customers in a format that is structured to fuel AI models

Checkout PrimeXBT
Trade with the Official CFD Partners of AC Milan
The Easiest Way to Way To Trade Crypto.
Source: https://www.fintechnews.org/big-data-industry-predictions-for-2021/

Artificial Intelligence

Deep Learning vs Machine Learning: How an Emerging Field Influences Traditional Computer Programming

Avatar

Published

on

When two different concepts are greatly intertwined, it can be difficult to separate them as distinct academic topics. That might explain why it’s so difficult to separate deep learning from machine learning as a whole. Considering the current push for both automation as well as instant gratification, a great deal of renewed focus has been heaped on the topic.

Everything from automated manufacturing worfklows to personalized digital medicine could potentially grow to rely on deep learning technology. Defining the exact aspects of this technical discipline that will revolutionize these industries is, however, admittedly much more difficult. Perhaps it’s best to consider deep learning in the context of a greater movement in computer science.

Defining Deep Learning as a Subset of Machine Learning

Machine learning and deep learning are essentially two sides of the same coin. Deep learning techniques are a specific discipline that belong to a much larger field that includes a large variety of trained artificially intelligent agents that can predict the correct response in an equally wide array of situations. What makes deep learning independent of all of these other techniques, however, is the fact that it focuses almost exclusively on teaching agents to accomplish a specific goal by learning the best possible action in a number of virtual environments.

Traditional machine learning algorithms usually teach artificial nodes how to respond to stimuli by rote memorization. This is somewhat similar to human teaching techniques that consist of simple repetition, and therefore might be thought of the computerized equivalent of a student running through times tables until they can recite them. While this is effective in a way, artificially intelligent agents educated in such a manner may not be able to respond to any stimulus outside of the realm of their original design specifications.

That’s why deep learning specialists have developed alternative algorithms that are considered to be somewhat superior to this method, though they are admittedly far more hardware intensive in many ways. Subrountines used by deep learning agents may be based around generative adversarial networks, convolutional neural node structures or a practical form of restricted Boltzmann machine. These stand in sharp contrast to the binary trees and linked lists used by conventional machine learning firmware as well as a majority of modern file systems.

Self-organizing maps have also widely been in deep learning, though their applications in other AI research fields have typically been much less promising. When it comes to defining the deep learning vs machine learning debate, however, it’s highly likely that technicians will be looking more for practical applications than for theoretical academic discussion in the coming months. Suffice it to say that machine learning encompasses everything from the simplest AI to the most sophisticated predictive algorithms while deep learning constitutes a more selective subset of these techniques.

Practical Applications of Deep Learning Technology

Depending on how a particular program is authored, deep learning techniques could be deployed along supervised or semi-supervised neural networks. Theoretically, it’d also be possible to do so via a completely unsupervised node layout, and it’s this technique that has quickly become the most promising. Unsupervised networks may be useful for medical image analysis, since this application often presents unique pieces of graphical information to a computer program that have to be tested against known inputs.

Traditional binary tree or blockchain-based learning systems have struggled to identify the same patterns in dramatically different scenarios, because the information remains hidden in a structure that would have otherwise been designed to present data effectively. It’s essentially a natural form of steganography, and it has confounded computer algorithms in the healthcare industry. However, this new type of unsupervised learning node could virtually educate itself on how to match these patterns even in a data structure that isn’t organized along the normal lines that a computer would expect it to be.

Others have proposed implementing semi-supervised artificially intelligent marketing agents that could eliminate much of the concern over ethics regarding existing deal-closing software. Instead of trying to reach as large a customer base as possible, these tools would calculate the odds of any given individual needing a product at a given time. In order to do so, it would need certain types of information provided by the organization that it works on behalf of, but it would eventually be able to predict all further actions on its own.

While some companies are currently relying on tools that utilize traditional machine learning technology to achieve the same goals, these are often wrought with privacy and ethical concerns. The advent of deep structured learning algorithms have enabled software engineers to come up with new systems that don’t suffer from these drawbacks.

Developing a Private Automated Learning Environment

Conventional machine learning programs often run into serious privacy concerns because of the fact that they need a huge amount of input in order to draw any usable conclusions. Deep learning image recognition software works by processing a smaller subset of inputs, thus ensuring that it doesn’t need as much information to do its job. This is of particular importance for those who are concerned about the possibility of consumer data leaks.

Considering new regulatory stances on many of these issues, it’s also quickly become something that’s become important from a compliance standpoint as well. As toxicology labs begin using bioactivity-focused deep structured learning packages, it’s likely that regulators will express additional concerns in regards to the amount of information needed to perform any given task with this kind of sensitive data. Computer scientists have had to scale back what some have called a veritable fire hose of bytes that tell more of a story than most would be comfortable with.

In a way, these developments hearken back to an earlier time when it was believed that each process in a system should only have the amount of privileges necessary to complete its job. As machine learning engineers embrace this paradigm, it’s highly likely that future developments will be considerably more secure simply because they don’t require the massive amount of data mining necessary to power today’s existing operations.

Image Credit: toptal.io

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://datafloq.com/read/deep-learning-vs-machine-learning-how-emerging-field-influences-traditional-computer-programming/13652

Continue Reading

Artificial Intelligence

Extra Crunch roundup: Tonal EC-1, Deliveroo’s rocky IPO, is Substack really worth $650M?

Avatar

Published

on

For this morning’s column, Alex Wilhelm looked back on the last few months, “a busy season for technology exits” that followed a hot Q4 2020.

We’re seeing signs of an IPO market that may be cooling, but even so, “there are sufficient SPACs to take the entire recent Y Combinator class public,” he notes.

Once we factor in private equity firms with pockets full of money, it’s evident that late-stage companies have three solid choices for leveling up.

Seeking more insight into these liquidity options, Alex interviewed:

  • DigitalOcean CEO Yancey Spruill, whose company went public via IPO;
  • Latch CFO Garth Mitchell, who discussed his startup’s merger with real estate SPAC $TSIA;
  • Brian Cruver, founder and CEO of AlertMedia, which recently sold to a private equity firm.

After recapping their deals, each executive explains how their company determined which flashing red “EXIT” sign to follow. As Alex observed, “choosing which option is best from a buffet’s worth of possibilities is an interesting task.”

Thanks very much for reading Extra Crunch! Have a great weekend.

Walter Thompson
Senior Editor, TechCrunch
@yourprotagonist


Full Extra Crunch articles are only available to members
Use discount code ECFriday to save 20% off a one- or two-year subscription


The Tonal EC-1

Image Credits: Nigel Sussman

On Tuesday, we published a four-part series on Tonal, a home fitness startup that has raised $200 million since it launched in 2018. The company’s patented hardware combines digital weights, coaching and AI in a wall-mounted system that sells for $2,995.

By any measure, it is poised for success — sales increased 800% between December 2019 and 2020, and by the end of this year, the company will have 60 retail locations. On Wednesday, Tonal reported a $250 million Series E that valued the company at $1.6 billion.

Our deep dive examines Tonal’s origins, product development timeline, its go-to-market strategy and other aspects that combined to spark investor interest and customer delight.

We call this format the “EC-1,” since these stories are as comprehensive and illuminating as the S-1 forms startups must file with the SEC before going public.

Here’s how the Tonal EC-1 breaks down:

We have more EC-1s in the works about other late-stage startups that are doing big things well and making news in the process.

What to make of Deliveroo’s rough IPO debut

Why did Deliveroo struggle when it began to trade? Is it suffering from cultural dissonance between its high-growth model and more conservative European investors?

Let’s peek at the numbers and find out.

Kaltura puts debut on hold. Is the tech IPO window closing?

The Exchange doubts many folks expected the IPO climate to get so chilly without warning. But we could be in for a Q2 pause in the formerly scorching climate for tech debuts.

Is Substack really worth $650M?

A $65 million Series B is remarkable, even by 2021 standards. But the fact that a16z is pouring more capital into the alt-media space is not a surprise.

Substack is a place where publications have bled some well-known talent, shifting the center of gravity in media. Let’s take a look at Substack’s historical growth.

RPA market surges as investors, vendors capitalize on pandemic-driven tech shift

Business process organization and analytics. Business process visualization and representation, automated workflow system concept. Vector concept creative illustration

Image Credits: Visual Generation / Getty Images

Robotic process automation came to the fore during the pandemic as companies took steps to digitally transform. When employees couldn’t be in the same office together, it became crucial to cobble together more automated workflows that required fewer people in the loop.

RPA has enabled executives to provide a level of automation that essentially buys them time to update systems to more modern approaches while reducing the large number of mundane manual tasks that are part of every industry’s workflow.

E-commerce roll-ups are the next wave of disruption in consumer packaged goods

Elevated view of many toilet rolls on blue background

Image Credits: Javier Zayas Photography (opens in a new window) / Getty Images

This year is all about the roll-ups, the aggregation of smaller companies into larger firms, creating a potentially compelling path for equity value. The interest in creating value through e-commerce brands is particularly striking.

Just a year ago, digitally native brands had fallen out of favor with venture capitalists after so many failed to create venture-scale returns. So what’s the roll-up hype about?

Hack takes: A CISO and a hacker detail how they’d respond to the Exchange breach

3d Flat isometric vector concept of data breach, confidential data stealing, cyber attack.

Image Credits: TarikVision (opens in a new window) / Getty Images

The cyber world has entered a new era in which attacks are becoming more frequent and happening on a larger scale than ever before. Massive hacks affecting thousands of high-level American companies and agencies have dominated the news recently. Chief among these are the December SolarWinds/FireEye breach and the more recent Microsoft Exchange server breach.

Everyone wants to know: If you’ve been hit with the Exchange breach, what should you do?

5 machine learning essentials nontechnical leaders need to understand

Jumble of multicoloured wires untangling into straight lines over a white background. Cape Town, South Africa. Feb 2019.

Image Credits: David Malan (opens in a new window) / Getty Images

Machine learning has become the foundation of business and growth acceleration because of the incredible pace of change and development in this space.

But for engineering and team leaders without an ML background, this can also feel overwhelming and intimidating.

Here are best practices and must-know components broken down into five practical and easily applicable lessons.

Embedded procurement will make every company its own marketplace

Businesswomen using mobile phone analyzing data and economic growth graph chart. Technology digital marketing and network connection.

Image Credits: Busakorn Pongparnit / Getty Images

Embedded procurement is the natural evolution of embedded fintech.

In this next wave, businesses will buy things they need through vertical B2B apps, rather than through sales reps, distributors or an individual merchant’s website.

Knowing when your startup should go all-in on business development

One red line with arrow head breaking out from a business or finance growth chart canvas.

Image Credits: twomeows / Getty Images

There’s a persistent fallacy swirling around that any startup growing pain or scaling problem can be solved with business development.

That’s frankly not true.

Dear Sophie: What should I know about prenups and getting a green card through marriage?

lone figure at entrance to maze hedge that has an American flag at the center

Image Credits: Bryce Durbin/TechCrunch

Dear Sophie:

I’m a founder of a startup on an E-2 investor visa and just got engaged! My soon-to-be spouse will sponsor me for a green card.

Are there any minimum salary requirements for her to sponsor me? Is there anything I should keep in mind before starting the green card process?

— Betrothed in Belmont

Startups must curb bureaucracy to ensure agile data governance

Image of a computer, phone and clock on a desk tied in red tape.

Image Credits: RichVintage / Getty Images

Many organizations perceive data management as being akin to data governance, where responsibilities are centered around establishing controls and audit procedures, and things are viewed from a defensive lens.

That defensiveness is admittedly justified, particularly given the potential financial and reputational damages caused by data mismanagement and leakage.

Nonetheless, there’s an element of myopia here, and being excessively cautious can prevent organizations from realizing the benefits of data-driven collaboration, particularly when it comes to software and product development.

Bring CISOs into the C-suite to bake cybersecurity into company culture

Mixed race businesswoman using tablet computer in server room

Image Credits: Jetta Productions Inc (opens in a new window) / Getty Images

Cyber strategy and company strategy are inextricably linked. Consequently, chief information security officers in the C-Suite will be just as common and influential as CFOs in maximizing shareholder value.

How is edtech spending its extra capital?

Money tree: an adult hand reaches for dollar bills growing on a leafless tree

Image Credits: Tetra Images (opens in a new window) / Getty Images

Edtech unicorns have boatloads of cash to spend following the capital boost to the sector in 2020. As a result, edtech M&A activity has continued to swell.

The idea of a well-capitalized startup buying competitors to complement its core business is nothing new, but exits in this sector are notable because the money used to buy startups can be seen as an effect of the pandemic’s impact on remote education.

But in the past week, the consolidation environment made a clear statement: Pandemic-proven startups are scooping up talent — and fast.

Tech in Mexico: A confluence of Latin America, the US and Asia

Aerial view of crowd connected by lines

Image Credits: Orbon Alija (opens in a new window)/ Getty Images

Knowledge transfer is not the only trend flowing in the U.S.-Asia-LatAm nexus. Competition is afoot as well.

Because of similar market conditions, Asian tech giants are directly expanding into Mexico and other LatAm countries.

How we improved net retention by 30+ points in 2 quarters

Sparks coming off US dollar bill attached to jumper cables

Image Credits: Steven Puetzer (opens in a new window) / Getty Images

There’s certainly no shortage of SaaS performance metrics leaders focus on, but NRR (net revenue retention) is without question the most underrated metric out there.

NRR is simply total revenue minus any revenue churn plus any revenue expansion from upgrades, cross-sells or upsells. The greater the NRR, the quicker companies can scale.

5 mistakes creators make building new games on Roblox

BRAZIL - 2021/03/24: In this photo illustration a Roblox logo seen displayed on a smartphone. (Photo Illustration by Rafael Henrique/SOPA Images/LightRocket via Getty Images)

Image Credits: SOPA Images (opens in a new window) / Getty Images

Even the most experienced and talented game designers from the mobile F2P business usually fail to understand what features matter to Robloxians.

For those just starting their journey in Roblox game development, these are the most common mistakes gaming professionals make on Roblox.

CEO Manish Chandra, investor Navin Chaddha explain why Poshmark’s Series A deck sings

CEO Manish Chandra, investor Navin Chaddha explain why Poshmark’s Series A deck sings image

“Lead with love, and the money comes.” It’s one of the cornerstone values at Poshmark. On the latest episode of Extra Crunch Live, Chandra and Chaddha sat down with us and walked us through their original Series A pitch deck.

Will the pandemic spur a smart rebirth for cities?

New versus old - an old brick building reflected in windows of modern new facade

Image Credits: hopsalka (opens in a new window) / Getty Images

Cities are bustling hubs where people live, work and play. When the pandemic hit, some people fled major metropolitan markets for smaller towns — raising questions about the future validity of cities.

But those who predicted that COVID-19 would destroy major urban communities might want to stop shorting the resilience of these municipalities and start going long on what the post-pandemic future looks like.

The NFT craze will be a boon for lawyers

3d rendering of pink piggy bank standing on sounding block with gavel lying beside on light-blue background with copy space. Money matters. Lawsuit for money. Auction bids.

Image Credits: Gearstd (opens in a new window) / Getty Images

There’s plenty of uncertainty surrounding copyright issues, fraud and adult content, and legal implications are the crux of the NFT trend.

Whether a court would protect the receipt-holder’s ownership over a given file depends on a variety of factors. All of these concerns mean artists may need to lawyer up.

Viewing Cazoo’s proposed SPAC debut through Carvana’s windshield

It’s a reasonable question: Why would anyone pay that much for Cazoo today if Carvana is more profitable and whatnot? Well, growth. That’s the argument anyway.

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://techcrunch.com/2021/04/02/extra-crunch-roundup-tonal-ec-1-deliveroos-rocky-ipo-is-substack-really-worth-650m/

Continue Reading

AI

The AI Trends Reshaping Health Care

Avatar

Published

on

Click to learn more about author Ben Lorica.

Applications of AI in health care present a number of challenges and considerations that differ substantially from other industries. Despite this, it has also been one of the leaders in putting AI to work, taking advantage of the cutting-edge technology to improve care. The numbers speak for themselves: The global AI in health care market size is expected to grow from $4.9 billion in 2020 to $45.2 billion by 2026. Some major factors driving this growth are the sheer volume of health care data and growing complexities of datasets, the need to reduce mounting health care costs, and evolving patient needs.

Deep learning, for example, has made considerable inroads into the clinical environment over the last few years. Computer vision, in particular, has proven its value in medical imaging to assist in screening and diagnosis. Natural language processing (NLP) has provided significant value in addressing both contractual and regulatory concerns with text mining and data sharing. Increasing adoption of AI technology by pharmaceutical and biotechnology companies to expedite initiatives like vaccine and drug development, as seen in the wake of COVID-19, only exemplifies AI’s massive potential.

We’re already seeing amazing strides in health care AI, but it’s still the early days, and to truly unlock its value, there’s a lot of work to be done in understanding the challenges, tools, and intended users shaping the industry. New research from John Snow Labs and Gradient Flow, 2021 AI in Healthcare Survey Report, sheds light on just this: where we are, where we’re going, and how to get there. The global survey explores the important considerations for health care organizations in varying stages of AI adoption, geographies, and technical prowess to provide an extensive look into the state of AI in health care today.               

One of the most significant findings is around which technologies are top of mind when it comes to AI implementation. When asked what technologies they plan to have in place by the end of 2021, almost half of respondents cited data integration. About one-third cited natural language processing (NLP) and business intelligence (BI) among the technologies they are currently using or plan to use by the end of the year. Half of those considered technical leaders are using – or soon will be using – technologies for data integration, NLP, business intelligence, and data warehousing. This makes sense, considering these tools have the power to help make sense of huge amounts of data, while also keeping regulatory and responsible AI practices in mind.

When asked about intended users for AI tools and technologies, over half of respondents identified clinicians among their target users. This indicates that AI is being used by people tasked with delivering health care services – not just technologists and data scientists, as in years past. That number climbs even higher when evaluating mature organizations, or those that have had AI models in production for more than two years. Interestingly, nearly 60% of respondents from mature organizations also indicated that patients are also users of their AI technologies. With the advent of chatbots and telehealth, it will be interesting to see how AI proliferates for both patients and providers over the next few years.

In considering software for building AI solutions, open-source software (53%) had a slight edge over public cloud providers (42%). Looking ahead one to two years, respondents indicated openness to also using both commercial software and commercial SaaS. Open-source software gives users a level of autonomy over their data that cloud providers can’t, so it’s not a big surprise that a highly regulated industry like health care would be wary of data sharing. Similarly, the majority of companies with experience deploying AI models to production choose to validate models using their own data and monitoring tools, rather than evaluation from third parties or software vendors. While earlier-stage companies are more receptive to exploring third-party partners, more mature organizations are tending to take a more conservative approach.                      

Generally, attitudes remained the same when asked about key criteria used to evaluate AI solutions, software libraries or SaaS solutions, and consulting companies to work with.Although the answers varied slightly for each category,technical leaders considered no data sharing with software vendors or consulting companies, the ability to train their own models, and state-of-the art accuracy as top priorities. Health care-specific models and expertise in health care data engineering, integration, and compliance topped the list when asked about solutions and potential partners. Privacy, accuracy, and health care experience are the forces driving AI adoption. It’s clear that AI is poised for even more growth, as data continues to grow and technology and security measures improve. Health care, which can sometimes be seen as a laggard for quick adoption, is taking to AI and already seeing its significant impact. While its approach, the top tools and technologies, and applications of AI may differ from other industries, it will be exciting to see what’s in store for next year’s survey results.

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://www.dataversity.net/the-ai-trends-reshaping-health-care/

Continue Reading

AI

Turns out humans are leading AI systems astray because we can’t agree on labeling

Avatar

Published

on

Top datasets used to train AI models and benchmark how the technology has progressed over time are riddled with labeling errors, a study shows.

Data is a vital resource in teaching machines how to complete specific tasks, whether that’s identifying different species of plants or automatically generating captions. Most neural networks are spoon-fed lots and lots of annotated samples before they can learn common patterns in data.

But these labels aren’t always correct; training machines using error-prone datasets can decrease their performance or accuracy. In the aforementioned study, led by MIT, analysts combed through ten popular datasets that have been cited more than 100,000 times in academic papers and found that on average 3.4 per cent of the samples are wrongly labelled.

The datasets they looked at range from photographs in ImageNet, to sounds in AudioSet, reviews scraped from Amazon, to sketches in QuickDraw. Examples of some of the mistakes compiled by the researchers show that in some cases, it’s a clear blunder, such as a drawing of a light bulb tagged as a crocodile, in others, however, it’s not always obvious. Should a picture of a bucket of baseballs be labeled as ‘baseballs’ or ‘bucket’?

Shocking contents revealed

Inside the 1TB ImageNet dataset used to train the world’s AI: Naked kids, drunken frat parties, porno stars, and more

READ MORE

Annotating each sample is laborious work. This work is often outsourced work to services like Amazon Mechanical Turk, where workers are paid the square root of sod all to sift through the data piece by piece, labeling images and audio to feed into AI systems. This process amplifies biases and errors, as Vice documented here.

Workers are pressured to agree with the status quo if they want to get paid: if a lot of them label a bucket of baseballs as a ‘bucket’, and you decide it’s ‘baseballs’, you may not be paid at all if the platform figures you’re wrong or deliberately trying to mess up the labeling. That means workers will choose the most popular label to avoid looking like they’ve made a mistake. It’s in their interest to stick to the narrative and avoid sticking out like a sore thumb. That means errors, or worse, racial biases and suchlike, snowball in these datasets.

The error rates vary across the datasets. In ImageNet, the most popular dataset used to train models for object recognition, the rate creeps up to six per cent. Considering it contains about 15 million photos, that means hundreds of thousands of labels are wrong. Some classes of images are more affected than others, for example, ‘chameleon’ is often mistaken for ‘green lizard’ and vice versa.

There are other knock-on effects: neural nets may learn to incorrectly associate features within data with certain labels. If, say, many images of the sea seem to contain boats and they keep getting tagged as ‘sea’, a machine might get confused and be more likely to incorrectly recognize boats as seas.

Problems don’t just arise when trying to compare the performance of models using these noisy datasets. The risks are higher if these systems are deployed in the real world, Curtis Northcutt, co-lead author of the stud and a PhD student at MIT, and also cofounder and CTO of ChipBrain, a machine-learning hardware startup, explained to The Register.

“Imagine a self-driving car that uses an AI model to make steering decisions at intersections,” he said. “What would happen if a self-driving car is trained on a dataset with frequent label errors that mislabel a three-way intersection as a four-way intersection? The answer: it might learn to drive off the road when it encounters three-way intersections.

What would happen if a self-driving car is trained on a dataset with frequent label errors that mislabel a three-way intersection as a four-way intersection?

“Maybe one of your AI self-driving models is actually more robust to training noise, so that it doesn’t drive off the road as much. You’ll never know this if your test set is too noisy because your test set labels won’t match reality. This means you can’t properly gauge which of your auto-pilot AI models drives best – at least not until you deploy the car out in the real-world, where it might drive off the road.”

When the team working on the study trained some convolutional neural networks on portions of ImageNet that have been cleared of errors, their performance improved. The boffins believe that developers should think twice about training large models on datasets that have high error rates, and advise them to sort through the samples first. Cleanlab, the software the team developed and used to identify incorrect and inconsistent labels, can be found on GitHub.

“Cleanlab is an open-source python package for machine learning with noisy labels,” said Northcutt. “Cleanlab works by implementing all of the theory and algorithms in the sub-field of machine learning called confident learning, invented at MIT. I built cleanlab to allow other researchers to use confident learning – usually with just a few lines of code – but more importantly, to advance the progress of science in machine learning with noisy labels and to provide a framework for new researchers to get started easily.”

And be aware that if a dataset’s labels are particularly shoddy, training large complex neural networks may not always be so advantageous. Larger models tend to overfit to data more than smaller ones.

“Sometimes using smaller models will work for very noisy datasets. However, instead of always defaulting to using smaller models for very noisy datasets, I think the main takeaway is that machine learning engineers should clean and correct their test sets before they benchmark their models,” Northcutt concluded. ®

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://go.theregister.com/feed/www.theregister.com/2021/04/01/mit_ai_accuracy/

Continue Reading
Esports2 hours ago

Call of Duty League 2021: Rosters, format, schedule, and more

Esports2 hours ago

Ludwig breaks Ninja’s Twitch sub record

Esports3 hours ago

shroud explains why bottom fragging in Valorant is no big deal

Esports3 hours ago

mason banned from Twitch yet again after homophobic slur

Esports4 hours ago

Epic Games Receives $1 Billion In Funding, Intends To Grow The “Metaverse”

Esports4 hours ago

Call of Duty League power rankings for April 13

Esports5 hours ago

Here’s Why Dr Disrespect Won’t Stream Apex Legends on YouTube

Esports5 hours ago

Fortnite Leak Teases Aloy Skin From Horizon Zero Dawn

Esports5 hours ago

Twitch Streamer Mason Catches Ban for “Accidental” Slur

Esports5 hours ago

Former Nike Exec Samantha Halverson Joins NZXT 

Esports5 hours ago

All Changes made to champions in Patch 11.8 of League of Legends

Esports5 hours ago

League of Legends’ Patch 11.8 introduces Gwen, champion updates and new Skins

Esports6 hours ago

Twitch Hot Tub Streamer Gets Inappropriate Attire Ban for Bunny Outfit

Esports6 hours ago

LPL Spring Playoff Grand Finals are set: RNG rematch FPX

Esports6 hours ago

People are mad at indiefoxx promoting her OnlyFans via Twitch

Blockchain6 hours ago

COPA verklagt Craig Wright wegen Bitcoin-Copyright

Cannabis6 hours ago

Irish start-up raises funds to get farmers to grow hemp using drones

Esports6 hours ago

Chess GM Hikaru Apologizes For Drama With Eric Hansen and Chessbae

Esports6 hours ago

Envy Gaming and Partners Launch New Charity Dedicated to STEAM and Building Connectivity Through Gaming

Esports6 hours ago

Valorant Patch 2.07: Full Patch Notes Inside

Esports7 hours ago

OWL 2021 Power Rankings – #12 NYXL

Esports7 hours ago

VALORANT Patch 2.07 brings quality of life changes for several agents

Esports7 hours ago

Fortnite: Aloy Cup PlayStation-Only Tournament – Free Skin, Format, Date, Scoring System & More

Esports7 hours ago

The LEC viewership sees 35% growth from last year’s spring split

Esports7 hours ago

Valorant fan creates a real-life version of the Phantom

Esports7 hours ago

Powder Raises $14M Series A Led by Serena 

Esports7 hours ago

PUBG’s upcoming changes are designed to tackle bridge trolling

Esports7 hours ago

CLG unveils Samsung as latest partner

Esports7 hours ago

Sony Apparently Planning on Adapting Its Biggest IPs to Mobile

Esports7 hours ago

You can now save PS5 games to external USB drives

Trending