Connect with us

AI

Huawei trained the Chinese-language equivalent of GPT-3

Avatar

Published

on

Join Transform 2021 this July 12-16. Register for the AI event of the year.


For the better part of a year, OpenAI’s GPT-3 has remained among the largest AI language models ever created, if not the largest of its kind. Via an API, people have used it to automatically write emails and articles, summarize text, compose poetry and recipes, create website layouts, and generate code for deep learning in Python. But GPT-3 has key limitations, chief among them that it’s only available in English. The 45-terabyte dataset the model was trained on drew exclusively from English-language sources.

This week, a research team at Chinese company Huawei quietly detailed what might be the Chinese-language equivalent of GPT-3. Called PanGu-Alpha (stylized PanGu-α), the 750-gigabyte model contains up to 200 billion parameters — 25 million more than GPT-3 — and was trained on 1.1 terabytes of Chinese-language ebooks, encyclopedias, news, social media, and web pages.

The team claims that the model achieves “superior” performance in Chinese-language tasks spanning text summarization, question answering, and dialogue generation. Huawei says it’s seeking a way to let nonprofit research institutes and companies gain access to pretrained PanGu-α models, either by releasing the code, model, and dataset or via APIs.

Familiar architecture

In machine learning, parameters are the part of the model that’s learned from historical training data. Generally speaking, in the language domain, the correlation between the number of parameters and sophistication has held up remarkably well.

Large language models like OpenAI’s GPT-3 learn to write humanlike text by internalizing billions of examples from the public web. Drawing on sources like ebooks, Wikipedia, and social media platforms like Reddit, they make inferences to complete sentences and even whole paragraphs.

Huawei language model

Above: PanGu-α generating dialog for a video game.

Akin to GPT-3, PanGu-α is what’s called a generative pretrained transformer (GPT), a language model that is first pretrained on unlabeled text and then fine-tuned for tasks. Using Huawei’s MindSpore framework for development and testing, the researchers trained the model on a cluster of 2,048 Huawei Ascend 910 AI processors, each delivering 256 teraflops of computing power.

To build the training dataset for PanGu-α, the Huawei team collected nearly 80 terabytes of raw data from public datasets, including the popular Common Crawl dataset, as well as the open web. They then filtered the data, removing documents containing fewer than 60% Chinese characters, less than 150 characters, or only titles, advertisements, or navigation bars. Chinese text was converted into simplified Chinese, and 724 potentially offensive words, spam, and “low-quality” samples were filtered out.

One crucial difference between GPT-3 and PanGu-α is the number of tokens on which the models trained. Tokens, a way of separating pieces of text into smaller units in natural language, can be either words, characters, or parts of words. While GPT-3 trained on 499 billion tokens, PanGu-α trained on only 40 billion, suggesting it’s comparatively undertrained.

Huawei language model

Above: PanGu-α writing fiction.

Image Credit: Huawei

In experiments, the researchers say that PanGu-α was particularly adept at writing poetry, fiction, and dialog as well as summarizing text. Absent fine-tuning on examples, PanGu-α could generate poems in the Chinese forms of gushi and duilian. And given a brief conversation as prompt, the model could brainstorm rounds of “plausible” follow-up dialog.

This isn’t to suggest that PanGu-α solves all of the problems plaguing language models of its size. A focus group tasked with evaluating the model’s outputs found 10% of them to be “unacceptable” in terms of quality. And the researchers observed that some of PanGu-α’s creations contained irrelevant, repetitive, or illogical sentences.

Huawei language model

Above: PanGu-α summarizing text from news articles.

The PanGu-α team also didn’t address some of the longstanding challenges in natural language generation, including the tendency of models to contradict themselves. Like GPT-3, PanGu-α can’t remember earlier conversations, and it lacks the ability to learn concepts through further conversation and to ground entities and actions to experiences in the real world.

“The main point of excitement is the extension of these large models to Chinese,” Maria Antoniak, a natural language processing researcher and data scientist at Cornell University, told VentureBeat via email. “In other ways, it’s similar to GPT-3 in both its benefits and risks. Like GPT-3, it’s a huge model and can generate plausible outputs in a variety of scenarios, and so it’s exciting that we can extend this to non-English scenarios … By constructing this huge dataset, [Huawei is] able to train a model in Chinese at a similar scale to English models like GPT-3. So in sum, I’d point to the dataset and the Chinese domain as the most interesting factors, rather than the model architecture, though training a big model like this is always an engineering feat.”

Skepticism

Indeed, many experts believe that while PanGu-α and similarly large models are impressive with respect to their performance, they don’t move the ball forward on the research side of the equation. They’re prestige projects that demonstrate the scalability of existing techniques, rather, or that serve as a showcase for a company’s products.

“I think the best analogy is with some oil-rich country being able to build a very tall skyscraper,” Guy Van den Broeck, an assistant professor of computer science at UCLA, said in a previous interview with VentureBeat. “Sure, a lot of money and engineering effort goes into building these things. And you do get the ‘state of the art’ in building tall buildings. But there is no scientific advancement per se … I’m sure academics and other companies will be happy to use these large language models in downstream tasks, but I don’t think they fundamentally change progress in AI.”

Huawei language model

Above: PanGu-α writing articles.

Even OpenAI’s GPT-3 paper hinted at the limitations of merely throwing more compute at problems in natural language. While GPT-3 completes tasks from generating sentences to translating between languages with ease, it fails to perform much better than chance on a test — adversarial natural language inference — that tasks it with discovering relationships between sentences.

The PanGu-α team makes no claim that the model overcomes other blockers in natural language, like answering math problems correctly or responding to questions without paraphrasing training data. More problematically, their experiments didn’t probe PanGu-α for the types of bias and toxicity found to exist in models like GPT-3. OpenAI itself notes that GPT-3 places words like “naughty” or “sucked” near female pronouns and “Islam” near terms like “terrorism.” A separate paper by Stanford University Ph.D. candidate and Gradio founder Abubakar Abid details the inequitable tendencies of text generated by GPT-3, like associating the word “Jews” with “money.”

Carbon impact

Among others, leading AI researcher Timnit Gebru has questioned the wisdom of building large language models, examining who benefits from them and who’s disadvantaged. A paper coauthored by Gebru earlier this year spotlights the impact of large language models’ carbon footprint on minority communities and such models’ tendency to perpetuate abusive language, hate speech, microaggressions, stereotypes, and other dehumanizing language aimed at specific groups of people.

In particular, the effects of AI and machine learning model training on the environment have been brought into relief. In June 2020, researchers at the University of Massachusetts at Amherst released a report estimating that the amount of power required for training and searching a certain model involves the emissions of roughly 626,000 pounds of carbon dioxide, equivalent to nearly 5 times the lifetime emissions of the average U.S. car.

Huawei language model

Above: PanGu-α creating poetry.

While the environmental impact of training PanGu-α is unclear, it’s likely that the model’s footprint is substantial — at least compared with language models a fraction of its size. As the coauthors of a recent MIT paper wrote, evidence suggests that deep learning is approaching computational limits. “We do not anticipate that the computational requirements implied by the targets … The hardware, environmental, and monetary costs would be prohibitive,” the researchers said. “Hitting this in an economical way will require more efficient hardware, more efficient algorithms, or other improvements such that the net impact is this large a gain.”

Antoniak says that it’s an open question as to whether larger models are the right approach in natural language. While the best performance scores on tasks currently come from large datasets and models, whether the pattern of dumping enormous amounts of data into models will pay off is uncertain. “The current structure of the field is task-focused, where the community gathers together to try to solve specific problems on specific datasets,” she said. “These tasks are usually very structured and can have their own weaknesses, so while they help our field move forward in some ways, they can also constrain us. Large models perform well on these tasks, but whether these tasks can ultimately lead us to any true language understanding is up for debate.”

Future directions

The PanGu-α team’s choices aside, they might not have long to set standards that address the language model’s potential impact on society. A paper published by researchers from OpenAI and Stanford University found that large language model developers like Huawei, OpenAI, and others may only have a six- to nine-month advantage until others can reproduce their work. EleutherAI, a community of machine learning researchers and data scientists, expects to release an open source implementation of GPT-3 in August.

The coauthors of the OpenAI and Stanford paper suggest ways to address the negative consequences of large language models, such as enacting laws that require companies to acknowledge when text is generated by AI — perhaps along the lines of California’s bot law. Other recommendations include:

  • Training a separate model that acts as a filter for content generated by a language model
  • Deploying a suite of bias tests to run models through before allowing people to use the model
  • Avoiding some specific use cases

The consequences of failing to take any of these steps could be catastrophic over the long term. In recent research, the Middlebury Institute of International Studies’ Center on Terrorism, Extremism, and Counterterrorism claims that GPT-3 could reliably generate “informational” and “influential” text that might radicalize people into violent far-right extremist ideologies and behaviors. And toxic language models deployed into production might struggle to understand aspects of minority languages and dialects. This could force people using the models to switch to “white-aligned English,” for example, to ensure that the models work better for them, which could discourage minority speakers from engaging with the models to begin with.

Given Huawei’s ties with the Chinese government, there’s also a concern that models like PanGu-α could be used to discriminate against marginalized peoples including Uyghurs living in China. A Washington Post report revealed that Huawei tested facial recognition software that could send automated “Uighur alarms” to government authorities when its camera systems identified members of the minority group.

We’ve reached out to Huawei for comment and will update this article once we hear back.

“With PanGu-α, like with GPT-3, there are risks of memorization, biases, and toxicity in the outputs,” Antoniak said. “This suggests that perhaps we should try to better model how humans learn language.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://venturebeat.com/2021/04/29/huawei-trained-the-chinese-language-equivalent-of-gpt-3/

AI

Father and son duo take on global logistics with Optimal Dynamics’ sequential decision AI platform

Avatar

Published

on

Like “innovation,” machine learning and artificial intelligence are commonplace terms that provide very little context for what they actually signify. AI/ML spans dozens of different fields of research, covering all kinds of different problems and alternative and often incompatible ways to solve them.

One robust area of research here that has antecedents going back to the mid-20th century is what is known as stochastic optimization — decision-making under uncertainty where an entity wants to optimize for a particular objective. A classic problem is how to optimize an airline’s schedule to maximize profit. Airlines need to commit to schedules months in advance without knowing what the weather will be like or what the specific demand for a route will be (or, whether a pandemic will wipe out travel demand entirely). It’s a vibrant field, and these days, basically runs most of modern life.

Warren B. Powell has been exploring this problem for decades as a researcher at Princeton, where he has operated the Castle Lab. He has researched how to bring disparate areas of stochastic optimization together under one framework that he has dubbed “sequential decision analytics” to optimize problems where each decision in a series places constraints on future decisions. Such problems are common in areas like logistics, scheduling and other key areas of business.

The Castle Lab has long had industry partners, and it has raised tens of millions of dollars in grants from industry over its history. But after decades of research, Powell teamed up with his son, Daniel Powell, to spin out his collective body of research and productize it into a startup called Optimal Dynamics. Father Powell has now retired full-time from Princeton to become Chief Analytics Officer, while son Powell became CEO.

The company raised $18.4 million in new funding last week from Bessemer led by Mike Droesch, who recently was promoted to partner earlier this year with the firm’s newest $3.3 billion fundraise. The company now has 25 employees and is centered in New York City.

So what does Optimal Dynamics actually do? CEO Powell said that it’s been a long road since the company’s founding in mid-2017 when it first raised a $450,000 pre-seed round. We were “drunkenly walking in finding product-market fit,” Powell said. This is “not an easy technology to get right.”

What the company ultimately zoomed in on was the trucking industry, which has precisely the kind of sequential decision-making that father Powell had been working on his entire career. “Within truckload, you have a whole series of uncertain variables,” CEO Powell described. “We are the first company that can learn and plan for an uncertain future.”

There’s been a lot of investment in logistics and trucking from VCs in recent years as more and more investors see the potential to completely disrupt the massive and fragmented market. Yet, rather than building a whole new trucking marketplace or approaching it as a vertically-integrated solution, Optimal Dynamics decided to go with the much simpler enterprise SaaS route to offer better optimization to existing companies.

One early customer, which owned 120 power units, saved $4 million using the company’s software, according to Powell. That was a result of better utilization of equipment and more efficient operations. They “sold off about 20 vehicles that they didn’t need anymore due to the underlying efficiency,” he said. In addition, the company was able to replace a team of ten who used to manage trucking logistics down to one, and “they are just managing exceptions” to the normal course of business. As an example of an exception, Powell said that “a guy drove half way and then decided he wanted to quit,” leaving a load stranded. “Trying to train a computer on weird edge events [like that] is hard,” he said.

Better efficiency for equipment usage and then saving money on employee costs by automating their work are the two main ways Optimal Dynamics saves money for customers. Powell says most of the savings come in the former rather than the latter, since utilization is often where the most impact can be felt.

On the technical front, the key improvement the company has devised is how to rapidly solve the ultra-complex optimization problems that logistics companies face. The company does that through value function approximation, which is a field of study where instead of actually computing the full range of stochastic optimization solutions, the program approximates the outcomes of decisions to reduce compute time. We “take in this extraordinary amount of detail while handling it in a computationally efficient way,” Powell said. That’s where we have really “wedged ourselves as a company.”

Early signs of success with customers led to a $4 million seed round led by Homan Yuen of Fusion Fund, which invests in technically-sophisticated startups (i.e. the kind of startups that take decades of optimization research at Princeton to get going). Powell said that raising the round was tough, transpiring during the first weeks of the pandemic last year. One corporate fund pulled out at the last minute, and it was “chaos ensuing with everyone,” he said. This Series A process meanwhile was the opposite. “This round was totally different — closed it in 17 days from round kickoff to closure,” he said.

With new capital in the bank, the company is looking to expand from 25 employees to 75 this year, who will be trickling back to the company’s office in the Flatiron neighborhood of Manhattan in the coming months. Optimal Dynamics targets customers with 75 trucks or more, either fleets for rent or private fleets owned by companies like Walmart who handle their own logistics.

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://techcrunch.com/2021/05/18/optimal-dynamics-series-a/

Continue Reading

AI

IBM will buy Salesforce partner Waeg to boost hybrid cloud, AI strategy

Avatar

Published

on

Elevate your enterprise data technology and strategy at Transform 2021.


(Reuters) — IBM said on Tuesday it would buy Waeg, a consulting partner for Salesforce, in a deal that will extend its range of services and support its hybrid cloud and artificial intelligence strategy.

The deal to acquire Waeg, which is based in Brussels and serves clients across Europe, complements IBM’s acquisition in January of 7Summits, a U.S. consultancy that specialises in Salesforce’s customer management software.

“Waeg’s strength in Salesforce consulting services will be key to creating intelligent workflows that allow our clients to keep pace with changing customer and employee needs and expectations,” Mark Foster, senior vice president of IBM Services and Global Business Services, said.

Waeg employs a team of 130 ‘Waegers’ in locations that include Belgium, Denmark, France, Ireland, Poland, Portugal and the Netherlands.

The terms were not disclosed for the deal, which is subject to customary closing conditions and is expected to be completed this quarter.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://venturebeat.com/2021/05/18/ibm-will-buy-salesforce-partner-waeg-to-boost-hybrid-cloud-ai-strategy/

Continue Reading

AI

For companies that use ML, labeled data is the key differentiator

Avatar

Published

on

AI is driving the paradigm shift that is the software industry’s transition to data-centric programming from writing logical statements. Data is now oxygen. The more training data a company gathers, the brighter will its AI-powered products burn.

Why is Tesla so far ahead with advanced driver assistance systems (ADAS)? Because no one else has collected as much information — it has data on more than ten billion driven miles, helping it pull ahead of competition like Waymo, which has only about 20 million miles. But any company that is considering using machine learning (ML) cannot overlook one technical choice: supervised or unsupervised learning.

There is a fundamental difference between the two. For unsupervised learning, the process is fairly straightforward: The acquired data is directly fed to the models, and if all goes well, it will identify patterns.

Elon Musk compares unsupervised learning to the human brain, which gets raw data from the six senses and makes sense of it. He recently shared that making unsupervised learning work for ADAS is a major challenge that hasn’t been solved yet.

Supervised learning is currently the most practical approach for most ML challenges. O’Reilly’s 2021 report on AI Adoption in the Enterprise found that 82% of surveyed companies use supervised learning, while only 58% use unsupervised learning. Gartner predicts that through 2022, supervised learning will remain favored by enterprises, arguing that “most of the current economic value gained from ML is based on supervised learning use cases”.

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://techcrunch.com/2021/05/18/for-companies-that-use-ml-labeled-data-is-the-key-differentiator/

Continue Reading

AI

My Experience Building a WhatsApp Chat Bot for a Nigerian Company

Avatar

Published

on

Jerry Udensi
Summary.

Short introduction: I’m Jerry Udensi, CTO of a Nigerian-Malaysian tech company: Lyshnia Limited. Prior to working full time with Lyshnia (a company I founded in 2013 with my elder brother), I worked in the AI industry in Malaysia and Singapore. I have built Natural Language AI systems for large corporations such as Allianz SE, and Insurance Technology for companies like Malaysia’s Insuradar Sdn.

The reason for my short introduction is to show you my background in building AI powered systems. Natural Language Processing is a field I’ve actively been in for over 3 years now so you’d think building a Transactional Chat Bot that sells only 10 products shouldn’t be an issue for me right? Well you’d be right if the customers were people who read.

In the paragraphs to follow, I will highlight what I’ve learnt building and maintaining Jane B(Just another Non-Existent Bot) which attends to approx. 1000 customers every day.

There’s this old saying that goes “if you want to hide something from a Black Man, put it in a book”. Unfortunately, this is the case with over 70% of the Customers who used the bot.

When you first message the bot, it greets you, let’s you know that you’re chatting with a Bot, then gives you 4 options to choose from.

The first 3 messages you receive after chatting the first time.

5 out of 10 people ignore the initial message and go ahead to write what they want, 2 out of 10 people would read but not understand and therefore reply confusedly like in the image below:

A customers reply

For the 5 who initially ignored the Menu message, we automatically resend the message, and 4 out of 5 would go on to reply appropriately, while 1 of 5 would complain of how stressful the process is and probably never chat again.

1. Chatbot Trends Report 2021

2. 4 DO’s and 3 DON’Ts for Training a Chatbot NLP Model

3. Concierge Bot: Handle Multiple Chatbots from One Chat Screen

4. An expert system: Conversational AI Vs Chatbots

Why? 🤦🏽‍♂️

Yes, we get it. You live in France, but do you want it Delivered or will you Pick it up? (some customers send people in to do a pick up for them)

Jane has been simplified to understand even incorrect English, and giving the customers hints on how to reply, yet a lot of those who chat her simply ignore instructions, and rather type a thousand words than one that Jane would understand.

Ok

You would think it’ll be easier and less stressful for customers to simply reply “1” rather than type out “I want to make an order”, but no. Chat after chat, you will realise a lot of people are saying unnecessary things before or after their actual intention. For Chat Bot providers, this can be a nightmare because the Chat Bot asked a question and is listening for a Natural Language answer which is very hard to predict if the users response is in line with your desired answer.

Even for a human, it is hard to understand another humans intentions when spoken out of context

For the Chat above, the Bot was asking the user to confirm the items she wants to buy, but the user instead replies saying where they live. Totally out of context.

Getting instant replies is a drug people are addicted to. Customers are told that this is a chat bot which only takes orders and track orders, then given another number to chat for consultancy to speak to a human. Yet, they keep coming back just minutes later to complain to the Bot that they’re not getting responses there.

Something else I noticed while analysing the chat response times is that the Customers get so hooked on the instant replies that if at any point, the chat bot delays their response for even just 1 minute they start asking why they’re not getting any response.

On the good side, customer who read and follow the short and simple instructions are able to place their orders in less than 2 minutes from a platform their comfortable with (WhatsApp) while feeling like they’re chatting with a human.

We as Humans need to do better. Thank you.

Coinsmart. Beste Bitcoin-Börse in Europa
Source: https://chatbotslife.com/my-experience-building-a-whatsapp-chat-bot-for-a-nigerian-company-b19c02c7d68?source=rss—-a49517e4c30b—4

Continue Reading
AR/VR3 days ago

Next Dimension Podcast – Pico Neo 3, PSVR 2, HTC Vive Pro 2 & Vive Focus 3!

Blockchain5 days ago

IRS & DOJ Set Sights on Binance to Root Out Illicit Activity

Blockchain5 days ago

Hong Kong in Talks with China to Stretch Cross-Border Testing of Digital Yuan

Big Data5 days ago

Disney’s streaming growth slows as pandemic lift fades, shares fall

Cyber Security3 days ago

Online Cybersecurity Certification Programs

Esports3 days ago

Technoblade’s Minecraft settings

Blockchain3 days ago

Proof-of-Work Cryptocurrencies Spikes After Elon Musk Ditches Bitcoin

Blockchain5 days ago

Elon Musk’s Tesla Looking to Accept Energy-Efficient Cryptocurrencies

Big Data5 days ago

Elon Musk on crypto: to the mooooonnn! And back again

Blockchain News5 days ago

MicroStrategy Acquires an Additional 271 Bitcoins for $15 Million

Esports5 days ago

Playbase offers an instant solution to organizing simple and cost-effective competitive gaming platforms

Energy5 days ago

AlphaESS lance de nouveaux produits et programmes au salon Smart Energy Conference & Exhibition de 2021

ZDNET5 days ago

US pipeline ransomware attack serves as fair warning to persistent corporate inertia over security

North America
Esports3 days ago

Extra Salt, O PLANO secure wins in cs_summit 8 closed qualifier openers

Aviation5 days ago

The World’s Most Interesting Boeing 747 Uses

Esports5 days ago

Valorant Error Code VAN 81: How to Fix

Esports5 days ago

CS:GO Update 1.37.9.1 adds several updates to new map Ancient

Energy5 days ago

How Young Entrepreneur Jeff Clayton Is Innovating the Dropshipping Logistics Industry

AI4 days ago

Shiba Inu (SHIB) Mania, Dogecoin, Tesla’s Bitcoin Halt and Crypto Market Volatility: The Weekly Recap

Energy5 days ago

Aero-Engine Coating Market to grow by USD 28.43 million|Key Drivers, Trends, and Market Forecasts|17000+ Technavio Research Reports

Trending