Zephyrnet Logo

IBM’s new Watson Large Speech Model brings generative AI to the phone  – IBM Blog

Date:


IBM’s new Watson Large Speech Model brings generative AI to the phone  – IBM Blog

<!—->


<!– –>


Person at a desk in a living room smiling down at phone in hand, with other hand on laptop

Most everyone has heard of large language models, or LLMs, since generative AI has entered our daily lexicon through its amazing text and image generating capabilities, and its promise as a revolution in how enterprises handle core business functions. Now, more than ever, the thought of talking to AI through a chat interface or have it perform specific tasks for you, is a tangible reality. Enormous strides are taking place to adopt this technology to positively impact daily experiences as individuals and consumers.

But what about in the world of voice? So much attention has been given to LLMs as a catalyst for enhanced generative AI chat capabilities that not many are talking about how it can be applied to voice-based conversational experiences. The modern contact center is currently dominated by rigid conversational experiences (yes, Interactive Voice Response or IVR is still the norm). Enter the world of Large Speech Models, or LSMs. Yes, LLMs have a more vocal cousin with benefits and possibilities you can expect from generative AI, but this time customers can interact with the assistant over the phone. 

Over the past few months, IBM watsonx development teams and IBM Research have been hard at work developing a new, state-of-the-art Large Speech Model (LSM). Based on transformer technology, LSMs take vast amounts of training data and model parameters to deliver accuracy in speech recognition. Purpose-built for customer care use cases like self-service phone assistants and real-time call transcription, our LSM delivers highly advanced transcriptions out-of-the-box to create a seamless customer experience.

We are very excited to announce the deployment of new LSMs in English and Japanese, now available exclusively in closed beta to Watson Speech to Text and watsonx Assistant phone customers.

We can go on and on about how great these models are, but what it really comes down to is performance. Based on internal benchmarking, the new LSM is our most accurate speech model yet, outperforming OpenAI’s Whisper model on short-form English use cases. We compared the out-of-the-box performance of our English LSM with OpenAI’s Whisper model across five real customer use cases on the phone, and found the Word Error Rate (WER) of the IBM LSM to be 42% lower than that of the Whisper model (see footnote (1) for evaluation methodology).

IBM’s LSM is also 5x smaller than the Whisper model (5x fewer parameters), meaning it processes audio 10x faster when run on the same hardware. With streaming, the LSM will finish processing when the audio finishes; Whisper, on the other hand, processes audio in block mode (for example, 30-second intervals). Let’s look at an example — when processing an audio file that is shorter than 30 seconds, say 12 seconds, Whisper pads with silence but still takes the full 30 seconds to process; the IBM LSM will process after the 12 seconds of audio is complete.

These tests indicate that our LSM is highly accurate in the short-form. But there’s more. The LSM also showed comparable performance to Whisper´s accuracy on long-form use cases (like call analytics and call summarization) as shown in the chart below.

How can you get started with these models?

Apply for our closed beta user program and our Product Management team will reach out to you to schedule a call.As the IBM LSM is in closed beta, some features and functionalities are still in development2.

Sign up today to explore LSMs


1 Methodology for benchmarking:

  • Whisper model for comparison: medium.en
  • Language assessed: US-English
  • Metric used for comparison: Word Error Rate, commonly known as WER, is defined as the number of edit errors (substitutions, deletions, and insertions) divided by the number of words in the reference/human transcript.
  • Prior to scoring, all machine transcripts were normalized using the whisper-normalizer to eliminate any formatting differences that might cause WER discrepancies.

2 IBM’s statements regarding its plans, direction, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.  The information mentioned regarding potential future product is not a commitment, promise, or legal obligation to deliver any material, code or functionality. The development, release, and timing of any future features or functionality remains at IBM’s sole discretion.


More from Artificial intelligence




Five machine learning types to know

5 min readMachine learning (ML) technologies can drive decision-making in virtually all industries, from healthcare to human resources to finance and in myriad use cases, like computer vision, large language models (LLMs), speech recognition, self-driving cars and more. However, the growing influence of ML isn’t without complications. The validation and training datasets that undergird ML technology are often aggregated by human beings, and humans are susceptible to bias and prone to error. Even in cases where an ML model isn’t itself biased…




Customer service trends winning organizations need to follow

4 min readPaying attention to the latest customer service trends ensures that an organization is prepared to meet changing customer expectations. Customer loyalty is waning, spurred on by the COVID-19 pandemic, social influences and the ease of switching brands. More than ever, organizations must stay on top of changes in the customer service experience to improve customer satisfaction and meet increased customer needs. A 2023 Gartner study found that 58% of leaders identified business growth as one of their most important goals.…




Five open-source AI tools to know

5 min readOpen-source artificial intelligence (AI) refers to AI technologies where the source code is freely available for anyone to use, modify and distribute. When AI algorithms, pre-trained models, and data sets are available for public use and experimentation, creative AI applications emerge as a community of volunteer enthusiasts builds upon existing work and accelerates the development of practical AI solutions. As a result, these technologies quite often lead to the best tools to handle complex challenges across many enterprise use cases.…




IBM Tech Now: December 11, 2023

< 1 min read​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 90 On this episode, we’re covering the following topics: IBM Quantum Heron IBM Quantum System Two The GA of watsonx.governance Stay plugged in You can check out the IBM Blog Announcements for a full…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.

Subscribe now

More newsletters

spot_img

Latest Intelligence

spot_img