Zephyrnet Logo

What Happens When AI Performance Asymptotes? by @ttunguz

Date:

In the past, the bigger the AI model, the better the performance. Across OpenAI’s models for example, parameters have grown by 1000x+ & performance has nearly tripled.

OpenAI Model Release Date Parameters, B MMLU
GPT2 2/14/19 1.5 0.324
GPT3 6/11/20 175 0.539
GPT3.5 3/15/22 175 0.7
GPT4 3/14/23 1760 0.864

But model performance will soon asymptote – at least on this metric.

image

This is a chart of many recent AI models’ performance according to a broadly accepted benchmark called MMLU. 1 MMLU measures the performance of an AI model compared to a high school student.

I’ve categorized the models this way :

  • Large : > 100 billion parameters
  • Medium : 15 to 100b parameters
  • Small : < 15b parameters

Over time, the performance is converging rapidly both across model sizes & across the model vendors.

What happens when Facebook’s open-source model & Google’s closed-source model that powers Google.com & OpenAI’s models that power ChatGPT all work equally well?

Computer scientists have been challenged distinguishing the relative performance of these models with many different tests. Users will be hard-pressed to do better.

At that point, the value in the model layer should collapse. If a freely available open-source model is just as good as a paid one, why not use the free one? And if a smaller, less expensive to operate open-source model is nearly as good, why not use that one?

The rapid growth of AI has fueled a surge of interest in the models themselves. But pretty quickly, the infrastructure layer should commoditize, just as it did in the cloud where three vendors command 65% market share : Amazon Web Services, Azure, & Google Cloud Platform.

The applications & the developer tooling around the massive AI commodity brokers is the next phase of development – where product differentiation & distribution differentiate rather than brilliant, raw technical advances.2


1 MMLU measures 57 different tasks including math, history, computer science & other topics. It’s one measure of many & it’s not perfect – like any benchmark. There are others including the Elo system. Here’s an overview of the differences.. Each benchmark grades the model on a different spectrum : bias,
mathematical reasoning are two other examples.

spot_img

Latest Intelligence

spot_img