OpenAI Faces Criticism After CTO’s Interview On Sora

OpenAI, the influential artificial intelligence research lab behind groundbreaking tools like ChatGPT and Sora, has found itself in hot water following a recent interview with its Chief Technology Officer, Mira Murati.

The interview, conducted by Wall Street Journal reporter Joanna Stern, focused on OpenAI’s latest image, or rather video, generation system, Sora.

Concerns center around the potential misuse of copyrighted work to train AI models and the lack of transparency from OpenAI regarding its data practices.

Sora’s training data is in question

At the heart of the controversy lies the issue of training data, the massive datasets used to train AI models.

When asked about the sources of data utilized for Sora, Murati provided the standard response: the model had been trained on “publicly available and licensed data“.

However, further probing revealed hesitation and uncertainty on Murati’s part about the specific details of this dataset.

This response has raised red flags among artists, photographers, and intellectual property experts. AI image generation systems depend heavily on ingesting vast quantities of images, many of which may be protected by copyright. The lack of clarity around Sora’s training data raises questions about whether OpenAI has adequately safeguarded the rights of content creators.

OpenAI SORA training data controversy — **Sora’s training database has not been published on any official platform** (Image credit)

Shutterstock usage admitted later on

Adding fuel to the fire was Murati’s initial refusal to address whether Shutterstock images were a component of Sora’s training dataset. Only after the interview, in a footnote added by the Wall Street Journal, did Murati confirm the use of Shutterstock’s image library.

This confirmation contradicts OpenAI’s public-facing stance of “publicly available and licensed data” and suggests an attempt to conceal potentially problematic sourcing practices.

Shutterstock and OpenAI formed a partnership granting OpenAI rights to use Shutterstock’s image library in training image generation models like DALL-E 2 and potentially Sora.

In return, Shutterstock contributors (the photographers and artists whose images are on the platform) receive compensation when their work is used in the development of these AI models.

A PR nightmare unfolds

It’s safe to say that most public relations folks would not consider this interview to be a PR masterpiece.

Murati’s lack of clarity comes at a sensitive time for OpenAI, already facing major copyright lawsuits, including a significant one filed by the New York Times.

The public is scrutinizing practices like OpenAI’s alleged secret use of YouTube videos for model training, as previously reported by The Information. With stakeholders ranging from artists to politicians demanding accountability, Murati’s avoidance only fuels the fire.

OpenAI’s opaque approach is backfiring spectacularly, transforming the Sora interview into a PR disaster.

OpenAI CTO Mira Murati says Sora was trained on publicly available and licensed data pic.twitter.com/rf7pZ0ZX00

— Tsarathustra (@tsarnick) March 13, 2024

Transparency is not the most discussed topic for nothing

This incident underscores a critical truth: unveiling the truth is paramount in the world of AI. OpenAI’s stumbling responses have severely undermined public trust and intensified questions about its ethical practices. The Sora controversy highlights the growing chorus demanding greater accountability within the AI industry.

Murati’s reluctance to disclose the specifics of Sora’s training data breeds mistrust and sets a dangerous precedent.

Without the clarity artists, creators, and the public are demanding, ethical debates and the potential for legal action will only intensify.

There are no angels in this land

While much of the current scrutiny falls squarely on OpenAI, it’s crucial to remember they’re not the only player in the game.

Facebook AI Research’s LLaMA model and Google’s Gemini have also faced allegations of problematic training data sources.

This isn’t surprising, as Business Insider reports that Meta has already admitted to using Instagram and Facebook posts to train its AI models. Additionally, Google’s control over vast swaths of the internet gives them unparalleled access to potential training data, raising similar ethical concerns about consent and copyright.

The situation with OpenAI’s Sora is just one piece of a larger puzzle. The entire AI development field is facing scrutiny regarding its data practices and the potential ethical implications.

Featured image credit: Freepik.

SEO Powered Content & PR Distribution. Get Amplified Today.
PlatoData.Network Vertical Generative Ai. Empower Yourself. Access Here.
PlatoAiStream. Web3 Intelligence. Knowledge Amplified. Access Here.
PlatoESG. Carbon, CleanTech, Energy, Environment, Solar, Waste Management. Access Here.
PlatoHealth. Biotech and Clinical Trials Intelligence. Access Here.
Source: https://dataconomy.com/2024/03/15/openai-sora-training-data-controversy/

Generative Data Intelligence

OpenAI faces criticism after CTO’s interview on Sora

Sora’s training data is in question

Shutterstock usage admitted later on

A PR nightmare unfolds

Transparency is not the most discussed topic for nothing

There are no angels in this land

Exclusive Trump Bitcoin NFTs With Custom Ordinals For ‘Mugshot Edition’ Buyers – CryptoInfoNet

Company Provides Digital Financial Literacy Training For Nigerians – CryptoInfoNet

Latest Intelligence

BDAG Leads Top 5 Promising Crypto Presales of 2024

How to Assess Market Sentiment Before Buying Cryptocurrency

BlockDAG’s $100M Liquidity & Vesting Period Amidst SOL Network Issues & DOT Price Predictions

Rainbet and Crypto Casinos: Leaders of the Pack

Forbes lists XRP, ADA, LTC, ETC among top “zombie” tokens

DOJ Disputes Roman Storm’s Characterization of Tornado Cash Operations in New Filing