Zephyrnet Logo

Microsoft Unveils Tools to Tackle AI Hallucinations

Date:

Microsoft has unveiled a set of capabilities in the Azure AI Studio to tackle a plethora of issues including AI hallucinations, poisoning and prompt injection.

The new tools are meant to make generative AI safer and reliable for users, after being plagued by untruths by chatbots, making up stuff, in what is now widely known as AI hallucinations.

Going back to drawing boards

The surfacing of AI models offering incorrect or harmful responses has made developers go back to the drawing boards, but with more funding required. The tech industry, according to The Register is trying to tame the wild models, instead of coming up with much safer and ethical AI tools.

Microsoft, has had to acknowledge the AI tech comes with risks and addressing some of them cannot be overemphasized. Sarah Bird, chief product officer of responsible AI at Microsoft said the new safety features will be easy to use for azure customers “who are hiring groups of red teamers to test the AI services built.”

The tools, she said, can detect potential threats and monitor for hallucinations. They can also block any malicious prompts in real time from Azure AI customers.

“We know that customers don’t all have deep expertise in prompt injection attacks or hateful content, so the evaluation system generates the prompts needed to simulate these types of attacks,” she told The Verge in an interview.

“Customers can then get a score and see the outcomes.”

The tools

According to the tech firm, three features – prompt shields, safety evaluations as well as risk and safety monitoring are now available in preview on Azure AI and OpenAI services. Prompt Shields, according to the company blocks malicious prompts from external documents, which instruct models to disregard their training.

Risk and safety monitoring helps “to understand what model inputs, outputs, and end users are triggering content filters to inform mitigations.”

Safety evaluation assesses model vulnerability to jailbreak attacks and generate content risk.

Microsoft is not stopping on these ones alone. The company revealed that two more features will be released soon. These are meant to direct models towards safe output as well as tracking prompts “to flag potentially problematic users.”

“With these additions, Azure AI continues to provide our customers with innovative technologies to safeguard their applications across the generative AI lifecycle,” said Bird in a blogpost.

According to Bird, groundedness detection is a feature, which was designed to identify text-based hallucinations. It gives customers options when a false claim is seen, including “sending the message back to be revised before it can be displayed.”

Safety system messages to users’ models directing them towards safe and responsible outputs, according the firm.

Also read: AI Tokens AGIX, FET and OCEAN Soar On Merger Talks

Risk management versus innovation

Bird further explained in a blogpost how business executives are attempting a balance between innovation and risk management. They want to use generative AI “without being bitten by it.”

“Prompt injection attacks have emerged as a significant challenge, where malicious actors try to manipulate an AI system into doing something outside its intended purpose, such as producing harmful content or exfiltrating confidential data,” explained Bird.

She added that apart from mitigating the risks, companies were also concerned about quality and reliability.

“They want to ensure that their AI systems are not generating errors or adding information that isn’t substantiated in the application’s data sources, which can erode user trust,” she said.

Market concerns

Bird admitted that there are fears Microsoft and other AI firms want to detect to people what should be deemed appropriate and what is not.

However, her team, she said, added a way for Azure customers to “toggle the filtering of hate speech or violence that the model sees and blocks.”

As for Google Gemini, which made noise recently because of its outrageous images, filters that were meant to reduce bias resulted in unintended effects.

spot_img

Latest Intelligence

spot_img