Writing AI Lab each week means I occasionally brushwood AI models that behave severely and bizarrely. Usually, there’s thing to beryllium done astir it, prevention for sharing those tales with you. But that could soon change.
A radical of AI researchers has acceptable up a crowdsourced website, Flaw Reporting for AI (FLARE-AI), for reporting and tracking AI harms. If, for example, a chatbot generates malware oregon a bomb-making recipe, leaks idiosyncratic information, oregon triggers delusional reasoning successful users, FLARE-AI could beryllium utilized to dependable the alarm. The unfastened root codification down the strategy allows others to verify an contented and way reports to exemplary makers, arsenic good arsenic organizations similar MITRE, a nonprofit that tracks problems with method systems. It’s a spot similar Downdetector, which compiles real-time idiosyncratic reports for planetary work outages affecting things similar apps and websites.
The website is different measurement successful the group’s ongoing enactment with AI reporting, which I archetypal wrote astir past year. Members of the radical besides consulted connected a legislature measure announced successful June, which would spot the US authorities instrumentality a cardinal relation successful tracking this benignant of AI misbehavior.
“Right now, determination is nary centralized, accountable mode to study flaws successful AI systems,” says Avijit Ghosh, an artificial quality argumentation researcher astatine HuggingFace who co-led improvement of FLARE-AI with machine scientists Elaine Zhu and Shayne Longpre.
The alarm strategy was developed successful collaboration with 49 AI experts from 32 antithetic organizations. In a insubstantial outlining the work, the researchers reason that their inaugural could beryllium important arsenic AI is adopted much wide and arsenic agentic systems summation greater power. The deficiency of a accordant mode to study AI flaws is simply a important problem, they believe.
“I deliberation it’s a truly bully initiative,” says Jessica Ji, a researcher astatine the deliberation vessel Center for Security and Emerging Technology. Ji says the researchers are close to enactment that existing reporting mechanisms are fragmented and that AI models are achromatic boxes. “I’m successful enactment of thing that makes AI much transparent,” she says.
Though bugs and cybersecurity problems get a batch of attention—especially of late—Ghosh tells maine that problems with AI systems span topics similar intelligence harm, favoritism oregon bias, and misinformation. He adds that antithetic companies person antithetic standards astir specified issues, which means immoderate problems spell unrecognized. “In the lack of a coordinated disclosure system, determination are nary outer mechanisms to enforce transparency,” Ghosh says.
A spate of caller incidents involving fashionable AI tools shows however easy the exertion tin spell bad.
This week, a institution called LayerX disclosed a mode to dupe AI-infused web browsers, including OpenAI’s Atlas and Perplexity’s Comet, into vaulting their guardrails. Convincing the AI exemplary down the browser that it was playing a game, for example, could pb to the browser going rogue and trying to hack a website. (The companies liable for the affected browsers person fixed the issue, LayerX says.) And this April, Johann Rehberger, a information researcher, discovered a mode to instrumentality Claude into divulging idiosyncratic information utilizing images generated by ChatGTP.
AI introduces bizarre caller kinds of problems, too. Last year, OpenAI was forced to update its models aft it discovered that they were overly sycophantic, which sometimes appeared to promote delusional thinking.
Rumman Chowdhury, the CEO and laminitis of Humane Intelligence PBC, says FLARE-AI could beryllium a utile mode for galore AI developers to instrumentality ways of reporting issues with their tools. But she adds that specified initiatives often travel with superior challenges.









.png)

English (CA) ·
English (US) ·
Spanish (MX) ·