Elon Musk's AI chatbot struggled to fact-check Israel-Iran war, report says

3 months ago

A caller study reveals that Grok — nan free-to-use AI chatbot integrated into Elon Musk's X — showed "significant flaws and limitations" erstwhile verifying accusation astir nan 12-day conflict betwixt Israel and Iran (June 13-24), which now seems to person subsided.

Researchers astatine nan Atlantic Council's Digital Forensic Research Lab (DFRLab) analysed 130,000 posts published by nan chatbot connected X successful narration to nan 12-day conflict, and recovered they provided inaccurate and inconsistent information.

They estimate that astir a 3rd of those posts responded to requests to verify misinformation circulating astir nan conflict, including unverified societal media claims and footage purporting to look from nan speech of fire.

"Grok demonstrated that it struggles pinch verifying already-confirmed facts, analysing clone visuals and avoiding unsubstantiated claims," nan study says.

"The study emphasises nan important value of AI chatbots providing meticulous accusation to guarantee they are responsible intermediaries of information."

While Grok is not intended arsenic a fact-checking tool, X users are progressively turning to it to verify accusation circulating connected nan platform, including to understand situation events.

X has nary third-party fact-checking programme, relying alternatively connected alleged organization notes wherever users tin adhd discourse to posts believed to beryllium inaccurate.

Misinformation surged connected nan level aft Israel first struck successful Iran connected 13 June, triggering an aggravated speech of fire.

Grok fails to separate authentic from fake

DFRLab researchers identified 2 AI-generated videos that Grok falsely labelled arsenic "real footage" emerging from nan conflict.

The first of these videos shows what seems to beryllium demolition to Tel Aviv's Ben Gurion airdrome aft an Iranian strike, but is intelligibly AI-generated. Asked whether it was real, Grok oscillated betwixt conflicting responses wrong minutes.

It falsely claimed that nan mendacious video "likely shows existent harm astatine Tel Aviv's Ben Gurion Airport from a Houthi rocket onslaught connected May 4, 2025," but later claimed nan video "likely shows Mehrabad International Airport successful Tehran, Iran, damaged during Israeli airstrikes connected June 13, 2025."

Euroverify, Euronews' fact-checking unit, identified 3 further viral AI-generated videos which Grok falsely said were authentic erstwhile asked by X users. The chatbot linked them to an onslaught connected Iran's Arak atomic works and strikes connected Israel's larboard of Haifa and nan Weizmann Institute successful Rehovot.

Euroverify has antecedently detected respective out-of-context videos circulating connected societal platforms being misleadingly linked to nan Israel-Iran conflict.

Grok seems to person contributed to this phenomenon. The chatbot described a viral video arsenic showing Israelis fleeing nan conflict astatine nan Taba separator crossing pinch Egypt, erstwhile it successful truth shows festival-goers successful France.

It besides alleged that a video of an detonation successful Malaysia showed an "Iranian rocket hitting Tel Aviv" connected 19 June.

Chatbots amplifying falsehoods

The findings of nan study travel aft nan 12-day conflict triggered an avalanche of mendacious claims and speculation online.

One claim, that China sent subject cargo planes to Iran's aid, was wide boosted by AI chatbots Grok and Perplexity, a three-year-old AI startup which has drawn wide contention for allegedly utilizing nan contented of media companies without their consent.

NewsGuard, a disinformation watchdog, claimed some these chatbots had contributed to nan dispersed of nan claim.

The misinformation stemmed from misinterpreted information from formation search tract Flightradar24, which was picked up by immoderate media outlets and amplified artificially by nan AI chatbots.

Experts astatine DFRLab constituent retired that chatbots heavy trust connected media outlets to verify information, but often cannot support up pinch nan fast-changing news gait successful situations of world crises.

They besides pass against nan distorting effect these chatbots tin person arsenic users go progressively reliant connected them to pass themselves.

"As these precocious connection models go an intermediary done which wars and conflicts are interpreted, their responses, biases, and limitations tin power nan nationalist narrative."