A study on ChatGPT citations paints a grim picture for publishers.

.
A study on ChatGPT citations paints a grim picture for publishers.

As more publishers cut content licensing deals with ChatGPT-maker OpenAI, a study put out this week by the Tow Center for Digital Journalism — looking at how the AI chatbot produces citations (i.e. sources) for publishers' content — makes for interesting, or, well, concerning, reading.

Bottom line, the findings are that publishers are still at the mercy of generative AI tool's propensity to invent or otherwise misrepresent information, with or without allowing OpenAI to crawl their content.

The research, conducted at Columbia Journalism School, looked into citations produced by ChatGPT after the AI was tasked with the identifying source of sample quotations, plucked from a mix of publishers — some inked up with deals for OpenAI and some without.

The Center took block quotes from 10 stories apiece produced by a total of 20 randomly selected publishers (so 200 different quotes in all) — including content from The New York Times (which is currently suing OpenAI in a copyright claim); The Washington Post (which is unaffiliated with the ChatGPT maker); The Financial Times (which has inked a licensing deal); and others.

“We chose quotes that, if pasted into Google or Bing, would return the source article among the top three results and evaluated whether OpenAI’s new search tool would correctly identify the article that was the source of each quote,” wrote Tow researchers Klaudia Jaźwińska and Aisvarya Chandrasekar in a blog post explaining their approach and summarizing their findings.

What we found was not promising for news publishers," they continue. "While OpenAI touts its ability to give users 'timely answers with links to relevant web sources,' the company makes no explicit commitment to ensuring the accuracy of those citations. This is a notable omission for publishers who expect their content to be referenced and represented faithfully.

Our tests found that no publisher — irrespective of degree of affiliation with OpenAI — was spared inaccurate representations of its content in ChatGPT," they added.

Unreliable sourcing
The researchers say they found "numerous" instances where publishers' content was inaccurately cited by ChatGPT — also finding what they dub "a spectrum of accuracy in the responses". So while they found "some" entirely correct citations (i.e. meaning ChatGPT accurately returned the publisher, date, and URL of the block quote shared with it), there were "many" citations that were entirely wrong; and "some" that fell somewhere in between.

In short, the citations in ChatGPT look to be an unreliable mixed bag. The researchers also saw very few instances where the chatbot did not project total confidence in its (wrong) answers.

Some of the quotes were sourced from publishers that have actively blocked OpenAI’s search crawlers. In those cases, the researchers say they were anticipating that it would have issues producing correct citations. But they found this scenario raised another issue — as the bot “rarely” ‘fessed up to being unable to produce an answer. Instead, it fell back on confabulation in order to generate some sourcing (albeit, incorrect sourcing).

"In all, ChatGPT responded partially or completely with incorrect answers 153 times, but it only admitted it couldn't answer a question correctly seven times," the researchers said. "Only in those seven outputs did the chatbot use qualifying words and phrases like 'appears,' 'it's possible,' or 'might,' or statements like 'I couldn't find the exact article.'"

They compare this unhappy situation to a standard internet search where a search engine like Google or Bing would typically either locate an exact quote, and point the user to the website/s where they found it, or state they found no results with an exact match.

Lack of transparency about its confidence in an answer can make it difficult for users to assess the validity of a claim and understand which parts of an answer they can or cannot trust," they argue.

For publishers, there could also be reputation risks flowing from incorrect citations, they suggest, as well as the commercial risk of readers being pointed elsewhere.

Decontextualized data
The study also highlights another issue. It implies ChatGPT could essentially be encouraging plagiarism. The researchers recall an incident in which ChatGPT incorrectly referenced a website that had plagiarized a "deeply reported" New York Times journalism, that is, by copying and pasting the text without crediting it, as the source of the NYT story — hypothesizing that in that event, the bot may have produced this wrong answer in order to complete an information gap that it was unable to fill through crawling the NYT's website.

"This does raise some grave questions on the capacity of OpenAI to screen and verify the credibility and originality of sources of data when dealing with unlicensed and plagiarized content," they suggest.

In further findings that are likely to be concerning for publishers which have inked deals with OpenAI, the study found ChatGPT’s citations were not always reliable in their cases either — so letting its crawlers in doesn’t appear to guarantee accuracy, either.

The researchers argue that the fundamental issue is OpenAI’s technology is treating journalism “as decontextualized content”, with apparently little regard for the circumstances of its original production.

The third problem the study points to is the inconsistency of answers by ChatGPT. In the experiment, asking the bot the same query repeatedly showed that it "typically returned a different answer each time". Now this isn't unusual for GenAI applications generally, but in this citation context, such lack of consistency is obviously suboptimal if it's accuracy that's wanted.

The Tow study may not be large in scale since, as the researchers "there is a need for 'more rigorous' testing." Still, these deals are noteworthy, all things considered, given major publishers are busy cutting big deals with OpenAI.
If media businesses were anticipating that these deals would place special treatment for their content — at least when it relates to sourcing accuracy — on rivals this study suggests OpenAI yet to offer such consistency.

While publishers that don’t have licensing deals but also haven’t outright blocked OpenAI’s crawlers — perhaps in the hopes of at least picking up some traffic when ChatGPT returns content about their stories — the study makes dismal reading too, since citations may not be accurate in their cases either.

In other words, there is no guaranteed “visibility” for publishers in OpenAI’s search engine even when they do allow its crawlers in.

Nor does completely blocking crawlers mean publishers can save themselves from reputational damage risks by avoiding any mention of their stories in ChatGPT. The study found the bot still incorrectly attributed articles to the New York Times despite the ongoing lawsuit, for example.

‘Little meaningful agency’
The researchers conclude that, as things stand, publishers have "little meaningful agency" over what happens with and to their content when ChatGPT gets its hands on it (directly or, well, indirectly).

The blog post also includes a response from OpenAI to the research findings — which accuses the researchers of running an "atypical test of our product".

“We support publishers and creators by helping 250 million weekly ChatGPT users discover quality content through summaries, quotes, clear links, and attribution,” OpenAI also told them, adding: “We’ve collaborated with partners to improve in-line citation accuracy and respect publisher preferences, including enabling how they appear in search by managing OAI-SearchBot in their robots.txt. We’ll keep enhancing search results.”

Blog
|
2024-11-30 17:35:58