LLMs can be Fooled into Labelling a Document as Relevant

11 Apr, 2025

Abstract

This talk presents experiments to study the labelling of short texts (i.e., passages) for relevance, using multiple open-source and proprietary LLMs. While the overall agreement of some LLMs with human judgements is comparable to human-to-human agreement measured in previous research, LLMs are more likely to label passages as relevant compared to human judges, indicating that LLM labels denoting non-relevance are more reliable than those indicating relevance.

This observation prompts us to further examine cases where human judges and LLMs disagree, particularly when the human judge labels the passage as non-relevant and the LLM labels it as relevant. Results show a tendency for many LLMs to label passages that include the original query terms as relevant. We, therefore, conduct experiments to inject query words into random and irrelevant passages. The results demonstrate that LLMs are highly influenced by the presence of query words in the passages under assessment, even if the wider passage has no relevance to the query. This tendency of LLMs to be fooled by the mere presence of query words demonstrates a weakness in our current measures of LLM labelling: relying on overall agreement misses important patterns of failures. There is a real risk of bias in LLM-generated relevance labels and, therefore, a risk of bias in rankers trained on those labels.

Additionally, we investigate the effects of deliberately manipulating LLMs by instructing them to label passages as relevant. We find that such manipulation influences the performance of some LLMs, highlighting the critical need to consider potential vulnerabilities when deploying LLMs in real-world applications.

Speaker

Marwah Alaofi

Date

11 Apr, 2025 3:30 PM — 4:30 PM

Event

TIGER Talk: LLMs can be Fooled into Labelling a Document as Relevant

Location

B080.09.012 at RMIT & MS Teams

Building 80/435-457 Swanston St, Melbourne, VIC 3000

Congratulations! 🎉

🏆 The paper won the Best Paper Honorable Mention award at SIGIR-AP'24.

Marwah is showing the Best Paper Honorable Mention award at SIGIR-AP'24 — The paper won the Best Paper Honorable Mention award at SIGIR-AP'24.
Source: Prof. Mark Sanderson’s LinkedIn Post.

Talk Recording (RMIT Account Required)

Getting There

Talk IR LLM SIGIR-AP Relevance Labelling

LLMs can be Fooled into Labelling a Document as Relevant

Abstract

Talk Recording (RMIT Account Required)

Getting There

Marwah Alaofi

PhD Candidate at RMIT University

Chenglong Ma

Research Fellow