LLM generated responses to mitigate the impact of hate speech
LLM generated responses to mitigate the impact of hate speech
StatusVoR
Alternative title
Authors
Podolak, Jakub
Łukasik, Szymon
Balawender, Paweł
Ossowski, Jan
Piotrowski, Jan
Bąkowicz, Katarzyna
Sankowski, Piotr
Monograph
Findings of the Association for Computational Linguistics: EMNLP 2024
Monograph (alternative title)
Editor
Al-Onaizan, Yaser
Bansal, Mohit
Chen, Yun-Nung
Date
2024
Place of publication
Publisher
Association for Computational Linguistics
Journal title
Findings of the Association for Computational Linguistics: EMNLP 2024
Volume
Pages
15860–15876
ISSN
ISBN
979-8-89176-168-1
eISBN
Series
Series number
ISSN of series
Access date
2024-12-05
Remarks
Abstract PL
Abstract EN
In this study, we explore the use of Large Language Models (LLMs) to counteract hate speech. We conducted the first real-life A/B test assessing the effectiveness of LLM-generated counter-speech. During the experiment, we posted 753 automatically generated responses aimed at reducing user engagement under tweets that contained hate speech toward Ukrainian refugees in Poland.Our work shows that interventions with LLM-generated responses significantly decrease user engagement, particularly for original tweets with at least ten views, reducing it by over 20%. This paper outlines the design of our automatic moderation system, proposes a simple metric for measuring user engagement and details the methodology of conducting such an experiment. We discuss the ethical considerations and challenges in deploying generative AI for discourse moderation.
Abstract other
Keywords PL
Keywords EN
Keywords other
Conference edition name
The 2024 Conference on Empirical Methods in Natural Language Processing
Conference place
Miami
Start date
2024-11-12
Finish date
2024-11-16
Exhibition title
Place of exhibition (institution)
Exhibition curator
Organisational Unit
Instytut Nauk Społecznych
Type
Version
Version of Record
License type
This item is shared on a different license
Funder
Time range from
Time range to
Contact person name
item.page.relation.dataset
Related publication
Related publication
Grant/project name
Collections
Views
Acquisition Date26.12.2024
Downloads
Acquisition Date26.12.2024