LLM generated responses to mitigate the impact of hate speech

Podolak, Jakub; Łukasik, Szymon; Balawender, Paweł; Ossowski, Jan; Piotrowski, Jan; Bąkowicz, Katarzyna; Sankowski, Piotr

doi:10.18653/v1/2024.findings-emnlp.931

LLM generated responses to mitigate the impact of hate speech

StatusVoR

dc.abstract.en	In this study, we explore the use of Large Language Models (LLMs) to counteract hate speech. We conducted the first real-life A/B test assessing the effectiveness of LLM-generated counter-speech. During the experiment, we posted 753 automatically generated responses aimed at reducing user engagement under tweets that contained hate speech toward Ukrainian refugees in Poland.Our work shows that interventions with LLM-generated responses significantly decrease user engagement, particularly for original tweets with at least ten views, reducing it by over 20%. This paper outlines the design of our automatic moderation system, proposes a simple metric for measuring user engagement and details the methodology of conducting such an experiment. We discuss the ethical considerations and challenges in deploying generative AI for discourse moderation.
dc.affiliation	Instytut Nauk Społecznych
dc.conference	The 2024 Conference on Empirical Methods in Natural Language Processing
dc.conference.country	Stany Zjednoczone
dc.conference.datefinish	2024-11-16
dc.conference.datestart	2024-11-12
dc.conference.place	Miami
dc.conference.series	Empirical Methods in Natural Language Processing
dc.conference.seriesshortcut	EMNLP
dc.conference.seriesweblink	https://sigdat.org/
dc.conference.shortcut	EMNLP 2024
dc.conference.weblink	https://2024.emnlp.org/
dc.contributor.author	Podolak, Jakub
dc.contributor.author	Łukasik, Szymon
dc.contributor.author	Balawender, Paweł
dc.contributor.author	Ossowski, Jan
dc.contributor.author	Piotrowski, Jan
dc.contributor.author	Bąkowicz, Katarzyna
dc.contributor.author	Sankowski, Piotr
dc.contributor.editor	Al-Onaizan, Yaser
dc.contributor.editor	Bansal, Mohit
dc.contributor.editor	Chen, Yun-Nung
dc.date.access	2024-12-05
dc.date.accessioned	2024-12-05T09:07:26Z
dc.date.available	2024-12-05T09:07:26Z
dc.date.created	2024
dc.date.issued	2024
dc.description.accesstime	at_publication
dc.description.physical	15860–15876
dc.description.version	final_published
dc.identifier.doi	10.18653/v1/2024.findings-emnlp.931
dc.identifier.isbn	979-8-89176-168-1
dc.identifier.uri	https://share.swps.edu.pl/handle/swps/1141
dc.identifier.weblink	https://aclanthology.org/2024.findings-emnlp.931/
dc.language	en
dc.pbn.affiliation	nauki o komunikacji społecznej i mediach
dc.publisher	Association for Computational Linguistics
dc.publisher.ministerial	Association for Computational Linguistics
dc.relation.book	Findings of the Association for Computational Linguistics: EMNLP 2024
dc.relation.pages	17203
dc.rights	Other
dc.rights.question	Yes_rights
dc.share.mono	OTHER
dc.swps.sciencecloud	send
dc.title	LLM generated responses to mitigate the impact of hate speech
dc.title.journal	Findings of the Association for Computational Linguistics: EMNLP 2024
dc.type	MonographyChapterConference
dspace.entity.type	Book

LLM generated responses to mitigate the impact of hate speech

Files

Original bundle

License bundle

Collections

Dublin Core Metadata LLM generated responses to mitigate the impact of hate speech

Options

Files

Original bundle

License bundle

Collections

LLM generated responses to mitigate the impact of hate speech