LLM generated responses to mitigate the impact of hate speech

StatusVoR
dc.abstract.enIn this study, we explore the use of Large Language Models (LLMs) to counteract hate speech. We conducted the first real-life A/B test assessing the effectiveness of LLM-generated counter-speech. During the experiment, we posted 753 automatically generated responses aimed at reducing user engagement under tweets that contained hate speech toward Ukrainian refugees in Poland.Our work shows that interventions with LLM-generated responses significantly decrease user engagement, particularly for original tweets with at least ten views, reducing it by over 20%. This paper outlines the design of our automatic moderation system, proposes a simple metric for measuring user engagement and details the methodology of conducting such an experiment. We discuss the ethical considerations and challenges in deploying generative AI for discourse moderation.
dc.affiliationInstytut Nauk Społecznych
dc.conferenceThe 2024 Conference on Empirical Methods in Natural Language Processing
dc.conference.countryStany Zjednoczone
dc.conference.datefinish2024-11-16
dc.conference.datestart2024-11-12
dc.conference.placeMiami
dc.conference.seriesEmpirical Methods in Natural Language Processing
dc.conference.seriesshortcutEMNLP
dc.conference.seriesweblinkhttps://sigdat.org/
dc.conference.shortcutEMNLP 2024
dc.conference.weblinkhttps://2024.emnlp.org/
dc.contributor.authorPodolak, Jakub
dc.contributor.authorŁukasik, Szymon
dc.contributor.authorBalawender, Paweł
dc.contributor.authorOssowski, Jan
dc.contributor.authorPiotrowski, Jan
dc.contributor.authorBąkowicz, Katarzyna
dc.contributor.authorSankowski, Piotr
dc.contributor.editorAl-Onaizan, Yaser
dc.contributor.editorBansal, Mohit
dc.contributor.editorChen, Yun-Nung
dc.date.access2024-12-05
dc.date.accessioned2024-12-05T09:07:26Z
dc.date.available2024-12-05T09:07:26Z
dc.date.created2024
dc.date.issued2024
dc.description.accesstimeat_publication
dc.description.physical15860–15876
dc.description.versionfinal_published
dc.identifier.doi10.18653/v1/2024.findings-emnlp.931
dc.identifier.isbn979-8-89176-168-1
dc.identifier.urihttps://share.swps.edu.pl/handle/swps/1141
dc.identifier.weblinkhttps://aclanthology.org/2024.findings-emnlp.931/
dc.languageen
dc.pbn.affiliationnauki o komunikacji społecznej i mediach
dc.publisherAssociation for Computational Linguistics
dc.publisher.ministerialAssociation for Computational Linguistics
dc.relation.bookFindings of the Association for Computational Linguistics: EMNLP 2024
dc.relation.pages17203
dc.rightsOther
dc.rights.questionYes_rights
dc.share.monoOTHER
dc.swps.sciencecloudsend
dc.titleLLM generated responses to mitigate the impact of hate speech
dc.title.journalFindings of the Association for Computational Linguistics: EMNLP 2024
dc.typeMonographyChapterConference
dspace.entity.typeBook