1. When Crowd Meets Persona: Creating a Large-
Scale Open-Domain Persona Dialogue Corpus
Nov. 2022. @HCOMP (WiP)
Won Ik Cho¹*, Yoon Kyung Lee¹*, Seoyeon Bae¹, Jihwan Kim¹,
Sangah Park², Moosung Kim³, Sowon Hahn¹ and Nam Soo Kim¹
Seoul National University¹, DeepNatural AI², Smilegate AI³
2. Motivation
• Creating dialogue dataset
Multiple participants
High degree of freedom
• Difficulties of crowdsourcing
Researchers, moderators, and crowdworkers
Considerate scheduling and conflict resolution required
• Persona dialogue
Challenging and time-
consuming project
What should the task
managers keep in mind?
1
3. Our study
• Setting
Persona participants (actors) talk with user participants (workers)
Actors are hired, while workers are crowdsourced
User initiates the conversation, but persona leads the role
• Collection
Recruiting workers from crowdsourcing platform
Chat interface developed by the platform
2
5. Discussion
• Overview
RQ1: What should be considered in accommodating the construction
of a successful dialogue dataset?
• The organizer should acknowledge that it differs a lot from usual conversation
and it is crucial to handle unexpected and unwanted situations
RQ2: What is the role of the moderator in large-scale dialogue dataset
construction?
• Resolve conflicts after constructing a rapport with participants
• Be aware on the points participants feel uncomfortable, empathizing and
understanding the struggles
• Recruitment and financial support that affects the atmosphere
RQ3: Will such considerations help reach an intended goal of
construction?
• Shown indirectly using survey results, textual analysis, and generative model-
based experiments (to be further investigated)
4
Hi, we are joint team of Seoul national university, Deep natural AI, and smilegate AI, from South korea. Today we are going to present our work-in-progress project on persona dialogue creation with hired persona actors and crowdsourced users.
Our work first considers an innate difficulty of making up dialogue corpus, that two or more participants are necessarily involved with the construction process, and such process has so high degree of freedom that the quality control of the output may not be feasible. Also, in many corpus creation work these days corporate with crowdsourcing companies and the moderators there, who recruit the workers and manage their overall load and compensation. That is, the role of researchers, moderators and crowdworkers are all slightly different concerning the goal and scale, which requires a considerate scheduling and conflict resolution. In this light, we’ve come to a question that how should the persona dialogue corpus generation should be managed in practice.
In our study, we let persona participants, namely the actors, talk with user participants, the workers. Actors are hired here, while workers are crowdsourced. For every dialogue, the user initiates the conversation, but persona actors lead the role while they talk. The collection is processed by recruiting workers from the community of crowdsourcing platform, using the chat interface developed by the platform so as to check and manage the progress of the conversation. Freedom of conversation was guaranteed as much as possible, but users who make actors uneasy or feel eerieness were reported and set aside from the project. After the collection was finished, we analyzed the survey and interview done with participants and the moderator, and furthermore analyzed the constructed data.
We demonstrate the overall project flow. First, guidelines for the conversation are created by researchers, and the platform and moderator recruit actors and workers based on the guidelines. Here, actor plays the perfona they first decided, and the user initiate the conversation with the persona based on the profile they face, only if the pass the test prepared for user participants. When the conversation starts, The conversation lasts over 15 turns, and it is terminated by actors or workers if they feel fatigued or feel bored. They finish a survey after each conversation, and the reward is given afterward according to the amount of dialogue.
After the whole collection phase, we answered our research questions briefly. First, In accommodating the construction of a successful persona dialogue dataset, the organizer should acknowledge that it differs a lot from usual conversation and it is crucial to handle unexpected and unwanted situations, which could be moderated by a expertise moderator. To look more into this, the moderator should resolve conflicts after constructing a rapport with participants so that they can report whatever they feel uncomfortable, at the same time empathizing and understanding their struggles. Recruiting them and managing finance is also a crucial role in that such environments can deter or boost the atmosphere of the project. We've also found that the whole process led to high quality generation of the persona dialogue dataset and recently disclosed it online, but our work is to be further investigated with more thorough experimental criteria, and to be presented as a more mature work afterwards.
Our work is currently disclosed in the github of our funding agency, smilegate AI. also, we thank deep natural AI for building up the chat interface, recruiting participants from the worker pool, and moderating the whole process. Finally, we thank all our crowdworkers, including actors and users, who made up the whole dialogues and went through the survey and interviews. Since our work is in progress, we will soon disclose the whole analysis results with our full paper.