Projects | Daniela Occhipinti

This page brings together the three research threads that currently define my work. They are connected by one broader question: how can dialogue systems model people and social interaction in ways that are both computationally useful and conversationally meaningful?

PRODIGy

Research question: what kinds of speaker information actually help a dialogue model represent a person?

With PRODIGy, I worked on a dataset designed to move beyond flat persona descriptions by aligning dialogues with multiple kinds of speaker information, including biography, personality, communication style, and gender. The goal was to create a richer resource for studying how profile representations shape generation.

The main contribution of this project is a stronger foundation for persona-based dialogue research: instead of treating profile information as a single text field, it opens the door to studying which aspects of identity and style matter most, when, and why.

Takeaway: better dialogue personalization starts with better ways of representing people.

Paper · Code and data · News

HED-IT

Research question: how much does data quality matter when fine-tuning dialogue models?

HED-IT focuses on the role of human post-editing in dialogue data creation. I studied how machine-generated and human-edited dialogues differ, and how those differences affect downstream model behavior during fine-tuning.

This project connects model performance to the quality of the data behind it. Rather than assuming that more data is always better, it asks whether better curated conversational data changes what models learn and how users perceive their outputs.

Takeaway: conversational quality is shaped not only by model size, but by the care invested in the training data.

Paper · Code and data · News

Interlocutor-Aware Persona Dialogue

Research question: what changes when a dialogue model must adapt to both a speaker and their interlocutor?

My ACL 2025 work studies how dialogue generation changes when conversational systems are asked to model not only a target speaker profile, but also the person they are speaking to. This shifts the framing from isolated personas to interaction, familiarity, and relational context.

The project examines whether models generalize across topics, how they behave with familiar versus unfamiliar interlocutors, and when persona consistency reflects genuine adaptation rather than superficial copying.

Takeaway: believable dialogue depends on social context, not just a single speaker profile.

Paper · Acceptance news · Poster presentation