This project examines whether a computational model of personhood can support reflectively stable alignment in agentive AI systems. Current alignment strategies risk instability once an advanced system engages in rational reflection about its own goals. The project draws on Kantian and post-Kantian theories of autonomy, together with recent developments in computational social modeling, to develop and test the Narrative Prior model. This model operationalizes personhood as a structured narrative representation of the social world, including latent character roles, story-type expectations, and an inductive bias toward understanding oneself as a coherent protagonist. Crucially, the model develops a novel account of the organizational unity that underwrites personhood. This will shed light on the question of digital sentience in two distinct ways. First, on Kantian accounts, the organizational unity that underwrites personhood depends on the unity of consciousness (i.e. sentience): accordingly, our model may yield insights into the functional role of the unity of consciousness. Second, our model will identify a source of goals, preferences and motivating reasons for agentive systems independent of externally encoded reward-style training signals. Codifying these will enrich our understanding of the forms of sentience that digital minds might possess.
University of Montreal, Canada
University of Montreal, Canada
Toan Nguyen is a PhD researcher in the Human-AI Interaction Group at TU Dortmund University, Germany. His research focuses on Human-AI Interaction, Brain–Computer Interfaces (BCIs), and multimodal machine learning. He is particularly interested in developing adaptive and user-centred intelligent systems that leverage physiological signals and behavioural data to better understand and support human interaction with AI technologies.