About me

I am a third year Ph.D. student at the University of Washington, advised by Yulia Tsvetkov. I am also a visiting researcher at Meta FAIR, working with Asli Celikyilmaz and Luke Zettlemoyer.

My research is in NLP and cognitive modeling, with a focus on personalization and proactive learning (specifically, how AI should ask questions). I'm interested in how humans and models reason, communicate uncertainty, and make decisions, with applications in healthcare AI and education. My broader goal is building more capable systems that are cognitively and socially aligned for safer, more equitable care.

Research interests: Personalization, Proactive Learning, AI for Health, Safety & Reliability, and more!

Before grad school, I received my B.S. and M.S.E. at Johns Hopkins with majors in Cognitive Science (linguistics focus), Computer Science, and Applied Mathematics (statistics focus). I worked as a research assistant at JHU CLSP advised by Philipp Koehn and Kenton Murray.

Please contact me at stelli [at] cs.washington.edu if you are interested in my work!

  • Click here to view my CV (updated July 25)

     
  • I'm thinking about...

    • design icon

      Proactive Reasoning

      How to identify and proactively seek information using LLMs to improve model safety & reliability with statistical guarantee. How to make LLMs ask good questions? How do we model "intuition" in expert domains like medicine?

    • design icon

      Socially-Intelligent Personalization

      Modeling how different social groups express health concerns and interpret medical advice. Aiming to personalize AI systems for more equitable, culturally-aware health communication.

    News

    1. 2026-06

      Invited talk at MSR AI Frontiers on "EvoLM: Self-Evolving Language Models." [Slides].

    2. 2026-05

      Check out our new paper "EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics" that surfaces latent evaluative knowledge from the model through rubrics to self-improve.

    3. 2026-04

      Check out our new paper "HorizonBench: Long-Horizon Personalization with Evolving Preferences" that builds an infinite data generator for long-horizon (2-6 months) user-AI interactions and preference following benchmark.

    4. 2026-03

      Guest lecture at UBC NLP: "Proactive Question Asking for Reliable and Personalized LLMs." [Slides].

    5. 2026-02

      Check out our new paper "Cold-Start Personalization via Training-Free Priors from Structured World Models" that learns priors from population preferences for interactive personalization.

    6. 2025-11

      Check out our new paper "Cognitive Foundations for Reasoning and Their Manifestation in LLMs" that extracts and analyze patterns in LLM and human reasoning.

    7. 2025-11

      Guest lecture at UT Austin Computational Discourse and NLG class on PrefPalette [Slides].

    8. 2025-08

      "PrefPalette: Personalized Preference Modeling with Latent Attributes" won a Spotlight at COLM 2025🏆!

    9. 2025-06

      Invited talk at Cohere Labs on Spurious Rewards [YouTube] [Slides].

    Publications

    Below is a list of projects for which I was very involved in (lead/co-lead/contributed significantly). For a more comprehensive list of papers, check out my Google Scholars page. I also try to record the time that I spent on each project in case anyone finds it helpful!

    • EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics

      RL & Post-Training

      EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics

      Preprint

      Shuyue Stella Li, Rui Xin, Teng Xiao, Yike Wang, Rulin Shao, Zoey Hao, Melanie Sclar, Sewoong Oh, Faeze Brahman, Pang Wei Koh, Yulia Tsvetkov

      We use define a simple "discriminative utility" reward to train a rubric generator, which share parameters with a policy that is trained with the rubric reward to self-improve.

    • HorizonBench: Long-Horizon Personalization with Evolving Preferences

      Personalization

      HorizonBench: Long-Horizon Personalization with Evolving Preferences

      Preprint

      Shuyue Stella Li, Bhargavi Paranjape, Kerem Oktar, Zhongyao Ma, Gelin Zhou, Lin Guan, Na Zhang, Sem Park, Lin Chen, Diyi Yang, Yulia Tsvetkov, Asli Celikyilmaz

      We build a state-first data generator for 2-6 months human-AI interaction and a preference tracking benchmark to show frontier models are bad at state-tracking.

    • Cold-Start Personalization via Training-Free Priors from Structured World Models

      Personalization

      Cold-Start Personalization via Training-Free Priors from Structured World Models

      ICML 2026

      Avinandan Bose*, Shuyue Stella Li*, Faeze Brahman, Pang Wei Koh, Simon Shaolei Du, Yulia Tsvetkov, Maryam Fazel, Lin Xiao, Asli Celikyilmaz

      When no user-specific data is available, we propose to learn priors from population preferences for interactive personalization.

    • Cognitive Foundations for Reasoning and Their Manifestation in LLMs

      Cognitive Reasoning

      Cognitive Foundations for Reasoning and Their Manifestation in LLMs

      Preprint

      Priyanka Kargupta*, Shuyue Stella Li*, Haocheng Wang, Jinu Lee, Shan Chen, Orevaoghene Ahia, Dean Light, Thomas L. Griffiths, Max Kleiman-Weiner, Jiawei Han, Asli Celikyilmaz, Yulia Tsvetkov

      What is reasoning? We introduce a taxonomy for cognitive elements used in human reasoning and analyze how LLMs exhibit these elements in their reasoning processes.

    • 🪩PrefDisco: Benchmarking Proactive Personalized Reasoning

      Personalization

      🪩PrefDisco: Benchmarking Proactive Personalized Reasoning

      ICLR 2026

      Shuyue Stella Li*, Avinandan Bose*, Faeze Brahman, Simon Shaolei Du, Pang Wei Koh, Maryam Fazel, Yulia Tsvetkov

      We propose PrefDisco, a benchmark for proactive personalized reasoning where models need to ask questions to the user to learn their preferences then adapt their reasoning and response accordingly.

    • PrefPalette: Personalized Preference Modeling with Latent Attributes

      Personalization

      PrefPalette: Personalized Preference Modeling with Latent Attributes

      COLM 2025 Spotlight 🏆

      Shuyue Stella Li, Melanie Sclar, Hunter Lang, Ansong Ni, Jacqueline He, Puxin Xu, Andrew Cohen, Chan Young Park, Yulia Tsvetkov, Asli Celikyilmaz

      Grounded in multi-attribute decision making from cognitive science, we propose PrefPalette, a framework for learning preference models with additional signals from latent social attributes (e.g., humor, cultural values).

    • Spurious Rewards: Rethinking Training Signals in RLVR

      RL & Post Training

      Spurious Rewards: Rethinking Training Signals in RLVR

      ICML 2026

      Rulin Shao*, Shuyue Stella Li*, Rui Xin*, Scott Geng*, Yiping Wang, Sewoong Oh, Simon Shaolei Du, Nathan Lambert, Sewon Min, Ranjay Krishna, Yulia Tsvetkov, Hannaneh Hajishirzi, Pang Wei Koh, Luke Zettlemoyer

      We show that RLVR on incorrect and even random rewards can boost model performance on some models but not others, and investigate why.

    • A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

      Privacy & Safety

      A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage

      SaTML 2026

      Rui Xin, Niloofar Mireshghallah, Shuyue Stella Li, Michael Duan, Hyunwoo Kim, Yejin Choi, Yulia Tsvetkov, Sewoong Oh, Pang Wei Koh

      We evaluate the effectiveness of sanitization methods in removing sensitive information from text data and show previously undetected semantic leakage.

    • ALFA: attribute-guided alignment for question-asking

      Clinical Reasoning, Post Training

      ALFA: Aligning LLMs to Ask Good Questions - A Case Study in Clinical Reasoning

      COLM 2025

      Shuyue Stella Li*, Jimin Mun*, Faeze Brahman, Jonathan S. Ilgen, Yulia Tsvetkov, Maarten Sap

      Guided by attributes from clinical communications and psychology, we generate synthetic paired data to align LLMs to ask good questions.

    • ValueScope: social norms and values detector

      Social Reasoning

      ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions

      EMNLP 2024

      Chan Young Park*, Shuyue Stella Li*, Hayoung Jung*, Svitlana Volkova, Tanushree Mitra, David Jurgens, Yulia Tsvetkov

      We developed a computational framework to model and discover implicit social norms and values in online communities at scale.

    • MediQ: interactive medical consultation framework

      Clinical Reasoning

      MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning

      Neurips 2024

      Shuyue Stella Li, Vidhisha Balachandran, Shangbin Feng, Jonathan Ilgen, Emma Pierson, Pang Wei Koh, Yulia Tsvetkov

      We establish a novel framework for interactive information seeking to enhance reliable medical reasoning abilities in LLMs.

    Photography