Yin Yuan

Welcome!

I am a Postdoctoral Researcher at the China Data Lab at the 21st Century China Center, University of California, San Diego, where I also received my Ph.D. in Political Science in 2024. My research focuses on propaganda, digital politics, and political communication, with a primary focus on China and authoritarian regimes.

I study the formal and linguistic aspects of propaganda, examining and empirically tracing how regimes use institutional mechanisms–often with deep historical roots–to replicate their messages across media platforms in contemporary China—including commercial newspapers, the internet, and even large language models (LLMs). My recent article (co-authored with Hannah Waight, Margaret E. (Molly) Roberts, and Brandon Stewart) in the Proceedings of the National Academy of Sciences applies a text reuse approach to detect what we call “scripted propaganda” by tracing the percentage of articles copied verbatim across more than 700 publications. Another collaborative project, published in Nature, explores how the state’s capacity to shape its media environment has downstream effects on LLM outputs through training data.

The other body of my researches explore the role of formal and linguistic aspects of propaganda in elite politics in China. My dissertation examines how the seemingly rigid and archaic official discourse of the Chinese Communist Party (CCP)—often called “hard propaganda”—serves as an intra-elite communication system, with political catchphrases (tifa, 提法) as its fundamental units. Using a corpus of 2.5 million ideological texts (1921–2019), I developed a method to extract over 25,000 catchphrases. My findings show that subtle wording changes in these catchphrases can send important political and policy signals, significantly shifting their ideological and policy positions, as quantified by word embedding-based methods. Survey evidence further indicates that these nuances are primarily recognized by political elites while remaining obscure to the general public. This enables elites to communicate sensitive political information “in plain sight” while ensuring a gradual and controlled release of ongoing elite conflicts to broader circles of political power.

Methodologically, I specialize in Natural Language Processing and Machine Learning. I also have strong expertise in interactive data visualization and in developing interactive web applications for experimental purposes using tools such as Shiny and Gradio. I am proficient in Python and R for data analysis, modeling, and visualization, experienced in SQL for data management, and skilled in command-line and version control tools, including Git and Bash.