Keynote Speakers
We are delighted to announce that the esteemed speakers listed below have graciously accepted our invitation to deliver keynote speeches at *SEM 2025:
Keynote 1: Yue Zhang

On reasoning and generalization of LLMs
Abstract: LLMs and human reasoning have many differences. In this talk, I will share some insights from the perspective of causality in reasoning. I will start with some results on the linguistic reasoning of different language models, showing that the statistics across dataset differs from human. I will further show some discussion that the CoT of LLMs lack causality, which is a fundamental cause for the difference. I will then move on to show the consequence, namely lack of out-of-distribution generalization for current LMs. There will be evidences on different benchmarks for NLU, and also on the ICL and CoT for LLMs. I will share some insights how to address such issues from data curation and reinforcement learning. Then I will show how to leverage such difference for automatically detecting LLM generated text.
Bio: Yue Zhang is a tenured Professor at Westlake University (https://frcchang.github.io). His research interests include fundamental NLP and its machine learning algorithms, and his recent research focuses on LLM reasoning and AI scientist. His major contributions to the field include machine learning algorithms for structured prediction (e.g., parsing and IE), neural NLP models (i.e., lattice and graph LSTM), and generalization for NLP/LM (e.g., OOD and logical reasoning). He co-authored the Cambridge University Press book “Natural Language Processing – a Machine Learning Perspective”. Yue Zhang served as a PC co-chair for CCL 2020, EMNLP 2022, and LMG 2025. He served as tutorial co-chair for ACL 2020, and test-of-time award committee co-chairs for ACL 2024 and 2025. He currently serves as editor-in-chief for the Large Language Model journal, action editor for TACL, and associate editor for TASLP, TALLIP, TBD, and CSL.
Keynote 2: Thamar Solorio

Every Language Is a World: Beyond Data Scarcity in AI
Abstract: What does it truly mean for a language to be classified as “low-resource,” and how does this classification influence the technologies we develop? This talk presents a comprehensive research journey that challenges conventional approaches to AI for African languages through three interconnected studies. I begin with a large-scale review of 150 NLP papers that reveals that “low-resource” encompasses four interrelated dimensions: socio-political factors, human and digital resources, artifacts, and community agency. This work demonstrates that languages with similar data volumes can have different technological needs, and that understanding these nuances is key to meaningful progress. Building on this foundation, I will then present recent work that highlights how state-of-the-art language models achieve near-random performance on cultural knowledge that native speakers consider fundamental. This failure is not simply a result of data scarcity; it reveals significant limitations in how models process figurative language, cultural context, and linguistic features that are unique to African languages. Finally, I will present the first large-scale cultural question-answering dataset for 15 African languages, incorporating both text and speech modalities. Our evaluation shows striking disparities: while text-based capabilities do not transfer to speech understanding, some models exhibit a 91% degradation in performance on audio. Together, this research demonstrates that addressing African language representation in AI requires moving beyond simplistic “low-resource” labels. We need to understand the complex interplay of cultural knowledge, multimodal capabilities, and community needs. By reframing the challenge from one of data scarcity to one of systematic exclusion and misalignment, we can chart more effective paths toward AI systems that genuinely serve Africa’s linguistically and culturally diverse populations.
Bio: Thamar Solorio is a professor of Natural Language Processing (NLP) at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) where she also serves as Senior Director of Graduate Student Education and Postdoctoral Affairs. Before joining MBZUAI she was a professor of Computer Science at the University of Houston. She is founder and director of RiTUAL lab. Her research interests include NLP for low-resource settings and multilingual data, including code-switching data. More recently, she has been exploring language and vision problems, focusing on developing inclusive NLP, for example in the context of artificial social intelligence. She served two terms as an elected board member of the North American Chapter of the Association of Computational Linguistics (NAACL), was PC co-chair for NAACL 2019, and recently stepped down from being co-Editor in Chief of the ACL Rolling Review Initiative (ARR). She was the general chair of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Keynote 3: Yuki Arase

Grounding Text Complexity Control in Defined Linguistic Difficulty
Abstract: As large language models excel in generating fluent text, a fundamental question arises: how can we make such language understandable and appropriate for readers with diverse proficiency levels? While “text simplification” has long been studied in NLP, most approaches rely on vague or task-specific notions of difficulty. This talk introduces a framework that grounds text complexity control in a defined scale of linguistic difficulty, enabling models to adapt syntax, vocabulary, and meaning to reader proficiency in a principled and measurable way. I will present computational methods for sentence difficulty prediction, contextual lexical simplification, and proficiency-aligned rewriting, which optimize readability and meaning preservation. By moving from ad-hoc heuristics to standardized definitions, we establish text complexity control as a principled framework for studying how linguistic form, meaning, and proficiency interact in language understanding.
Bio: Yuki Arase is a professor at the School of Computing, Institute of Science Tokyo (aka Tokyo Institute of Technology), Japan. After obtaining her PhD in Information Science from Osaka University in 2010, she worked for Microsoft Research Asia, where she started NLP research that continues to captivate her to this day. Her research interests focus on paraphrasing and NLP technology for language education and healthcare.