| Date | Speaker | Title |
|---|---|---|
| October 3 | Brihi Joshi | Towards Richer User Signals for Personalization |
| October 10 | Jacob Andreas | Just Asking Questions |
| October 17 | Aviral Kumar | The Importance of Exploration for Test-Time Scaling |
| October 24 | Parisa Kordjamshidi | Reasoning under Uncertainty with Large Multimodal Language Models |
| October 31 | Rose Yu | Towards AI Co-Scientists: Agentic Foundation Models for Physical Universe |
| November 14 | Arman Cohan | Evaluating and Understanding LLMs: From Scientific Reasoning to Alignment as Judges |
| November 21 | Sherry Yang | Learning World Models and Agents for High-Cost Environments |
🚀 Upcoming Talks
Evaluating and Understanding LLMs: From Scientific Reasoning to Alignment as Judges
November 14, 2025, 2:00 PM PT
https://ucla.zoom.us/meeting/register/ApWOwuXdTl6nzsKfHOMW7w
Speaker Bio: Arman Cohan is an Assistant Professor of Computer Science at Yale University and a faculty researcher at the Allen Institute for AI (Ai2). His research focuses on advancing large language models and LLM-based systems, with particular interest in specialized domains including science.
Abstract: We present our recent work on evaluating and understanding large language models in scientific contexts and understanding them in context of evaluation-generation capabilities. First, we'll introduce SciArena, an open evaluation platform for literature-grounded scientific tasks that uses expert preferences to rank models on long-form, literature-grounded responses. The platform currently supports a broad set of open and proprietary models and has already accumulated a large pool of high-quality preferences. Using these data, we release SciArena-Eval, a meta-evaluation benchmark for training and stress-testing automated judges on science tasks. We will then turn to scientific problem solving. We discuss a holistic suite of scientific reasoning tasks, and a new framework for studying the role of knowledge in scientific problem solving and its interaction with reasoning. Our analysis shows that retrieving task-relevant knowledge from model parameters is the primary bottleneck for science reasoning; in-context external knowledge systematically helps even strong reasoning models; and improved verbalized reasoning increases a model’s ability to surface the right knowledge. Finally, if there is time, we will present a work on generation–evaluation consistency and show that models that judge well also tend to generate outputs that align with human preferences. This enables alignment benchmarking that evaluates models in their role as judges without scoring their generations directly.
Learning World Models and Agents for High-Cost Environments
November 21, 2025, 2:00 PM PT
Speaker Bio:Sherry Yang is an Assistant Professor of Computer Science at NYU Courant and a Staff Research Scientist at Google DeepMind. She researches in machine learning with a focus on reinforcement learning and generative modeling. Her current research interests include learning world models and agents, and their applications in robotics and AI for science. Her research has been recognized by the Best Paper award at ICLR and various media outlets such as VentureBeat and TWIML. She has organized tutorials, workshops, and served as Area Chairs at major conferences (NeurIPS, ICLR, ICML, CVPR). Prior to her current role, she was a post-doc at Stanford working with Percy Liang. She received her Ph.D. from UC Berkeley advised by Pieter Abbeel and Master’s and Bachelor's degrees from MIT.
Abstract:While neural networks have achieved superhuman performance in domains with low-cost simulations—from AlphaGo to LLMs—their application to the physical world is bottlenecked by a fundamental challenge: high-cost interactions. In fields like robotics, ML engineering, and the natural sciences, every action or experiment is expensive and time-consuming. This talk outlines strategies for building intelligent agents that learn efficiently despite these real-world constraints. We first address the physical world by showing how learned world models can serve as high-fidelity simulators for robotics, enabling extensive policy refinement before deployment on costly hardware. We then turn to complex engineering domains, where actions like running an ML program incur significant time delays, and discuss adaptations to reinforcement learning to make it robust for these long action settings. Finally, we show how compositional generative models can navigate the vast hypothesis spaces in science, intelligently proposing experiments to accelerate the pace of discovery.
Past Talks
Towards AI Co-Scientists: Agentic foundation models for physical universe
October 31, 2025, 2:00 PM
https://ucla.zoom.us/meeting/register/Fz5U2GKTRNGWfHTJXrCLKg
Speaker Bio: Dr. Rose Yu is an associate professor at the University of California San Diego, Department of Computer Science and Engineering. She earned her Ph.D. in Computer Sciences at USC in 2017. She was subsequently a Postdoctoral Fellow at Caltech. Her research focuses on advancing machine learning techniques for large-scale spatiotemporal data analysis, with applications to sustainability, health, and physical sciences. A particular emphasis of her research is on physics-guided AI which aims to integrate first principles with data-driven models. She is a recipient of the Presidential Early Career Award (PECASE)- the highest honor given by the White House to early career scientists, DARPA Young Faculty Award, Army ECASE Award, NSF CAREER Award, Hellman Fellow, Faculty Research Award from JP Morgan, Facebook, Google, Amazon, and Adobe, Several Best Paper Awards, Best Dissertation Award at USC. She was named as MIT Technology Review Innovators Under 35 in AI and Samsung AI researcher of the Year in 2025.
Abstract: Despite the huge success of foundation models across fields, they still suffer from hallucinations and can produce physically inconsistent outputs. Towards the grand dream of building AI co-scientist, it is critical to integrate physical laws, scientific simulations and formal methods to the learning and reasoning of these models. In this talk, I will discuss our on-going effort to develop agentic foundation models for physical sciences. I will demonstrate the use cases of our models on a variety of applications including climate science, epidemiology modeling and mathematics theorem proving.
Reasoning under Uncertainty with Large Multimodal Language Models
October 24, 2025, 2:00 PM
289, Engineering VI
Speaker Bio: Parisa Kordjamshidi is an Associate Professor of Computer Science and Engineering at Michigan State University. Her research focuses on Natural Language Processing, multimodal reasoning across vision and language, and neuro-symbolic learning. She received her Ph.D. from KU Leuven and conducted postdoctoral research at the University of Illinois Urbana-Champaign. She is a recipient of the NSF CAREER, Amazon Faculty Research, and Fulbright Scholar Awards, and her research team received the NAACL 2025 Outstanding Research Paper Award. Dr. Kordjamshidi serves as Associate Editor of JAIR, Co-editor in chief of ARR (2026), Action Editor for TACL and has held roles in organization committee of major conferences including ACL, NAACL, EACL, EMNLP, ECML-PKDD, and AAAI. Currently, she is a visiting Associate Professor at UCLA spending a part of her sabbatical.
Abstract: Uncertainty in intelligent models has multiple facets. One aspect concerns a model’s own uncertainty or confidence in its generated outputs. Another pertains to factual knowledge about uncertainty within specific concepts. For example, statements such as “10–20% of lifelong smokers will develop lung cancer” express factual uncertainty derived from statistical data analyses and represented in text. A key research question is whether language models can form and convey such factual uncertainties—integrating information, drawing on their internal knowledge, and aligning this with their confidence when expressing opinions. While addressing this question is highly challenging, I will present our research that explores related directions and the following research question: 1) How do language models understand uncertainty expressions in natural language and perform probabilistic inference over them? 2) How can models be trained to follow the principles of probabilistic reasoning when handling uncertainty in text? 3) How can today’s large models reason over uncertain text? specifically focusing on mapping language into formal probabilistic logic programs?, and finally, in the context of grounding natural language in the visual modality, 4) How can uncertainty in perception be explicitly represented in reasoning? specifically focusing on mappings to differentiable probabilistic programs.
The Importance of Exploration for Test-Time Scaling
October 17, 2025, 2:00 PM
https://ucla.zoom.us/meeting/register/1LfTUChHRWOA1zApfUcAlA
Speaker Bio: Aviral Kumar is an Assistant Professor of Computer Science and Machine Learning at Carnegie Mellon University, where he started in September 2024. He finished his PhD from UC Berkeley in 2023. His research focuses on reinforcement learning (RL), spanning fundamental advances in offline RL and scaling up RL, and more recently, the use of RL to train large language models (LLMs) and optimize test-time compute. He is a recipient of the Samsung AI Researcher of the Year Award (2024), the Schmidt Sciences AI2050 Early Career Fellowship (2024), and multiple best paper awards across workshops in RL, LLMs, and robotics at ICLR and ICML.
Abstract: RL has enabled language models to optimize long chains of thought (CoTs), yet the field still lacks clarity on what makes these approaches succeed. Conflicting empirical results across papers often stem from differences in setting rather than principle. In this talk, I will share our perspective: effective test-time scaling hinges on in-context exploration, the ability of a model to internally experiment and infer generalizable algorithmic procedures using additional compute at inference. I will describe two RL-based approaches for training models to perform such exploration. First, I will present e3, a curriculum-based recipe that teaches models to chain together existing skills in the base model, yielding the state-of-the-art <2B language model for math reasoning. Second, I will discuss cases where chaining alone is insufficient. There, we guide exploration by conditioning the model’s CoT on concise, self-generated natural language abstractions: short procedural summaries produced before launching into long reasoning traces. These abstractions help steer test-time search more effectively. Across tasks, conditioning RL on abstractions significantly improves in-context exploration and yields sustained performance gains even when conventional pass@k scaling plateaus.I will also talk briefly about some ongoing work that builds on these ideas to improve exploration for test-time scaling.
Just Asking Questions
October 10, 2025, 2:00 PM
https://ucla.zoom.us/meeting/register/1LfTUChHRWOA1zApfUcAlA
Speaker Bio:Jacob Andreas is an associate professor at MIT in the Department of Electrical Engineering and Computer Science as well as the Computer Science and Artificial Intelligence Laboratory. His research aims to understand the computational foundations of language learning, and to build intelligent systems that can learn from human guidance. Jacob earned his Ph.D. from UC Berkeley, his M.Phil. from Cambridge (where he studied as a Churchill scholar) and his B.S. from Columbia. He has received a Sloan fellowship, an NSF CAREER award, MIT's Junior Bose and Kolokotrones teaching awards, and paper awards at ACL, ICML and NAACL.
Abstract: In the age of deep networks, "learning" almost invariably means "learning from examples". We train language models with human-generated text and labeled preference pairs, mage classifiers with large datasets of images, and robot policies with rollouts or demonstrations. When human learners acquire new concepts and skills, we often do so with richer supervision, especially in the form of language---we learn new concepts from examples accompanied by descriptions or definitions, and new skills from demonstrations accompanied by instructions. Current language models (LMs) support a limited form of language-based teaching via prompting, but it remains challenging to use natural language supervision to apply global, persistent changes to learned models. This talk will focus on two recent projects aimed at more effectively supervising LMs using language: first, on *eliciting* new information (by asking questions to human users of LMs); second, on *updating* language models to incorporate new information (by using LMs to automatically ask and answer questions about information implied by, but not explicitly stated in, training data). If time permits, I'll also discuss some applications of these techniques to educational settings (where we can optimize questions for human, rather than machine, learning). This is joint work with Belinda Li, Alex Tamkin, Noah Goodman, Feyza Akyürek, Ekin Akyürek, Leshem Choshen, Derry Wijaya, and Alexis Ross.
Towards Richer User Signals for Personalization
October 3, 2025, 2:00 PM
289, Engineering VI
Speaker Bio: Brihi Joshi is a final-year PhD student in Computer Science at the University of Southern California, advised by Xiang Ren and Swabha Swayamdipta. Her research focuses on human-AI interaction, with an emphasis on personalization, where she designs and evaluates interactive systems that adapt to users in meaningful and useful ways. Her work has been supported by fellowships from Apple and Amazon.
Abstract: Personalization is gaining attention across domains, with different works exploring signals ranging from user demographics to interaction history. The talk will begin by showing that common signals such as prompts and instructions are underspecified for truly useful personalization, leading only to surface-level changes; for example, failing to adapt to learners with different educational backgrounds. We will then present how LLMs can be used to synthesize richer signals, such as user explanations, that drive more meaningful personalization. Finally, we will share ongoing work on training systems to actively elicit useful user signals, and touch upon open problems on how we can obtain and use these user signals.