Kai-Wei Chang's Lab

UCLA NLP Seminar Series - Archive

Past talks from our weekly seminar series.

Kai-Wei Chang's Lab

Past Talks from Spring 2025

JUN
6

On Interacting and Writing with LLMs

Person IconPhilippe Laban

Clock IconJune 6, 2025, 2:00 PM

Location Icon289, Engineering VI (Virtual Speaker)

Speaker Bio: Philippe Laban is a Research Scientist at Microsoft Research, based in New York. Philippe works at the intersection of NLP (~70%) and HCI (~30%), and is passionate about studying and building the future of reading and writing interfaces.

Abstract: In this two part talk, we will cover topics at the intersection of NLP and HCI. In the first part, we'll cover recent work on multi-turn evaluation of LLMs, with findings indicating that LLMs tend to get "lost in conversation" when user instructions are underspecified and require interactivity from the LLM. In the second part, we will turn to writing interfaces, first introducing an interface (InkSync) that can facilitate human-AI interaction for writing (and ensure factual correctness), and then turn to a high-level question: what does it mean for LLMs to produce creative writing, and how does AI writing compare to expert-level writing?

MAY
30

MrT5: Dynamic Token Merging for Efficient Byte-level Language Models

Person IconJulie Kallini

Clock IconMay 30, 2025, 2:00 PM

Location Icon289, Engineering VI

Speaker Bio: Julie Kallini is a second-year Ph.D. student in Computer Science at Stanford University, advised by Christopher Potts and Dan Jurafsky. Her research focuses on natural language processing (NLP), with an emphasis on computational linguistics/cognitive science, tokenization, and model architecture. Her paper, "Mission: Impossible Language Models," won Best Paper Award at ACL 2024. Her work is supported by the NSF Graduate Research Fellowship, the Stanford School of Engineering Graduate Fellowship, and the Stanford EDGE Fellowship. Before starting her Ph.D., Julie was a software engineer at Meta, where she worked on machine learning for advertisements. Julie graduated summa cum laude from Princeton University with a B.S.E. in Computer Science and a minor in Linguistics.

Abstract: Models that rely on subword tokenization have significant drawbacks, such as sensitivity to character-level noise like spelling errors and inconsistent compression rates across different languages and scripts. While character- or byte-level models like ByT5 attempt to address these concerns, they have not gained widespread adoption—processing raw byte streams without tokenization results in significantly longer sequence lengths, making training and inference inefficient. This work introduces MrT5 (MergeT5), a more efficient variant of ByT5 that integrates a token deletion mechanism in its encoder to dynamically shorten the input sequence length. MrT5 achieves up to 75% sequence length reduction with minimal performance loss, offering faster inference and competitive accuracy on multilingual and character-level tasks. Our approach presents a solution to the practical limitations of existing byte-level models.

MAY
13

Beyond Accuracy: Rethinking LLM Evaluation for Real-World, Interactive, and Culturally Inclusive Scenarios

Person IconAlice Oh

Clock IconMay 13, 2025, 4:15 PM

Location Icon3400 Boelter Hall (Co-located with CS 201)

Speaker Bio: Alice Oh is a Professor in the School of Computing at KAIST. Her major research area is at the intersection of natural language processing (NLP) and computational social science, with a recent focus on multilingual and multicultural aspects of LLMs. She collaborates with scholars in humanities and social sciences such as political science, education, and history. She has served as Program Chair for ICLR 2021 and NeurIPS 2022, General Chair for ACM FAccT 2022 and NeurIPS 2023, and DEI Chair for COLM 2024. She is the current President of SIGDAT which oversees EMNLP.

Abstract: Traditional evaluation methods for large language models (LLMs)—often centered on accuracy in static multiple-choice or short-answer questions—fail to capture the complexities of real-world use. As we envision LLMs serving users in dynamic, multicultural, and interactive scenarios, we must rethink what meaningful evaluation looks like. This talk presents our recent research to advance LLM evaluation through culturally aware, socially grounded, and interaction-driven benchmarks. We assess factual consistency across languages and regions, explore everyday knowledge in underrepresented cultures, and examine cultural inclusivity. We highlight that while LLMs may not appear to be socially biased in simple question-answering, they reveal their biases in generation tasks, which is more aligned with the actual LLM usage. We further introduce dynamic and interactive evaluation paradigms: LLM-as-an-Interviewer, which mimic real-time user interaction, and Flex-TravelPlanner, which evaluates planning adaptability under evolving and prioritized constraints. Together, these papers reveal that accuracy alone is insufficient; LLM evaluation must consider culture, context, interactivity, and adaptation. This talk calls for a broader evaluation agenda and presents these ten papers as starting points for more robust, inclusive, and realistic assessments.

MAY
16

Using New Data to Answer Old Questions

Person IconEmma Pierson

Clock IconMay 16, 2025, 2:00 PM

Location Icon289, Engineering VI (Virtual Speaker)

Speaker Bio: Emma Pierson is an assistant professor of computer science at UC Berkeley and core faculty in the Computational Precision Health program. She develops data science and machine learning methods to study inequality and healthcare. Her work has been recognized by best paper, poster, and talk awards, an NSF CAREER award, a Rhodes Scholarship, Hertz Fellowship, Rising Star in EECS, MIT Technology Review 35 Innovators Under 35, Forbes 30 Under 30 in Science, AI2050 Early Career Fellowship, and Samsung AI Researcher of the Year. Her research has been published in venues including Nature, JAMA, The New England Journal of Medicine, PNAS, Nature Medicine, ICML and ICLR, and she has also written for The New York Times, FiveThirtyEight, Wired, and various other publications.

Abstract: The explosion of new data sources has created new opportunities, and necessitated new machine learning methods, to answer old questions in the health and social sciences. This talk discusses three stories under this theme: first, using image data to quantify inequality in policing; second, using text data to interpretably predict target variables and characterize disparities; and third, using address data to infer fine-grained migration patterns.

MAY
9

Matryoshka Principles for Adaptive Intelligence

Person IconAditya Kusupati

Clock IconMay 09, 2025, 2:00 PM

Location Icon289, Engineering VI

Speaker Bio: Aditya Kusupati is a Staff Research Scientist at Google DeepMind. He got his PhD from University of Washington and B.Tech from IIT Bombay. Between his B.Tech and PhD, he was a Research Fellow at Microsoft Research. His research focuses broadly on next-generation machine learning models geared towards adaptive intelligence.

Abstract: The increasing scale of deep learning models presents significant challenges for deployment across diverse computational environments, each with unique constraints on latency, memory, and energy. Traditional approaches often necessitate training and maintaining separate models for each desired operating point, leading to substantial overhead. This talk explores the "Matryoshka" principle, a promising paradigm for achieving computational adaptivity within a single trained artifact. Inspired by Russian nesting dolls, Matryoshka methods embed coarser, computationally cheaper structures within finer, more powerful ones, enabling dynamic adjustment of resource usage at inference time. This technique is highly generalizable across various fundamental components of Machine Learning like Embeddings, Transformers and even the integer data type for Quantization. The community extended it beyond just these components and has seen a wide array of deployments both across industry and open-source, serving over a Billion users daily. Collectively, these works demonstrate how the Matryoshka principle facilitates unified training of highly flexible models that can seamlessly adapt their computational footprint post-training, significantly simplifying deployment and enhancing efficiency across heterogeneous hardware.

APR
18

Measuring Representation and Linguistic Variation in Hollywood

Person IconProf. David Bamman, University of California Berkeley

Clock IconApr 18, 2025, 2:00 PM

Location Icon289, Engineering VI

Speaker Bio: David Bamman is an associate professor in the School of Information at UC Berkeley, where he works in the areas of natural language processing and cultural analytics, applying NLP and machine learning to empirical questions in the humanities and social sciences. His research focuses on improving the performance of NLP for underserved domains like literature (including LitBank and BookNLP) and exploring the affordances of empirical methods for the study of literature and culture. Before Berkeley, he received his PhD in the School of Computer Science at Carnegie Mellon University and was a senior researcher at the Perseus Project of Tufts University. Bamman's work is supported by the National Endowment for the Humanities, National Science Foundation, an Amazon Research Award, and an NSF CAREER award.

Abstract: Movies are a massively popular and influential form of media, but their computational study at scale has largely been off-limits to researchers in the United States due to the Digital Millennium Copyright Act. In this talk, I'll discuss recent regulatory changes at the U.S. Copyright Office that allows for large-scale text and data mining of film, and describe our efforts to build a collection of 2,307 films representing the top 50 movies by U.S. box office over the period 1980 to 2022, along with award nominees. Building this collection allows us to carry out several large-scale computational studies of film; I'll discuss our work measuring changing patterns in the representation of gender and race/ethnicity over the past 43 years (where we see an increase in diversity over the past decade) and in leveraging it to model variation in emotional performances and choice of adverbial intensifiers over both narrative and historical time. This work illustrates a new frontier of the data-driven analysis of film at a large scale

APR
11

How to Build Your Multimodal LLMs: From Pre-training to Post-training and Agents

Person IconZhe Gan, Apple

Clock IconApril 11, 2025, 2:00 PM

Location Icon289, Engineering VI (Virtual Speaker)

Speaker Bio: Dr. Zhe Gan is a Research Scientist and Manager at Apple AI/ML, primarily working on building large-scale vision and multimodal foundation models. Before joining Apple, he was a Principal Researcher at Microsoft. He received his Ph.D. degree from Duke University in 2018. He has served as Area Chairs for top-tier AI conferences, and is a recipient of the Best Student Paper Honorable Mention Awards at CVPR 2021 and WACV 2021, respectively.

Abstract: Multimodal Large Language Models (LLMs) have become an increasing hot research topic. In this talk, I will present our recent works on how to build performant multimodal LLMs, along several fronts: (1) Pre-training, with focus on pre-training data choices, multimodal LLM pre-training and visual encoder pre-training; (2) Post-training, with focus on text-rich image understanding, visual referring and grounding, UI understanding, and reasoning; and (3) Generalist Agents, with focus on how to adapt multimodal LLMs into generalist embodied agents.

APR
4

Optimizing for Long-Term Vision in a Fast-Paced Research World

Person IconProf. Yulia Tsvetkov, University of Washington

Clock IconApr 4, 2025, 2:00 PM

Location Icon289, Engineering VI

Speaker Bio: Yulia Tsvetkov is an associate professor at the Paul G. Allen School of Computer Science & Engineering at University of Washington. Her research group works on fundamental advancements to large language models, multilingual NLP, and AI ethics/safety. This research is motivated by a unified goal: to extend the capabilities of human language technology beyond individual populations and across language boundaries, thereby making NLP tools available to all users. Prior to joining UW, Yulia was an assistant professor at Carnegie Mellon University and before that a postdoc at Stanford. Yulia is a recipient of NSF CAREER, Sloan Fellowship, Okawa Research award, and multiple paper awards and runner-ups at NLP, ML, and CSS conferences.

Abstract: The fast-paced race for larger language models—and the promise of financial gains for the winners—incentivizes heavier engineering with incremental ideas, often at the expense of long-term vision. While this approach advances industry products used by millions, it is not necessarily the right approach for academic research. In this talk, I will present novel task formulations and evaluation benchmarks that question mainstream assumptions about LLM architectures, training/alignment algorithms, and evaluation approaches. While proposed ideas contradict the common practice, they expose blind spots in LLMs reasoning abilities, and huge performance and fairness gaps in best commercial LLMs, highlighting directions for future research.

Past Talks from Winter 2025

JAN
24

Reasoning with Inference-Time Compute

Person IconSean Welleck

Clock IconJan 24, 2025, 2:00 PM

Location IconVirtual Talk

Speaker Bio: Sean Welleck is an Assistant Professor at Carnegie Mellon University, where he leads the Machine Learning, Language, and Logic (L3) Lab. His areas of focus include generative models, algorithms for large language models, and AI for code, science, and mathematics. Sean received a Ph.D. from New York University. He was a postdoctoral scholar at the University of Washington and the Allen Institute for Artificial Intelligence. He is a recipient of a NeurIPS 2021 Outstanding Paper Award, and two NVIDIA AI Pioneering Research Awards.

Abstract: One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute at training time leads to better final results. However, there is another lesser-mentioned scaling phenomenon, where adopting more sophisticated methods and/or scaling compute at inference time can result in significantly better outputs from LLMs. In this talk, I will talk about our lab's recent work on using inference-time strategies to enable better reasoning. This includes training models to think prior to steps of formal mathematical proving, leveraging strong evaluation models to enable easy-to-hard generalization, and inference scaling laws that optimally balance cost and performance. Together, these advances point to a new paradigm of scaling compute at inference time.

JAN
31

Understanding the role of data, scale and capacity in recent breakthroughs

Person IconSara Hooker

Clock IconJan 31, 2025, 2:00 PM

Location Icon289, Engineering VI

Speaker Bio: Sara Hooker leads Cohere For AI, the dedicated research arm of Cohere. Cohere For AI seeks to solve complex machine learning problems and supports fundamental research that explores the unknown. With a long track-record of impactful research at Google Brain, Sara brings a wealth of knowledge from across machine learning. Her work has focused on model efficiency training techniques and optimizing for models that fulfill multiple desired criteria -- interpretable, efficient, fair and robust. Sara leads a team of researchers and engineers working on making large language models more efficient, safe and grounded. Sara is currently on Kaggle's ML Advisory Research Board and serves on the World Economic Forum council on the Future of Artificial Intelligence.

FEB
7

Social Reinforcement Learning for pluralistic alignment and human-AI interaction

Person IconNatasha Jaques

Clock IconFeb 7, 2025, 2:00 PM

Location IconVirtual Talk

Speaker Bio: Natasha Jaques is an Assistant Professor of Computer Science and Engineering at the University of Washington, and a Senior Research Scientist at Google DeepMind. Her research focuses on Social Reinforcement Learning in multi-agent and human-AI interactions. During her PhD at MIT, she developed techniques for learning from human feedback signals to train language models which were later built on by OpenAI’s series of work on Reinforcement Learning from Human Feedback (RLHF). In the multi-agent space, she has developed techniques for improving coordination through social influence, and unsupervised environment design. Natasha’s work has received various awards, including Best Demo at NeurIPS, an honourable mention for Best Paper at ICML, and the Outstanding PhD Dissertation Award from the Association for the Advancement of Affective Computing. Her work has been featured in Science Magazine, MIT Technology Review, Quartz, IEEE Spectrum, Boston Magazine, and on CBC radio, among others. Natasha earned her Masters degree from the University of British Columbia, undergraduate degrees in Computer Science and Psychology from the University of Regina, and was a postdoctoral fellow at UC Berkeley.

Abstract: If AGI is right around the corner, why are AI agents still so bad at so many tasks? AI still fails to coordinate effectively with other agents, follow natural language instructions to complete embodied tasks, and generalize to circumstances not encountered during training. Even in pure language settings like dialog, AI still fails to adapt to the needs of individual users, instead aligning to a single set of values that may ignore the needs of minority groups. In this talk, I will argue that Social Learning is a key facet of intelligence that enables both humans and animals to easily adapt to new circumstances, coordinate with different people, and acquire complex behaviors. By improving the social intelligence of AI agents, we can get a step closer to adaptive, flexible, generalist agents which better align to diverse human values. This talk will overview recent work in the Social Reinforcement Learning lab, describing how to enable pluralistic alignment of large language models using human feedback, smooth coordination with diverse human partners, and improve social reasoning for understanding natural language commands.

FEB
14

Planning in Creative Contexts

Person IconAlexander Spangher

Clock IconFeb 14, 2025, 2:00 PM

Location Icon289, Engineering VI

Speaker Bio: Alexander Spangher is pursuing his PhD in computer science at the University of Southern California; he is formerly a writer and data scientist at the New York Times. He focuses on computational journalism and is advised by Jonathan May, Emilio Ferrara and Nanyun Peng. His research is broad and has pursued the following side directions: he has worked at Microsoft Research under the mentorship of Eric Horvitz to detect misinformation. He has collaborated with EleutherAI to build state-of-the-art symbolic music models. Finally, he has collaborated with the MIT Plasma Science and Fusion Center (PFSC) to model disruptions in nuclear fusion reactions. His work has received numerous awards: 2 Outstanding Paper Awards at EMNLP 2024, 1 Spotlight Award at ICML 2024, and an Outstanding Paper Award at NAACL 2022. He is fortunate to be supported by a 4-year Bloomberg PhD Fellowship.

Abstract: Recent modeling innovations incorporate planning — or reasoning about actions (exhibited by models like GPT-o1 and Deepseek's R1) — and have demonstrated impressive performance gains in areas like mathematical problem-solving and computer coding. However, such domains are characterized by well-defined goals (or rewards). For many human-centered tasks in creative contexts, rewards are not as clearly defined and it is thus not clear how to make similar progress in these domains. In this talk, I will outline a research agenda that can enable us to make progress in these fundamentally human processes. I focus on tasks related to journalism, where there is a pressing need for technical innovation. Specifically, in this talk I will focus on the task of retrieving sources relevant to news stories: I will show how (1) we can make inferences about human actions based on environmental state-observations (a process known to cognitive psychologists as "end-state" or "ghost conditions", but as yet unexplored in machine learning); and, (2) how these inferences can help us learn human values and rewards.

Feb
21

Weak to Strong Generalization

Person IconPavel Izmailov

Clock IconFeb 21, 2025, 2:00 PM

Location Icon289, Engineering VI

Speaker Bio: I am a Researcher at Anthropic. I am primarily interested in reasoning, AI for science and AI alignment. Previously, I worked on reasoning and problem solving in language models at OpenAI. I contributed to the recent OpenAI o1 models, a new state-of-the-art in LLM reasoning. I have also worked on weak-to-strong-generalization on the superalignment team under Jeff Wu, Jan Leike and Ilya Sutskever. I also had a short stint at xAI, where I reported to Elon Musk. Starting in Fall 2025, I will be joining NYU as an Assistant Professor in the Tandon CSE department, and Courant CS department by courtesy. I am also a member of the NYU CILVR Group. I defended my PhD in Computer Science at NYU, in 2023.

Abstract: Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior—for example, to evaluate whether a model faithfully followed instructions or generated safe outputs. However, future superhuman models will behave in complex ways too difficult for humans to reliably evaluate; humans will only be able to weakly supervise superhuman models. We study an analogy to this problem: can weak model supervision elicit the full capabilities of a much stronger model? We test this using a range of pretrained language models in the GPT-4 family on natural language processing (NLP), chess, and reward modeling tasks. We find that when we naively finetune strong pretrained models on labels generated by a weak model, they consistently perform better than their weak supervisors, a phenomenon we call weak-to-strong generalization. However, we are still far from recovering the full capabilities of strong models with naive finetuning alone, suggesting that techniques like RLHF may scale poorly to superhuman models without further work. We find that simple methods can often significantly improve weak-to-strong generalization: for example, when finetuning GPT-4 with a GPT-2-level supervisor and an auxiliary confidence loss, we can recover close to GPT-3.5-level performance on NLP tasks. Our results suggest that it is feasible to make empirical progress today on a fundamental challenge of aligning superhuman models.

Feb
28

Contextual AI Integrity: Balancing Compliance and Reliability

Person IconFaeze Brahman

Clock IconFeb 28, 2025, 2:00 PM

Location IconVirtual Talk

Speaker Bio: Faeze Brahman is a Research Scientist at the Allen Institute for AI (Ai2). Prior to that, she was a postdoctoral researcher at Ai2 and the University of Washington and received her PhD from UCSC. Her research focuses on constrained reasoning and generation, understanding LLMs' capabilities and limitations and bridging the capability gap between humans and models beyond scaling through developing resource-efficient algorithms. She is also interested in designing human-centered AI systems that are reliable and safe for real-world applications.

Abstract: As AI assistants grow increasingly capable, their responsible deployment relies not just on what they can do, but on knowing when not to comply. Moving beyond traditional safety-focused view of AI noncompliance, I will talk about two projects that tackle this challenge: First, I introduce a taxonomy of contextual noncompliance, that identifies when and how models should handle misleading, out-of-scope or underspecified requests—revealing significant gaps in current systems' ability to do so. Second, I present a selective evaluation framework that enables models to abstain from making unsound judgments when they lack confidence, achieving stronger alignment with human evaluators while remaining cost-effective. Together, these works help create AI systems that are more reliable and safe to use across diverse real-world use cases.

Past Talks from Fall 2024

NOV
5

Auditing, Understanding, and Leveraging Large Language Models

Person IconRobin Jia

Clock IconNovember 5, 2024, 4:15 PM

Location Icon3400 Boelter Hall

Co-located with CS 201 Seminar

Speaker Bio: Robin Jia is an Assistant Professor of Computer Science at the University of Southern California. He received his Ph.D. in Computer Science from Stanford University, where he was advised by Percy Liang. He has also spent time as a visiting researcher at Facebook AI Research, working with Luke Zettlemoyer and Douwe Kiela. He is interested broadly in natural language processing and machine learning, with a focus on scientifically understanding NLP models in order to improve their reliability. Robin’s work has received best paper awards at ACL and EMNLP.

Abstract: The rise of large language models offers opportunities to both scientifically study these complex systems and apply them in novel ways. In this talk, I will describe my group’s recent work along these lines. First, I will discuss data watermarks, a statistically rigorous technique for auditing a language model’s training data based only on black-box model queries. Then, we will investigate how language models memorize training data: based on results from two complementary benchmarks, I will demonstrate the viability of localizing memorized data to a sparse subset of neurons. Next, I will provide a mechanistic account of how pre-trained language models use Fourier features to solve arithmetic problems, and how pre-training plays a critical role in these mechanisms. Finally, I will show how to leverage the complementary strengths of large language models and symbolic solvers to handle complex planning tasks.

NOV
1

Building Accountable NLP Models for Social Good

Person IconJieyu Zhao

Clock IconNovember 1, 2024, 2:00 PM

Location Icon289, Engineering VI

Speaker Bio: Jieyu Zhao is an assistant professor of Computer Science Department at University of Southern California where she is leading the LIME lab. Prior to that, she was an NSF Computing Innovation Fellow at University of Maryland, College Park. Jieyu received her Ph.D. from Computer Science Department, UCLA. Her research interest lies in fairness of ML/NLP models. Her research has been covered by news media such as Wires, The Daily Mail and so on. She was invited by UN-WOMEN Beijing on a panel discussion about gender equality and social responsibility.

Abstract: The rapid advancement of large language models (LLMs) has unlocked a myriad of possibilities for positive societal impact, ranging from enhancing accessibility and communication to supporting disaster response and public health initiatives. However, the deployment of these technologies also raises critical concerns regarding accountability, fairness, transparency, and ethical use. In this talk, I will discuss our efforts for auditing NLP models, detecting and mitigating biases, and understanding how LLMs make decisions. We hope to open the conversation to foster a community-wide effort towards more accountable and inclusive NLP practices.

OCT
25

Translating images into words: From truthful to useful

Person IconElisa Kreiss

Clock IconOctober 25, 2024, 2:00 PM

Location IconMAXWELL Room 57-124, Engineering IV

Zoom IconZoom Link

Speaker Bio: Elisa Kreiss is an Assistant Professor of Communication at UCLA and the lab director of the Coalas (Computation and Language for Society) Lab. Previously, she completed a PhD in Linguistics at Stanford, where she was a member of Stanford’s NLP group and the Stanford Data Science Center for Open and REproducible Science (CORES). Elisa investigates how we produce and understand language situated in the visual world. Her work combines tools from natural language processing, psycholinguistics, and human-computer interaction to advance our understanding of how communicative context shapes language use. Her research has direct applications to image accessibility – the challenge of (automatically) generating image descriptions for blind and low-vision users. Elisa’s work has been supported by several Google Research Awards, the National Science Foundation, Stanford’s Human-centered AI initiative, and Stanford’s Accelerator for Learning.

Abstract: Developing Vision-Language Models (VLMs) that can easily translate between the linguistic and visual modality in human-like ways has many useful applications, including making visual content accessible to blind and low vision individuals, detecting misinformation, and combating visual illiteracy. While the current generation of VLMs has quickly risen to show human-level performance on many existing benchmarks, there remains a remarkable gap between these scores and how useful the models are found to be in practice. In this talk, I will present recent and ongoing work which suggests that in order to develop and understand the merit of Vision-Language Models for downstream application, we need to define tasks and evaluation metrics that assess the communicative usefulness of the generated texts. Specifically, I will focus on the challenge of generating image descriptions and argue for moving the goal post from what can be said about an image to the fundamentally pragmatic question of what should be said about it. Based on a variety of experiments with sighted and blind and low-vision participants, I will show that the pragmatic notion of contextual relevance is a core pillar of generating human-like image descriptions, provide evidence that our current tasks and evaluation tools in NLP remain unhelpful in uncovering these context effects, and present work that starts addressing this gap. Taken together, this work provides fundamental insights into how people communicate about the visual world, and shows how we can use those insights to advance VLMs for social impact, such as non-visual accessibility.