Eunsu Kim

I am a master's student advised by Professor Alice Oh at the School of Computing, KAIST. Previously, I was a visiting scholar at Carnegie Mellon University, working with Professor Sherry Tongshuang Wu.

My research aims to develop AI systems and agents that serve as meaningful bridges — connecting individuals, societies, and humans with intelligent agents. Currently, I focus on two questions: (1) How effectively can LLMs assist humans in real-world contexts? (2) How well do they understand and represent diverse multicultural and multilingual societies? Feel free to reach out if you'd like to collaborate!

Eunsu Kim Eunsu Kim smiling
My happiest moment so far!
📍 Ponte Luís I, Porto

Affiliations

Carnegie Mellon University

(Incoming) PhD in School of Computer Science, Language Technology Institution Sep 2026 – ???

Carnegie Mellon University

Visiting Scholar in HCII, Host: Sherry Wu Sep 2025 – Mar 2026

KAIST

M.S. in Computer Science, Advisor: Alice Oh Sep 2023 – Present
B.S. in Electrical Engineering Mar 2019 – Aug 2023
GPA 4.02 / 4.3 · Major 4.15 / 4.3 · Summa Cum Laude

Latest News

Apr 2026
🏆 Three papers have been accepted to ACL 2026 Main/Findings/Industry track. See you in San Diego!
Apr 2026
🏆 Are they Lovers or Friends? accepted to ACL 2026 Main!
Feb 2026
🏆 Culture-Mixing paper to be presented at CVPR 2026!
Dec 2025
🏆 Two papers at NeurIPS 2025 Workshops: BenchHub @ Efficient Reasoning & ML-IAM @ Climate Change with ML
Sep 2025
🇺🇸 Starting as a visiting student at Carnegie Mellon University
Aug 2025
🏆 Two papers accepted to EMNLP 2025 Findings: Uncovering Factor Level Preferences & MUG-Eval
May 2025
🏆 Three papers at ACL 2025 — two Findings + one Main (Oral)!
Mar 2025
🏆 "When Tom Eats Kimchi" won Outstanding Paper Award at NAACL C3NLP!

Selected Publications See all →

* denotes equal contributions

Are they Lovers or Friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues
Eunsu Kim, Junyeong Park, Juhyun Oh, Kiwoong Park, Seyoung Song, A. Seza Doğruöz, Alice Oh, Najoung Kim
ACL 2026
World in a Frame: Understanding Culture Mixing as a New Challenge for Vision-Language Models
Eunsu Kim*, Junyeong Park*, Na Min An*, Junseong Kim, Hitesh Laxmichand Patel, Jiho Jin, Julia Kruk, Amit Agarwal, Srikant Panda, Fenal Ashokbhai Ilasariya, Hyunjung Shim, Alice Oh
CVPR 2026
BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation
Eunsu Kim*, Haneul Yoo*, Guijin Son, Hitesh Patel, Amit Agarwal, Alice Oh
Preprint, Under Review
MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language
Seyoung Song*, Seogyeong Jeong*, Eunsu Kim, Jiho Jin, Dongkwan Kim, Jay Shin, Alice Oh
EMNLP 2025 (Findings)
Diffusion Models Through a Global Lens: Are They Culturally Inclusive?
Zahra Bayramli*, Ayhan Suleymanzade*, Na Min An, Huzama Ahmad, Eunsu Kim, Junyeong Park, James Thorne, Alice Oh
ACL 2025 Oral, NAACL 2025 C3NLP Workshop
Text-to-image diffusion models can create compelling images from prompts, but their ability to represent cultural nuances remains limited. This work introduces the CultDiff benchmark, evaluating models on generating culturally specific images across ten countries, revealing significant disparities in cultural relevance, especially for underrepresented regions.
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation
Eunsu Kim, Juyoung Suk, Seungone Kim, Niklas Muennighoff, Dongkwan Kim, Alice Oh
ACL 2025 (Findings)
An evaluation framework that assesses LLM capabilities through an interview-style process. The interviewer LLM evaluates other LLMs by providing feedback and asking follow-up questions, enabling more comprehensive capability assessment beyond static benchmarks.
Uncovering Factor Level Preferences to Improve Human-Model Alignment
Juhyun Oh*, Eunsu Kim*, Jiseon Kim, Wenda Xu, William Yang Wang, Alice Oh
EMNLP 2025 (Findings)
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages
Junho Myung, Nayeon Lee, Yi Zhou, Jiho Jin, Rifki Afina Putri, Dimosthenis Antypas, Hsuvas Borkakoty, Eunsu Kim, et al.
NeurIPS D&B 2025
A hand-crafted benchmark evaluating LLMs' everyday cultural knowledge across 16 countries/regions in 13 languages, including low-resource ones. Results show LLMs perform significantly better for cultures highly represented online, with up to a 57% gap in GPT-4 performance.
CLIcK: Evaluation of Cultural and Linguistic Intelligence in Korean
Eunsu Kim, Juyoung Suk, Philhoon Oh, Haneul Yoo, James Thorne, Alice Oh
LREC-COLING 2024
A culturally-aware evaluation benchmark with 1,995 instances across 11 categories of Korean culture, spanning everyday life to specialized subjects, as well as Korean grammar and linguistics.

Beyond Research

I love bread 🥯🥐🥨, table tennis 🏓, and learning new sports. I recently started tennis and yoga!