Junhyeok Kim

Hello!👋 I'm a Ph.D. student at MICV Lab at Yonsei University (Prof. Seong Jae Hwang). I'm currently interested in mechanistic interpretability, vision-language models, and a little bit of medical imaging!

Recently, my first first-author paper, which interprets ViT features and uncovers circuits to understand the model’s underlying mechanisms, has been accepted to NeurIPS ’25.
I am actively looking for internship opportunities. Please feel free to reach out!

Email Scholar LinkedIn Github

Research

Currently, my primary research interest lies in Mechanistic Interpretability (MI). By leveraging MI, we can understand what capabilities an AI model specializes in and what capabilities it requires. I believe that this deep understanding of AI models will ultimately serve as a major foundation for advancing towards Artificial General Intelligence (AGI).

	Interpreting Attention Heads for Image-to-Text Information Flow in Large Vision-Language Models Jinyeong Kim, Seil Kang, Jiwoo Park, Junhyeok Kim, Seong Jae Hwang NeurIPS MechInterp Workshop 2025 Spotlight! The image-to-text transfer in LVLMs depends on specialized attention head groups that are determined by the semantic content of the image.
	Interpreting vision transformers via residual replacement model Junhyeok Kim, Jinyeong Kim, Yumin Shim, Joohyeok Kim, Sunyoung Jung, Seong Jae Hwang NeurIPS 2025 Residual replacement model explains end-to-end decision-making process of vision transformers in human-understandable scale.
	Backbone Augmented Training for Adaptations Jae Wan Park, Junhyeok Kim, Youngjun Jun, Hyunah Ko, Seong Jae Hwang arXiv 2025 To solve the problem of scarce adaptation data, the pre-training data of the backbone model can be selectively utilized to augment the adaptation dataset.
	Your large vision-language model only needs a few attention heads for visual grounding Seil Kang, Jinyeong Kim, Junhyeok Kim, Seong Jae Hwang CVPR 2025 Highlight! A few attention heads in frozen LVLMs demonstrate strong visual grounding capabilities. These "localization heads" immediately enable training-free detection and segmentation.
	See what you are told: Visual attention sink in large multimodal models Seil Kang, Jinyeong Kim, Junhyeok Kim, Seong Jae Hwang ICLR 2025 Large multimodal models consistently see irrelevant parts of the input. Our work demystifies this phenomenon, dubbed "visual attention sink," and proposes a simple mitigation strategy.
	WoLF: Wide-scope Large Language Model Framework for CXR Understanding Seil Kang, Junhyeok Kim, Donghyun Kim, Hyo Kyung Lee, Seong Jae Hwang arXiv 2024 WOLF is a novel Large Language Model framework for Chest X-ray understanding that integrates patient Electronic Health Records (EHR).

Miscellaneous

Need another Junhyeok Kim? He's just one click away!
(He is my mate as well as my namesake.)
I'm the drummer at MICCAI 2025 Gala Dinner!
I love music (2023).
I really love music (2024).

This website is built based on Jon Barron's personal website.