Lijie Fan's Homepage

Lijie Fan

I am a research scientist at Google DeepMind.

I completed my PhD in Computer Science at MIT EECS, where I also got my master's degree. Before that I obtained my bachelor’s degree in Computer Science from Tsinghua University.

I train large scale autoregressive models that generate multimodal outputs. I am the tech lead / core contributor to Fluid, UniFluid, and Gemini Multimodal Generation.

Email: lijiefan[at]alum.mit.edu

Publications

*: equal contribution

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Google Gemini Team
Tech Report Paper

Unified Autoregressive Visual Generation and Understanding with Continuous Tokens
Lijie Fan*, Luming Tang*, Siyang Qin*, Tianhong Li, Xuan Yang, Siyuan Qiao, Andreas Steiner, Chen Sun, Yuanzhen Li, Tao Zhu, Michael Rubinstein, Michalis Raptis, Deqing Sun, Radu Soricut
Preprint PDF / arXiv

Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Lijie Fan*, Tianhong Li, Siyang Qin, Yuanzhen Li, Chen Sun, Michael Rubinstein, Deqing Sun, Kaiming He, Yonglong Tian*
ICLR 2025 PDF / arXiv

Fractal Generative Models
Tianhong Li, Qinyi Sun Lijie Fan, Kaiming He
Preprint PDF / arXiv / code

Learning Vision from Models Rivals Learning Vision from Data
Yonglong Tian*, Lijie Fan*, Kaifeng Chen, Dina Katabi, Dilip Krishnan, Phillip Isola
CVPR 2024 PDF / arXiv / code

Scaling Laws of Synthetic Images for Model Training ... for Now
Lijie Fan*, Kaifeng Chen, Dilip Krishnan, Dina Katabi, Phillip Isola, Yonglong Tian*
CVPR 2024 PDF / arXiv / code

Improving CLIP Training with Language Rewrites
Lijie Fan*, Dilip Krishnan, Phillip Isola, Dina Katabi, Yonglong Tian*
NeurIPS 2023 PDF / arXiv / code

StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
Yonglong Tian*, Lijie Fan*, Phillip Isola, Huiwen Chang, Dilip Krishnan
NeurIPS 2023 PDF / arXiv / code / MIT News

Reparo: Loss-Resilient Generative Codec for Video Conferencing
Tianhong Li, Vibhaalakshmi Sivaraman, Lijie Fan, Mohammad Alizadeh, Dina Katabi
Preprint PDF / arXiv

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention
Mingyu Ding, Yikang Shen, Lijie Fan, Zhenfang Chen, Zitian Chen, Ping Luo, Joshua B Tenenbaum, Chuang Gan
CVPR 2023 PDF / arXiv / code

Making Contrastive Learning Robust to Shortcuts
Tianhong Li*, Lijie Fan*, Yuan Yuan, Hao He, Yonglong Tian, Rogerio Feris, Piotr Indyk, Dina Katabi
WACV 2023 PDF / arXiv / Talk (by Dina)

Targeted supervised contrastive learning for long-tailed recognition
Tianhong Li*, Peng Cao*, Yuan Yuan, Lijie Fan, Yuzhe Yang, Rogerio Feris, Piotr Indyk, Dina Katabi
CVPR 2022 PDF / arXiv / code

Unsupervised Learning for Human Sensing Using Radio Signals
Tianhong Li*, Lijie Fan*, Yuan Yuan*, Dina Katabi
WACV 2022 PDF / arXiv

When Does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?
Lijie Fan, Sijia Liu, Pin-Yu Chen, Gaoyuan Zhang, Chuang Gan
NeurIPS 2021 Project Page / PDF / arXiv / Code / TechTalks

In-Home Daily-Life Captioning Using Radio Signals
Lijie Fan*, Tianhong Li*, Yuan Yuan, Dina Katabi
ECCV 2020 Project Page / PDF / arXiv / Slides / Demo / Video / Talk / CSAIL News / BBC / TechCrunch / Engadget / VentureBeat
Oral Presentation

Learning Longterm Representations for Person Re-Identification Using Radio Signals
Lijie Fan*, Tianhong Li*, Rongyao Fang*, Rumen Hristov, Yuan Yuan, Dina Katabi
CVPR 2020 Project Page / PDF / arXiv / Video / CSAIL News / TechCrunch / Yahoo News

Making the Invisible Visible: Action Recognition Through Walls and Occlusions
Tianhong Li*, Lijie Fan*, Mingmin Zhao, Yingcheng Liu, Dina Katabi

ICCV 2019 Project Page / PDF / arXiv / Video / MIT Technology Review

Real-time Through-wall Human Activity Recognition using Radio Signals
ECCV 2020 Demo Project Page / Video

Controllable Image-to-Video Translation: A Case Study on Facial Expression Generation
Lijie Fan, Wenbing Huang, Chuang Gan, Junzhou Huang, Boqing Gong
AAAI 2019 Project Page/ PDF / arXiv
Oral Presentation

End-to-End Learning of Motion Representation for Video Understanding
Lijie Fan*, Wenbing Huang*, Chuang Gan, Stefano Ermon, Boqing Gong, Junzhou Huang
CVPR 2018 Project Page/ PDF / arXiv / Code / Talk
Spotlight Presentation

Towards Efficient Action Recognition: Principal Backpropagation for Training Two-Stream Networks
Wenbing Huang*, Lijie Fan* ,Mehrtash Harandi, Lin Ma, Huaping Liu, Wei Liu, Chuang Gan
IEEE Transactions on Image Processing (T-IP) 2019 PDF

Adversarial Localization Network
Lijie Fan, Shengjia Zhao, Stefano Ermon
NIPS 2017 Workshop on Learning with Limited Labeled Data PDF

Efficient Optimization for Linear Dynamical Systems with Applications to Clustering and Sparse Coding
Wenbing Huang, Mehrtash Harandi, Tong Zhang, Lijie Fan, Fuchun Sun, Junzhou Huang
NIPS 2017 PDF / Code

Professional Services

Area Chair & Tutorial Chair: ICCV 2025

Conference Reviewer: NeurIPS, ICML, CVPR, ICCV, ECCV, AAAI, WACV

Journal Reviewer: TPAMI

Misc

I do landscape photography, checkout my photos on Instgram.

I'm fond of snowboarding, especially carving and ground trick.