Home | Naji Khosravan

Naji Khosravan

Computer Science Ph.D.

Senior Applied Science Manager (AI/ML)

ABOUT ME

I'm passionate about developing cutting-edge Computer Vision and Machine Learning technologies, turning research into real-world impact.

Currently, I lead part of the Adobe’s Firefly VideoGen Org, advancing and scaling video generation and multimodal foundation models to power the next generation of creative tools. Previously, I was a Senior Applied Science Manager at Zillow Group's AI Org, where I led the development of 3D and multimodal CV/ML models to elevate Zillow’s product experiences.

I received my Ph.D. in Computer Science from the Center for Research in Computer Vision (CRCV) at the University of Central Florida (UCF), specializing in Computer Vision and Deep Learning. My research focused on Human-Centered AI for analyzing volumetric medical scans using smart glasses. During my Ph.D. I had a couple of internships at Netflix Research working on Attention based methods for video and an internship at Sighthound, a startup company where I worked on representation learning for multi-object tracking.

My research interests lie at the intersection of Generative Models, Multimodal Learning and 3D Computer Vision. I focus on enabling AI systems to develop a deeper understanding of the physical world, through generative/world modeling or general-purpose multimodal and multitask visual intelligence.

About

NEWS & HIGHLIGHTS

Oct 2025:

I'm organizing Multi-Modal Reasoning for Agentic Intelligence workshop in conjunction with ICCV'25.

Feb 2024:

Our paper TMF2 is accepted for publication at WACV 2025.

Feb 2024:

Our paper Restyle 3D is accepted for publication at SIGGRAPH 2025.

Jul 2024:

Giving an invited talk at Gradients Spaces group at Stanford University.

Jun 2024:

We released ZInD-Tell enhancing Zillow Indoor Dataset with natural language descriptions of layout at CVPR'24.

Jun 2023:

I'm organizing CIVILS workshop in conjunction with CVPR'23.

Jun 2023

I'm co-organizing FedVision workshop in conjunction with CVPR'23.

Jun 2023:

Our paper Graph-Covis received the best paper award from OmniCV workshop at CVPR'23.

Jun 2022:

Our paper LASER on visual localization in textureless indoor scenes is selected as an Oral presentation at CVPR'22.

Jun 2022:

We lunched Zillow's AS GitHub landing page to open-source our research Projects.

Jun 2021:

We released (academic use only) Zillow Indoor Dataset (ZInD) the largest real dataset of 360 panoramas and floor-plans in CVPR'21.

Projects

SELECT PROJECTS

TFM2: Training-Free Mask Matching for Open-Vocabulary Semantic Segmentation

Yaoxin Zhuo, Zachary Bessinger, Lichen Wang, Naji Khosravan, Baoxin Li, Sing Bing Kang,

WACV, 2025

ReStyle3D: Scene-Level Appearance Transfer with Semantic Correspondences

Liyuan Zhu, Shengqu Cai, Shengyu Huang, Gordon Wetzstein, Naji Khosravan, Iro Armeni,

SIGGRAPH, 2025

Screenshot 2024-09-07 at 1.26_edited.png

ZInD-Tell: Towards Translating Indoor Panoramas into Descriptions

Tonmoay Deb, Lichen Wang, Zachary Bessinger, Naji Khosravan, Eric Penner, Sing Bing Kang,

CVPRW, 2024

iBARLE: imBalance-Aware Room Layout Estimation

Taotao Jing, Lichen Wang, Naji Khosravan, Zhiqiang Wan, Zachary Bessinger, Zhengming Ding, Sing Bing Kang

WACV, 2024

Screenshot 2024-09-07 at 1.28_edited.jpg

Graph-covis: GNN-based multi-view panorama global pose estimation

Negar Nejatishahidin, Will Hutchcroft, Manjunath Narayana, Ivaylo Boyadzhiev, Yuguang Li, Naji Khosravan, Jana Košecká, Sing Bing Kang

CVPRW, 2023, (OmniCV Best Paper Award)

Generating topological structure of floorplans from room attributes

Yu Yin, Will Hutchcroft, Naji Khosravan, Ivaylo Boyadzhiev, Yun Fu, Sing Bing Kang

ICMR, 2022

Laser: Latent space rendering for 2D visual localization

Zhixiang Min, Naji Khosravan, Zachary Bessinger, Manjunath Narayana, Sing Bing Kang, Enrique Dunn, Ivaylo Boyadzhiev

CVPR, 2022 (Oral Presentation)

Zillow indoor dataset: Annotated floor plans with 360deg panoramas and 3D room layouts

Steve Cruz, Will Hutchcroft, Yuguang Li, Naji Khosravan, Ivaylo Boyadzhiev, Sing Bing Kang

CVPR, 2021

Deformable capsules for object detection

Rodney LaLonde, Naji Khosravan, Ulas Bagci

Advanced Intelligent Systems, 2021

On Attention Modules for Audio-Visual Synchronization

Naji Khosravan, Shervin Ardeshir, Rohit Puri

CVPRW, 2019

Collaborative artificial intelligence algorithms for medical imaging applications

Naji Khosravan

Ph.D. Dissertation, 2019

Pan: Projective adversarial network for medical image segmentation

Naji Khosravan, Aliasghar Mortazi, Michael Wallace, Ulas Bagci

MICCAI, 2019

A collaborative computer aided diagnosis (C-CAD) system with eye-tracking, sparse attentional model, and deep learning

Naji Khosravan, Haydar Celik, Baris Turkbey, Elizabeth C Jones, Bradford Wood, Ulas Bagci

Medical image analysis journal, 2019

Semi-supervised multi-task learning for lung cancer diagnosis

Naji Khosravan, Ulas Bagci

IEEE EMBC, 2018

S4ND: Single-Shot Single-Scale Lung Nodule Detection

Naji Khosravan, Ulas Bagci

MICCAI, 2018

Gaze2Segment: A Pilot Study for Integrating Eye-Tracking Technology into Medical Image Segmentation

Naji Khosravan, Haydar Celik, Baris Turkbey, Ruida Cheng, Evan McCreedy, Matthew McAuliffe, Sandra Bednarova, Elizabeth Jones, Xinjian Chen, Peter Choyke, Bradford Wood, Ulas Bagci

MICCAIW, 2017