I'm a Lead (Staff) Research Scientist at the Bosch AI Research Center in Silicon Valley, where I lead a team focused on 3D vision and spatial AI research. In addition to advancing the research frontier, my work has been translated into large-scale enterprise solutions, including assisted driving and parking systems, industrial augmented reality (AR) applications, and AI-powered home robotics.
Computer Vision, 3D Vision, Physical AI — My research focuses on enabling physical intelligence—systems that learn new skills through interaction with 3D environments and generalize across new cameras, embodiments, and scenarios. I pursue this through three core pillars:
Unified 3D Vision that generalizes across diverse robotic platforms and real-world conditions.
Scalable Neural Reconstruction with generative models for the creation of interactable digital twins.
Robotic Lifelong Learning from 3D experience (Ongoing and future interest).
News
[2025]
-Co-Chair, Robot Mapping 2 session, ICRA 2025
-1 paper accepted to CVPR 2025
-1 paper accepted to ICRA 2025
-1 paper accepted to IEEE IV 2025 [2024]
-2 papers accepted to ECCV 2024
-2 papers accepted to CVPR 2024
-1 paper accepted to IROS 2024 [2023]
-1 paper accepted to NeurIPS 2023 [2022]
-2 papers accepted to CVPR 2022 (1 Oral Presentation)
-1 paper accepted to WACV 2022
A fully online system to effectively integrate dense CLIP features with Gaussian Splatting. High-resolution dense CLIP embedding and online compressor learning modules are introduced to serve dense language mapping at realtime (40+ FPS) while retaining open-vocabulary capability for flexible query-based human-machine interaction.
Depth Any Camera (DAC) is a training framework for metric depth estimation that enables zero-shot generalization across cameras with diverse fields of view—including fisheye and 360° images. Tired of collecting new data for every camera setup? DAC maximizes the utility of existing 3D datasets, making them applicable to a wide range of camera types without the need for retraining.
SMART augments online topology reasoning with robust map priors learned from scalable SD and satellite maps, substantially improving lane perception and topology reasoning.
A monocular object reconstruction framework effectively integrating object pose estimation and NeRF-based reconstruction. A novel camera-invariant pose estimation module is introduced to resolve depth-scale ambiguity and enhance cross-domain generalization.
An advanced Gaussian Splatting method effectively fusing Lidar and surrounding camera views for autonomous driving. The method uniquely leverages an intermediate occ-tree feature volume before GS such that GS parameters can be initialized from feature-volume-generated 3D surface more effectively.
An effective framework leveraging lightweight and scalable priors-Standard Definition (SD) maps in the estimation of online vectorized HD map representations.
A mathematical framework to prove that the dice loss leads to superior noise-robustness and model convergence for large objects compared to regression losses. A flexible monocular 3D detection pipeline integrated with bird-eye view segmentation.
A neural reconstruction method enabling the completion the occluded surfaces from large 3D scene reconstrucion. A milestone in automating the creation of interactable digital twins from real world.
The first vision transformer approach to handle 360 monocular depth estimation with spherical distortion. Novel designs include tangent-image coordinate embedding and geometry-aware feature fusion.
A real-time method to predict multi-person 3D poses from a depth image. Introduce new part-level representation to enables an explicit fusion process of bottom-up part detection and global pose detection. A new 3D human posture dataset with challenging multi-person occlusion.
A joint model of learned part-based appearance and parametric shape representation to precisely estimate the highly articulated poses of multiple laboratory animals.
One-shot learning gesture recognition on RGB-D data recorded from Microsoft Kinect. A novel bag of manifold words (BoMW) based feature representation on sysmetric positive definite (SPD) manifolds.
This study investigates the relative diagnosticity and the optimal combination of multiple cues (we consider luminance, color, motion and binocular disparity) for boundary detection in natural scenes. A multi-cue boundary dataset is introduced to facilitate the study.
A multi-stage approach to curve extraction where the curve fragment search space is iteratively reduced by removing unlikely candidates using geometric constrains, but without affecting recall, to a point where the application of an objective functional becomes appropriate.
Industrial Impact
Bosch Video-Only Autonomous Parking Solution demonstrated at Bosch Experience Day 2024
AR-Assisted Assembly Production Lines deployed at Bosch-Siemens Appliance Factories, 2022
Baidu Apollo Autonomous Driving Platform, the world’s first open autonomous driving platform, 2019
Selected Patents
Yuliang Guo, Xinyu Huang, Liu Ren, Systems and methods for providing product assembly step recognition using augmented reality, US Patent 11,715,300, 2023
Yuliang Guo, Xinyu Huang, Liu Ren, Semantic SLAM Framework for Improved Object Pose Estimation, US Patent App. 17/686,677, 2023
Yuliang Guo, Zhixin Yan, Yuyan Li, Xinyu Huang, Liu Ren, Method for fast domain adaptation from perspective projection image domain to omnidirectional image domain in machine perception tasks, US Patent App. 17/545,673, 2023
Yuliang Guo, Tae Eun Choe, KaWai Tsoi, Guang Chen, Weide Zhang, Determining vanishing points based on lane lines, US Patent 11,227,167, 2022
Tae Eun Choe, Yuliang Guo, Guang Chen, KaWai Tsoi, Weide Zhang, Sensor calibration system for autonomous driving vehicles, US Patent 10,891,747, 2021