Abstract: Transformer-based object detection models usually adopt an encoding-decoding architecture that mainly combines self-attention (SA) and multilayer perceptron (MLP). Although this architecture ...
Abstract: Estimating the poses of new objects is a challenging problem. Although many methods have been developed for instance-level object pose estimation, they often struggle when faced with ...
Recent advances in agentic workflows have enabled the automation of tasks such as professional document generation. However, they primarily focus on textual quality, neglecting visual structure and ...
SAM 3D Objects is a foundation model that reconstructs full 3D shape geometry, texture, and layout from a single image, excelling in real-world scenarios with occlusion and clutter by using ...