RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)

  • Yao Mu * 1 3, Tianxing Chen * 1 3 4, Shijia Peng * 2 4, Zanxin Chen * 2 4,
    Zeyu Gao 5, Yude Zou 4, Lunkai Lin 2, Zhiqiang Xie 2, Ping Luo† 1
  • 1 The University of Hong Kong,
  • 2 AgileX Robotics,
  • 3 Shanghai AI Laboratory,

  • 4 Shenzhen University,
  • 5 Institute of Automation, Chinese Academy of Sciences

*Equal Contribution, Corresponding authors

Abstract

Effective collaboration of dual-arm robots and their tool use capabilities are increasingly important areas in the advancement of robotics. These skills play a significant role in expanding robots' ability to operate in diverse real-world environments. However, progress is impeded by the scarcity of specialized training data. This paper introduces RoboTwin, a novel benchmark dataset combining real-world teleoperated data with synthetic data from digital twins, designed for dual-arm robotic scenarios. Using the COBOT Magic platform, we have collected diverse data on tool usage and human-robot interaction. We present a innovative approach to creating digital twins using AI-generated content, transforming 2D images into detailed 3D models. Furthermore, we utilize large language models to generate expert-level training data and task-specific pose sequences oriented toward functionality. Our key contributions are: 1) the RoboTwin benchmark dataset, 2) an efficient real-to-simulation pipeline, and 3) the use of language models for automatic expert-level data generation. These advancements are designed to address the shortage of robotic training data, potentially accelerating the development of more capable and versatile robotic systems for a wide range of real-world applications.

RoboTwin's AIGC & Expert Data Generation pipeline


RoboTwin's AIGC & Expert Data Generation pipeline RoboTwin's pipeline for AIGC and expert data generation involves automatically extracting object segmentation and textual descriptions from a single RGB photo. This is followed by generating 3D geometry, surface normals, wireframes, and texture maps to create a high-fidelity simulation object. With the object's surface normal and pose information, we can decompose and generate grasping postures. Furthermore, we utilize the capabilities of large models to generate expert data for tasks in a zero-shot manner.

Benchmark

To further research and development in this area, we introduce a comprehensive benchmark specifically designed to assess dual-arm robots in a variety of scenarios. This benchmark encompasses a diverse set of tasks, each presenting unique challenges which are critical for assessing the dexterity, coordination, and operational efficiency of robotic arms in a simulated environment. The tasks range from simple object manipulation to complex actions that require synchronized movements of both arms.

1. Block Hammer Beat (Expert)

(a) observer

(b) left

(c) top

(d) right

2. Empty Cup Place (Expert)

(a) observer

(b) left

(c) top

(d) right

3. Dual-Bottles Pick (Expert)

(a) observer

(b) left

(c) top

(d) right

4. Block Sweep (Expert)

(a) observer

(b) left

(c) top

(d) right

5. Apple Cabinet Storage (Expert)

(a) observer

(b) left

(c) top

(d) right

6. Block Handover (Expert)

(a) observer

(b) left

(c) top

(d) right

Digital Twin

We assign specific coordinate axes to the functional parts of objects within the model. For instance, for a hammer, one axis is aligned with the hammerhead, identifying the functional part, while another axis indicates the approach direction. This strategic alignment is crucial for automating the calculation of grasp poses, ensuring essential robotic manipulation and tool usage. Grasp poses are computed perpendicular to the surface normal of the functional part along the designated approach direction axis, ensuring correct and efficient tool use with minimal manual intervention.

Example:

Ping Pong Sweep (Real)

(a) left

(b) top

(c) right

Block Sweep (Sim)

(a) left

(b) top

(c) right

Empty Cup Place (Real)

(a) left

(b) top

(c) right

Empty Cup Place (Sim)

(a) left

(b) top

(c) right

Dataset [Video Show]

Hardware Platform

COBOT Magic platform from AgileX Robotics (https://global.agilex.ai/products/cobot-magic) For the acquisition of real-world data, we employed the open-source COBOT Magic platform from AgileX Robotics, which is equipped with four AgileX Arms and four Intel Realsense D-435 RGBD cameras and is built on the Tracer chassis. These cameras are strategically positioned: one on the high part of the stand for an expansive field of view, two on the wrists of the robot's arms, and one on the low part of the stand which is optional for use. The front, left, and right cameras capture data simultaneously at a frequency of 30Hz. The data collection and alignment are facilitated by tools provided by the ARIO Data Alliance, available at our GitHub repository. Each captured frame consists of three images from the cameras, each providing an RGB and depth image at a resolution of 640 x 480 pixels. Additionally, the data includes the poses of the robotic arms' joints and end-effectors for both master and slave configurations, encompassing both left and right arms. For motion tasks, the linear and angular velocities of the differential drive chassis are also recorded. All data storage and formatting adhere to the unified standards established by the ARIO Data Alliance.

Experiment


Experimental Result Our experimental aim is not to delve into the design choices of different strategy networks, but to explore the correctness and effectiveness of our Benchmark expert data. Our experiments are intended to verify: a) the rationality of the COBOT Magic platform settings, b) the effectiveness of the automatically generated expert data.

Bibtex

      @misc{mu2024robotwindualarmrobotbenchmark,
        title={RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version)}, 
        author={Yao Mu and Tianxing Chen and Shijia Peng and Zanxin Chen and Zeyu Gao and Yude Zou and Lunkai Lin and Zhiqiang Xie and Ping Luo},
        year={2024},
        eprint={2409.02920},
        archivePrefix={arXiv},
        primaryClass={cs.RO},
        url={https://arxiv.org/abs/2409.02920}, 
    }

Video Show of DP3 deployment

Hammer Beat

(a) observer success

(b) top success

(c) observer fail

(d) top fail

Block Handover

(a) observer success

(b) top success

(c) observer fail

(d) top fail

Apple Cabinet Storage

(a) observer success

(b) top success

(c) observer fail

(d) top fail

Dual-Bottles Pick

(a) observer success

(b) top success

(c) observer fail

(d) top fail

Empty Cup Place

(a) observer success

(b) top success

(c) observer fail

(d) top fail

Block Sweep

(a) observer success

(b) top success

(c) observer fail

(d) top fail

Acknowledgements

The authors extend their profound gratitude to D-robotics for their invaluable support in supplying the necessary cloud computing resources that facilitated the execution of this research. Furthermore, our sincere appreciation is extended to Deeoms for their contribution in providing essential model support, which was pivotal to the successful completion of this study.