Expanding Argoverse with Open-Source Autonomous Driving Data from Six U.S. Cities

Today, Argo AI is launching the second release of Argoverse, which includes one of the largest collections of open-source autonomous driving data and high-definition maps from six U.S. cities: Austin, Detroit, Miami, Pittsburgh, Palo Alto, and Washington, D.C. This release builds upon the initial launch of Argoverse in 2019, which was among the first data releases of its kind to include high-definition (HD) maps for machine learning and computer vision research.  

“As Argoverse enters its third year, we remain committed to identifying and filling critical gaps in the public data available to academic researchers,” said Argo AI Principal Scientist, James Hays. “With Argoverse 2, we specifically curated and designed datasets to enable breakthroughs in long tail challenges that need to be solved for safe autonomous driving at a global scale.”

To give academic communities access to the materials they need, the company is releasing a new dataset to enable advanced research into algorithms that can detect out-of-date HD maps. The company is also sharing one of the largest lidar datasets in the autonomous driving industry, to fuel research into new methods of machine learning using unlabeled point cloud data. 

In the coming weeks, Argo AI will launch a set of global competitions that encourage participants to use the company’s open-source data to drive innovation in core areas of autonomy: 3D object detection and motion forecasting.

Enabling new ways to detect changes in mapped environments

Among the datasets in Argoverse 2, one of the most unique and innovative is the Map Change Dataset—a dataset that is specifically designed to enable researchers to create models that can detect changes in mapped environments.

High-definition (HD) maps are a key element of autonomous driving because they act as a blueprint for the many fixed objects autonomous vehicles encounter on the road, from stop signs to lane boundaries to traffic lights and beyond. Coupled with a robust sensing suite, HD maps enable autonomous vehicles to navigate safely through dense urban environments. 

But things like construction and new lane markings can change mapped environments. It is critical that as those real-world environments change, HD maps reflect those changes so that they continue to provide the most up-to-date information to the autonomous driving system.

By sharing a map dataset that labels the instances in which there are discrepancies with sensor data, Argo is encouraging the development of novel methods for detecting out-of-date map regions. More efficient and accurate detections will play a critical role in continuing to enable self-driving vehicle operations to scale.

One of the largest lidar datasets in the autonomous driving industry

In addition, Argoverse 2 includes one of the largest lidar datasets in the autonomous driving industry, with a staggering 6 million lidar frames. With the Argoverse 2 Lidar Dataset, researchers will be able to advance key aspects of safe and efficient autonomous driving, such as point cloud forecasting or self-supervised learning–a method of machine learning with unlabeled data. 

“Argo is releasing these high framerate sequences of raw lidar data because of the crucial role that lidar technology plays in safe autonomous driving,” Hays said. “It’s the technology at the core of most autonomous sensing suites, and releasing this volume of lidar data has the potential to be game-changing for researchers and engineers in academia, who often don’t have access to these kinds of resources.” 

Sparking new research through competitions

To encourage advanced research into core areas of autonomy, Argo organizes competitions that utilize the Argoverse datasets. In the coming weeks, Argo will launch two new competitions that challenge participants from around the world to innovate in 3D object detection (using computer vision to identify objects on the road) and motion forecasting (predicting where those objects will travel next). 

These competitions will rely on new sensor and motion forecasting data that builds upon data from the Argoverse 1 release. The new sensor dataset (“Argoverse 2 Sensor Dataset”) is much larger than its predecessor, with nearly ten times as many sequences and a much larger object taxonomy, consisting of 30 object types from autonomous driving data across six cities. The new motion forecasting dataset (“Argoverse 2 Motion Forecasting”) contains longer sequences with object types and richer object attributes, such as heading.

Past participants have hailed from many countries outside the United States, including Canada, India, Germany, France, and China. Tapping into the global research community not only provides diverse perspectives on key autonomous challenges—it also has the added benefit of introducing Argo to an entirely new audience. Through Argoverse’s competitions, Argo has been able to identify talented engineers from around the world, some of whom have come to work for the company full-time. 

“Argo has always believed that diverse perspectives are needed to solve the thorniest problems of autonomous driving,” said Hays. “With Argoverse 2, we’re enabling advanced research that will benefit everyone, from university students and professors to the autonomous vehicle industry at large.”

To learn more about Argoverse 2, visit www.argoverse.org and sign up for future Argoverse news and announcements.