Cutting edge AI techniques - from Vision Language Action (VLA) and embodied models to end-to-end training approaches - are showing immense promise for the development of autonomous vehicles (AVs). The tools offer an innovative path to instill more common-sense intuition into driving policies. At the same time, many of us in the AV industry have increasingly focused on solving long-tail (or edge case) scenarios as we expand deployments and commercialization. We believe these two trends need to converge in order to produce a more generalizable autonomous driving capability that will deliver on the promise of safe, scalable driverless vehicle deployments.
To accelerate this convergence, Motional is introducing the nuReasoning dataset, the world’s first and largest reasoning-centric, long-tail scenario-based open dataset for autonomous driving development. We’re building on our legacy of creating open-source datasets such as nuScenes, nuPlan and nuImages available to the research community to further the innovation and progress that will make our roads safer for all.
Edge cases are the rare and atypical scenarios that we as human drivers rarely encounter, but can intuitively learn to handle over time. These long-tail scenarios, as understood across the industry, are tough to handle with classic robotics techniques. Thinking up and devising protocols for every possible edge case is impossible – there are just far too many.
The nuReasoning dataset incorporates edge case scenes that include:
- Vulnerable Road User (VRU) Behaviors - Interactions involving pedestrians, cyclists, scooter riders, and other VRUs, especially when behavior is unexpected (e.g., walking outside crosswalks, sudden road entry or emergence from occlusions).
- Vehicle Behaviors - Aggressive cut-ins, hard braking, stalled vehicles, wrong-way driving, and interactions with emergency vehicles.
- Environmental and Scene Conditions - Adverse weather, low-visibility lighting, visual clutter (heavy signage, reflections), construction zones, and lane closures.
- Unknown, Out-of-Distribution or Generic Objects - Road debris, animals, unclassified objects, and other hard-to-specify items not typically represented in training data.
While we see great promise in generalized end-to-end architectures, many architectures can struggle when it comes to safely handling complex edge cases. More critically, some architectures lack the ability for introspection to understand why and how to improve them. Without an intentional focus on training to ensure safety in long-tail scenarios, some can result in lacking the strict rigor and intermediate guardrails required to achieve SAE Level 4 (L4) automation and the threshold of safety that enables the removal of the human operator from behind the wheel.
To bridge this gap and help these models better understand complex scenes, infer interactions, and make safer decisions in rare but critical situations, we need to add reasoning annotations of long-tail events.
Built using a human-model-in-the-loop pipeline to mine Motional’s real-world fleet data, the nuReasoning dataset features:
- High-Quality Scenarios: 20,000 human- and model-verified long-tail scenario clips, representing 105 hours of driving selected from 10,000 hours of human driving logs.
- Rich Annotations: 247,000 diverse reasoning annotations that encompass spatial reasoning (2D-3D object understanding), decision reasoning (action logic and causal inference), and counterfactual reasoning (alternative actions and risk analysis).
- Unmatched Scale: Five times more data than its predecessors, establishing a new benchmark for enabling superior Visual Question Answering (VQA) and planning scores.
By providing critical reasoning traces, nuReasoning aids in the integration of language and common-sense reasoning with traditional autonomous vehicle intermediate outputs. This dataset supports the ongoing evolution of artificial intelligence in autonomous driving, paving the way for more robust, explainable L4 large driving models and a safer, faster path to scalable commercial driverless operations.
Edge cases require different types of reasoning, such as spatial, decision and counter-factual reasoning. The driving model has to be able to assess and think through a situation it has never encountered before, just like a human driver would. This requires translating perception information into contextual data and common sense reasoning about the scene in the same way a human would, so that we can generate the right set of trajectories. Consequently, the right data set becomes the key factor in accelerating the reasoning flywheel.
That’s where nuReasoning comes in. Built in conjunction with the Mobility Lab at UCLA, the nuReasoning dataset consists of more than 20,000 long-tail events, the equivalent to more than 105 hours of edge-case events. These were selected from millions of miles of driving data from our cars in operation in Pittsburgh, Las Vegas, Los Angeles and Singapore.
Each long-tail event is comprised of video clips at least 20 seconds in length that are embedded with high-quality and human verified reasoning annotations. This provides a description of the encountered scene, a breakdown of the critical components that have the highest influence for decision making and the appropriate decision, along with reasoning for said decision. These annotations are key to taking developers beyond just perception and recognition and towards the kind of reasoning critical for truly effective end-to-end model development.
We’re releasing the first part of the dataset now. The full set will be available for download in August. We’re beginning a public challenge in September, with final winners selected in December.
We hope the research community embraces the nuReasoning dataset in much the same way they did with nuScenes. As before, we are proud to contribute to the field in order to further research and learning, and ultimately to advance the industry, so we can all benefit from the safer roads that autonomous vehicles can enable.