Data collected by vehicles through cameras, LiDAR and GPS allow researchers to capture video clips of humans in motion and then recreate them in 3D computer simulation. With that, they've created a "biomechanically inspired recurrent neural network" that catalogues human movements.
According to the researchers, they can use this to predict poses and future locations for one or several pedestrians up to about 50 yards from the vehicle.
"Prior work in this area has typically only looked at still images. It wasn't really concerned with how people move in three dimensions," said Ram Vasudevan, U-M assistant professor of mechanical engineering. "But if these vehicles are going to operate and interact in the real world, we need to make sure our predictions of where a pedestrian is going doesn't coincide with where the vehicle is going next."
Equipping vehicles with the necessary predictive power requires the network to dive into the minutiae of human movement: the pace of a human's gait (periodicity), the mirror symmetry of limbs, and the way in which foot placement affects stability during walking.
Much of the machine learning used to bring autonomous technology to its current level has dealt with two dimensional images. A computer shown several million photos of a stop sign will eventually come to recognise stop signs in the real world and in real time.
Using video clips, the U-M system can study the first half of them to make its predictions, and then verify the accuracy with the second half.
"Now, we're training the system to recognise motion and making predictions of not just one single thing--whether it's a stop sign or not--but where that pedestrian's body will be at the next step and the next and the next," said Matthew Johnson-Roberson, associate professor in U-M's Department of Naval Architecture and Marine Engineering.
"If a pedestrian is playing with their phone, you know they're distracted. Their pose and where they're looking is telling you a lot about their level of attentiveness. It's also telling you a lot about what they're capable of doing next," explained Assist Prof Vasudevan,
The results have shown that this new system improves upon a driverless vehicle's capacity to recognise what's most likely to happen next.
"The median translation error of our prediction was approximately 10cm after one second and less than 80cm after six seconds. All other comparison methods were up to 7 meters off," Assoc Prof. Johnson-Roberson said. "We're better at figuring out where a person is going to be."
To create the dataset used to train U-M's neural network, researchers parked a vehicle with Level 4 autonomous features at several Ann Arbor intersections. With the car's cameras and LiDAR facing the intersection, the vehicle could record multiple days of data at a time.
Researchers bolstered that real-world, "in the wild" data from traditional pose data sets captured in a lab, with the result that the system was able to increase the capabilities of the autonomous vehicle.