The method enables the detection of objects without the generation of huge datasets, a task previously thought to be essential.
A major challenge with computer vision technology has been to improve the image recognition performance for applications such as in-vehicle cameras and surveillance systems under poor lighting conditions. Up until now, a deep learning method using RAW image data from sensors had been used, called “Learning to See in the Dark”. This method, however, requires a dataset of more than 200,000 images with more than 1.5 million annotations [for end-to-end learning and, as a result, is both costly and time-prohibitive.
The joint research team have come up with a new domain adaptation method, which builds a required model using existing datasets by utilizing machine learning techniques such as Transfer Learning and Knowledge Distillation. This resolves the challenge through the following steps: building an inference model with existing datasets; extracting knowledge from the aforementioned inference model; merging the models by glue layers, and, finally, building generative models by knowledge distillation. It enables the learning of a desired image recognition model using the existing datasets.
Using this domain adaptation method, the team has been able to build an object detection model "YOLO in the Dark" using RAW images taken in extreme dark conditions, with the YOLO model. Learning of the object detection model with RAW images can be achieved with the existing dataset, without generating additional datasets. In contrast to the case where the object cannot be detected by brightness enhancement of images with existing YOLO model, the proposed model makes it possible to recognize RAW images and detect objects. Critically, the amount of computing resources needed in this model is about half of the baseline model, which uses the combination of previous models.
This "direct recognition of RAW images" by the method is expected to be used for object detection in extremely dark conditions, along with many other applications. Socionext said that it plans to incorporate this new method into the company’s image signal processors to develop new SoCs, as well as new camera systems around such SoCs, and offer leading edge solutions for applications including automotive, security, industrial and others that require high performance image recognition.
(The research work will be presented at European Conference on Computer Vision (ECCV) 2020, held online from August 23 through 28)