Do you want to detect a T72 tank in your drone's vision software?
Do you want to see where the most vulnerable parts, e.g., optics, wheels, or engine of the tank are?
Object recognition in videos and images is unthinkable today without leveraging the capabilities of deep neural networks (DNN). This development is only a few years old and began at the end of the 2000s with the development and implementation of very large neural networks on Nvidia graphics cards. Newly developed network topologies combined with the unprecedented computing power of graphics cards triggered a revolution that we are still at the beginning of. John Hopfield and Geoffrey Hinton were awarded the 2024 Nobel Prize in Physics for their contribution to these developments.
The availability of suitable training data forms the basis of well-functioning DNN-based image recognition and classification. Approximately 1000 annotated images in different scenarios are required for each object class. But what if real images of these object classes are unavailable, for example, if the data is classified or available just as construction sketches or if your system prepares for future crisis and conflict situations?
This is where 3D data comes into play in the most realistic scenarios possible - with the huge advantage that image annotations, i.e. the image areas that the object occupies and the associated object designation, do not have to be created manually but can be generated automatically as far as possible.
In today's world, software development that uses DNN can build on well-functioning frameworks and tools such as TensorFlow, PyTorch, or Caffe. We have used Ultralytics' YOLO framework in our projects, which again is based on PyTorch. We were particularly interested in the runtime behavior of this framework in Linux-based embedded systems, such as those used in drone companion computers for object recognition. Our C++ application, which runs the YOLO DNN on the target computer using OpenCV, achieved good recognition rates on an RPi5 with approx. 7 - 8 Hz in pure image classification in 640x640 image resolution. As soon as the object is recognized, a performance of more than 20 Hz can be achieved with conventional object tracking algorithms.
The VisDrone training dataset applied to Weekly Drone Video — October 25, 2019 - RCTC 60 Truck Lanes Project using the YOLO v8 framework
We at cantaloupe can help you create training data for your use case, for your vision system, and for exactly the objects you want to recognize. We know how to design realistic scenarios and can help you integrate training data synthesis into your development pipelines.
A T72 scene with rendered and programmatically marked parts.
Do you want to know more? Then please write us!
Comments