I'm currently working on a vision system for a UAV I am building. The goal of the system is to find target objects, which are rather well defined (see below), in a video stream that will be a 2-D flyover view of the ground. So far I have tried training and using a Haar-like feature based cascade, a la Viola Jones, to do the detection. I am training it with 5000+ images of the targets at different angles (perspective shifts) and ranges (sizes in the frame), but only 1900 "background" images. This does not yield good results at all, as I cannot find a suitable number of stages to the cascade that balances few false positives with few false negatives.
I am looking for advice from anyone who has experience in this area, as to whether I should: 1) ditch the cascade, in favor of something more suitable to objects defined by their outline and color (which I've read the VJ cascade is not). 2) improve my training set for the cascade, either by adding positives, background frames, organizing/shooting them better, etc. 3) Some other approach I can't fathom currently.
A description of the targets:
- Primary shapes: triangles, squares, circles, ellipses, etc.
- Distinct, solid, primary (or close to) colors.
- Smallest dimension between two and eight feet (big enough to be seen easily from a couple hundred feet AGL
- Large, single alphanumeric in the center of the object, with its own distinct, solid, primary or almost primary color.
My goal is use something very fast, such as the VJ cascade, to find possible objects and their associated bounding box, and then pass these on to finer processing routines to determine the properties (color of the object and AN, value of the AN, actual shape, and GPS location). Any advice you can give me towards completing this goal would be much appreciated. The source code I currently have is a little lengthy for post here, but is freely available should you like to see it for reference. Thanks in advance!
-JB