That's what those 3 corners are for.
After a convolution with such a square as template, there are 3 local maxima, of known relative distances to each other, that give the bounding box.
For the computer vision community
That's what those 3 corners are for.
After a convolution with such a square as template, there are 3 local maxima, of known relative distances to each other, that give the bounding box.
Is that scale invariant? Or would they need to do it multiple times at different scales if they can't rely on a consistent scale?
The latter: it's not scale invariant
Wouldn’t it be easier to generate a new one?