Thanks, I’ve got some previous mentions of base yolo models as well as nudenet that someone shared previously in this thread and I didn’t find them stable or reliable enough, even on 2d video. They’d be great if I could use them as a feature extractor and I recently saw an article describing a way to use yolov8 that way but I haven’t gotten around to testing it.
I checked your link and also saw your newer commits and noted your yolo cock tracking onnx model and tried it out side by side with a yolo model from a longer training run and I don’t know what your experience has been with that, but I think if your not happy with the detections it’s making, then more data in the training dataset might be needed from the looks of it.
The same goes for the model I tuned, when watching the previous animation as well as that side by side, I can see many points where it’s not detecting objects but it should be, and that’s likely due to the small dataset I used. For transparency, there’s plenty of times in those videos that the model I trained, utterly fails! I have set opencv to listen for keypresses so when I see a frame that’s not being detected correcty, I can hit S and it can be dumped to a folder for later annotation. It’s not something I’m going to put a lot of time into, I’m still much more interested in approaches that do it all within the model.
I also saw your repo code using the cv2.calcOpticalFlowFarneback for tracking too, and I’d been meaning to take a look at what it would take to plug that into a model. I was surprised how easy it was to output a simple up or down vector and plot it into a detectable wave form and it’s given me a few ideas around feeding the hsv data into a custom model instead of images as it seemed to be visually informative, even at low resolutions. Although I can see that if I was programmatically trying to infer the right waveforms, the correct polarity is largely dependant on who is moving!
And I also saw how you’re doing quadratic interpolation in your code and it seems wayyy easier than the way I’ve been doing it so that was really useful as well! I have been manually writing functions to calculate the time between points and then doing sinusoidal calculations! I see interp1d is marked as legacy though, but make_interp_spline with uniform parametrizations looks like a good alternative I’ll try. A few weeks ago I played around making an autoencoder with lstm, with interpolated data choking down to 1/20th the size of the original data so having good, easy to generate interpolated data is really useful. I was thinking the resultant decoder model might be good to use as a replacement head on some of the other sequence models I’ve tinkered with, and potentially output smoother, more consistent predictions.
I also tried the lstm approach with my ‘manual’ interpolation to create an ‘axis expander’ model. To go from 1 channel and to create roll+pitch channels based off the longer term input (60s in, 60s out). Objectively, it wasn’t bad, but wasn’t necessarily better than just setting multifunplayer to random for those channels.
Back to your tracker model though, have you thought about doing a segmentation model instead of detection model? There’s some smart segmentation tools that make segmentation easier to annotate than you’d think for that sort of defined object. Potentially it’s easier and more reliable to calculate the visible area of the segmentation mask?
The existing optical flow and other approaches are quite good and consistently outperform anything model based I’ve tinkered with. I’m not sure if/where the pain ports are for users of it at the moment, but I could see some sort of detection model that helps a program make a decision about where to focus an optical flow’s attention (such as static cropping guidance), or to provide smoothing/postprocessing of the tracking output data.
I wouldn’t hold out any hope on anything useful from me though . My area of interest on this is mostly focused on something fully image model based, which I don’t think is achievable with my limited knowledge, or my free time. Potentially something I’ve posted will save some time of someone smarter than me who comes along and sees it