Hey! I was actually the author of the idea gladly I have realized that it cannot be turned into classification problem from the data that we have (that came after evaluating it in a first place). So you are right about number data problem.
Right now we decided to go CV way and you can actually see the progress here https://discuss.eroscripts.com/t/sexlikereal-ai-scripting-tool/43435/14
My plan is to add classification network on top of that. We will detect hands, mouths, feet etc and track them across the video (either tracking by detecting or tracking with bbox tracker). That should yield some reliable signals.
Again that won’t cover the 100% variety due to the occlusion, but I think we might get into some territory where it actually don’t whole lot of work for scripting.
If you willing to chat more just hmu philip at sexlikereal.com