Most of the time its just device limitations you face. Those harder sections are often because either the device isnt fast enough, or because it wants to use a feature that the device doesnt support (vibration, suction, multiple hands, tightness).
Those harder sections are just workarounds, thats why they take that many time to make. If such things arent desired, AI can generate about 80% of the script at very high accuracy, 15% at a high enough accuracy it generaly doesnt even need a change. and 5% of the more difficult parts where AI is generaly not bad, but does require fixes.
And time wise, you will spend about 20% of the time on the 80% part (just watching it), and the remaining 20% is that part you adjust (which takes about 4x as long then).
But its not like trying to adjust for those is a must, it heavily depends on the video to begin with. If the movement is fast, suction isnt realy needed for the experience on the handy. Double action is on that generaly the most difficult since it doesnt allow a simple module added onto it to imitate it. Vibration, tightness and suction already exist on strokers (they just arent script automated), and often are too heavy. That is generaly an issue that will be solved over time (with a bit more expense as a result like the osr).
Idealy you even want the AI to generate a script for each of those things, and that includes hand/head movement as seperate scripts, and then have a script program to allow merging.