In my experiments with OFS I’ve had the same frustration of scrolling back and forth to find the “correct” frame (sufficiently frustrating that I would rather spend my time automating funscripting), and had the same idea that things would go much faster if there were more context available - i.e. I could see frames in a timeline instead of video.
As for implementation, I was thinking of an app calling FFMPEG in a background thread to extract frames scaled down and trimmed to a range that pages as the user scrolls forwards/backwards, saving from FFMPEG and loading to scripter app via a RAM disk (/dev/shm on civilised OSes) and deleting images once loaded.
I suspect this is sufficently different from OFS that it should be a separate app.
The quickest test of whether this would help is to extract frames from a video and open them in an image viewer gallery. If you can quickly see where the end of strokes should go, it would help. If not, it wouldn’t.