I have done a bit of analysis with a corpus of ~2000 scripts, including all of the ones from here. My approach is quite simple, using some basic Python code:
- Use
json.load()
to read the the file from disk - Extract just the actions from the resulting dict - we don’t care about the metadata - and save them as a Pandas dataframe. From this point it’s quite simple to calculate metrics for each script. This thread is quite useful for speed calculations.
- For calculating the compression ratio:
- Convert the timestamps to time deltas - many scripts don’t start at the first frame, and the timestamps contain a lot of redundant information that could bias the compressibility.
- Export the time delta and position columns to csv (without an index column or header row)
- Zip the resulting csv file, using the highest level of LZMA compression, and take the ratio of the zipped csv file size to the original file size
Although most of the values fall in a fairly narrow range, there is quite a bit a variation in compression ratio (capped at 0.30 in the figure here for readability):
At the lower end, there tend to be more beat-based CH-style scripts like this, with only occasional changes of stroke speeds and stroke lengths:
https://discuss.eroscripts.com/t/ch-red-light-green-light-softcore/74604
In the middle, around the 0.06 mark, there are scripts like this one, which have a bit more variety but which still contain some fairly long regular sections:
https://discuss.eroscripts.com/t/cock-hero-dream-girls/13893
Extremely short scripts (less than 1 minute) don’t compress very well due to the overhead in the compression algorithm. If we ignore those, then at the upper end of the scale are scripts like this one, which tend to be slower, more action-based, and have quite varied stroke speed and stroke length:
https://discuss.eroscripts.com/t/stop-and-go-take-two/52840
There are many other metrics you can look at, and each one favours different kinds of scripts. E.g. stroke length standard deviation has a distribution like this:
The big peak at 0 consists of all the fap hero scripts, which use full strokes throughout. At the very top end is this one, which makes extensive use of little bounces at the ends of longer strokes:
https://discuss.eroscripts.com/t/katesplayground-nirvana/138806
Turning it down a bit to about 35, you get mostly PMVs, e.g.
https://discuss.eroscripts.com/t/thots-vs-e-girls-script-request-fulfilled/153162
In theory, it ought to be possible to calculate lots of metrics like these and feed them into some sort of recommender system, which would make it easier for site users to discover new scripts in line with other scripts they liked.