Find free VR scripts missing for local files automatically

Bounce · May 20, 2023, 12:06am

Quick script I whipped up to match local files to (free) VR scripts on the site. It uses the free VR script list as a datasource. It is very janky since the filenames have varying patterns, but it works well enough to be useful (I found close to 30!! scripts that I missed).

The output is as follows in case a local script could not be found for a videofile, but a free script does exist on Eroscripts according to the google sheet. You need to manually compare the title and the filename as some false results are very likely to occur.

Workflow:

Install python3
Install fuzzywuzzy (Terminal command after installing python: pip install fuzzywuzzy)
Install python-Levenshtein (Terminal command after installing python: pip install python-Levenshtein)
Copy paste the code into a file (freescripts.py)
Download the free VR script list (save as csv through google drive) to the same folder as freescripts.py
Open freescripts.py and adjust the variables video_folder and funscript_folder
Adjust the regex if you think it is worth it (paste filelist in regex101.com and tweak there)
Adjust similarity_threshold threshold if you want more or less results with more or less false positives
Run the code (python freescripts.py)

Code:

import os
import pandas as pd
from fuzzywuzzy import fuzz #pip install fuzzywuzzy
import re

#variables to adjust
csv_file = "free_scripts.csv"
video_folder = "G:\\vr\\8k" #video folder
funscript_folder = "G:\\vr\\funscript" #funscript folder; can be the same
similarity_threshold = 69 #nice! (sensitivity between 0 and 100)

# Load the CSV file
df = pd.read_csv(csv_file,)
mp4_files = [file for file in os.listdir(video_folder) if file.endswith(".mp4")]

#used to select studioname + title and cut off all the excess such as resolution, etc
regex_pattern = r"^(.*?)([Oo]culus|3d\W|\dk|180|\d+p|_LR|[_-]\d[Kk]|A_XXX|A\sXXX|_UHD|\d{4}).*?mp4$"


# Function to check if a funscript file already exists in the target folder
def is_funscript_exists(file):
    file = "".join(file.split(".")[:-1]) + ".funscript"
    path = funscript_folder+"\\"+file
    return (os.path.isfile(path) or os.path.islink(path))


# Iterate over the MP4 files and perform fuzzy matching
matched_files = []
for mp4_file in mp4_files:
    match = re.search(regex_pattern, mp4_file)
    if match:
        mp4_title_match = match.group(1)
        for index, row in df.iterrows():
            try:
                similarity_ratio = fuzz.ratio(mp4_title_match, "{} {}".format(row["Studio / Scene type"],row['Name of the scene']))
                if similarity_ratio >= similarity_threshold:  # You can adjust the similarity threshold as needed
                    if(is_funscript_exists(mp4_file)):
                        continue
                    print([mp4_file, row['Name of the scene'], row['Link to the post']])
            except:
                continue

hentaiprodigy69 · May 20, 2023, 2:08am

This is pretty cool.
I found an alternative to fuzzywuzzy that I liked for file comparisons.

I use this one for file comparisons - jellyfish.jaro_similarity
The speed is also much faster, like 10-50x.
It worked a lot better than fuzzywuzzy for me for matching up some hentai titles when I was scraping this one website for funscripts.

Usage:

import jellyfish
...
jellyfish.jaro_similarity("Energy Kyouka ep2", "Energy Kyouka!! 02 [E3B0301C]") #88.0

Bounce · May 20, 2023, 10:15am

Thanks! I tried it, but it seems slightly slower compared to fuzz. I did install the recommended python-Levenshtein package which accelerates fuzz greatly; without it is indeed considerably slower. I’ve added it to the instructions for now.

hentaiprodigy69 · May 20, 2023, 12:01pm

Oh, that explains why you got fuzz faster. Never knew about it.

azrael21x · August 2, 2023, 12:45pm

Hey, thank you so much for this tool.
I am not very familiar with Python, so I wanted to ask you for assistance:
Is there a way to adjust the video_folder and funscript_folder variable so it can include multiple subfolders?
I don’t have all my scenes in one folder, I have multiple folders, sorted by the studios.
Like maybe I can add all folders as separate variables and then and iterate over them as a list?

EDIT:
Could make it work!
If anyone is interested, I added a dictionary to map video folders to their corresponding funscript folders and then changed the for loops slightly.

Here is the code with a template dictionary if anyone is interested:

import os
import pandas as pd
from fuzzywuzzy import fuzz
import re

# Variables to adjust
csv_file = "free_scripts.csv"
similarity_threshold = 69 #nice! (sensitivity between 0 and 100)

# Dictionary to map video folders to their corresponding funscript folders
folder_mapping = {
    "X:\\BadoinkVR": "Y:\\BadoinkVR\\Funscripts",
    "Y:\\fuckpassVR": "Y:\\fuckpassVR\\Funscripts",
    "X:\\VRBangers": "Y:\\VRBangers\\Funscripts",
}

# Load the CSV file
df = pd.read_csv(csv_file)
mp4_files = [file for folder in folder_mapping for file in os.listdir(folder) if file.endswith(".mp4")]

# Used to select studioname + title and cut off all the excess such as resolution, etc
regex_pattern = r"^(.*?)([Oo]culus|3d\W|\dk|180|\d+p|_LR|[_-]\d[Kk]|A_XXX|A\sXXX|_UHD|\d{4}).*?mp4$"

# Function to check if a funscript file already exists in the target folder
def is_funscript_exists(file, folder):
    file = "".join(file.split(".")[:-1]) + ".funscript"
    path = os.path.join(folder, file)
    return (os.path.isfile(path) or os.path.islink(path))

# Print "Searching..." message
print("Searching...\n")

# Iterate over the MP4 files and perform fuzzy matching
match_found = False
for mp4_file in mp4_files:
    match = re.search(regex_pattern, mp4_file)
    if match:
        mp4_title_match = match.group(1)
        for index, row in df.iterrows():
            try:
                similarity_ratio = fuzz.ratio(mp4_title_match, "{} {}".format(row["Studio / Scene type"], row['Name of the scene']))
                if similarity_ratio >= similarity_threshold:
                    folder = os.path.dirname(mp4_file)
                    funscript_folder = folder_mapping.get(folder, "")
                    if is_funscript_exists(mp4_file, funscript_folder):
                        continue
                    print([mp4_file, row['Name of the scene'], row['Link to the post']])
                    match_found = True
            except Exception as e:
                print("Error:", e)
                continue

# Print "No matches found!" if no matches were found
if not match_found:
    print("No matches found!")