Anyone with web scraping experience want to help archive old RTS scripts?

From what I can see in my settings, there’s no attachment max. There’s a post character limit of 32000, but I have control over this. Again, not sure if we’ll run into some hidden upper limit in the software.

Yup, don’t see why not

@xrobin for every script post, there’s going to be a post title, post description + links, maybe some images, attachments. There are also post replies, and I don’t actually know if RTS supports attachments in replies. Replies sometimes include things like updated video links when the original fails.

If we chose to include all of that above information (ignoring replies for now), it seems like a single mega post will be too large to navigate. I guess you could use anchors to create a table of contents and link to different anchors.

1 Like

I wonder if the most straightforward thing to do is scrape all the data and dump it in a public S3 bucket. Then create a super barebones front-end to explore that old data.
You do lose out on integrating old posts directly into the new site though.

EDIT: Another thing I forgot is that sometimes RTS script posts have no attachments. They may just link to 1 or more megas that contain the video/script files.

1 Like

Yeah, the way Husky shared his archive is pretty compact since it’s just a list of titles with video and preview links, and then a single mega full of all the scripts. I’m not sure how we’d scrape or host the scripts but we’d need to figure out something. If possible, it’d be great to do something like what Husky did except with embedded previews, but I don’t know how do-able that is. If we wanted to include a description, indented under each title, that might be do-able too. I see what you’re saying about scraping replies for mega links to videos. Those are important but I’m not sure how we’d scrape them.

1 Like

I think i found a method to scrape the entire free funscript section on RTS. It should give me a list of every post title, author, description, images and links.

2 Likes

It’d be cool if we could archive portfolios like Realcumber’s here, with embedded previews https://realtouchscripts.com/viewtopic.php?f=63&t=7163. It makes the post/thread really long but it’s nice to just scroll until you see something you like.

1 Like

We definitely need to handle free funscripts and free vr funscripts section. Is it necessary to scrape the paid sections though?

1 Like

I’d tend to think it’s not as necessary to scrape the paid section, since those are going to be browsable on SLR, RS, and CzechVR. I guess it might be nice if someone here is searching for their favorite actress or studio and this could be a one-stop source for searching all scripts, but I suspect many pro scripters may get around to their own portfolios at some point.

1 Like

Starting with a seperate category “RTS Archive” or something like that could be a start.

Maybe subcategorizing into VR/2D/JOI/Hentai/SFM like Husky did?

1 Like

script-creator-portfolios is already a category for RTS Archives. Do you have a reason in mind for why we should make another one?

1 Like

Ah alright my bad, i wasn’t thinking straight.

Im scraping 1966 RTS posts containing; title, description, images and files. Where would you guys like me to post them once i scraped them all? I should have them all finished tomorrow.

1 Like

So, what I’m thinking is, once you have all that data, divide it by author, then start with the most prolific authors and give them each a thread in script-creator-portfolios. The less prolific, who have only made a handful, might be able to share one thread titled something like “Misc RTS Archives” within the script-creator-portfolios category, and within that post we could still organize them by author, but all in the same post.

Another thing to keep in mind is that before we start posting the data, maybe we should make sure the author is okay with us doing it for them. We know jacecolm has given us permission and he’s one of the most prolific so we can start with his. If the scripter has been MIA for a long time and is not on this forum, then we can assume it’s fine to post their portfolio but we should probably ask if it’s someone like Realcumber or Evernessince for example.

Edit: Then again, if we want to post everybody’s without asking first, it’s easy enough for us to remove it later if they ask because they want to post it themselves.

1 Like

@PHO3NIX-E I like xrobin’s suggestions, but let us know how much work it is. If there’s a faster method that still gets us like 90% of the way, we could potentially do that as well.

Also are you scraping any of the comments? If not, how do we want to handle that in case there are any broken links in the original post?

In #script-creator-portfolios, I can also create a pinned post explaining that past scripts have been scraped from RTS and we will remove anyone’s content removed if the author requests it.

1 Like

I just got an idea that might sort of help. We could scrape the RTS post url along with everything else, and then include that link in the list, so that way if someone discovered a broken link, they could very easily pop over to RTS and see if there’s a fixed link in the replies. What do you think?

1 Like

Maybe you could scrape all comments, but only keep comments that have a link/attachment in them. Then tack them onto the original post’s content?

1 Like

I’ll scrape every topic for every hyperlink, then testing them afterwards is inevitable i guess.

1 Like

If there’s a lot of manual labor involved in part of this process, we could try to enlist other users to help crowdsource the work.

1 Like

If that’s possible, it would be better than my idea since my idea would end up with a lot of broken links if RTS did go down.

1 Like

Good discussion here guys, I am curious to see how this turns out, very important that you have that pinned statement stating creators can request takedowns of their content if not reached out to.

A one stop spot for paid and free is huge actually, where people can look to see if there favorite scene has been scripted (paid or free)

As long as there arent commercial paid scripts included then should be no problem, and if this could consolidate listing of all paid scripters work into #script-creator-portfolios , that would be welcome as well

1 Like

This is implying that the paid funscript and paid vr funscript section would also need to be scraped then?

1 Like

If possible, I think it would be good to scrape the paid section as well in light of both the news that there’s a lot of free scripts in there, and also Realcumber’s words above encouraging us to go ahead and migrate the paid scripts too.

1 Like