User:Purple Wrench/Homestar Offsite
From Homestar Runner Wiki
Here you will find an HTML list comprised of all files on the various official Homestar Runner websites. This allows you to download each entire website with minimal effort, but retain the browsing experience as much as possible. For example, this would make it possible to put homestarrunner.com on a CD-ROM for offline use. This page has a similar purpose to User:Nerd42/List and is equally compatible with DownThemAll, but has a much larger scope, including all secret pages (and a few which the wiki has not yet documented).
Please read the instructions before attempting to use these lists.
Contents |
Instructions
- Using Firefox, go to the DownThemAll add-on page and download it.
- Customize the Firefox toolbar to include a DownThemAll button.
- Go to one of the website links below and copy the HTML code you see inside the box. NOTE: Do NOT view the source page, as it uses templates to display the code.
- Paste the HTML code into a text editor and save it as a .html file on your computer. NOTE: If you have a text editor that only supports 64KB files or less, you will need to download a better text editor. This will most likely not be a problem for most users.
- Open the .html file in Firefox and click on the DownThemAll button.
- In the "Save Files In:" section, choose a directory where you want to save the files.
- In the "Filters" section, uncheck everything and then check "All Files".
- Make sure nothing is selected in the "Fast Filtering" section.
- In the "Renaming Mask:" section, clear the text box and paste the following (without quotes):
- If you have Windows, use "*subdirs*\*name*.*ext*"
- If you have Mac or Linux, use "*subdirs*/*name*.*ext*"
- Click "Start!" and wait a few minutes.
- Eventually all files should be downloaded in the directory you specified. Files in subdirectories on the website will now be in subfolders of the directory.
- Open the directory you specified and open "index.html" to start browsing.
Issues
If you find any issues when using these lists, please discuss them on the talk page. If the issue can be confirmed it will be posted here.
- Flash and JavaScript are used a lot in these pages. Sometimes Flash links or JavaScript links will be blocked by the browser. For example, I was not able to get the pop-up in crazy cartoon for the AIM icons to appear.
- The favicon does not work.
- I have "flagged" a few files which are not included in the lists below. On homestarrunner.com, they include a low-level file that is often blocked by web browsers; the regionally-visible files from April Fools' 2006; and the original version of Who Said What Now?. None of these are actively linked to on the website.
Websites Listed
- www.homestarrunner.com - list is 180KB, site is approx. 520MB, updated 25 Dec 2016
- podstar.homestarrunner.com - will be added if there is enough interest
- www.videlectrix.com - will be added if there is enough interest
- www.thoraxcorp.com - will be added if there is enough interest
www.homestarrunnerstore.com- site closed before I had the chance; please see archive.org instead
How I Did This
It's not easy to find a surefire way to identify every file on a website. Sitemap generators are great when the site uses HTML links predominantly, but homestarrunner.com links almost exclusively are embedded in Flash files. As a result, most sitemap generators stop at the Index Page. Google's search results provide a better solution, but it's difficult to pull those results into a complete list (even though Google supposedly allows that to be done using the Spreadsheets app).
The best solution, if you're willing to use outdated results, is archive.org. I typed in "http://www.homestarrunner.com/*", which retrieves all files found in the www hostname of homestarrunner.com. The archive returned over 10,000 different URLs. I saved the source code of the page to a file (which was 3MB alone) and searched it for text enclosed by "<a href= ... >" and "</a>", which returned the actual URLs rather than their archive.org mirrors.
4,000 of the URLs were obviously wrong, so I eliminated them. I pasted the rest into an Excel spreadsheet and wrote a macro to specify if each one was a valid URL or not.
Under normal circumstances, an HTML request that does not find a valid URL returns code 404. However, homestarrunner.com generates a 404'd page every time an invalid URL is found, and that custom 404'd page is valid, so the request returns 200 like every other page. Instead of checking the HTML request, the macro checks whether the text "404error.swf" is found, meaning that it must be an invalid page. Non-HTML files are treated as an exceptional occurrence, meaning that the macro was very sluggish. That said, it was able to reveal that, out of the remaining 6,000 URLs, about 1 in 3 were valid.
I then took all of the valid URLs and, by hand, labeled them by name (or category, in cases like Menu Previews and Fan Stuff) and extension. This allows them to be organized in a more meaningful manner, and allows new URLs to be added much more easily.
I converted the spreadsheet to a plain text document, and wrote a quick program to format its contents using HTML code. The program also gave each name/category its own heading, allowing multiple files with the same name to be grouped together. As I said previously, this makes it easier to add new URLs, and I quickly realized that when I tried downloading the site and found that the 2014 Toons had no Menu Previews. Manually adding the four remaining menu previews was only as difficult as finding their names and putting them in their correct order alphabetically under "Menu Previews".
Enjoy! -- ■■ PURPLE WRENCH ■■ 19:21, 22 February 2015 (UTC)