Jump to content

An updated index of the Old Story Archive


Recommended Posts

Posted
On 8/27/2020 at 5:46 PM, DexterEvanXavier said:

If it helps, I have a HTML file which I updated to work with the online version of the archive. It has all 6833 posts on a single page, ordered by story title. You can't sort it like you could an excel file, but you can use the find tool in your web browser to search it.

You can download it here: https://www.dropbox.com/s/iz6m7vh78axhkyz/ArchiveStoryList.html?dl=0

Awesome.  Found what I was looking for in seconds.  Thanks!

  • Like 1
  • 1 year later...
Posted
On 11/1/2018 at 3:32 AM, Scriptboy said:

I have been working on compiling an updated index of the story archive and through the use of Excel, Access, and other tools, I have been able to put together a listing of all of the 6,800+ story chapters located in the archive. This listing includes the story titles, author names, story descriptions and the links where the stories can be found in the archives. 

Some work still needs to be done but most of the data has been entered in Excel and transferred into a database for safekeeping which makes sorting and querying easier. 

I will post an update as soon as I have something available to show you guys! This should make it easier to find a story in the archive!

 

Sadly, looks like OP disappeared long ago. Did anyone happen to have a copy of the excel file that he uploaded (and is no longer available)? Thanks!!

Posted
On 10/4/2023 at 12:38 PM, yobdior said:

Sadly, looks like OP disappeared long ago. Did anyone happen to have a copy of the excel file that he uploaded (and is no longer available)? Thanks!!

The story list is here: https://musclegrowth.net/archived-stories/

 

  • Like 4
  • Upvote 1
  • 1 year later...
  • 1 month later...
Posted

Meanwhile, in tangentially related news, I've spent years slowly polishing the contents of the original-original archive (the TheArchive-070326.zip file (formerly?) downloadable from O'Melissokomos' blog, way back when), including:

  1. Developing a Perl script to heuristically detect line-break issues like:
    1. Files with paragraphs separated by <p> tags, instead of wrapped in <p>...</p>. (Technically valid pre-HTML4 syntax, but frowned upon today.)
    2. Files with a <p> tag between each line of the source, breaking each real paragraph up into a bunch of arbitrarily-separated line-paragraphs with roughly the same number of characters in each.
    3. Files with NO tag breaks, where the entire story source consisted of a wall of plain text with embedded newlines (which HTML turns into spaces), where an intended paragraph break was represented by a blank source line (two newlines in a row).
    4. Files with no tag breaks and NO blank lines, where the only indicator of a paragraph separation is a shorter-than-average source line that ends in some sort of punctuation.

      ...and then running that script across the entire archive, to clean all of those issues up and turn each story into an HTML document with no forced wrapping, where each paragraph is wrapped in a single <p>...</p> tag pair.
       
  2. Doing a LOT of manual wrapping cleanup, either in addition to or instead of what resulted from running my cleanup script. (Which in a few cases made things worse instead of better, especially in stories where the source already contained accidental breaks that didn't belong there.)
    It's nearly impossible to write code that can properly detect all of the weird forms of weird line-wrapping issues you end up with, when your documents are sourced from forum posts that users created by doing things like copy-pasting from text editors, or word processors, or other HTML documents. Never mind being able to work out what it's supposed to look like, and repair it. (And then even when it DID correctly work out what was going on in the beginning, some files would just suddenly change formatting partway through, for no discernible reason! I didn't even TRY to make it smart enough to deal with that nonsense.)
     
  3. Re-encoding all of the files from their captured encoding -- which was declared as "iso-8859-1" in every archive file, but was very frequently something else in reality. (Again, copy-pastes from other software, particularly on Windows with all of its strange pre-Unicode code pages.) I've attempted to make them all valid, standard utf-8, even though in a number of especially annoying cases, that meant hand-editing the file and guessing that all of the occurrences of one particular garbled string were supposed to have been ellipses, but this other garbled string represents an em dash, or a curly single quote / apostrophe, or an open/close curly double quotes.
     
  4. Replacing the <style> tag full of identical CSS in each file's <head> with <link media="all" href="../story.css" type="text/css" rel="stylesheet">, so that the entire archive can be re-styled in one place.
     
  5. Removing a couple of blank and/or duplicated story chapters.
     
  6. ADDING one missing chapter that was skipped over in a story, because the archive contained both the preceding and following chapters.
     
  7. Doing a few repairs on broken next/previous link connections between the various stories/chapters.

I've kept the whole thing in a Git repository tracking every change, automated and manual, starting from the untouched contents of the .zip file. The full list of changes to date (in reverse order, most recent first) is:

Quote

2946: Fix paragraph breaks
2462: Remove (dupe of 4173)
1273,4,6,8: Join broken lines
164: Manual paragraph breaks
1682-6: Remove page#s, spurious breaks
209-211: Remove spurious paragraph breaks
379: Manual paragraph breaks
1297-1300,1416: Fix apostrophes and other punctuation
412: Re-fix encoding
409-413: Manual breaks, encoding fixes
3445: Add lots of manual breaks
364: Total manual reformatting
3445: Revert double-reencoding corruption
3463: P tags for ---- separators
Update common css
224-225: Removed spurious par breaks
10: Add newlines at P tags
1483, 2474: Remove spurious extra breaks
2411: Manual breaks
All files: Use external css file for styles
3217: Restore original formatting
Remove spurious closing brace in CSS
313,917: Manual breaks
4142: Auto-formatting + manual additions
180: manual breaks
1261: manual breaks
360,361: Manual breaks
3769: Manual break (just the one)
3121: Manual breaks
438: Manual breaks
976: Remove spurious brace in CSS
4243-4: Added manual breaks
Remove 1221 (blank)
442: Broke up some run-together words
2140: Ran fixup script
976: UTF-8 conversion
gitignore and gitattributes
Automatic eol conversion
1940,2965:(same text) Manual breaks, UTF-8
2123:Manual breaks, recoded to UTF-8
1949: Add manual breaks, convert UTF-8
1877,1942:UTF-8 conversion
181,188,1592,1601+3,1672-4,1786+7: UTF-8 conversion
166,1042,1173+4,1368,1373+4+7,1616+7: UTF-8 conversion
410,1059+60,1168,1288,1253+4,1367: UTF-8 conversion
374,4222:Spurious non-ASCII, fix breaks
Automatic eol conversion
Create gitattributes file
Update .gitignore
1949: Copied formatting from O's site
1396: Remove spurious orig breaks
2142: Manual breaks
Add .gitignore file
Recoded all WINDOWS-xxxx to utf-8
854,884,1393: Remove spurious breaks
851-2,854,863-5,868,870,888: Revert fixup
850: Better breaks, manually
847,849,871-2,886-7,893: Revert fixup
1447,50: Remove spurious p-breaks in orig
1458: Removed spurious p-breaks from original
1504: Revert 38b3d0261 for file.
787-791: Manual breaks
820: Manual breaks
1537: Convert to utf8, manual breaks
396: Added manual breaks
443: Add manual breaks
363: Restored pars from O\'Melissokomos version + manually
1637: Revert auto-formatting, recode to UTF-8
1540-1: Reencode from ms-ansi to UTF-8
2855: Formatted lists in story text as <ul>
2855-62: Revert auto-formatting damage
Set meta charset=utf-8 on reencoded files
1540-1, 1642-3, 2034: Manual breaks
1530,2140,2270: Changed headers to utf-8
1540-1, 1642-3, 2034: Manual breaks
2140: Recoded from ms-ansi, processed
548:Removed spurious breaks, added others
3507: Applied better paragraph breaks from 2006 repost
2543: Removed some spurious breaks
3353: Removed extraneous breaks
398: Manual breaks
2247: Added manual breaks
4364: Inserted missing chapter between 1464,1466
3370,3394-5: Recoded from ms-ansi
1624,7;1638-40;1670,6;1680;1703,7: Recoded from ms-ansi
2351-2: Recoded from ms-ansi
2364: Recoded from ms-ansi
65: Recoded from ms-ansi
4216-23: Recoded from ms-ansi
3020: Recoded from ms-ansi
1401-2: Recoded from ms-ansi
2877: Recoded from ms-ansi
1264-6,1290: Recoded from ms-ansi
639: Manually added breaks
1003: Recoded from ms-ansi
2105-12,+12 additional: Recoded from ms-ansi
3362: Recoded from ms-ansi
2104: Recoded from ms-ansi
3352: Recoded from ms-ansi
2740: Reencoded from ms-ansi, restored breaks
2038: Recoded from ms-ansi
4082-4084: Recoded from ms-ansi
4183: Recoded from ms-ansi
1677,2116,3777: Recoded from ms-ansi to utf-8
3445: Recoded from cp1250 to utf-8, manually cleaned up
900,901,1070,2909: Recoded from ms-ansi to utf8
2841: Recoded from ms-ansi to utf8
3874: Added manual breaks from original post
2653-54,2709,2749,2880,3989,4310: Jocking, manual fixes
Restore files corrupted by fixup script
963,965: Un-mangled ASCII separator lines in text
765: Processed for breaks
2830,2831: Processed 2830, manually split 2831
570: Broke up lines, balanced P tags, cleaned up
569: Processed, hand-corrected paragraph additions
611,612: Manually cleaned up spurious breaks
1295: Fix encoding damage
2824,2825: Manually split
Ran latest fixup script against some dist files
737: Import manual fixes from other tree
Import dist from TheArchive-070326.zip

I keep thinking I should eventually make the results available somewhere. At times I've been tempted to push the whole repository to GitHub, commit history and all, but I'm not sure how that'd go over with their TOS. I do too much actual software development there to risk getting my account suspended over a repo full of gay muscle-growth smut.

  • Like 2
  • 3 weeks later...
Posted

@nypup2train 

You're doing important archival work because in a lot of ways you're preserving history. It's kind of amazing!

Do you think you'll eventually release your results?

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

By using this site, you agree to our Guidelines, Terms of Use, & Privacy Policy.
We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue..