Butch Posted November 11, 2021 Posted November 11, 2021 On 8/27/2020 at 5:46 PM, DexterEvanXavier said: If it helps, I have a HTML file which I updated to work with the online version of the archive. It has all 6833 posts on a single page, ordered by story title. You can't sort it like you could an excel file, but you can use the find tool in your web browser to search it. You can download it here: https://www.dropbox.com/s/iz6m7vh78axhkyz/ArchiveStoryList.html?dl=0 Awesome. Found what I was looking for in seconds. Thanks! 1 Quote
yobdior Posted October 4, 2023 Posted October 4, 2023 On 11/1/2018 at 3:32 AM, Scriptboy said: I have been working on compiling an updated index of the story archive and through the use of Excel, Access, and other tools, I have been able to put together a listing of all of the 6,800+ story chapters located in the archive. This listing includes the story titles, author names, story descriptions and the links where the stories can be found in the archives. Some work still needs to be done but most of the data has been entered in Excel and transferred into a database for safekeeping which makes sorting and querying easier. I will post an update as soon as I have something available to show you guys! This should make it easier to find a story in the archive! Sadly, looks like OP disappeared long ago. Did anyone happen to have a copy of the excel file that he uploaded (and is no longer available)? Thanks!! Quote
CMiller Posted October 9, 2023 Posted October 9, 2023 On 10/4/2023 at 12:38 PM, yobdior said: Sadly, looks like OP disappeared long ago. Did anyone happen to have a copy of the excel file that he uploaded (and is no longer available)? Thanks!! The story list is here: https://musclegrowth.net/archived-stories/ 4 1 Quote
nypup2train Posted November 13 Posted November 13 Meanwhile, in tangentially related news, I've spent years slowly polishing the contents of the original-original archive (the TheArchive-070326.zip file (formerly?) downloadable from O'Melissokomos' blog, way back when), including: Developing a Perl script to heuristically detect line-break issues like: Files with paragraphs separated by <p> tags, instead of wrapped in <p>...</p>. (Technically valid pre-HTML4 syntax, but frowned upon today.) Files with a <p> tag between each line of the source, breaking each real paragraph up into a bunch of arbitrarily-separated line-paragraphs with roughly the same number of characters in each. Files with NO tag breaks, where the entire story source consisted of a wall of plain text with embedded newlines (which HTML turns into spaces), where an intended paragraph break was represented by a blank source line (two newlines in a row). Files with no tag breaks and NO blank lines, where the only indicator of a paragraph separation is a shorter-than-average source line that ends in some sort of punctuation. ...and then running that script across the entire archive, to clean all of those issues up and turn each story into an HTML document with no forced wrapping, where each paragraph is wrapped in a single <p>...</p> tag pair. Doing a LOT of manual wrapping cleanup, either in addition to or instead of what resulted from running my cleanup script. (Which in a few cases made things worse instead of better, especially in stories where the source already contained accidental breaks that didn't belong there.) It's nearly impossible to write code that can properly detect all of the weird forms of weird line-wrapping issues you end up with, when your documents are sourced from forum posts that users created by doing things like copy-pasting from text editors, or word processors, or other HTML documents. Never mind being able to work out what it's supposed to look like, and repair it. (And then even when it DID correctly work out what was going on in the beginning, some files would just suddenly change formatting partway through, for no discernible reason! I didn't even TRY to make it smart enough to deal with that nonsense.) Re-encoding all of the files from their captured encoding -- which was declared as "iso-8859-1" in every archive file, but was very frequently something else in reality. (Again, copy-pastes from other software, particularly on Windows with all of its strange pre-Unicode code pages.) I've attempted to make them all valid, standard utf-8, even though in a number of especially annoying cases, that meant hand-editing the file and guessing that all of the occurrences of one particular garbled string were supposed to have been ellipses, but this other garbled string represents an em dash, or a curly single quote / apostrophe, or an open/close curly double quotes. Replacing the <style> tag full of identical CSS in each file's <head> with <link media="all" href="../story.css" type="text/css" rel="stylesheet">, so that the entire archive can be re-styled in one place. Removing a couple of blank and/or duplicated story chapters. ADDING one missing chapter that was skipped over in a story, because the archive contained both the preceding and following chapters. Doing a few repairs on broken next/previous link connections between the various stories/chapters. I've kept the whole thing in a Git repository tracking every change, automated and manual, starting from the untouched contents of the .zip file. The full list of changes to date (in reverse order, most recent first) is: Quote 2946: Fix paragraph breaks 2462: Remove (dupe of 4173) 1273,4,6,8: Join broken lines 164: Manual paragraph breaks 1682-6: Remove page#s, spurious breaks 209-211: Remove spurious paragraph breaks 379: Manual paragraph breaks 1297-1300,1416: Fix apostrophes and other punctuation 412: Re-fix encoding 409-413: Manual breaks, encoding fixes 3445: Add lots of manual breaks 364: Total manual reformatting 3445: Revert double-reencoding corruption 3463: P tags for ---- separators Update common css 224-225: Removed spurious par breaks 10: Add newlines at P tags 1483, 2474: Remove spurious extra breaks 2411: Manual breaks All files: Use external css file for styles 3217: Restore original formatting Remove spurious closing brace in CSS 313,917: Manual breaks 4142: Auto-formatting + manual additions 180: manual breaks 1261: manual breaks 360,361: Manual breaks 3769: Manual break (just the one) 3121: Manual breaks 438: Manual breaks 976: Remove spurious brace in CSS 4243-4: Added manual breaks Remove 1221 (blank) 442: Broke up some run-together words 2140: Ran fixup script 976: UTF-8 conversion gitignore and gitattributes Automatic eol conversion 1940,2965:(same text) Manual breaks, UTF-8 2123:Manual breaks, recoded to UTF-8 1949: Add manual breaks, convert UTF-8 1877,1942:UTF-8 conversion 181,188,1592,1601+3,1672-4,1786+7: UTF-8 conversion 166,1042,1173+4,1368,1373+4+7,1616+7: UTF-8 conversion 410,1059+60,1168,1288,1253+4,1367: UTF-8 conversion 374,4222:Spurious non-ASCII, fix breaks Automatic eol conversion Create gitattributes file Update .gitignore 1949: Copied formatting from O's site 1396: Remove spurious orig breaks 2142: Manual breaks Add .gitignore file Recoded all WINDOWS-xxxx to utf-8 854,884,1393: Remove spurious breaks 851-2,854,863-5,868,870,888: Revert fixup 850: Better breaks, manually 847,849,871-2,886-7,893: Revert fixup 1447,50: Remove spurious p-breaks in orig 1458: Removed spurious p-breaks from original 1504: Revert 38b3d0261 for file. 787-791: Manual breaks 820: Manual breaks 1537: Convert to utf8, manual breaks 396: Added manual breaks 443: Add manual breaks 363: Restored pars from O\'Melissokomos version + manually 1637: Revert auto-formatting, recode to UTF-8 1540-1: Reencode from ms-ansi to UTF-8 2855: Formatted lists in story text as <ul> 2855-62: Revert auto-formatting damage Set meta charset=utf-8 on reencoded files 1540-1, 1642-3, 2034: Manual breaks 1530,2140,2270: Changed headers to utf-8 1540-1, 1642-3, 2034: Manual breaks 2140: Recoded from ms-ansi, processed 548:Removed spurious breaks, added others 3507: Applied better paragraph breaks from 2006 repost 2543: Removed some spurious breaks 3353: Removed extraneous breaks 398: Manual breaks 2247: Added manual breaks 4364: Inserted missing chapter between 1464,1466 3370,3394-5: Recoded from ms-ansi 1624,7;1638-40;1670,6;1680;1703,7: Recoded from ms-ansi 2351-2: Recoded from ms-ansi 2364: Recoded from ms-ansi 65: Recoded from ms-ansi 4216-23: Recoded from ms-ansi 3020: Recoded from ms-ansi 1401-2: Recoded from ms-ansi 2877: Recoded from ms-ansi 1264-6,1290: Recoded from ms-ansi 639: Manually added breaks 1003: Recoded from ms-ansi 2105-12,+12 additional: Recoded from ms-ansi 3362: Recoded from ms-ansi 2104: Recoded from ms-ansi 3352: Recoded from ms-ansi 2740: Reencoded from ms-ansi, restored breaks 2038: Recoded from ms-ansi 4082-4084: Recoded from ms-ansi 4183: Recoded from ms-ansi 1677,2116,3777: Recoded from ms-ansi to utf-8 3445: Recoded from cp1250 to utf-8, manually cleaned up 900,901,1070,2909: Recoded from ms-ansi to utf8 2841: Recoded from ms-ansi to utf8 3874: Added manual breaks from original post 2653-54,2709,2749,2880,3989,4310: Jocking, manual fixes Restore files corrupted by fixup script 963,965: Un-mangled ASCII separator lines in text 765: Processed for breaks 2830,2831: Processed 2830, manually split 2831 570: Broke up lines, balanced P tags, cleaned up 569: Processed, hand-corrected paragraph additions 611,612: Manually cleaned up spurious breaks 1295: Fix encoding damage 2824,2825: Manually split Ran latest fixup script against some dist files 737: Import manual fixes from other tree Import dist from TheArchive-070326.zip I keep thinking I should eventually make the results available somewhere. At times I've been tempted to push the whole repository to GitHub, commit history and all, but I'm not sure how that'd go over with their TOS. I do too much actual software development there to risk getting my account suspended over a repo full of gay muscle-growth smut. 2 Quote
whatamuscleman Posted Saturday at 06:45 AM Posted Saturday at 06:45 AM @nypup2train You're doing important archival work because in a lot of ways you're preserving history. It's kind of amazing! Do you think you'll eventually release your results? Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.