Buffistas Building a Better Board
Do you have problems, concerns or recommendations about the technical side of the Phoenix? Air them here. Compliments also welcome.
To-do list
This is a "maybe it's just me" thing, but when I do Threadsuck, it takes over my whole browser, in fact makes it very hard to do anything on my computer, until it downloads the whole thing.
And my issue is, if the thing I wanted was available via a link, rather than a form submission, then I could use the Alt-Click option -- to "download to disk" rather than the "load in browser window, then save" which is what my browser so obviously hates (reading 1.5 MB web pages!).
So, I'm all about the reverse engineering.
I can figure out from the form, that the link to download Bureau from 5007 onward would implicitly be something like [link] something ]&thread_id=25&beg_post=5007&stage=submitted&keep_unread=1
But, that doesn't work. It gives me that "Fatal Error" page.
Am I causing big problems by doing this?
This is a "maybe it's just me" thing, but when I do Threadsuck, it takes over my whole browser, in fact makes it very hard to do anything on my computer, until it downloads the whole thing.
Not just you, but also not just this board. The threadsuck is a 5+MB HTML page, and Mozilla does seem to choke on it a bit for me.
I got Natter 1 threadsucked this afternoon. It took about three hours. The threadsuck itself didn't take very long, but being a perfectionist, I went through the original thread fixing all the instances of superlong URLs and superextended-hyphenated-word-sequences, so that anybody who read the threadsuck wouldn't have to side-scroll each and every one of ten thousand posts. The URLs are easy enough to find, because you can search on "http", but those frelling hyphenations are driving this archivist bonkers.
Also, I'm out of town for a week starting Saturday, so I don't know how much I'll have done by then, if that's an issue.
those frelling hyphenations are driving this archivist bonkers
What app is it you're using to search, and can it use regexes?
Because I have a regex for multiple-hyphenated phrases right here, AIFG!
Or, of course, he said coming back from the coffee machine, a regex for "any phrase over 50 chars not containing whitespace gets a linebreak at char 51" would be easy.
I was hoping you'd say something like that.
Right now I'm just doing it in a browser window. I set my preferences to download, say, a thousand posts at a time, then if my horizontal scroll bar appears, I search through that batch of posts in smaller chunks until I find the offending hyphenations. What I need to know is not only what long hyphenations there are, but also what post they're in so I can edit them.
Or, of course, he said coming back from the coffee machine, a regex for "any phrase over 50 chars not containing whitespace gets a linebreak at char 51" would be easy.
Yeah, it's not always hyphenations, sometimes youjustgetwordsstrungtogetheryouknow?
I have Perl, not sure what other programs I have that would be suitable.
edit: And editing by hand is fine, because there aren't that many instances (15-20), and that way you can place the break where it makes sense.
I have Perl, not sure what other programs I have that would be suitable.
Obviously Perl would do it, but Homesite and Dreamweaver have regex support. Then there are things like Ultra-edit and whatever.
With Perl you can do it with a quick oneliner, using the -i flag.
Let me play just a little...
I'll trying and find my copy of homesite and see what the syntax is.
But, here's something you could try. From the commandline, doing this:
perl -p -i.bak -e 's/(\\S{50})/$1n/' longstrings.txt
will put a linebreak after the fiftieth character of any line of longstrings.txt which contains fifty non-whitespace characters without a break.
Of course it won't wrap it neatly at the nearest hyphen, and it doesn't fix lines with more than 100 chars in a row, (run it twice!) but it's better than nothing.
Oh and it makes a backup file of course, called longstrings.txt.bak -- Perl, is there anything it
can't
do?
The code for "a pattern of fifty non-spaces in a row" in Homesite is the very beautiful:
([^[:space:]]{50})
so if you choose "extended Replace", check the Regular Expressions box then replace the above code with
\\1
followed by a return, that will hard-wrap, in a rather brutal way, all 50-character strings, putting a line break after char 50.