Buffistas Building a Better Board
Do you have problems, concerns or recommendations about the technical side of the Phoenix? Air them here. Compliments also welcome.
To-do list
OK thinking about ways of checking for problematic HTML.
I've got two methods I've played with (in Perl).
- every time we find an opening tag, we push it onto an array; every time we find a closing tag, if it's the closing tag which matches the last item of the array, we remove the last item of the array; a good result is an empty array, but if it's not empty, we add closing tags, in reverse order of course, to the post.
- every time we find an opening tag, we increment the number of that tag in a hash, so we have %hash{number_of_open_a_tags} = x; every time we find a closing tag we decrement that number; a good result is all tags at zero; but if they're not zero, we need that many closing tags added to the post
Problems with A: it won't work correctly with crossed tags, like this:
<b> something in bold <i> something in bold and italic </b> no longer bold, but still italic </i>
it'll get out of step, but the HTML, despite being Very Bad In Principle, won't actually be problematic for the page as a whole.
Problems with B: no maintaining of order, because it's a hash.
If you looked at the page a few minutes ago, I just resaved it while logged in as an admin. I realized that the "edit" link wasn't showing up because I hadn't written the post. Anyway, now the edit link is showing up, and it works! So I don't know why ita couldn't edit the post originally. The mystery deepens!
OK my analysis of that bad link is this:
<a href='http://story.news.yahoo.com/ news?tmpl=story2&cid=300&e=17&u=/ibsys/20021108/lo_wisc/1382225">Try this.</a></p><hr align="LEFT" size="2" width="75%" noshade><p class='normal-text'>
It's an opening A tag:
<a
then a very long, busted attribute:
href='http://story.news.yahoo.com/ news?tmpl=story2&cid=300&e=17&u=/ibsys/20021108/lo_wisc/1382225">Try this.</a></p><hr align="LEFT" size="2" width="75%" noshade><p class='
then an unknown attribute (as the browser sees it)
normal-text'
and then finally the closing bracket:
>
(unless it's a mandatory part of the post message function)
Whatever the solution
t clueless
, please don't let it be this.
OK a possible solution: check for bad HTML. Don't try to correct. If the HTML is bad simply refuse to post, and give the user an error message. You catch, put the burden on the user to fix it.
This could be unclosed tags, unclose quotes of various types, and too many quotes of the same type in an < A Href > tag.
For me, the fact that "post message" takes you back to the thread serves as the preview function. I narcissistically reread all my posts, and don't have broken HTML for longer than it takes me to edit.
Problems with A: it won't work correctly with crossed tags,
Problems with B: no maintaining of order, because it's a hash.
Could there be some sort of compromise between these? (Note: I'm really not so good at figuring out whether things can actually be implemented or not, or at figuring out how much of a pain it would be.) Like, instead of checking if the last tag in the array matches the closing tag it finds, check each tag in the array, starting from the last one. When it finds the right one, remove it, and shift everything below it up. Then add whatever's left in order. (Ugh. I don't explain too good.) Like, if the post contained
t strike
Crossed-out text
t i
crossed-out italic text
t b
crossed-out italc bold text
t /i
crossed-out bold text
t /b
crossed-out text
It would make the array as [strike, i, b], then remove the i so it's [strike, b], then remove the b so it's [strike], then get to the end of the post and add the
t /strike
.
Could there be some sort of compromise between these?
Very sensible, how about if we do both?
Then if method B says everything's OK, but method A says it isn't, method B wins, because we know that at least every tag has a closing tag and the post, no matter how mangled, is self-containedly mangled and won't cause problems downthread.
The edit link *did* work, Jon. It's that the edit form was broken by the bad HTML.
I think way back I suggested a modified A. When you're popping tags off the stack, if you meet one that's unclosed (a
t /b
when you're expecting a
t /a
for instance, add the
t /a
)
Which would have us err on the side of too many closing tags, instead of too few. Safe, except if we throw in an extra
t /table
we're in trouble, aren't we?