Buffistas Building a Better Board
Do you have problems, concerns or recommendations about the technical side of the Phoenix? Air them here. Compliments also welcome.
To-do list
It's possible but scary, to my brain.
I've already roughed out a version in Perl which just counts opening/closing tags. There are either more openings than closings, or there aren't.
It tells me, for this page, that the A tags are right on target, there are exactly the same number opening as closing, but the P tags aren't. Which we'd expect.
If it were are troublesome tag, like B, then it would just insert the closing-B x number of times where x is the difference between openers and closers.
For tags like A, FONT, B, I and so on, it won't necessarily perfect that post
as the writer intended,
but it should stop the error cascading on down the page at least.
For other tags, for instance table tags, it's a bit more scary.
[EDIT: just realised this script forces me to write "for(@openers)". Ha ha geek ha.]
Well, you know tags can't nest, so you only have to deal with one at a time. It's just like the logic you laid out for tags, only for attributes.
Actually, it's easier then that, isn't it? It's just counting double-quotes, isn't it?
Start counting double quotes when you see a <, and if you have an odd number of them when you see > throw in an extra. Plus a pinch of salt.
you know tags can't nest
Er, do I? You mean you can't have:
<tag blah blah blah <nothertag> blah blah>
right?
The idea of checking the syntax of every attribute of every tag seems rather processor-intensive to me. No reason it can't be done.
But, but but, coming home from the bottle shop (liquor store) it occurred to me that what was
really
scary the last time we had a major HTML snag was that it stopped the post from even being editable.
So I think what we need is not only an edit but a "safe edit" for admins, which would somehow rob such malformed posts of their power to break the browser/form/interface. Because ita had to go into the SQL DB and edit by hand, command-line style. Or do I mean "commando-style"? Anyway.
I'm 99% certain you can't next tags like that. But it occurs to me that even if you can, you would just have to recursively parse.
In order to do the tag closing you'll have to parse the tags. Counting double quotes to make sure each tag has an even number of them shouldn't make it more processor intensive. And it should only happen on post, so it's not like it's going to make page serving more expensive.
One thing I just thought of, as I was reading the HTML spec; you'll have to count single quotes too. In fact, once you see one sort of quote, you'll need to ignore occurences of the other. In other words, double quotes found between pairs of single quotes are normal, as are single quotes found between pairs of double quotes.
I could hack together some perl that does the checking, if that would help.
I'm 99% certain you can't next tags like that.
That's OK because I'm
100%
certain you can't. But there's nothing but nesting if you just mean:
<b> blah blah <i> blah blah </i></b>
once you see one sort of quote, you'll need to ignore occurences of the other. In other words, double quotes found between pairs of single quotes are normal, as are single quotes found between pairs of double quotes.
Well there shouldn't be that kind of nesting in regular HTML, though you'll get it in JavaScript for sure.
I could hack together some perl that does the checking, if that would help.
When I said I was working in Perl, I should have said "but of course this board uses PHP" -- have you ever done PHP? If you've worked in Perl it will be no big deal.
Betsy's post with the mismatched quotes? The one that broke the board and ita had to edit by hand? It was particularly devious. If you look at the resultant html in the page, it's not at all obvious how it ended up the way it did. ita reproduced the problem in our test environment. I took the page and copied it into a "regular" html page. You can look at it here. It's post #4 that broke things. I'd analyze it some more, but it's amazing what a few shots of tequila can do to one's analytical abilities.
I looked at the broken post, and it's not a case of too few quotes. Instead, it's someone includingn a URL that uses double-quotes.
If this anchor tag was handwritten by the author of the post, I don't think we can do anything about it. If instead it was created by the code that automagically wraps anchor tags around URLs, then the code needs to detect single or double quotes in the URL and use the other kind of quote.
The quote counting is still a good idea, but it won't fix this problem.
I can't tell what's happening from that post. I don't seem to be looking at the original, it seems like it's a URL-encoded version of it.
But it's definitely because of quotes that are opened and don't close, around the URL. for sure. I'm not sure what you're saying about that Rob, but what happens is, as I said earlier to Gar, if you've opened quotes, but not closed them, in a link, you've got a link to the entire remaining text of the page, haven't you?
The problem is, really really broken HTML, and what the browser chooses to do about it. Lots of browsers just auto-close unbroken tags at the end of a TD or TABLE.
We could just tell people to be careful?
While reaffirming that I love this board beyond belief, this is where I'd gently like to request the ability to preview posts. Not everyone will use it (unless it's a mandatory part of the post message function) - but most infamous tag-droppers (hi - my name is Cindy) probably would. It would put some of the responsibility for tags gone bad - back on the posters. I guess it wouldn't show that certain open tags could break the board though.
The bronze beta is coded so that if any tag that is supposed to be closed is left open, the carat converts to a bracket and the tag isn't activated. For example, if I intended to put "John H" in bold, but only put the opening b tag, what I'd see, either when I previewed or posted, would be [b]John H[/b] - which tells me I didn't close the tag, and so "John H" does not appear as bold.
I have no technical knowledge, so I don't know if that's feasible for you.
I'm sorry I was so unclear in describing the problem. I think the post Jon links to it a problem with bad HTML generated by Phoenix. That we can fix.
Here's the bad link, made harmless:
<p>Your modem too slow? <a href="http://story.news.yahoo.com/%20news?tmpl=story2&
cid=300&e=17&
u=/ibsys/20021108/lo_wisc/1382225%22%3ETry%20this.%3C/a%3E%3C/p%3E%3Chr%20
align=%22LEFT%22%20size=%222%22%20width=%2275%%22%20
noshade%3E%3Cp%20class=" normal-text="">
</a>
Note how the link has an attribute (class=) that uses double quotes as delimiters. If a Phoenix user were to try to paste such a link into a message, it's my theory that Phoenix will wrap the entire URL with double quotes, which doesn't work.
OK, I still would like to make sure the URL converter handles links with quotes in them correctly, but I can't say for sure that's the only thing wrong with the post Jon linked to. I think the HTML there is already corrupted by the bad link.