Buffistas Building a Better Board
Do you have problems, concerns or recommendations about the technical side of the Phoenix? Air them here. Compliments also welcome.
To-do list
I like the array of tag references, with a modification that when you find a closing tag, search the list backwards for an open. If you find it, remove it. If you don't, ignore it, possibly bitching to the poster about their bad HTML.
At the end, put in a close tag for every open tag still in the array.
From John's analysis of the page that broke editing, it looks like the author used mismatching quotes, single at the start and double at the end. I think my suggestion of counting quotes in a tag would solve that, since it would put in a closing single quote before the >. The post would still look wrong, but it would be editable.
Hil -- with your scenario, it would only close the <c> once -- since it's only open once.
So once it closed the c, the opening one would be deleted from the table, even though it didn't get to where the poster wanted to close the c yet? So then the order would be
t /c
t /a
t /b
t /a
t /c
, I think.
Tables do seem to be what cause the worst damage, while things like italics and sizes just create annoyances. Would disabling tables and making them a quickedit thing make it better, or just more confusing?
I think a quickedit table (table, td, tr, th plus attributes of width and colour and alignment and stuff) would be more confusing. I'd lean towards disabling them completely, except no! I don't want to! I like and use them pretty often.
So I'm pretty torn.
To deal with things like the " ' mismatch, could you just keep track of how many double-quotes are withing the brackets, and if there isn't an even number, treat it as if it was an invalid tag and just ignore it? Or are there times where there should be an odd number of double-quotes?
ita, aren't your tables cut-and-pasted, so you could link to them? Or you could post at W/X and link. Don't know if anyone else uses tables. Tables are trouble. Remember how at Table Talk (hee) you could post pictures by making them the background of a table cell?
Could you disable tables (heh) for non-admins? Or are we getting way too complex here?
Nope. Most of my tables are assembled from other sources (either by hand, or from Excel) so there's not really any linkage.
Of course, if I'm the only one using them ... that's not a good enough reason to keep them.
Hil, and we'd also need to count the single quotes. Problem being, i think you can legally nest a single " inside ' ', and vice versa.
you can legally nest a single " inside ' ', and vice versa
Definitely, more than one. Though really, you should use the entities which are ' and " or whatever.
The really scary thing for an HTML parser?
The fact that it's legal to have a tag like this:
<img src="a-is-bigger-than-b.gif" alt="a > b">
I think I thought of something. When you've got an opening bracket, start scanning the rest of the post. You're looking for a quote of some sort or an opening or closing bracket. If you get to an opening bracket, that's an error. If you get to a quote, then just look for another of those quotes, and ignore everything else you come across until you find it. Then start looking for any quote or bracket again. If you find a closing bracket, you're done. If you get to the end of the post while you're still looking for anything, that's an error. If there's an error, bring up an edit page that just has the text box at the bottom, no post at the top, and tell the poster to fix it before posting. Would that work? (Does that even make any sense, or do I need to find some better way of explaining it?)