ita, when you have a moment, what are the legal tags here?
I really want to think about the tag-closer code again, but are there any legal tags, apart from <br> which don't need closing?
Mal ,'Bushwhacked'
Do you have problems, concerns or recommendations about the technical side of the Phoenix? Air them here. Compliments also welcome.
ita, when you have a moment, what are the legal tags here?
I really want to think about the tag-closer code again, but are there any legal tags, apart from <br> which don't need closing?
I wouldn't swear to it, but poking around in the test site i found this list in the post-stripping function:
<a> <b> <i> <u> <ul> <ol> <li> <p> <br> <strike> <table> <tr> <td> <th> <font> <pre> <code>
Thanks Jon, that's like strip_tags(the list above), right?
So, only really BR, but what about the ones that don't need to be closed for it to work, but should be for the syntax, like P and LI? Hmmm. More thinking required.
that's like strip_tags(the list above), right?
Pretty much, yeah.
what about the ones that don't need to be closed for it to work, but should be for the syntax, like P and LI?
I'd say close them. I'm all for compliant html where possible.
t edit although I see the problem --- you want the t /li to occur before the next t li or t /ul
Similarly, with the P tag.
Having thought about it for a while, it's only the tags that would create formatting problems further down the page that need to be handled. An unclosed LI tag won't cause the rest of the page to be indented, but an unclosed OL or UL will. Unclosed P tags won't cause any problems either.
I'm seeing the logic like this:
Go through the post, creating a list of all potential troublesome tags, i.e. when we encounter the first A tag, put it on a list.
When we encounter the next </A>, take that A tag off the list again.
If there's anything on the list at the end, close it.
It needs some more refinement, but that's essentially it, right?
I've just remember that the last big HTML problems were caused by unclosed attributes, not unclosed tags. I don't have any idea how to sort that out...
Could you keep track of all open attributes and close them at the close of a tag?
It's possible but scary, to my brain.
I've already roughed out a version in Perl which just counts opening/closing tags. There are either more openings than closings, or there aren't.
It tells me, for this page, that the A tags are right on target, there are exactly the same number opening as closing, but the P tags aren't. Which we'd expect.
If it were are troublesome tag, like B, then it would just insert the closing-B x number of times where x is the difference between openers and closers.
For tags like A, FONT, B, I and so on, it won't necessarily perfect that post as the writer intended, but it should stop the error cascading on down the page at least.
For other tags, for instance table tags, it's a bit more scary.
[EDIT: just realised this script forces me to write "for(@openers)". Ha ha geek ha.]
Well, you know tags can't nest, so you only have to deal with one at a time. It's just like the logic you laid out for tags, only for attributes.
Actually, it's easier then that, isn't it? It's just counting double-quotes, isn't it?
Start counting double quotes when you see a <, and if you have an odd number of them when you see > throw in an extra. Plus a pinch of salt.
you know tags can't nest
Er, do I? You mean you can't have:
<tag blah blah blah <nothertag> blah blah>
right?
The idea of checking the syntax of every attribute of every tag seems rather processor-intensive to me. No reason it can't be done.
But, but but, coming home from the bottle shop (liquor store) it occurred to me that what was really scary the last time we had a major HTML snag was that it stopped the post from even being editable.
So I think what we need is not only an edit but a "safe edit" for admins, which would somehow rob such malformed posts of their power to break the browser/form/interface. Because ita had to go into the SQL DB and edit by hand, command-line style. Or do I mean "commando-style"? Anyway.