With error text to the effect of "You have x unclosed t b tags, y unclosed t i tags, and 1 unopened t /table tag."
Or something.
Xander ,'End of Days'
Do you have problems, concerns or recommendations about the technical side of the Phoenix? Air them here. Compliments also welcome.
With error text to the effect of "You have x unclosed t b tags, y unclosed t i tags, and 1 unopened t /table tag."
Or something.
It'll be slightly easier, the way I've coded it, to say "the following tags didn't get closed: <b>,<i>,<b>,<table>" but same diff.
OK I started thinking about the unclosed HREF problem.
Am I crazy or is it quite simple? We look for
HREF=
followed by a quote of some kind, and we check what's between that and the next quote, or what's between that and the end of the post, if no quote appears.
Like if we find
HREF="http://somewhere.com"
we grab the "http://somewhere.com" part and examine it for characters that shouldn't be there.
That way if we find we've grabbed
HREF="http://somewhere.com' >somewhere interesting</a>
... so, as I was saying...
we'll find the spaces, the brackets and so on, "inside" the HREF, which tell us that it's not a valid link.
ita, I am still working on the CVS repository, but I've had a death in my family this week (details coming on Monday in Beep Me), so I'm woefully behind.
Many apologies; I'll get back to this as soon as I can. Thanks for your patience.
Don't apologize, Karl, and certainly not to me. I'm the biggest bottleneck out there.
Pay attention to your life and your family, and we'll be here when you get back. Take care.
John -- not just href. Someone can do the same with a target='new". In fact, a t font color='red" is a very unfriendly thing.
Hmmm... Is there any reason a poster would use single quotes embedded within double quotes (or vice versa), inside of a tag (i.e. inside of a < > pair)?
Is there any reason a poster would use single quotes embedded within double quotes (or vice versa), inside of a tag
The only reason to do that would be in a TITLE or ALT attribute,
TITLE="Commentary on 'Hush' by Joss"
which I can't see people using very much, or in JavaScript, where you need to do things like
onmouseover="alert('hello world')"
but we've specifically outlawed that by regex.
Oh, and if you're inline-stylesheeting you might want to do
style="font-family:'Times New Roman'"
for font names with spaces.
not just href
Didn't think of that. It's just unbroken attributes of all kinds isn't it? Though the link tag causes the worst problems.
EDIT: full HTML compatibilitycakes, in the example above, you shouldn't put quotes inside quotes, you should use the entity code for quotes, so you'd put
TITLE="Commentary on 'Hush' by Joss"
or
TITLE='Commentary on "Hush" by Joss'
and the use of double quotes is more correct for backward compatibility. I can't quote chapter and verse but some browsers only like double quotes.
I don't think link tags cause worse havoc than any other tag where mismatched quotes are concerned. In all cases, they cause a (potentially) large chunk of html to be ignored by the browser.
I'm no regex expert so I've no idea if this is doable, but is there a way to look inside every tag that takes parameters (like a href or font) and "close" every "open" single or double quote? It wouldn't be too different from what John H has already done with open tags, would it?
I don't think link tags cause worse havoc than any other tag where mismatched quotes are concerned.
I was thinking of that unable-to-edit, ita-had-to-get-under-the-hood thing. Has that been fixed?
That's the same havoc -- doesn't matter which tag broke it. The key difference is if you open with a ' or a " -- affects where it gets closed.
Jon's coded the fix. I haven't implemented it yet.