Buffistas Building a Better Board
Do you have problems, concerns or recommendations about the technical side of the Phoenix? Air them here. Compliments also welcome.
To-do list
OK, first of all, I was supposed to set up the mailing lists, but I've been slacking. I apologize for that.
Here's the short summary about the performance:
- The good news is that as far as I can see, the performance problem is strictly a matter of fixing the code. Given our current usage patterns, there's no reason why we can't go back to a cheaper shared hosting solution at the end of the year. Nobody needs to change their behavior.
- The bad news is that because of the way the code is structured, much of it will have to be thrown out and rewritten from scratch to get the most benefit.
Basically, every time the code queries the database, it uses a "connectAndQuery" function. This gets called around 3 or 4 times per page view. The first problem is that database connects are extremely expensive. Every time we connect to the database, mysqld forks a copy of itself, which then goes away when the connection is closed. The second problem is that for every query, the entire table gets loaded into memory, and then the table gets written out as HTML. This is especially bad with threadsucks, where the size of a httpd process can grow to over 40 megabytes. A far better approach is to fetch the result one row at a time, print out a line of HTML, and then fetch the next row.
Unfortunately, the connectAndQuery idiom is used everywhere in the code.
A far better approach is to fetch the result one row at a time, print out a line of HTML, and then fetch the next row
How do you think we can avoid the totals being different at the end of the page than they are at the top? Or should we just try and leave all totals or cumulative info until the bottom?
Also -- is the forking shutting down correctly, or are we sitting on connections that never disappear (according to what Rob found in the MySQL code?).
To be perfectly honest, I'm not exactly sure why mysqld is forking. Up to now, I had assumed that mysqld was a single-instance, multithreaded process. That's the way I've seen it work in other places. Maybe it's a configuration issue; I haven't seen anything about it in the mysql manual. Maybe it's just how threads show up in a Linix ps listing.
is the forking shutting down correctly, or are we sitting on connections that never disappear (according to what Rob found in the MySQL code?).
It wasn't that the connections weren't going away, it was that an internal variable that counted the connections was wrong. I haven't looked into that. I don't see any evidence of actual connections that are sticking around.
Cereal:
How do you think we can avoid the totals being different at the end of the page than they are at the top? Or should we just try and leave all totals or cumulative info until the bottom?
I don't know. It would still be better, as a compromise, to write the HTML to a string and keep the string in memory than to keep the SQL table in memory as an array.
Sorry. Phrased poorly -- I meant is it what Rob found, or something "legit".
I'm apprehensive of the line by line redesign -- I'd thought the tradeoff would lean towards one large transaction as opposed to setting up and tearing down many small ones, but mostly it was about preserving the integrity of the pages.
If it will save much money, however (what the hell are those MySQL slobs doing! Fix the product!) it should be considered.
I'd prefer to switch DB backends, honestly, to something with stored procedures and triggers.
I'd thought the tradeoff would lean towards one large transaction as opposed to setting up and tearing down many small ones,
It's not the transactions that are a problem, it's the opening and closing of connections to the database. You can have many transactions per connection, we just don't currently do it that way, and that's what we need to fix.
Is this a serious enough issue that you'd recommend changing the flow even if we moved to Postgres? I really want to move, because the triggers and stored procedures, etc. look like they should clear up post numbering issues.
In my mind, the code is a lot more malleable than the database is. There's nothing inherently wrong with the database. In fact, none of our SQL queries will need to change, just how they're executed in the code.
There's nothing inherently wrong with the database
Well, except it's not as full featured as I'd prefer, and there's that bug Rob found.
I've wanted to change to Postgres since shortly after we went live. It's completely unrelated to the connections issue.
What I'm wondering -- if the back end change is a given, how does that affect your urgency on the page model change?
if the back end change is a given, how does that affect your urgency on the page model change?
The new-connection-for-each-transaction thing needs to be fixed, no matter what database we're using. There should be one sql connection per page view (and if those connections can be cached across page views, even better).
If the loading-the-entire-query-into-memory thing could be fixed for just threadsuck, that would probably be good enough.