It likes the first quote in the table more than it likes all the rest.
Me, I don't really care. I'll toss a seed in and see if anything changes.
'Dirty Girls'
Do you have problems, concerns or recommendations about the technical side of the Phoenix? Air them here. Compliments also welcome.
It likes the first quote in the table more than it likes all the rest.
Me, I don't really care. I'll toss a seed in and see if anything changes.
I'm researching this in newsgroups, and you know how you said you didn't want to get all the rows and then throw them all away?
That's apparently what it's doing anyway.
"Gavin M. Roy" <gmr@justsportsusa.com> writes:SELECT * FROM poetry ORDER BY random() LIMIT 1;[ is slow for 35000 rows ]Yeah. Basically this query is implemented as (a) select all 35000 rows of "poetry"; (b) compute a random() value for each row; (c) sort by the random() values; (d) take the first row, discard the rest.
The good news: this gives you a pretty-durn-random selection. The bad news: you didn't really care about choosing a random ordering of the other 34999 rows, but it computed one anyway.
This problem's been discussed before, but I've not seen any really ideal answer. SQL is not designed to offer unpredictable results ;-)
But the database is doing it, and so it's faster than us having to handle a PHP data structure. That's where I want the work to lie. You'd have to page through the record set a random number of times in PHP. I don't like that at all, not on each and every page.
the database is doing it, and so it's faster than us having to handle a PHP data structure
I hope I'm not being a pain, but I honestly don't get this.
Leaving aside the fact that the DB doesn't seem to do Random properly anyway, what are the efficiency issues that concern you?
If we get PHP to pick a random number between x and the number of quotes, then get it to select the quote that's greater than or equal to that number, that's somehow always less efficient than getting the DB to do it? Even though we know that DB does it in a very inefficient way (according to some dude on USENET anyway)?
There's something about back- and and front-end efficiencies that I obviously don't get.
What do you mean "the quote that's greater than or equal to that number"?
I'm missing something. You want to pick a random number between 1 and what? And then select the quote with that ID, or something near it?
select the quote with that ID, or something near it
Well I only said that because you said the ids weren't consecutive, so I was thinking we'd select quote ID $x, and if there wasn't one, it could find the next one up -- is that feasible? Pseudocode: "select where equal to or greater than $x, limit of one"?
Or just go back and rewrite the numbers so they are consecutive.
Or just go back and rewrite the numbers so they are consecutive.
Every time something's deleted? I dunno ...
Okay -- so we pick a random number -- what's the upper ceiling, or do we hit the database for that beforehand?
Then we do select charactername, quotation, quotes.season, quotes.episode, ep_title from quotes left join episodes on episodes.show_name=quotes.show_name and episodes.episode=quotes.episode and episodes.season=quotes.season where is_approved='Y' where quotes.id >= randomnumber limit 1
, you're saying?
Personally, for something as trivial as quote frequency, I *suspect* the overhead in connecting, counting, returning, tearing down, connecting, selecting, returning, tearing down is more than the ORDER BY RAND() LIMIT 1
.
But I admit I have no benchmarks.
what's the upper ceiling, or do we hit the database for that beforehand?
My quick fix is to say "how many do we have now? hard-code that in and remember to change it when we put some new quotes in" because only you are going to be putting new quotes in anyway.
select charactername, quotation, quotes.season, quotes.episode, ep_title from quotes left join episodes on episodes.show_name=quotes.show_name and episodes.episode=quotes.episode and episodes.season=quotes.season where is_approved='Y' where quotes.id >= randomnumber limit 1
That seems incredibly long to me. What's all that "show_name" and "season" stuff doing in there? If we don't display it? I'm puzzled, sorry.
Corrected.
select charactername, quotation, quotes.season, quotes.episode, ep_title
from quotes
left join episodes
on episodes.show_name=quotes.show_name and episodes.episode=quotes.episode and episodes.season=quotes.season
where is_approved='Y' and quotes.id >= randomnumber limit 1
Showname is in the join, not the fields returned -- a join on episode 4, season 2 in the episode database will return Inca Mummy Girl and Untouched. We need to know which show a quote is from too. The season is in the fields returned because that's a field in the quote data structure, as is the episode number. It's not mandatory for the display, but puts little additional drain on the system, so I didn't remove it after we started displaying titles instead of ep numbers.
I have to repeat my position against hardcoding. The only reason I'm the only one massaging the table is because the code hasn't been written yet.
Admittedly I'm absolutely not bothered by having one quote come up much more than the other. I don't read them. And I really don't want to write kludgey code or double database hits for them either.