Buffistas

River: You gave up everything you had. Simon: [Chinese] Everything I have is right here.

'Safe'

Buffistas Building a Better Board

Do you have problems, concerns or recommendations about the technical side of the Phoenix? Air them here. Compliments also welcome.

To-do list

First Previous Next Last Recent

§ ita § - Jan 01, 2003 9:55:38 pm PST #2527 of 10000

Well not canonically, no, but this is transformative fiction.

It likes the first quote in the table more than it likes all the rest.

Me, I don't really care. I'll toss a seed in and see if anything changes.

John H - Jan 01, 2003 10:00:59 pm PST #2528 of 10000

I'm researching this in newsgroups, and you know how you said you didn't want to get all the rows and then throw them all away?

That's apparently what it's doing anyway.

"Gavin M. Roy" <gmr@justsportsusa.com> writes:
 SELECT * FROM poetry ORDER BY random() LIMIT 1;
 [ is slow for 35000 rows ]


Yeah.  Basically this query is implemented as
  (a) select all 35000 rows of "poetry";
  (b) compute a random() value for each row;
  (c) sort by the random() values;
  (d) take the first row, discard the rest.


The good news: this gives you a pretty-durn-random selection.  The bad
news: you didn't really care about choosing a random ordering of the
other 34999 rows, but it computed one anyway.


This problem's been discussed before, but I've not seen any really
ideal answer.  SQL is not designed to offer unpredictable results ;-)

§ ita § - Jan 01, 2003 10:04:37 pm PST #2529 of 10000

Well not canonically, no, but this is transformative fiction.

But the database is doing it, and so it's faster than us having to handle a PHP data structure. That's where I want the work to lie. You'd have to page through the record set a random number of times in PHP. I don't like that at all, not on each and every page.

John H - Jan 01, 2003 10:24:43 pm PST #2530 of 10000

the database is doing it, and so it's faster than us having to handle a PHP data structure

I hope I'm not being a pain, but I honestly don't get this.

Leaving aside the fact that the DB doesn't seem to do Random properly anyway, what are the efficiency issues that concern you?

If we get PHP to pick a random number between x and the number of quotes, then get it to select the quote that's greater than or equal to that number, that's somehow always less efficient than getting the DB to do it? Even though we know that DB does it in a very inefficient way (according to some dude on USENET anyway)?

There's something about back- and and front-end efficiencies that I obviously don't get.

§ ita § - Jan 01, 2003 10:42:53 pm PST #2531 of 10000

Well not canonically, no, but this is transformative fiction.

What do you mean "the quote that's greater than or equal to that number"?

I'm missing something. You want to pick a random number between 1 and what? And then select the quote with that ID, or something near it?

John H - Jan 01, 2003 10:49:57 pm PST #2532 of 10000

select the quote with that ID, or something near it

Well I only said that because you said the ids weren't consecutive, so I was thinking we'd select quote ID $x, and if there wasn't one, it could find the next one up -- is that feasible? Pseudocode: "select where equal to or greater than $x, limit of one"?

Or just go back and rewrite the numbers so they are consecutive.

§ ita § - Jan 01, 2003 10:55:17 pm PST #2533 of 10000

Well not canonically, no, but this is transformative fiction.

Or just go back and rewrite the numbers so they are consecutive.

Every time something's deleted? I dunno ...

Okay -- so we pick a random number -- what's the upper ceiling, or do we hit the database for that beforehand?

Then we do select charactername, quotation, quotes.season, quotes.episode, ep_title from quotes left join episodes on episodes.show_name=quotes.show_name and episodes.episode=quotes.episode and episodes.season=quotes.season where is_approved='Y' where quotes.id >= randomnumber limit 1, you're saying?

Personally, for something as trivial as quote frequency, I *suspect* the overhead in connecting, counting, returning, tearing down, connecting, selecting, returning, tearing down is more than the ORDER BY RAND() LIMIT 1.

But I admit I have no benchmarks.

John H - Jan 01, 2003 11:02:28 pm PST #2534 of 10000

John H - Jan 01, 2003 11:02:33 pm PST #2535 of 10000

what's the upper ceiling, or do we hit the database for that beforehand?

My quick fix is to say "how many do we have now? hard-code that in and remember to change it when we put some new quotes in" because only you are going to be putting new quotes in anyway.

select charactername, quotation, quotes.season, quotes.episode, ep_title from quotes left join episodes on episodes.show_name=quotes.show_name and episodes.episode=quotes.episode and episodes.season=quotes.season where is_approved='Y' where quotes.id >= randomnumber limit 1

That seems incredibly long to me. What's all that "show_name" and "season" stuff doing in there? If we don't display it? I'm puzzled, sorry.

§ ita § - Jan 01, 2003 11:11:10 pm PST #2536 of 10000

Well not canonically, no, but this is transformative fiction.

Corrected.

select charactername, quotation, quotes.season, quotes.episode, ep_title from quotes left join episodes on episodes.show_name=quotes.show_name and episodes.episode=quotes.episode and episodes.season=quotes.season where is_approved='Y' and quotes.id >= randomnumber limit 1

Showname is in the join, not the fields returned -- a join on episode 4, season 2 in the episode database will return Inca Mummy Girl and Untouched. We need to know which show a quote is from too. The season is in the fields returned because that's a field in the quote data structure, as is the episode number. It's not mandatory for the display, but puts little additional drain on the system, so I didn't remove it after we started displaying titles instead of ep numbers.

I have to repeat my position against hardcoding. The only reason I'm the only one massaging the table is because the code hasn't been written yet.

Admittedly I'm absolutely not bothered by having one quote come up much more than the other. I don't read them. And I really don't want to write kludgey code or double database hits for them either.

First Previous Next Last Recent