Why You Should Avoid ORDER BY RAND() OR NEWID()
Published on Feb 23, 2009 by Jamie MunroQuite often we get a complaint from a client that their homepage is too slow. The first thing we do is load the homepage up and see for ourselves. Yep, it looks slow, so the next step is too see what is being done. Quite often you'll see some random data being displayed on the homepage. Whether it's random news articles, photos, videos, etc...If this is not the case, you may find my article on improving slow loading web pages more useful.
If the homepage is slow because of random data, you can lay a bet with high certainty that those 10 random articles are being pulled with an ORDER BY RAND() OR NEWID() slowing the site to a crawl. In case you are curious the difference, RAND() is the MySQL way and NEWID() is the SQL Server way. In this article, I'll show you why you should avoid these functions and several different solutions to help you overcome this issue.
The first thing we'll do is run a standard random query returning 10 rows on a large table and see how it performs.
SELECT * FROM videos ORDER BY RAND() LIMIT 10
The following query should look pretty standard. I ran this query on a table with approximately 360,000 records on it. Check out the results, they are shocking!
Showing rows 0 - 9 (10 total, Query took 14.3622 sec)
14.36 seconds, did your mouth just drop? Now imagine if we were running this on our homepage every time a user visited it? You know what, scratch that, I don't want to imagine that because I'm sure our web server would be coming down in the matter of minutes with a lot of concurrent users.
Now that we understand how bad the problem is, what are our solutions? Well we have several. Each solution provides something different. Some solutions will not be so dynamic, some won't be so random, but I guarantee you that each solution will run in under 0.5 seconds!
Solution 1 involves doing some mock randomness. The first thing to do is a simple query to get the count or max id of our videos:
SELECT COUNT(*) FROM videos
Using this number we can do one of two things:
1. Select 10 random numbers between 1 and the count. Once we have 10 random numbers we will do straight select queries where id = random number. I would avoid using an IN clause because it will most likely be slower than 10 simple queries.
The major CON to this solution is, if you delete data, the id might not exist and you may get less than 10 results. If this is the case, I would put the queries in a loop and loop until you get 10 records. Even if you have to do 15 or 20 simple select queries it will be faster than the 14.36 seconds query we did originally.
2. Select 1 random number between 1 and the count - 10. With this number we do one query where id is greater than random number and id is less than or equal to random number + 10.
There are two potential CONs to this solution. Same as above where you might not get 10 records if you delete data and if your data is entered chronological, you may not get very random results as they are all grouped together. To solve the first CON you can do a larger range, perhaps 1 to count - 30 and update your where clause to random number + 30 instead of 10 - just remember to LIMIT 10 your results. There is no solution to CON 2, only use this solution if it is acceptable to have pseudo random data.
Solution 2 involves creating a CRON job and re-creating the random data on a regular basis. This solution is a bit more involved, but a better solution all round I find. On a high traffic site, this solution will perform far less queries than solution 1 bringing down the entire load on the server.
This is a 3 part solution.
Step 1, make a copy of your existing guilty random code and place it in a new file by its self. This script should simply query the random data using RAND() or NEWID() and save the results to an HTML file or a PHP file (whatever your chosing) that you will include into your homepage in step 3.
Step 2, create a CRON job that runs as often as you like to regenerate your data. For help with this, search creating CRON jobs in Google. There are a lot of utilities that provide you exactly what you need. This CRON job should simply do a WGET on your the script you created in step 1.
Step 3, update your homepage and remove the guilty code and replace it with the inclusion of the file you create in step 1 (the static data that gets saved).
The only CON to this solution is, if a user refreshes your homepage before your CRON job has run to create new random records they will see the same data. There is no 100% solution for this, but the best one would be to run your CRON job every 1 minute. My philosophy is that I am content with 1 CRON job calling random records every one minute than 100s or 1000s of users calling random records every second.
I hope my explanations are clear and you have another happy client!