Selecting random record from MySQL database table.

Prosoxi.com Webmaster May 15, 2012 3 Comments

The simplest way of selecting random rows from the MySQL database is to use "ORDER BY RAND()" clause in the query.

Solution 1 [SQL]: SELECT * FROM `table` ORDER BY RAND() LIMIT 0,1;

The problem with this method is that it is very slow. The reason for it being so slow is that MySQL creates a temporary table with all the result rows and assigns each one of them a random sorting index. The results are then sorted and returned.

There are several workarounds to speed things up.

The basic idea is to get a random number and then select a specific row using this number.

In the case that all the rows have unique ids we will just have to pick a random number between the smallest and the biggest id and then select the row with id that equals that number. To make this method work when ids are not evenly distributed we will have to use ">=" operator instead of "=" in the last query.

To get the minimum and maximum id values in the entire table we will use MAX() and MIN() aggregate functions. These functions will return minimum and maximum value in the specified group. The group in our case is all the values of `id` column in our table.

Solution 2 [PHP]: $range_result = mysql_query( " SELECT MAX(`id`) AS max_id , MIN(`id`) AS min_id FROM `table` "); $range_row = mysql_fetch_object( $range_result ); $random = mt_rand( $range_row->min_id , $range_row->max_id ); $result = mysql_query( " SELECT * FROM `table` WHERE `id` >= $random LIMIT 0,1 ");

As we mentioned this method is limited to tables with unique id for each row. What to do if it’s not the case?

The solution is to use the MySQL LIMIT clause. LIMIT accepts two arguments. The first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1).

To calculate the offset to the first row we will generate a random number between 0 and 1 using MySQL’s RAND() function. Then we will multiply this number by number of records in the table, which we will get using COUNT() function. Since LIMIT arguments must be integers and not float values we will round the resulting number using FLOOR() function. FLOOR() is an arithmetic function that calculates the largest integer value that is smaller than or equal to the expression. The resulting code will look like this:

Solution 3 [PHP]: $offset_result = mysql_query( " SELECT FLOOR(RAND() * COUNT(*)) AS `offset` FROM `table` "); $offset_row = mysql_fetch_object( $offset_result ); $offset = $offset_row->offset; $result = mysql_query( " SELECT * FROM `table` LIMIT $offset, 1 " );

In MySQL 4.1 and later we can combine two previous methods using subquery like so:

Solution 4 [SQL]: SELECT * FROM `table` WHERE id >= (SELECT FLOOR( MAX(id) * RAND()) FROM `table` ) ORDER BY id LIMIT 1;

This solution has the same weakness as the solution 2 e.g. it only works for tables with unique ids.

Remember the reason we started looked for alternative ways of selecting random rows? Speed! So how do these methods compare in terms of execution times. I am not going to go into specifics of hardware and software configuration or give precise numbers. The approximate results are:

The slowest method is solution 1. Let’s say that it took 100% of time to execute.
Solution 2 took 79%.
Solution 3 – 13%.
Solution 4 – 16%.

The winner is solution 3.

http://akinas.com/pages/en/blog/mysql_random_row/

Join the Conversation

3 Comments

nikmauro says:

May 15, 2012 at 08:49

http://www.greggdev.com/web/articles.php?id=6

Log in to Reply
nikmauro says:

May 15, 2012 at 08:49

As of this writing, there is no automatic way for MySQL to select a random row from a database table. Extracting a random row from a table can be useful for many reasons. It can pull up random products, random advertisments, or any other random thing that you have stored in your database table.

For most purposes on smaller database tables, the following will work fine:

$random_row = mysql_fetch_row(mysql_query(“select * from YOUR_TABLE order by rand() limit 1″));

$random_row will be an array containing the data extracted from the random row. However, when the table is large (over about 10,000 rows) this method of selecting a random row becomes increasingly slow with the size of the table and can create a great load on the server. I tested this on a table I was working that contained 2,394,968 rows. It took 717 seconds (12 minutes!) to return a random row.

I wrote the script below as a workaround for a random row with a large database table. This will return a random row in about 0.05 seconds, regardless of the size of the table. If the max_id of the table is not dynamically changing, the function can be rewritten to only execute one database query instead of two.

I am using this script for a new search engine spidering project I am working on. With this, it is useful to have an tinyint(1) column in the table that is updated as the row is selected so that it will not be chosen again. Since the function uses greater than and less than, it would not return an empty result unless all rows had been updated at which point the script could be ended. For such a script this also allows many updaters to be running at once selecting random rows that will not be selected again because the tinyint will be changed upon selection.

= ” . $random_number . ”

ORDER BY ” . $column . ” ASC

LIMIT 1″;

$random_row = mysql_fetch_row(mysql_query($random_sql));

if (!is_array($random_row)) {

$random_sql = “SELECT * FROM ” . $table . ”

WHERE ” . $column . ” < " . $random_number . "

ORDER BY " . $column . " DESC

LIMIT 1";

$random_row = mysql_fetch_row(mysql_query($random_sql));

}

return $random_row;

}

//USAGE

echo '’;

print_r(random_row(‘YOUR_TABLE’, ‘YOUR_COLUMN’));

echo ”;

?>

Log in to Reply
nikmauro says:

May 15, 2012 at 08:49

Table used
I tested this on INNODB and MyISAM versions of the same table with 170429 records. The INNODB table uses 172 MB and the MyISAM 105 MB. This is real data and not some random database table I created for the purposes of this post, although it’s not the sort of table you’d normally pull random records out from.

Using LIMIT to fetch the last record
The best way to see how slow it can potentially be is to use LIMIT to select the very last record. By doing this we’ll get the maximum time it will take to run this query if the last record is the one selected at random.

The first query gets the count from the table:

SELECT COUNT(*) FROM mytable;
This returned 170432. Now to select the last record (note the offset is 1 less than the total number of records):

SELECT * FROM mytable LIMIT 170431, 1;
I ran this a number of times and took around 1.8 seconds for INNODB and 0.5 seconds for MyISAM on the machine I ran it on. This is far too slow if it’s something that might be called frequently in a web application, although it might be a little better on some beefier hardware.

Comparing LIMIT with ORDER BY RAND()
The next thing to do was to benchmark the result from using LIMIT with ORDER BY RAND() to see if had been correct in my previous post that using LIMIT really is faster. So:

SELECT * FROM mytable ORDER BY RAND() LIMIT 1
I ran this query a number of times as well, and it took around 33 seconds for INNODB and 30 seconds for MyISAM each time. So clearly using LIMIT is much faster although it’s still not a suitable solution if random data needs to be selected frequently in an on demand application.

Another alternative
I have another alternative which is almost instant, even on large tables but it does require an adjustment to the table and some additional business logic for your application. However it does solve the issue with speed of random records on a large table if you need random data frequently. This will be posted this time next week.

Log in to Reply

Cancel reply

You must be logged in to post a comment.

Join the Conversation

Leave a comment

Cancel reply