Building A Scalable Queueing System With PHP

Published on Feb 14, 2011 by Jamie Munro

In today's article we are going to cover building a queueing system with PHP. Before we begin, let's define what a queueing system is. The best place to start is the dictionary:

"to line up or wait in a queue"

Now that we have our definition, let's define why we would want to build a queueing system. A queueing system is an excellent tool that will allow us to take a specific process and perform the functionality "offline", e.g. the process will line up and we will process them one at a time at a later date. This will probably be easier to explain with an example.

Imagine an admin area of a website that allows the administrator to send out a mass email to all of their users. The simple process to building this functionality would be as follows:

1. Build a form that accepts a subject and a body for the email.
2. Retrieve the list of users from your database.
3. Loop through the users and send each person an individual email.

The above example works nice and fast when there are only a few hundred users. However, imagine trying to send this email to 10,000 users. The administrator would be waiting a long time for this process to finish. Not only that, if they closed the browser, it probably would not finish properly.

So, the goal of our queueing system is to remove a specific process from running "online" (in a web browser) and running it "offline" with a scheduled task.


Setup


The goal of this article is to provide a theoritical example of a queueing system and allowing that system to accommodate large scale processing. To implement our above example, I am going to assume that you have an existing web server and web site already running. We will also need a process similar to the above that you wish to perform the offline queue process.

By the end of this article, you should be able to (with minimal effort), convert your existing process into an efficiently scaled queueing process.

Other great examples of a queueing system are:

- third party tracking data
- RSS content creation
- RSS content reading
- etc...

Implementation


To begin, we need to create a database table. This table will be used to maintain our queue and allow for a first in, first out process. Depending on the queue you are building, your table might be slightly different. Sticking with our example, our table should looking something similar to the following:
queue:

- id int AUTO INCREMENT PRIMARY KEY
- to varchar(255) NOT NULL
- from varchar(255) NOT NULL
- subject varchar(255) NOT NULL
- body TEXT NOT NULL
- created DATETIME NOT NULL
- sent DATETIME NULL

Each time the admin user wishes to send an email to their users, we will insert one row into our queue table PER USER. The row will contain the subject, body, created date, as well as the to and from. Our id will be auto generated and most importantly, our sent date will be null (more on this later).

Now that our table is created, begin by updating your current process to no longer perform the email. Copy and paste this code into a separate file for later as we'll need it when we implement our queue. Replace that process with a SQL query to insert into our queue table instead (the loop through all users will still be required), e.g.:


<?php
// assuming we already have an existing database connection...
// after our form post that collects subject and body...
$result = mysql_query('SELECT email FROM users');
// loop through all users
while ($row = mysql_fetch_assoc($result)) {
// insert one row in the queue per user
mysql_query('INSERT INTO queue (
null,
\'' . $row['email'] . '\',
\'noreply@mydomain.com\',
\'' . $_POST['subject'] . '\',
\'' . $_POST['body'] . '\',
\'' . Date('Y-m-d H:i:s') . '\',
null
)');
}
?>


Our queue table will now begin to fill up when an admin user wishes to send out emails. Now we need to create our queue processing page. Let's create a new file called queue.php:


<?php
// ensure the file doesn't stop processing
set_time_limit(0);
// assuming we already have an existing database connection...
// get all of our pending emails to be sent
$result = mysql_query('SELECT * FROM queue WHERE sent IS NULL');
// loop through each one
while ($row = mysql_fetch_assoc($result)) {
// put existing process to send email here...
// after processing email, mark it as sent
mysql_query('UPDATE queue SET sent = \'' . Date('Y-m-d H:i:s') . '\' WHERE id = ' . $row['id']);
}
?>


The above code is just a basic example of queue processing. It begins by retrieving a list of outstanding queues and processes them one-by-one. After each queue is finished, we set the sent date to "now" to avoid re-processing this queue later. We'll also need to copy and paste our previous code that we removed a few steps ago to perform the actual email sending.

To finish our queueing process, we need to create a scheduled task to run this process regularly. The easiest way to run this script is using PHP in command line mode (why involve Apache when we don't need too). If we are on Windows, it would be something similar to (in a command prompt):

C:\xammp\php\php.exe C:\xammp\htdocs\queue.php.

Basically, we are invoking PHP and telling it to run our script. In the example above be sure to edit both locations to your files as necessary.

Create the scheduled task now and let's set it to run once per day (or any other interval that is appropriate for your queueing needs). For help creating a scheduled task (or cron job), do a quick Google search.

That's it, our queueing process is now completed. Each time it runs, it will retrieve a list of pending emails to send out and update the sent time after it is done preventing it from being processed again.

Scaling


Let's shift our focus to scaling our queueing system. Our current limitation is how many emails our web server can send out per minute. Imagine we need to send out 100,000 emails and our server can send out approximately 500 emails per minute. It would take roughly 3+ hours to complete this process. Now imagine 1,000,000 emails. We are now talking 30+ hours to complete! That would prevent sending daily emails because it would take more than one day to complete the entire queue.

We can solve this problem easily by scaling our web servers and adding more into the mix. This is a very typical solution, add more web servers and evenly distribute the load to each server to not overload any one server.

In our above example, adding one server should cut the time down in half, which would allow for a daily email to 1,000,000 users to occur. So we'll assume we now have two web servers that will be used to process our queue.

The next step is to ensure the two servers "don't step on each others toes", e.g. send out the same email twice. Most users wouldn't enjoy constantly getting two identical emails. The best way I've found to accomplish this is to alter our original queue table and add the following field:

- server varchar(50) NOT NULL

Now, when we create our email queue, we can evenly distribute the emails across the multiple web servers. With this new field we will need to update our process that inserts into the database as follows:


// assuming we already have an existing database connection...
// define our array of servers (replace with names of your servers)
$server = array('WEB1', 'WEB2');
// after our form post that collects subject and body...
$result = mysql_query('SELECT email FROM users');
// loop through all users
while ($row = mysql_fetch_assoc($result)) {
// pick a random number between 0 and count of servers - 1
$rand = mt_rand(0, count($server) - 1);
// insert one row in the queue per user
mysql_query('INSERT INTO queue (
null,
\'' . $row['email'] . '\',
\'noreply@mydomain.com\',
\'' . $_POST['subject'] . '\',
\'' . $_POST['body'] . '\',
\'' . Date('Y-m-d H:i:s') . '\',
null,
\'' . $server[$rand] . '\'
)');
}
?>


The above code simply picks a random server to assign the queue to. This selection is saved into the "server" column. Over a large number of users our random generated number should maintain an even balance across all servers.

Now we need to update our queue processing file as follows:


<?php
// ensure the file doesn't stop processing
set_time_limit(0);
// assuming we already have an existing database connection...
// get all of our pending emails to be sent for THIS hostname
$result = mysql_query('SELECT * FROM queue WHERE sent IS NULL AND server = \'' . `hostname` . '\'');
// loop through each one
while ($row = mysql_fetch_assoc($result)) {
// put existing process to send email here...
// after processing email, mark it as sent
mysql_query('UPDATE queue SET sent = \'' . Date('Y-m-d H:i:s') . '\' WHERE id = ' . $row['id']);
}
?>


The above change is a little more subtle, can you spot it? We've updated the process to get our pending emails to only get ones assigned to our current server. `hostname` returns us the name of our web server specified in our Apache configuration. Once this change is done, create the scheduled task on the new web server(s) to run at the same time and interval as the previous scheduled task.

As your user base grows and more processing is required, we now just need to add more web servers to the loop and our code will take care of the rest!

Summary


In today's article we discussed how to convert an "online" process that is slowing our web site down and convert it into an "offline" queueing system. Not only did we do that, we implemented an extremely simple solution to grow our queueing system to accommodate a large scale queue.

Enjoy!

Resources


http://en.wikipedia.org/wiki/Load_balancing_(computing)
http://en.wikipedia.org/wiki/Cron
http://support.microsoft.com/kb/308569

Tags: Optimization | scaling | queue | optimizing | Theory | PHP

Related Posts

blog comments powered by Disqus