[rs_block class=”bg”]
Working with cutting-edge tech at RapidSpike our dev team often faces tricky, involved technical problems.
Our solutions to these issues might be of use to other developers – so we’ve asked the team to share their work on our blog. If you’re a developer and this post has helped you – let us know!
[/rs_block]
Recently we started investigating asynchronous and multithreaded processing with PHP, which is a touchy subject in the PHP community. There’s a number of ways to achieve this ability, most of which are considered ‘hacky’. In this post I’ll briefly explain three commonly used methods.
Multithreading: Queues with Web Server Software
This method allows you to make use of multi-processor servers so as to distribute processor-intensive tasks efficiently.
- A data ‘message’ is packaged up – this is all the data required to perform a desired task, usually encoded into a JSON string
- The message is sent to a queuing mechanism (BeanstalkMQ, RabbitMQ, AWS SQS, Memcache, Redis etc)
- A daemon process consumes the queued messages – at a limited rate – picking up tasks and using a HTTP POST to forward the message to a script hosted on a web server
- The web server software (Apache, Nginx, Tomcat etc) then handles its child processes just as it would when users hit a website it’s hosting
This methodology is what AWS use in their managed Elastic Beanstalk (EB) ‘worker’ stack. You write a script to send tasks to an AWS SQS queue, an EB worker – with their consumer daemon running – then picks up the tasks and POSTs them back to a user uploaded script hosted on an internal web server. The web server software – Apache – then handles managing the subsequent process.
Advantages are; it’s relatively easy to implement (despite being a seemingly complicated setup), its ‘accessible’ as any developer who understands a generic web stack architecture can quickly understand how this process works. Also, it’s scalable – your queues are getting larger or taking longer to process? No problem, add another queue consuming worker.
The main disadvantage is that not all systems lend themselves to this architecture. If your processing requirements are on-demand, they could result in idle server time which costs money. This method is better suited if you require near constant batch processing.
Multithreading: Process Forking
Processing forking involves a parent process creating – or ‘forking’ – child processes off itself to perform tasks. The child processes are copies of the parent process from the line where the fork occurred, so all instantiated objects and variables are copied over. This article explains PHP process forking really well, and is worth a read.
In practice, we’ve found this method to be difficult to handle and manage. Child processes can be tricky to keep track of and their runtime needs to be monitored because if they crash or don’t exit correctly, they can clog up valuable concurrency space. All things considered, however, this method can work well. It requires a fair bit of scripting to manage properly and can also become troublesome quickly if error handling isn’t done properly.
Asynchronous Processing
If you have a group of tasks to perform that are independent of each other then this method allows you to process multiple of them at the same time, without blocking each other. The flow is as so:
- A parent process loops over a set of tasks
- For each task, a new child ‘thread’ is created which will perform the task, but the parent process is not required to wait for
- Once the thread has completed, the parent process is informed and any concurrency limit can be reassessed
On paper, this method is really easy to implement and a lot of online articles claim that with PHP 7 makes this much more accessible. A PHP module called pThreads is used to access this functionality, however, you have to compile PHP with Zend Thread Safety (ZTS) enabled. To do this, you must install an experimental an unsupported version of PHP 7 – not ideal in a multi-use production environment. Many other PHP modules and frameworks might not be compatible with ZTS enabled PHP, therefore to make use of this functionality you would potentially need to build a dedicated VM to handle the required workload.
In a follow-up post I will explain how to install ZTS enabled PHP and the pThreads module.