I'm looking for a way to manage asynchronous, potentially long-running PHP jobs (ex. feed import and generation). One could start with a simple cron-based system, but this has a few drawbacks :
* Too much 'downtime' if you set the sample interval too long
* Overlapping jobs if jobs don't finish
* No ability to handle events or dependency issues
I've previously used AppWorx (appworx.com) in a large ETL operation. AppWorx is quite sophisticated with a distributed slave architecture, chains, forks and joins. But AppWorx is probably too complex for my needs.
In my simplest case I would need a queue manager so I could keep the queue full without overlapping jobs. I also have modest needs for distributed processing.
For my purposes, I thought about using Hudson. Although Hudson is a job manager, it's really meant to manage build jobs. Not the best for the types of jobs I'm looking to process.
Next I looked at Zend Server. It comes from PHP-based company Zend, so the PHP integration should be good. Here's an article regarding queues in Zend Server :
* http://www.eschrade.com/page/queue-introduction-zend-server-queue-4b8eef5c
Unfortunately, there's no queue facility in Zend's free Community Edition :
* http://www.zend.com/en/products/server/editions
Then I found Gearman. Gearman is a free job manager that seems to be well suited for PHP-based jobs :
* http://gearman.org/index.php?id=documentation
Gearman is used at large sites (Yahoo and Digg). Here are articles on Gearman for feed processing, and Gearman for PHP applications :
* http://gearman.org/index.php?id=php_-_feed_fetching_parsing
* http://www.slideshare.net/felixdv/high-gear-php-with-gearman
A more general overview of job schedulers can be found here :
There is 1 Comment
Gearman piece by Rasmus Lerdorf
Here's a nice PHP-oriented introduction to Gearman by PHP creator Rasmus Lerdorf :
http://toys.lerdorf.com/archives/51-Playing-with-Gearman.html