Making WordPress Cloud Friendly

WordPress has been around for a long time and it has contributed immeasurably to the life of the Internet. It has made having a good-looking and functional presence on the web easy for millions of people. It currently powers 34% of all the sites on the web, which translates to about 75 million sites.

That’s an incredible (though not unparalleled) achievement for an open-source project.

WordPress was designed back in 2003, before the Cloud emerged and changed how we think about building and running software on the Internet. As a result, its design has its roots in traditional web hosting models — a single stateful web server that can only be scaled vertically — that don’t fit comfortably into the cloud paradigm.

This now presents a challenge — how do we run WordPress in the cloud, to take advantage of cloud paradigms while not losing the ease of use and features that made WordPress so popular in the first place?

What’s the problem?

When we move web workloads to the cloud, we want to make use of horizontal scaling. This means dynamically adding servers to a pool of servers as demand increases, and removing them again as demand decreases. It’s not only a more cost-effective way to meet demand, it’s also highly scalable, limited only by our ability to pay for more server instances.

When you work with servers in this way you need to be able to automatically create a server instance, use it for as long as the demand exists, then destroy it when it is no longer needed. We call servers created and destroyed this way “ephemeral infrastructure”.

WordPress breaks this model because it allows users to change the files on an individual server instance via the web interface.

This can happen in a couple of scenarios:

When the user uploads media (images etc)
When the user updates WordPress (core, plugins, templates etc)

When your servers are part of a pool of ephemeral infrastructure, whichever server instance you happen to be accessing when you do either of these things is the one whose file system gets updated. Any other instances that are sitting in your server pool are untouched and therefore don’t get the updates or uploads.

Now you have a problem because if the instance you just updated, or uploaded a file to, is taken out of your pool, your update/upload will be lost. New servers that get added won’t have these changes, and depending on which instance your visitors get directed to they will see something different.

Design goals

In solving this problem, I have a couple of design goals.

To preserve the ease of use of WordPress, I don’t want to change the standard way users update it, or upload content. WordPress admins don’t want to do DevOps and we shouldn’t add complexity that just makes work for people.

To make WordPress behave like a cloud app, I want to ensure that its servers are ephemeral, that every new instance that gets created will serve the same files as every other instance in the pool, and that any instance that’s terminated won’t cause data loss. This should be done without compromising performance.

Breaking down the problem

We can break this problem down into two functional parts — “user uploads media” (images etc) and “user updates WordPress” (and its plugins, themes etc).

The problem with uploading media can be dealt with easily — we can simply install a plugin that offloads the storage of uploaded content (images and other media) to S3. This is a good solution, easy to implement and there are plenty of options for doing it. It removes the uploads from the file system altogether. Job done.

The second problem is harder to deal with. The code that drives WordPress has to be in the file system, and updating it via the admin UI changes the code on an individual server. We have to find a solution to ensure that all instances will share the same code.

Oblique Strategies

Many ways to solve this problem have been devised. Most carry with them downsides that I feel significantly reduce the value WordPress offers to its users. Let’s look at some options.

Generate static pages

This option uses WordPress as a page generator. You have the WordPress instance set up as if it were your production site, but use tools to publish your site as a set of pages and media files to a static hosting environment, like S3.

The upside is that your static site will be very robust and cheap to run, and it won’t be susceptible to php-based exploits. The downside is that you will have increased complexity — more systems, a formal publishing process, additional tools to install and manage etc . You will also be diverging from the standard way that WordPress works, which will require training for your WP admins.

Going static might work for you, but it requires too many extra steps to satisfy my “ease of use” design goal.

Filesystem sync

The idea here is that whenever an individual instance makes updates to its files, those changes get synced to all of the other server instances.

Various solutions for this exist (eg lsync, unison) but they struggle when you are using ephemeral infrastructure because they all rely on some sort of configuration that tells them about the systems they will sync from or to.

Another option is to sync with a shared “master” filesystem, eg S3, but this presents the problem of two-way sync — when does an updated instance get to overwrite the files on S3? How is the overwrite triggered? And how do the other instances find out about updates they need to fetch?

Filesystem sync is too complicated, which fails my “behaving like a cloud app” design goal.

Use a Network File System

Many solutions on AWS (including their reference architecture for HA WordPress) suggest using EFS to provide a common file system across multiple instances, however they mostly only use it to store the contents of the wp-content directory (where uploads, themes and plugins are kept). Unfortunately this doesn’t solve the problem of how to deal with updates to WordPress itself, without invoking an additional non-standard process.

The upside of using a network file system is that it’s simple and allows WordPress to work as expected. The downside is that there are a number of areas where it can go wrong in subtle ways.

One area of concern is that latency when accessing the files will be significantly higher when using EFS than when using local disk storage. This can negatively impact the server’s responsiveness when running the PHP code of which WordPress is made.

Another issue is that EFS can be expensive and behave in ways that are not always anticipated, like unexpectedly slowing to a crawl.

If we can address those downsides, though, I think using EFS can allow me to meet both of my design goals.

WordPress on AWS with EFS

There are two issues we’re going to try to solve: Performance & Reliability.

Performance

To tackle the performance question we are going to use caching. There are five types of caching we will employ, at the OS, PHP interpreter, application, network and browser levels. All have the goal of removing load from the application server and minimising the use of the EFS filesystem.

We are going to treat the EFS as a master data source from which our instances will dynamically get what they need, and design those instances to be as self-sufficient as possible from that point onwards.

To achieve this at the OS level we will make use of FS-Cache (fscached). This daemon runs on each instance and caches files from the EFS to the local file system. This means that when the instance OS asks for a file from the EFS it can instead be pulled from fscached.

Since version 7, PHP has had its own code caching system — Opcache — built in and running by default. Opcache works by caching the compiled bytecode for each PHP script in memory so the PHP interpreter will not have to execute the original code every time. This will also help our solution to reduce the need to access the EFS — all we’ll do here is configure it to ensure that it is able to store all of the code in our WordPress site.

At the application level, in WordPress, we will install the WP Super Cache plugin. This plugin allows WordPress to cache rendered pages to a directory on the local file system for re-use. This has the additional advantage of speeding up access to those pages as the page does not need to be re-rendered every time it is accessed. The WP Super Cache plugin is maintained by the WordPress project, so it’s a safe choice to use on your site.

Next, we will use a CDN, in this case CloudFront, to cache our WordPress content in Amazon’s edge nodes. This not only brings the content closer to our users, it also offloads processing of the requests from our instances.

Finally, we will ensure that the correct cache headers are set so that we make best use of browser caching.

Combining all of these strategies, we should be able to significantly reduce the load not only on the individual instances but also on the EFS, which saves us money and makes our solution faster and more robust.

These strategies should also help tackle the reliability issue, since most issues with using EFS for web workloads are due to usage patterns that use up the burst credit balance.

EFS reliability — check your burst credits

Burst credits are credits your infrastructure accrues over time. They build a headroom of capacity your applications can “burst into” during unpredictable spikes in demand.

In the case of EFS, burst credits provide additional I/O throughput above the baseline level. The more you use your EFS above the baseline level the more you eat into your burst credit balance. This is why caching the EFS locally is key.

Because web workloads tend to result in spiky I/O you can be in for a nasty shock if your EFS burst credit balance hits zero. It may appear as if your site is broken, because the web server is unable to access the files fast enough to respond to requests.

To avoid surprises, you should set up a CloudWatch alarm for the EFS Burst Credit metric to ensure you’re warned if it ever drops unexpectedly, or below a level that can be recovered within the SLA period.

To guarantee that burst credits won’t ever bring your site to a crawl, look at either ballasting the EFS with large files (so it generates a higher burst credit balance) or switching it to provisioned mode, where you can specify (and pay for) a higher baseline level of throughput.

Putting it all together

There is a Github repo here that contains instructions and code you can use to set up your own high-availability WordPress instance on AWS, using Elastic Beanstalk and EFS.

This solution uses Elastic Beanstalk and ebextensions to manage the creation of the instances as needed. During creation, each instance has fscached installed and configured. The shared EFS volume is mounted and cache directories for the caching plugin are created on the local disk. Finally a check is made to see if WordPress is already installed, and if it isn’t the latest version will be installed, ready to set up.

This is what you should end up with:

How can we help you?