Tuesday, December 30, 2008

Drupal core caching and content caching modules

Drupal core caching

Caching stores "elements" in a cache table in the database, so the data can be retrieved by a single query, rather that constructing the page from individual elements.

Drupal's core cache has two parts, stuff that gets caches no matter what, and stuff that is optional via an administrator defined settings.

The always on cache is for the menu (the hierarchy of callbacks/urls, with access information), the variable table (all the various settings and configuration), and filters (all processed texts).

The optional cache is the page cache, and it only applies for anonymous users.

For Drupal 4.7 and earlier, the cache table was a single table storing all of the above. As of Drupal 5.x, the cache table is split into cache_menu, cache_filter, cache_variable and cache_page to decrease contention. Also new in Drupal 5.x is aggressive caching mode.
Cache contention

On a large site, if you are using MyISAM, contention occurs in the database tables when the cache is forced to clear after a node or a comment is added. With tens of thousands of filter text snippets needing to be deleted, the table will be locked for a long period, and any accesses to it will be queued pending the purge of the data in it. The same is true for the page cache as well.

This often causes a "site hang" for a minute or two. During that time new requests keep piling up, and if you do not have the MaxClients parameter in Apache setup correctly, the system can go into thrashing because of excessive swapping.

Example of such cases exist here, and here.

If you change your database tables to InnoDB, you can avoid the table level locking, since it supports row level locking. However, you have to be careful of some InnoDB pitfalls that can cause other queries on the site to be slow, as well as eliminate the table locks as described in that same article.
Eliminating cache contention

To get rid of this contention, you have to first disable the page cache, which requires no code change. This is only possible if you have the CPU horsepower to generate the pages and filters for every page view. This means that you are on a dedicated server (which you need anyway if you have a large site), and that you have enabled one of the PHP op-code caches/accelerators.

Do disable the filter cache, you have to manually edit code in Drupal 4.7 (and remember to change it when you get a new release).

In filter.module, find the function check_markup(), then delete or comment out the following lines:

if ($cached = cache_get($id, 'cache_filter')) {
return $cached->data;
}

and

if ($cache) {
cache_set($id, 'cache_filter', $text, time() + (60 * 60 * 24));
}

With these lines removed/commented, you no longer have to worry about contention for the filter cache.

In Drupal 5.x, there is a nifty feature that allows you to create your own caching strategy, by replacing includes/cache.inc with a file you define.

Copy the includes/cache.inc file to sites/modules/cache_no_filter/cache_no_filter.inc, and then modify the file you just created like so:

In the cache_get() function, put the following at the start of the function:

if ($table == 'cache_filter') {
return 0;
}

And this in cache_set(), put this at the start of the function:

if ($table == 'cache_filter') {
return;
}

Then, in your settings.php file, you do the following:

$conf = array(
'cache_inc' => './sites/all/modules/cache_no_filter/cache_no_filter.inc',
);

A pre-patched version for Drupal 5.2 can be downloaded below towards the end of this article. Just rename the file to remove the .txt extension.

The beauty of this is that you do not modify Drupal core, yet hook into your custom cache.
Contributed caching modules

There are quite a few contributed modules that help with caching.

For example, there is a block cache module avoids the overhead of generating blocks for every page load.

Taking it a step further, there is an API module by the name pressflow preempt that allows other modules to cache any function.

Avoiding the database altogether is the ultimate in caching: if the pages are stored in HTML static files, they can be served faster.

Taking this approach, there is the fastpath fscache module, as well as the boost module.

No comments: