Simons Blog | Newest post | About / Imprint

The performance reducing database cache

Posted on: 2021-03-12

Here's another story from the same company which brought us the self deleting user table.

We wanted to add a small script to one of the sites they built. This site is visited by a reasonable amount of users and generally works quite well. Sometimes we get load spikes which are very annoying. Not so much that it crashes the server, more like pages taking a around 3 seconds instead of 1 second to load. We never really investigated those performance issues. Anyway, the script we wanted to develop doesn't integrate directly with the CMS they deployed, but might make use some information from it. My first thought was: "Maybe we can gather some useful information from the session?". So I wrote a tiny PHP file and uploaded it to the server:

<?php session_start(); header("Content-Type: text/plain"); print_r($_SESSION);

I opened the script in my browser and it just started circling the little spinner in the tab bar for a while. "Odd ...", I thought - it shouldn't take this long. Then the result finally appeared. It looked like I would have needed a one kilometer long screen to fit all the data. Checking on the server, I figured out that the session was 1.7MB in size. Typically a session should be few kilobytes max - so what the hell is going on?

Grepping through the source code I quickly found the culprit. Unfortunately without any comments, so I can only try to reconstruct the process that must have been going on in the developers mind. I'll do that using an inner monologue:

Great. I've built this nice class which maps perfectly to the records of the data table. That means instead of messing with SQL I can just run a "SELECT * FROM x" on page load and store all the data in an array by creating an instance of my marvelous class for each record. Hmm ... This looks like it's going to be time consuming to do this on every page load ... I know what! I'll put it in a cache - that will speed things up! ... But which cache? In PHP nothing is preserved across page loads ... Except ... Yes! The session! That's what's kept, I'll put it there! Oh and I use the PHP session functions instead of the CMS's database based session system to make it faster.

There are few things wrong with this however:

Just to put icing on the cake, there is a bug which reloads the data from the database on each pageload anyway, making the "caching" totally pointless. It was an attempt to periodically invalidate the cache to account for changes in the database. So here's the process the PHP interpreter has to walk through on each page load:

  1. Find the session file on disk and load it into memory
  2. Deserialize thousands of objects and load them into an array in the session
  3. Throw everything which was just loaded away
  4. Query the database and recreate the thousands of objects, store them in the session
  5. Work with the data in the session, maybe access a few objects, maybe not
  6. Serialize thousands of objects and store them in the session file on disk

ARGH!! With just a few lines of code I disabled this "caching" in our testing instance, reducing the steps to:

  1. Query the database and create the thousands of objects, store them in a global variable
  2. Work with the data in the variable, maybe access a few objects, maybe not

The proper fix would be to actually use SQL the way it was meant. But this small fix is giving us a huge performance improvement already and it wasn't causing major issues before. So for now we'll just leave it like it is, with the "caching" disabled.

Like they say: "A little knowledge is a dangerous thing". A junior programmer might naively assume that session variables just seems to magically keep existing between requests, not knowing they actually make a round-trip through the disk every time. If this code would have been properly reviewed by a senior developer this would have never made it into production.

... oh and just to finish the story - there was nothing of value stored in the PHP session which we could have used for the script we were developing. But my colleague found another way to implement it.