mal/2019-02-18-The_things_we_do_to_read__emoji

The things we do to read 💩 emoji

Malcolm Blaney on 18 February 2019, 3:35am

Ok I totally stole the title from this article, which was really good and help me fix my problems so I wanted to link to it.

I recently upgraded all my servers from Jessie to Stretch, which was long overdue. The catalyst being that certbot from Let's Encrypt started complaining about the security I was using to fetch new certificates. (It complained in a very nice way and helped fix the problem, I should add.)

Anyway the upgrades went fine, Debian is very good at that. The problem was that part of the upgrade was switching from MySQL 5 to MariaDB, and all my tables were using an old character set and collation type. This showed up as emoji not rendering properly in my reader. Easy fix: switch the reader items table to the utf8mb4 character set and collation type to utf8mb4_unicode_ci. Problem solved.

This is where the fun really started though, and where my desire to understand the inner workings of databases started fading... My reader became so slow at loading items that it was unusable. The rest of the site worked fine, so I isolated the problem down to one database query. Due to the way I now have channels set up, this was a particularly complicated query running on the largest table in my database, reader_items which was 400k rows and growing. I delved deeper into the murky world of database performance, learning about sargable queries and how to keep your indexes fast.

Making sure all the tables joined in the same query have a matching collation type seemed to do the trick. My query which had blown out to over a minute was now running in a couple of seconds. A good reminder to look after your indexes! Since I was well and truly into database tuning now though, I decided I could get more performance improvements out of my reader.

The problem with adding new feed items to the same table is that you generally only want to read the new stuff. It's great to have fast indexes, but most of the time I really just need smaller tables. To do that my reader needed to be able to look up items in an arbitrary number of tables, but optimised to find new items in the first table it reads from. It can now do that, and the data partitioning process is automated to keep the items tables small. It can also handle reading across tables to return the correct number of items requested.

The problematic query now returns in less than a second for the optimised case. (I would measure it but it's not a noticeable part of using the reader any more.) My plan to fix the performance issues before making this change was to increase the specs of the server it's running on, which are pretty modest for all the work it's doing. I do however like the idea of improving the efficiency of code rather than throwing more hardware at a poorly running solution.

indieweb