Lab notes #5 Fulltext? not so fast
A few more days of playing with the fulltext search engine landed me in performance dead-end.
The test case consists of three tables. The primary table holds most of the information and decides on the tag of each record. It’s the biggest table with the largest number of records. The secondary tables hold aggregates or grouped information, and that’s where I was hoping to put the tags and search with a fulltext index. The record length is bigger, but the number of records is much smaller.
The first problem came up early in the performance tests. There’s not a lot of information on how MySQL manages fulltext queries internally, so I could only assume it’s equivalent to anything I’d write, only leaner and meaner. Maybe some overhead, but I have less code to write and maintain, so some overhead is an acceptable tradeoff. My rule is generally, if it’s up to 20% slower, but half the complexity, than I’d rather buy a bigger box than write more code.
(For the record I’m talking about complexity, not lines of code. Those are two different metrics. Adding trivial code to squeeze out more performance is a no-brainer, but increasing complexity is a bad tradeoff. You get to live with that complexity, instead of taking a free ride on the Moore train)
The problem I encountered had to do with queries that looked into the fulltext index, but also narrowed down records by other indexes, e.g. the user identifier or a date range. Mixing indexes led to extremely bad performance, my only guess is that MySQL uses one key to build a temporary table and then uses the other key to narrow down the search. Everything fits in memory, there’s no disk I/O, but performance is still too slow to be reasonable.
But even without that complexity, I couldn’t get fulltext queries to perform fast enough. It turns out that the primary table has 340x more records than the secondary one. I planned around a 1:250 difference. Given that ratio, the fulltext search ran substantially faster than any queries I could do on the primary table. At that step in the testing I was optimistic about using fulltext search.
But then I increased the secondary table size to about a third the record count of the primary table, and ended up with queries that on average were three times slower. That’s not only bad, it messes with my capacity planning. I don’t have an upper limit in mind for the primary table, but I can guestimate how well it will performance at different orders of magnitude. With fulltext, I seem to have an upper limit on the secondary table, after which I need a different strategy.
Not what I had in mind.




