Lab notes

Lab notes, MySQLJune 6, 2005 6:58 pm

A few more days of playing with the fulltext search engine landed me in performance dead-end.

The test case consists of three tables. The primary table holds most of the information and decides on the tag of each record. It’s the biggest table with the largest number of records. The secondary tables hold aggregates or grouped information, and that’s where I was hoping to put the tags and search with a fulltext index. The record length is bigger, but the number of records is much smaller.

The first problem came up early in the performance tests. There’s not a lot of information on how MySQL manages fulltext queries internally, so I could only assume it’s equivalent to anything I’d write, only leaner and meaner. Maybe some overhead, but I have less code to write and maintain, so some overhead is an acceptable tradeoff. My rule is generally, if it’s up to 20% slower, but half the complexity, than I’d rather buy a bigger box than write more code.

(For the record I’m talking about complexity, not lines of code. Those are two different metrics. Adding trivial code to squeeze out more performance is a no-brainer, but increasing complexity is a bad tradeoff. You get to live with that complexity, instead of taking a free ride on the Moore train)

The problem I encountered had to do with queries that looked into the fulltext index, but also narrowed down records by other indexes, e.g. the user identifier or a date range. Mixing indexes led to extremely bad performance, my only guess is that MySQL uses one key to build a temporary table and then uses the other key to narrow down the search. Everything fits in memory, there’s no disk I/O, but performance is still too slow to be reasonable.

But even without that complexity, I couldn’t get fulltext queries to perform fast enough. It turns out that the primary table has 340x more records than the secondary one. I planned around a 1:250 difference. Given that ratio, the fulltext search ran substantially faster than any queries I could do on the primary table. At that step in the testing I was optimistic about using fulltext search.

But then I increased the secondary table size to about a third the record count of the primary table, and ended up with queries that on average were three times slower. That’s not only bad, it messes with my capacity planning. I don’t have an upper limit in mind for the primary table, but I can guestimate how well it will performance at different orders of magnitude. With fulltext, I seem to have an upper limit on the secondary table, after which I need a different strategy.

Not what I had in mind.

Lab notes, Tags, MySQLMay 30, 2005 7:11 am

I’ve been looking at different schemas that can support tagging, which I realized that full text search is the perfect application for tagging.

A scheme where each tag (applied to an object by a user) occupies a row in the database is simple and efficient in terms of storage size and retrieval. With the right set of indexes (at least three are required), queries are extremely efficient. But the code for managing the tags is either complex or restrictive. To begin with, tagging each object requires multiple inserts, and changing the object’s tag list is a combination of inserts and deletes. Renaming a tag is just as tricky, thought MySQL’s replace statement keeps it under check.

Querying on a single tag is easy, querying on two tags slightly more complicated, but each extension increases the level of complexity, that even reasonable search capabilities become unreasonably complicated to implement. Users who need or searches, or searches that include one tag but exclude others, are out of luck.

Using fulltext search simplifies everything, and brings up new and interesting options. The scheme holds all tags applied to an object by a user in a single field, using a fulltext index on that field. Tagging an object requires a single insert, changing the object’s tags requires a single update. Renaming a tag is an update with a bit of string manipulation. But queries is where fulltext search takes us to a whole new level.

With the in boolean mode option you can query for interesting combination of tags. For example, you can query all records that have the tags foo and bar ("+foo +bar"), or all records that have the tags foo or bar ("foo bar"). You can query all records that have the tags foo or bar, but excluding any records that have the tag bar ("(foo bar) -baz"). So we have mandatory, optional and negative tags, as well as and an or relations.

The default configuration for MySQL will not index any word shorter than 4 letters, and will not include any word that appears in the stopword list (e.g. common words like “with” and “after”). Unfortunately, these two restrictions are bad for tags. A meaningful tag may fall in the stopword list, and many meaningful tags are shorted than 4 letters (e.g. "osx"). It’s possible to change the minimum word length and the stopword list, but doing so will mess with other applications of fulltext search.

There is a simple solution around this problem, which relies on the use of underscores to prefix tags. A tag prefixed with an underscore is not recognized by the stopword list, and several underscores can be used to get around the minimum word length limitation. If this sounds like a complication, consider that the scheme requires some text manipulation on all tags before they are stored. In fact, the scheme I use stores the tag lists in two separate fields. The first field holds the tags as entered by the user, e.g. “SF” and “San Francisco”. The second field holds the tags in a searchable format, and is used for fulltext searches, so it would hold "__sf" and "_sanfrancisco".

You may want to read Philipp Keller’s posts on various schemes used for tagging data, and the MySQL fulltext search documentation.

Lab notesMay 3, 2005 10:27 pm

To get started, I needed an environment with Apache, PHP and MySQL running on my development machine (a Windows XP notebook).

Installing MySQL

Installing MySQL 4.1 was just as easy. Once again I opted to run it from the command line, to be more precise, I didn’t know any better. The installer does come with a nice GUI configuration tool that sets up the MySQL file system and takes care of all the different options. I ran MySQL from the command line (mysqld) and used mysqladmin to create a database and mysql to create and populate a table.

I borrowed a test script from http://www.php.net and changed it to run a query on my database. This is when I learned that PHP 5 does not come pre-installed with MySQL support. I had to enable MySQL support in the php.ini configuration file (located in the windows directory), and copy the two MySQL support DLLs to the system32 directory. Restart Apache and everything is working as expected.

MySQL System Tray Monitor

Looking for GUI tools, I found out MySQL Administrator and MySQL Query Browser. After installing the MySQL Administrator, I discoverd that MySQL also includes a system tray monitor with a sleek GUI for setting up and controll services. And it also integrates with MySQL Administrator and MySQL Query Browser.

MySQL Administrator

Since I already had the database setup, I didn’t get to use MySQL Administrator and MySQL Query Browser much, but I was impressed with the quality of the UI, and the breadth of features offered by the UI and the underlying MySQL engine. It’s definitely on par with the expensive you-know-who database servers out there.

Lab notes 10:25 pm

To get started, I needed an environment with Apache, PHP and MySQL running on my development machine (a Windows XP notebook).

Installing PHP

PHP 5 also has several download options. I started with the installer package which was a breeze to install. The documentation states that it doesn’t include installation for Apache, which I assumed I could fix by using ApacheConf. It turns out that the installer does not include any of the libraries required to use Apache, nor a lot of other extras. So I found myself downloading the larger Zip package and extracting it on top of the installation directory.

I configured Apache to load the PHP module, use it for .php documents and support index.php as the directory index. I then setup a directory for the project and created a test page that calls phpinfo(). It worked on the first try.

PHP script

Lab notes 10:00 pm

To get started, I needed an environment with Apache, PHP and MySQL running on my development machine (a Windows XP notebook).

Installing Apache

I downloaded the latest version of Apache 2. I previously used Apache on Linux boxes, so I’m all too familiar with running it from the command line, and since this is my daily machine, I opted to not have it running as a Windows service all the time. Turns out I was wrong.

It’s easier when you don’t have many command line windows open, and Apache comes with a nifty tray utility that allows me to monitor and start/stop the service. I figured that out while trying to use ApachConf, which can only restart Apache (after making configuration updates), if Apache is installed as a service. All too often, you need to reinstall the product to change from command line to Windows service, but breezing through the documentation I learned that with Apache it only took one step:
apache -k install

Configuring Apache 2 is generally not complicated, but I did want to test some GUIs, so I also installed ApacheConf. Sadly, I couldn’t get the PHP wizard to work. I think I figured out why, but at that point it seemed like manually setting the two configuration options was my best shot. ApacheConf is a ‘better http.conf editor’, definitely useful if I was running managing machines or running multiple virtual servers.

ApachConf screen shot