Lab notes

TagsJune 29, 2005 4:21 pm

The Google Maps API lets developers embed Google Maps in their own web pages with JavaScript. You can add overlays to the map (including markers and polylines) and display shadowed “info windows” just like Google Maps.

Link

Tags, MySQLJune 21, 2005 6:37 pm

Philipp Keller tackles tags and MySQL again, this time running a benchmark against four different schemas, one of which uses fulltext search. My conclusion was slightly different, though: fulltext search is quick (and simple) on the insert, and mysterious on the select. I just couldn’t make it scale consistently, but I’m willing to give it another try.

Bonus point, he makes the benchmark source code available under the LGPL, so you can see exactly what he’s testing and how he’s testing it.

Anyway, go here to get all the deets.

Lab notes, Tags, MySQLMay 30, 2005 7:11 am

I’ve been looking at different schemas that can support tagging, which I realized that full text search is the perfect application for tagging.

A scheme where each tag (applied to an object by a user) occupies a row in the database is simple and efficient in terms of storage size and retrieval. With the right set of indexes (at least three are required), queries are extremely efficient. But the code for managing the tags is either complex or restrictive. To begin with, tagging each object requires multiple inserts, and changing the object’s tag list is a combination of inserts and deletes. Renaming a tag is just as tricky, thought MySQL’s replace statement keeps it under check.

Querying on a single tag is easy, querying on two tags slightly more complicated, but each extension increases the level of complexity, that even reasonable search capabilities become unreasonably complicated to implement. Users who need or searches, or searches that include one tag but exclude others, are out of luck.

Using fulltext search simplifies everything, and brings up new and interesting options. The scheme holds all tags applied to an object by a user in a single field, using a fulltext index on that field. Tagging an object requires a single insert, changing the object’s tags requires a single update. Renaming a tag is an update with a bit of string manipulation. But queries is where fulltext search takes us to a whole new level.

With the in boolean mode option you can query for interesting combination of tags. For example, you can query all records that have the tags foo and bar ("+foo +bar"), or all records that have the tags foo or bar ("foo bar"). You can query all records that have the tags foo or bar, but excluding any records that have the tag bar ("(foo bar) -baz"). So we have mandatory, optional and negative tags, as well as and an or relations.

The default configuration for MySQL will not index any word shorter than 4 letters, and will not include any word that appears in the stopword list (e.g. common words like “with” and “after”). Unfortunately, these two restrictions are bad for tags. A meaningful tag may fall in the stopword list, and many meaningful tags are shorted than 4 letters (e.g. "osx"). It’s possible to change the minimum word length and the stopword list, but doing so will mess with other applications of fulltext search.

There is a simple solution around this problem, which relies on the use of underscores to prefix tags. A tag prefixed with an underscore is not recognized by the stopword list, and several underscores can be used to get around the minimum word length limitation. If this sounds like a complication, consider that the scheme requires some text manipulation on all tags before they are stored. In fact, the scheme I use stores the tag lists in two separate fields. The first field holds the tags as entered by the user, e.g. “SF” and “San Francisco”. The second field holds the tags in a searchable format, and is used for fulltext searches, so it would hold "__sf" and "_sanfrancisco".

You may want to read Philipp Keller’s posts on various schemes used for tagging data, and the MySQL fulltext search documentation.

Tags, AJAXMay 15, 2005 10:32 pm

The previous post left all of my two readers wondering what AJAX is. Jesse James Garrett of Adaptive Path sums it up well in this post. Which incidentally is one of the top links on del.icio.us, at least according to populicio.us. And subtly, I’ve just linked tagging with AJAX.

TagsMay 7, 2005 6:58 am

A Consuming Experience has a well written introduction to Technorati tags, tags in general and a bunch of good references.

“This is a introductory guide to “tags” on Technorati, the blogosphere search engine, which started using them in mid-January 2005. It’s a practical introduction rather than a tutorial (ending with some personal thoughts about tags)”

Link