Lab notes

Lab notes, Tags, MySQLMay 30, 2005 7:11 am

I’ve been looking at different schemas that can support tagging, which I realized that full text search is the perfect application for tagging.

A scheme where each tag (applied to an object by a user) occupies a row in the database is simple and efficient in terms of storage size and retrieval. With the right set of indexes (at least three are required), queries are extremely efficient. But the code for managing the tags is either complex or restrictive. To begin with, tagging each object requires multiple inserts, and changing the object’s tag list is a combination of inserts and deletes. Renaming a tag is just as tricky, thought MySQL’s replace statement keeps it under check.

Querying on a single tag is easy, querying on two tags slightly more complicated, but each extension increases the level of complexity, that even reasonable search capabilities become unreasonably complicated to implement. Users who need or searches, or searches that include one tag but exclude others, are out of luck.

Using fulltext search simplifies everything, and brings up new and interesting options. The scheme holds all tags applied to an object by a user in a single field, using a fulltext index on that field. Tagging an object requires a single insert, changing the object’s tags requires a single update. Renaming a tag is an update with a bit of string manipulation. But queries is where fulltext search takes us to a whole new level.

With the in boolean mode option you can query for interesting combination of tags. For example, you can query all records that have the tags foo and bar ("+foo +bar"), or all records that have the tags foo or bar ("foo bar"). You can query all records that have the tags foo or bar, but excluding any records that have the tag bar ("(foo bar) -baz"). So we have mandatory, optional and negative tags, as well as and an or relations.

The default configuration for MySQL will not index any word shorter than 4 letters, and will not include any word that appears in the stopword list (e.g. common words like “with” and “after”). Unfortunately, these two restrictions are bad for tags. A meaningful tag may fall in the stopword list, and many meaningful tags are shorted than 4 letters (e.g. "osx"). It’s possible to change the minimum word length and the stopword list, but doing so will mess with other applications of fulltext search.

There is a simple solution around this problem, which relies on the use of underscores to prefix tags. A tag prefixed with an underscore is not recognized by the stopword list, and several underscores can be used to get around the minimum word length limitation. If this sounds like a complication, consider that the scheme requires some text manipulation on all tags before they are stored. In fact, the scheme I use stores the tag lists in two separate fields. The first field holds the tags as entered by the user, e.g. “SF” and “San Francisco”. The second field holds the tags in a searchable format, and is used for fulltext searches, so it would hold "__sf" and "_sanfrancisco".

You may want to read Philipp Keller’s posts on various schemes used for tagging data, and the MySQL fulltext search documentation.

UncategorizedMay 27, 2005 7:59 pm

Clay Shirky talks about the Semantic Web in a piece he calls The Semantic Web, Syllogism, and Worldview. In an effort to explain what the semantic Web is all about, he sums it up better than anything else I read:

The simple answer is this: The Semantic Web is a machine for creating syllogisms. A syllogism is a form of logic, first described by Aristotle, where “…certain things being stated, something other than what is stated follows of necessity from their being so.” … This is the promise of the Semantic Web — it will improve all the areas of your life where you currently use syllogisms. Which is to say, almost nowhere.

I have nothing else to add.

Link

UncategorizedMay 26, 2005 10:47 pm

They do. Quite a lot. And being smart, they often get away with bad decisions … but being smart, the outcome is still the same. Scott Berkun’s essay draws the distinction between smart people and wise people, explore why smart people would defend bad ideas and gives a few clues on how to handle such situations.

But one thing I did learn after years of studying advanced logic theory is that proficiency in argument can easily be used to overpower others, even when you are dead wrong. If you learn a few tricks of logic and debate, you can refute the obvious, and defend the ridiculous.

If when you say “I need the afternoon to think this over”, they say “tough. We’re deciding now”. Ask them if the decision is an important one. If they say yes, then you should be completely justified in asking for more time to think it over and ask questions.

Link

Productivity 3:43 pm

Keith’s To-Done blog has a collection of personal productivity and writing tips. I’m always on the look for better ways to get more stuff done and keep my sanity in check. His recent entry has great tips on how to be a more productive blogger:

  • Start with a title. Sometimes just coming up with that initial idea and writing down the title for your post will get the words flowing.
  • Adopt a conversational tone and style. This helps your words flow more freely. It might not be the “best” way to write, but it’s served me well and it saves me time.
  • Connect and motivate. I’ve found that community is a great way to keep your energy level high. Talk to people, create content centered around discussion. This leads to more energy and more ideas. As well, it’s very motivating to know that people are into the same things as you and dealing with the same issues..

Link

AJAXMay 24, 2005 4:27 am

Alex Bosworth lists his ten top AJAX mistakes. I couldn’t agree more. There are too many choices in AJAX and DHTML that work great in principle, but not in practice. The big challange, and it’s not an easy one, is how not to repeat any of them.

UncategorizedMay 20, 2005 8:10 pm

I’m not going to make headline news when I blog about the New York Times content policy. The way it works, only registered users can read new articles, and after a while articles move to the archive where only NYT subscribers can access them. As a result, I very rarely link to NYT articles, and only after I’ve considered the downside of annoying my readers with registration-required content and broken links.

Last week the NYT went one step further (or backwards, as the case may be) and restricted op-ed pieces to subscribers only. It’s content that I will never get to read, never get to link to, I probably wouldn’t even know it exists. I don’t know if the NYT stands to gain, but I do know op-ed contributors stand to lose by becoming irrelevant on the Net.

The news hit the blogsphere, and soon enough a lot of bandwidth was spent opinionating about access to other people’s opinions. It was John Battelle’s blog where I found this sentence, and it immediately caught my attention:

They are keeping most of the site free, after all, and asking that people pay for the stuff that has proven to be the most valuable - folks’ opinions.

Folks’ opinions is something I’ve been wrapping my head around during the past few weeks, it’s almost an obsession of mine, and what this project is all about. Not NYT co-ed contributors, but rather ordinary people like you and me who have interesting opinions to share, but not the forum (or time or moeny) to bring it to the front. Or worse, we have forums where our opinion is valued, but we don’t get to own it or take credit for it. That has to change.

Cool, NewsMay 17, 2005 5:46 am

The news is out that NewsGator has acquired FeedDemon.

When I first got exposed to RSS I was mildly curious about this new technology, and played around with a few subscriptions, and a bunch of news readers. Then I found FeedDemon. I was hooked. Not only was FeedDemon a top rate application, it actually changed the way I consumed information. Ok, RSS did have something to do with it, but what really set the ball in motion was the speed with which I could get the feeds, browse through them and focus on the important ones. I have more feeds than I can keep track of, barely enough time to read the top ones, and a great user experience to pull it through.

And then FeedDemon added searches. So I have feeds using keywords for all the feeds I don’t have time to browse.

And then FeedDemon added podcasting. Like any new technology, I tested out podcasting in the early days, but was too lazy to actually use it. When FeedDemon added iTunes synchronization, I turned into a regular listener, with never a dull moment on my daily commute.

So congratulations to Nick Bradbury and the NewsGator team. But wait, it gets even better. NewsGator will honor existing FeedDemon customers with 2 years of free service and upgrades. So now I also get to try NewsGator, and I heard a lot of promising things about their service. That’s what I call commitment to customers. Congratulations to all of us.