Saturday, March 26, 2005

Why Blogs Matter-- at least for Search

It occured to me, while I was writing my inaugural post, that for most of the links in my post, I would do a Google search to find a good link for whatever concept I wanted to reference, like the power-law distribution of blog readerships that I got from kottke.org. (I'm going to continue to help Jason out by linking to him as often as possible. And who knows? Maybe someday I'll actually start reading his blog.) The link I chose was always in the top 10 sites that Google returned. I imagine that I'm not the only blogger who does this, so my including links in my posts is actually refining and reinforcing the placement of sites that get returned by Google. I see a few different implications of this:

  • Helping search engines determine that different phrasings are refering to the same thing. The text I linked in my post was actually slightly different that my search phrases (the fear of plagarizing that is drilled in during school is still strong) so it improves the coverage of the search engine, tying slightly different phrases to the same concept.
  • Cleaning and pruning the sites returned. It so happened that Jason's site on the power law distribution was actually the third site returned, but it gave the sort of discussion that I was after. In linking to it, I'm making my own subjective statement that Jason's site is actually a little bit better than the first two sites returned. Bloggers serve as subject-matter experts on everything under the sun, constantly refining the quality of the index.
  • Rapidly indexing the new. Bloggers penchant for commenting on current events gives Google a direct index into sites that become especially relevant with the changing current events. I imagine they've even thought about optimizing their setup so that"crawling" the sites of Blogspot is near real-time.
Though I know that Google has the do-no-evil credo, and I myself am a long-time fan, I wonder if it's a good thing for a single company to have that kind of power. It seems like it would tempt Google to block other search engines (like MSN, Yahoo, etc.) from indexing Blogspot, so that they could use their advantage in indexing the information therein and the other major engines couldn't- in much the same way that AOL desperately tried (tries?) to prevent interoperability with its IM client, clinging to the one network-advantage they still have left.

Along those lines, I wouldn't mind seeing a blog-only search page that was akin to Google News- a rapidly updated, completely automated site with the latest and greatest blog posts from all of Blogspot. Perhaps some Googler is spending 20% of his/her time doing that right now.

The Inagural Post

About once every six months, I'll spend a few hours browsing the Internet of old via the Wayback Machine. Seeing how some of the really big sites have evolved gives you a really neat feeling of history happening in real time- like when you would watch those National Geographic slow-frame videos of a flower or a beehive evolving over months and years, all in a few seconds.

More than anything else, I marvel at how many of those people had no idea what they were starting. You can't blame them, obviously- there are those charts that show the power-law distribution of blog readership, for instance. (Note that I am helping to perpetuate that distribution by linking to Jason Kottke's blog. Since he's started blogging full-time, I figure it's the least I can do to help a brother earn a living.) Most things that start small, stay small. I haven't done an analysis of why this distribution arises in practice, but there are some people who have some very interesting ideas.

Anyways, I wanted to kick off this blog with some meta-thoughts on the history and practice of, well, kicking things off. Starting a blog looks (and I guess now, feels) alot like a first date, or meeting a friend of a friend- which is to say, often awkward and unintentionally amusing. I find that these experiences are helped if you have some sort of ideology (or better yet, a life-changing experience, like visiting Budapest) to discuss and get you through the occasional lulls in conversation. This blog will fill those lulls with thoughts on the current state of software for the quantitatively inclined- particularly statistical packages like SAS and R, since I'm a statistical software developer- but including Matlab, Mathematica, and whatever other obscure package that I come across. Like many blogs, this will no doubt include several rants, since I find all of these packages lacking in some way or another, and critiquing their shortcomings is much more fun than creating something of my own.

In between the rants, you'll find my thoughts on the business of software, politics and the state of the nation, what I've been reading and listening to lately, and the fortunes of my beloved Duke Blue Devils (who took a beating at the hands of Michigan State last night that I will spend several weeks recovering from.) I hope you enjoy and that you'll come back to visit often.