AUTHOR: Steve Clancy TITLE: Statistics, Blogs, and the Long Tail DATE: 10/17/2006 10:56:00 AM ----- BODY:
Earlier this month our wonderful systems manager, Rick Simpson, began providing us with daily statistics information about our site. In the past, statistics were tabulated at the end of the month and didn't give us a good idea about what are visitors were looking at on a day-to-day basis. Our statistics reports are publicly available, so if you're curious you can see what I'm talking about. I try to avoid getting too worked up over some details, because statistics can be lies with numbers. But I did want to focus on a couple areas of interest - blogs and the long tail.

First, let's talk about blogs. I've been checking Technorati, a blog search engine, a lot to see who is linking to the Daily Collegian Online. According to Technorati the answer is a handful of real blogs and a lot of spam blogs (blogs that just steal content and links to attract more hits). After checking out the referring URLs in our statistics I realized that we get linked a lot more often than I realized. College Humor currently has Friday's Bundy story linked on its home page, as did FIRE (Foundation for Individual Rights in Education). Bundy, by the way, gathered more page views than our home page yeterday. Yesterday Fark tagged our story about a creationism/evolution lecture from Sept. 29 as "sad". Those are just some examples of the bigger sites linking to us.

This all leads into my second point - the long tail. Those two "hot" stories from yesterday's statistics are ones that did not appear in yesterday's paper. In fact, a look at our statistics reports will show that only about a third of our traffic is for that day's news. The long tail is a concept introduced in a Wired magazine article that has later been expanded into a book. It suggests that the Internet has started a shift in business from selling a small number of popular items to using technology to sell small quantities of many smaller items. Think of sites like Amazon.com and Netflix, whose selection is a big selling point. Julia Turner demonstrated last month how the long tail works for Slate magazine.

Seeing information like this shows the significance of maintaining archives and not putting them behind a pay wall. Some people may think it bad that a significant amount of our traffic goes to our archives, but from an advertisers' perspective we're still delivering them eyeballs. There may be some issues revolving around what sort of audience comes from outside our site. One way we don't capitalize on this currently is that our archives don't bring people back into the site well. Our navigation isn't consistent across the site and we don't have any "fresh" content on our archive pages. So most people who come to our site from a direct link to a story don't necessarily to see what else we have going on.

The long tail is a valuable lesson for a lot of businesses including newspapers. Its unfortunate that more news sites do not embrace this philosophy and leverage their archives better.

Labels: , ,

----- -------- AUTHOR: Steve Clancy TITLE: News on the March DATE: 10/01/2006 04:06:00 PM ----- BODY:
Blogging regularly is more difficult than you would think. I'm not sure what keeps the Kottke's, Scoble's and Jarvis' of the world going. People have been asking me to post more, so I thought I would outline how we currently post stories on our Web site.

Our process is designed for stories to come right off the print pages. All the pages of the print newspaper are designed in a program called QuarkXPress. It seems most people around have a love-hate relationship with the program, which makes design easier but has a lot of annoying little quirks. I don't design the paper, so I am often indifferent. I do need it to get stories onto the Web though and here I have a beef with the software. Quark has a nice feature where it lets you copy the formatted text in HTML format. This is both beautiful and troublesome though, since its HTML is often poorly formatted and includes some bizarre characters.

Next we copy the text from Quark into the Collegian Web Generator (Da Da Dah!). The Web Generator is sophisticated, simple, and only occasionally buggy. It was created by Joseph Shimkus in 2000 with the best wisdom from that time. At its best, it pareses Quark-speak into more readable HTML. It also lets us add headlines, photos, and shadow boxes to the stories and spits it out in our Web site's standard template. It also includes different formats for things like columns and editorials. After all the stories are done it creates the section pages for news, sports, etc.

One hang up with the Web Generator is that it spits out static files, only slightly souped up HTML pages that aren't very different from your ePortfolio site. This means that we have to move these files around on the site and create links to a lot of things by hand. And while this may work OK for your Dane Cook fan site, it gets more complicated when you have more than 100,000 articles to maintain.

We can't really update the Web site until the last page of the paper is sent off to our printers, which is around 1 a.m. on a good night. We also have to go through most of this process whenever we do a Web update mid-day, which is a hassle. One advantage we have on the Web, as compared to the print, is that we can always go back and fix our mistakes. Fixing stories requires someone to go in and edit the actual HTML, so its not really made for the tech-queasy. All this is handled by a couple students and our systems staff who perform some of the more thankless on the site.

If all this sounds ugly to you, you're right. We're not quite on the cutting edge yet. Still we're different than most other newspaper's, who just outsource their Web site work. We like to think by keeping things in house we're able to give the site the extra love and care that makes our site better than the rest. We are actively looking for ways to improve the site though, so you're welcome to send me your thoughts. And I promise I'll write again sooner rather than later.

Labels: ,

----- --------