Stackoverflow.com

There is a good article on the principles driving the development of stackoverflow.com, a site where programmers get help with their coding problems on ReadWriteWeb.

I was particularly struck by the design points where Spolsky highlights the frustration created wrong answers and obsolete results.

I can remember when I was able to circumnavigate the web through a search engine for the topic of history of photography. It was that small. I could see everything there was to see about history of photography online in a week, a week of drudgery wading through duplicate results page after duplicate results page, until I had made sure I had seen everything about my topic. Although filled with a fair amount of junk and duplicates, I was still able to find a single web page if it contained sufficiently unique keywords, until about a year before Google emerged, I had relied on AltaVista to take me back to a web page in one go, when I could not remember where I had found a code solution on some obscure personal page, for example. Then the search engines began to fail me, and single pages I had found before became nearly impossible to find, but eventually, search engine technology improved and with Google, you could find that one blog page with the coding. That was one the solution to the problem of finding things.

Spolsky is right to observe the problem now is that search is failing to distinguish between correct and incorrect answers; between current and obsolete answers to technical questions.

When I first started programming using Microsoft Visual C++ (I was just a dabbler), I had a question about how to render bitmap graphics. I turned to the library of articles and code intended to help developers. I was happy when search quickly turned up an article on how to introduce bitmaps into your application. After an hour or two of reading, it slowly dawned on me the author was not talking about what I was familiar with, Microsoft Foundation Class applications. I was seeing unfamiliar code and unfamiliar techniques. I glanced up at the date. The article was from the mid 1990s. It was about coding C under Windows before MFC was introduced. The first, supposedly most relevant, documents search had brought up from MSDN was completely obsolete and about coding without an application framework. I had wasted hours reading the wrong articles.

Stackoverflow.com is an example of a great site. It is well designed, the developers learned the lessons of the last fifteen years of web technology and applied them. It is clean, beautifully presented and well organized site. I have to admit they did right what I failed to do with phphelp.com, which started by envisioning many of the same goals. They had to courage to go ahead with "soft security," collaborative editing, and content surfacing and valuing through a user voting system. Of course, with the volume of content and edits, such tools are necessary. What two humans could watch and police such a flow of content while doing their day job? User contributed and curated content is the only rational answer.

(By the way, it would probably be better to describe their principles as being informed by behavioral economics or an evolutionary branch of the field, than anthropology or social psychology, I feel the way people use voting systems to surface content, how "soft" social engineering strategies are employed on wikis, etc. to be close to the phenomena studied by behavioral economics, not just financial choices.)

Labels: , , , , , ,

Where are we going?

The issue of whether people should pay for forums or not came up on dpreview. With the current economy, I expect how to pay the bills will be a growing question for many web services.

The problem is with forums there is perfect competition. Anyone can setup a forum and run it for next to nothing. If one forum decides to charge a fee, the users can flee to another forum. The only reason they might stay is because of the audience. For example, photographers pay for to host their photographs on Flickr primarily because it provides a rich audience of people who love to look at still photographs. Flickr is the Life and Look magazine of our time, it is the revival of the great picture magazines, not because of its technology (that helped orient the site in the right direction to succeed, just look at the abject failure of Picasa to be social---too little too late). Flickr just happened to be where most people who like to look at pictures gathered, mostly because of its blog-like streams of every changing pictures and social tools. It is easier to pay a small fee to use Flickr (perhaps even to "read" it) than it would be to overcome the "capital" costs of changing sites. Flickr users have a lot invested in Flickr and it might just cost less to stay and pay than to move elsewhere. Besides, there is no where else to move. The closest thing I could see to Flickr would be for every photographer to put up their own photo blog software and then join photoblogs.org, which would become the "magazine" and "social hub." This is a distributed vision of photo sharing online. I used to wonder which would be successful. But it really was simple, Flickr did it all for you, some for free, a little more for pay, well worth it to promote your photography.

Despite the somewhat juvenile and absurd environment of Flickr with regard to art photography (you know, the dozens of people giving out "Great Photograph" awards to pedestrian, derivative and mediocre images mostly to promote themselves or because they are too young to know what a derivative image is), it is useful to professional photographers and art photographers because Flickr is where the eyeballs are. It attracts people who still love still photography, which in this age of video, is a bit of a miracle that anyone takes an interest in photography. However, photographs can make the world sit still long enough for people to pay attention, and that is a very similar experience to poetry, which at least in part, is there to draw attention to things. I've heard from professional photographers they get an order of magnitude more requests or work through Flickr than through one of the professional portfolio sites.

One reason, perhaps the principal one, Henri Cartier Bresson and other great photographers became well known, was through their images being published in the great picture magazines. When television came along, the picture magazines went into decline. Photojournalism began its long decline at this time, for the simple reason people could learn about their world visually through television, a more attention grabbing (the barrier to entry for television was lower, you didn't have to be intelligent to watch it, a good example where low barrier of entry is destructive to society) and free medium. Without the picture magazines it was no longer possible for a photographer of acknowledged artistic merit to become known and their images have significance in society. The audience was gone. Flickr reestablishes this audience.

So the question still stands. Will people in the future pay for their online content. Pay to create it. Pay to consume it. What is happening now? People are already paying to create content. They pay for a Flickr account with better tools. They pay for services to create graphics, three dee art, property in virtual communities. A few sites charge for reading content, but not many. But given human history and the recent past, when most content was paid for, in newspapers, books and magazines (except for tv), it seems reasonable to assume the free ride will be over someday.

There may be a tipping point when a non-pay site is no longer competitive. When most good content has gone to pay sites and the community of interest for that content willing to pay is consuming all they can (this is what happens with books and magazines today), the other sources will be driven out in a kind of perfect competition. The free sites will be filled with garbage and what passes for content on local cable access.

The network is not the old traditional world of libraries and publishers. It will be different. Project Gutenberg. Open source projects. The collections of enthusiasts sick and tired of the crap shoveled out by the traditional content and software businesses have taken it on their own to produce quality products where the marketplace would not or could not. This is an order of magnitude different than the pre-networked world, where people could not work together, providing little bits of effort or expertise to collaboratively create a cultural artifact. This is entirely new and we don't know where its going.

As an aside, the idea of tipping or donation comes up. Frustrated with no way to fund my original website, I considered taking a modern high tech variation on the PBS approach. I considered (in the 1990s) creating a content management system where each article would have a countdown timer displayed like a reverse donation thermometer. If you didn't contribute something to the article, it would count down, when it reached zero, the page would be pulled from the site. Of course, the ability to cache networked content presents a threat to such schemes, the wayback machine can regurgitate considerable missing content and so can the Google search cache. What about caching? If the Wikipedia were to dry up funding and blow away today, would its content still remain available in a myriad of niches around the network? On people's computers, disks, servers here and there, in caches? Would it evolve another life in a peer to peer environment? Will all information become distributed over billions of cell phones and have no location at all?

Labels: , , , , , , , , , , ,

Information Evolves and Other Stuff

I've learned to avoid precategorizing anything in my bookmarks.

I don't make a category unless it is necessary, unless I am using it. For example, I need to bookmark Amazon web services, so I create an Amazon Web Services folder, but I don't create a Web folder, with a Web Services folder inside, which I then put the AWS folder in. I don't have any other web services bookmarks yet in Google Chrome so I leave this for later. It just creates more folder depth to dig through before it's needed.

I also try to avoid adding a bookmark just for reference. That just leads to clutter, where I can't find the bookmarks I use on a daily basis, because when you categorize information according to its classification or how it relates to other information, you lose how it relates to you, to useage. For example, if in browsing the web I find a half dozen interesting resources on manual focus lenses, but for cameras I don't use, the bookmarks will obscure the resources I use for manual focus lenses for cameras I do use. What I do now is add bookmarks only when I use the content or need the content now, not for reference or anticipating future use, placing them in the categorized heirarchy. The others I place in Uncategorized (what a wonderful idea, that Uncategorized anti-category!) awaiting the day they become useful and can be categorized, or I place them in a special heirachy called Reference. I don't know if a parallel heirarchy will work but it does keep them out of my way.

I hate digging through deep categories. Yet, for proper categorization to find things later, when you've forgotten where they are, they need deep categories. If I have have five different web services providers, each one needs its own folder and there will be clutter if I just create them all at the same level. So I need to create a Web Services folder, which then adds another annoying, slowing, confusing layer to finding what I want and to my thinking. I want Amazon Web Services when I want it, not digging through Web, Web Services to get to it. What if I use it every day? I have to dig each time.

This brings up another issue: information structures evolve.

One of the problems library scientists create is through this need for pre-creation of categories. They must predict every category that will be needed ahead of time. I was once told I needed to create "name authority records" for every photographer in a database I envisioned of 19th century photographers, before a database collecting names from old card photographs could be built. At that rate, the database would never be built and besides the whole purpose of the project was to collect the names so we could see who was doing what and look for patterns. If we had anĀ authoritativeĀ name for each one, we wouldn't be doing the research.

Don't engineer. Evolve. Evolve. Evolve.

We don't need architects and engineers, we need some new job description with a new name, evolvineer or something, for the person who creates a framework for information evolution (maybe like the game Spore?). Perhaps databases like multivariate or Lotus Notes will help get us there.

Labels: , , , , , ,

Why Tag Clouds are Beating a Dead Horse

Tag clouds are dead. I don't want to mince words. I've been waiting for a long time for someone to say so, to let everyone see the elephant in the living room. What interests me is why tag clouds are dead.

About ten years ago I was working on a prototype web application. It never saw the light of day. But it was called Strands and consisted of a wiki-like content management system that allowed anyone (it was based on SoftSecurity) to create pages, to post and edit content. Any author could include single keywords in the text. These would be automatically scooped up and entered into an index. You could display the posts associated with (containing) any keyword listed on a page like search results. The idea was that content could be navigated in any number of ways according to keywords added by users. It's wasn't social. It didn't know the user who contributed the keyword. The idea was to destroy hierarchy and create a user centered order to information, something close to the folksonomy (but not quite because it didn't care about who submitted a keyword). One version did not allow linking between pages, no "wikiword" links, the idea being that all navigation was by keyword links, either in content or on the "strand" pages listing all content belonging to a keyword.

One of the other ways of navigating considered was by popularity of keyword. The system could generate a list of keywords based on how many posts contained or were associated with them. You may start to find the elements of this system familiar. "Strands" are posts listed by tag. Keywords are tags. Navigating by popular keywords is a tag cloud. The ideas for this system partly developed out of work I'd seen on the web where posts were ordered by single keyword. The other reason was I have a terrible time categorizing anything, I can't decide which category something could go in. I am incredibly bad at and hate categorizing anything, so I decided the wiki element would let visitors to my site categorize my junk for me.

If this were not a blog, I'd spare you all this personal history, but it does show you why I am interested in the question of why tag clouds suck.

When I visit a website with a tag cloud, I tend to pay close attention to it. I noticed that I never bothered clicking on them, never used them. When I thought about why, one of the things I noticed was that nearly every tag cloud consisted of a number of large tags I could count on my hand, and the rest were undifferntiated in size. One of the solutions that came to mind was displaying tags by popularity on a logarithmic scale, which could help increase the difference between the less popular tags. I'm not that great at math, so I would need to leave it to someone else to work this out. But the idea is to create greater differentiation visually among the less differentiated tags.

The other problem with this is there are only so many font sizes that are easily usable on the web. This worsens the differentiation problem.

The other concern I had devevloping the keyword based application was that chaos would ensue. People tend to prefer order. Would it help or hurt for people to be navigating by tag? Tags don't always apply to the subject. Their strength is freedom, freedom from controlled vocabularies and rigid meanings, but without those restrictions tag-chaos can reign. Wikis always had a kind of randomness to them and so do tag structured and navigated content.

I almost never click on tags in Wordpress blogs for this reason. It usually produces a result that widens not narrows my search. Nielson observed that clicking on a link has a penalty, and the trouble with tags is they have an uncertantity penalty.

The closest I've ever seen to a realization of the keyword based navigation idea is a photo gallery developed by Alex Wilson some years ago. You can see it still in operation here. It's a great idea and an excellent implementation, I don't know why I didn't go ahead with my own version instead of abandoning it (doubly, since the eventual goal was for organizing photographs). It makes the homepage a tag cloud and each detail page with a photograph displays a vertical row of thumbnails to photographs linked by tags, which is very similar to the way the Strands pages listed posts according to tag (like Flickr pages with the tags next to the image). Alex recently switched to a standard gallery system for this exact reason, that visitors and customers apparently found the tag-navigated album confusing.

I love tags. I use them like I feel they were supposed to be in this blog, I just write any significant word that comes into my head about the subject. I don't care that they create long lists of tags, since I only use them as a memory aid. They are terrible for people navigating the site and categories would probably be better. Tags aid memory, they aid discovery and exploration, but I'm uncertain that they are good finding aids.

I'm sure others have observed this before, but I've kept quiet about it, so I may be late the party, but still, it's a useful discussion, to dissect why tags ultimately fail to live up to the (strange to me) hype they received. Every new web technology seems to be annoucned like the second coming.

So, yes, tag clouds are beating a dead horse. Even the little sets of tags next to blog posts don't really do much for me, not even on my own site, or they don't seem to do much for visitors in my view.

The other thing that tortured me developing the keyword based navigation was whether to allow spaces in keywords, which would prevent combining keywords like chicken+soup and create confusion (sepearte keyword threads of navigation) between "farmers market" and "farmers_market." I worried a bit about misspellings, but not too much since I didn't like controlled vocabularies.

References: Tag Clouds_Rip and ZigTag supposed to solve these problems.

Labels: , , , , , , ,