Folksonomies: How we can improve the tags

The discussion on folksonomies continues. Folksonomies—like controlled vocabularies—are here to stay, and instead of argue about their merits relative to controlled vocabularies, I would like to focus on how we can make them better.

What I have noticed is that I suck at coming up with good tags for my links in delicious. I just cannot really come up with good tags to write—frequently I will just repeat words from the title—and as a result, most of my tags end up being applied to only that one entry, with a few tags, such as “web” and “software” being applied to so many links as to make them useless. This is a problem. After all, tagging and categorization are about divide-and-conquer. If you categorize everything in the “other” category, you have achieved nothing.

And I am not alone in this. I wrote a quick script to get an idea of how other people do compared to me, specifically Clay and Liz (apparently, Lou isn’t using delicious). Here are the results, and what they show is that we all have very uneven distributions of tags. The way to read this is that Clay has 94 different tags that he has only ever used once (first line), and 1 tag which is used for 211 different links (last line).

Clay Shirky

# Repeats # Tags
1 94
2 18
3 10
4 3
5 1
6 3
7 2
8 2
9 1
10 1
11 3
13 1
14 1
18 1
27 1
31 1
34 2
48 1
113 1
211 1

Elizabeth Lawley

# Repeats # Tags
1 115
2 54
3 24
4 18
5 13
6 10
7 9
8 12
9 3
10 2
11 3
12 4
13 3
14 2
15 4
16 1
17 2
19 3
20 1
21 1
22 2
25 1
26 1
27 2
29 3
30 1
32 1
36 1
40 1
45 1
50 1
70 2
74 1
92 1
111 1

Lars Pind

# Repeats # Tags
1 220
2 38
3 21
4 9
5 2
6 5
7 4
8 2
10 1
11 2
12 1
13 3
20 1
25 1
35 1
47 1

I suspect that the expertise that we lack, is the one that professionals like Lou Rosenfeld has. But instead of re-hiring the professionals to do controlled vocabularies for us, are there simple things we can do with the software to empower amateurs to be better taggers? I think so.

Here are some of the techniques used by professionals:

So here are some ideas for how we could improve folksonomy software to make us better at this, without involving any editors.

  • Suggest tags for me. A Google Suggest-style interface will help familiarize people with the universe of existing tags, so you can use an existing tag rather than invent your own, when the existing tag applies equally well. It would also reduce typos and inconsistencies, like “blog” vs. “blogs”, and it might serve as inspiration to get past the obvious tags. The pool of tags suggested from could be a weighted list of my own tags, my friends’ tags, all tags, and tags other people have already used for this link.
  • Find synonyms automatically. In the browsing interface, Flickr is pretty good about showing related tags. Why not show these related tags when I am tagging a photo, thus making it easy for me to just add the ones that apply. They could even do a quick lookup on WordNet for more synonyms. Since the related tags in the browsing interface feeds off of tags used on the same images on the input side, this would also help make strong links stronger.
  • Help me know what tags other people use. When doing both the Google Suggest and the synonyms above, show the most used tags in a larger size than less used tags. There is value in people using the same tag for the same thing, and we want to encourage that, without in any way preventing people from choosing different tag if they want to.
  • Infer hiearchy from the tags. I have a habit of using multiword tags, so instead of saying “socialsoftware” like you’re supposed to on delicious, I say “social software”, which really makes it two separate tags. That’s not necessarily a bad thing, though. If this habit is generally applied, we could look at home many links that are tagged with “social” are also tagged “software”, and maybe infer that “social” is frequently used in conjunction with “software”, and thus might imply a special kind of software (or the other way around, that software is a special kind of social), thus offering the combined tag “social software” to contain links that are tagged with both. A different example would be items tagged “volvo car”. If most of the time something is tagged “volvo”, it is also tagged “car”, we might infer that volvo is a kind of car.
  • Make it easy to adjust tags on old content. If the above and other ideas work, people’s tagging skills should improve over time. So why not augment the browsing interface so that it’s very easy for me to add or remove tags from my iamges or links right there, e.g. from a list of suggested tags on the page, and I’m sure that sometimes, someone would use it. Another incentive to retag my content is if I’m searching for a link on Buenos Aires, but the link wasn’t tagged with “buenosaires”, so I find it under “argentina”, say, it should be very easy to add the “buenosaires” tag to that item.

As always, these ideas are from the let’s-try-it-and-see-if-it-works department. Maybe some of them are too complicated, or produce bad results in practice. Surely there are many other ideas to try. But this is at least a start. And simultaneously with improving tagging, we can also do a lot to improve browsing, with collaborative filtering through a trusted network.