Kill the Word Cloud!

Words clouds are pure eye candy, devoid of any meaningful analytical value necessary in sharing information to an audience. Unfortunately, they are popular and are in every social listening software system that I have seen and a variety of other tools too. They are meant to quickly show you the most popular terms that individuals use on a given topic over a period of time. What word clouds actually are is one of the most misleading (and overused) forms of visualization. Here are 3 reasons why we must kill the word cloud:

Word Clouds Have No Axes:

Every Fourth grader knows that if you want to accurately share and show information, you must label your axes. While there is data behind a word music_cloudcloud, the axes are not only un-labeled, they are not even there! In my quick example music artists, you have no idea what the data is based on, it could be number of awards over the past 10 years, number of wardrobe malfunctions, or (as it actually is) number of news mentions from Google trends over the past 10 years.

Word Clouds Have No Absolute Sense of Scale:

Closely related to the fact that there are not axes, there are no numbers that are available to get a sense of how much one topic is more or less popular than the other. Even worse, word clouds, skew the actual scale based on the font sizes that are available to them. While you may guess that Beyonce had 5x the mentions of The Beatles (this is painful for me to write), it would be impossible to guess that Beyonce had 40% more mentions than Katy Perry. While that is the actual difference between the data from Google Trends, the font size difference, are not holding to those ratios, but simply spread the terms across the spectrum of font sizes (Blake Selton is a Font Size 15, and Beyonce is at 80… this ratio is closer to what the difference should be between The Beatles and Beyonce). This kind of subjective assessment has no place in real analysis.

Word Clouds Ignore Time:

The treatment of time is my BIGGEST pet peeve with word clouds. Time series of data is one of the most valuable things to examine to better understand understand artists10yeartrends. Is a topic or subject increasing or decreasing in popularity with time? This can have a HUGE impact on recommendations or decisions. By aggregating all time series data together, the information can be VERY misleading. In our music example, while it is clear from the word cloud that Beyonce is a dominating force (which she is), it does not show that Taylor Swift has dominated over the past 5 years. It is also interesting to note the that Miley Cyrus (and Justin Bieber) continue do things that spike the news cycles on a pretty consistent basis. Side note: amazing to see the staying power of Madonna through the years.

Lesson from the Digital Trail:

If you need to share meaningful information with your audience, do not use the word cloud. It is strictly eye candy – devoid of analytic nutrition. Kill it!

