Yesterday marked the second day of the course I’m taking at UC Berkeley titled, simply, “Visualization.” I’ll be recording my most valuable learnings from the course here on the blog, mostly for the benefit of fellow DMIers. If you find this interesting, just follow along on the blog, or find more information on the class wiki. Yes, everything related to the course is on there — the readings, completed assignments, everything.
One of our readings for class was “The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations,” by Ben Shneiderman. In it, Schneiderman outlines a taxonomy that covers nearly all types of data:
- 1D (point, one value)
- 2D (x/y, planes, maps, etc.)
- 3D (x/y/z, space, objects with depth)
- Temporal (4 dimensions)
- Multi-dimensional (or nD, basically any datum with >4 dimensions or values)
- Trees (hierarchically structured data)
- Networks (graph structure, or non-hierarchical, but related)
I recommend spending a few minutes with his article.
We also talked about Nominal, Ordinal, and Quantitative data.
- Nominal data is named, or labeled (e.g. apples, oranges, starfish, racecars), although the labels are not related in any direct way.
- Ordinal data is ordered (e.g. grade A meat, grade AA, and grade AAA). The order is known—whether from low to high or best to worst — but not the relative distance between each measure. (For example, we know that grade AAA meat is different from grade AA, but we can’t say whether it’s 2.5 times better or 88.3 times worse than AA.)
- Quantitative data is quanitifiable in that its relative position to other data can be easily identified. (22 is 12 more than 10.)
Whether you use N, O, or Q depends on what your goal is for the visualization. For example, you could start with the numbers 10.5, 24.8, and -7.1. Your conceptual model tells you what the meaning of these values is. For example, the conceptual model could be distances, angles, or temperatures. Let’s use temperatures.
- If considered nominally, the values could be categorized as burned or frozen.
- If considered ordinally, the values could be categorized as cold, cool, warm, or hot.
- If considered quantitatively, the values would represent a range of relative temperature values.
Why bother with all this? Because not all data types can be visualized in all ways. Jacques Bertin was the first to approach this subject in depth, with his 1967 book Semiology of Graphics. In it, he indentified attributes of visual language (e.g. position, size, shape) that could correspond to the different types of information communicated. Nominal data is more easily represented than quantitative data, it turns out. For example, it’s difficult to sequence using color (since the eye does not naturally perceive blue coming after yellow or before red), although color is great for labeling nominal values (cities, suburbs, disputed territories).