From the 24th till the 27th of March I visited the Broad Institute of Harvard and MIT in Boston to attend the VizBi 2015 conference. The scope of this conference is to advance the knowledge in the visualization of biological data, the 2015 iteration was the 6th international meeting that took place. Hereby a long overdue recap of two talks that I thought were particular interesting.
On Wednesday John Stasko kicked off as a keynote speaker with some very interesting notions about the different applications of visualization; this should either be for presentation (=explanatory) or for analysis (=exploratory). This difference is important since they both have their own goals, for example when presenting results the goals are: to clarify, focus, highlight, simplify and persuade. However when analyzing data the goal is to explore, make decisions and use statistic descriptors.
However a good quote also passed by here “IF you know what you are looking for, you probably don’t need visualizations”.
So when you do decide you need a visualization it is most useful for analysis (=exploratory), in this case it can help you:
- If you don’t know what you are looking for
- Don’t have an a priori questions
- Want to know what questions to ask
So typically these kind of visualizations; show all variables, illustrate overview and detail and facilitate comparison. A result of this setup is that “analysis visualizations” are difficult to understand, because the underlying data is complex, so the visualization is probably also difficult to understand. This is not a bad thing, however the user needs to invest time to decode the visualization.
A perfect example of a exploratory visualization is the Attribute Explorer from 1998. Here the authors used the notion of compromise to analyze a dataset. For example when searching for a new house you might look at the price, the commuting time and the amount of bedrooms. However when setting a particular limit on each of these attributes you might miss the house that has a perfect price and number of bedrooms but is just a 5-minute longer commute. The paper shows that by implementing coupled histograms the user is still able to see these “compromise solutions”. The PDF of the article is available here showing some old school histograms.
The takeaway: a visualization of radically different if one presents the data or when one analyses the data
An often encountered problem with visualization is high data complexity; too high to visualize in one go. There are a few options to tackle this:
- pack all the data in one complex representation
- spread the data into multiple coordinated views (pixels are Johns friend)
- use interaction to reveal different subsets of the data
When interaction with data users have different intends in a 2007 InfoVis paper by Stasko  there are 7 intends described:
However 95% of the intends are made up by Tooltip&Selection in order to get details, Navigation and Brushing&linking. This gives rise to a chicken-egg problem, why are only those 4 intends used so extensively and how can one make a visualization more effective?
An example Stasko showed was the use of a tablet where there is a whole wealth of new gestures available, as is best illustrated in this video:
As a conclusion Stasko gives his own formula that captures the value of visualization.
Value of Visualization = Time + Insight + Essence + Confidence:
- T: Ability to minimize the total time needed to answer a wide variety of questions about the data
- I: Ability to spur and discover insights or insightful questions about the data
- E: Ability to convey an overall essence or take-away sense of the data
- C: Ability to generate confidence and trust about the data, its domain and context
On Friday Daniel Evanko (@devanko) from the Nature Publishing spoke about the future of visualizations in publications. There is currently a big gap between all the rich data sets that people publish and the way these are incorporated in scientific articles. Evanko made some interesting points from a publisher perspective.
The current “rich” standards such as pdf are probably good for a dozen of years to come, however new formats such as D3, Java and R can break or could become unsupported at any time in the future. On the other hand the basic print format such as paper or microfilm can be kept for 100 years. Although this is a conservative standpoint in my opinion it indeed makes sense to keep the long term perspective in mind when releasing new publication formats, because who says Java will be supported in 20 years. However I think with thorough design (the community) should be able to come up with some defined standards that have the lifetime of a microfilm.
Another argument Evanko used was the fact that the few papers that are published with interactive visualization do not generate a lot of traffic from which the conclusion was drawn that the audience doesn’t want these kind of visualization so publishers will not offer them. Again I feel we can be dealing here with a chicken-egg problem.
- Spence R, Tweedie L: The Attribute Explorer: information synthesis via exploration. Interact Comput 1998, 11:137–146.
- Yi Jsyjs, Kang Yakya, Stasko JT, Jacko J.: Toward a Deeper Understanding of the Role of Interaction in Information Visualization. IEEE Trans Vis Comput Graph 2007, 13:1224–1231.
- Sadana R, Stasko J: Designing and implementing an interactive scatterplot visualization for a tablet computer. Proc. 2014 Int Work. 2014:265–272.