Here are my notes from today’s OU seminar which offers a survey of major online humanities datasets and some of the tools available for visualising them.
- Dr Elton Barker (Classical Studies Department, The Open University, and Alexander von Humboldt Foundation, Freie Universität Berlin)
- Ms Mia Ridge (PhD student, The Open University, and Chair of the Museum Computer Group UK)
This seminar offers a survey of major online humanities datasets and some of the tools available for visualising them. Examples will be drawn from externally-funded projects such as Hestia, Google Ancient Places and Pelagios.
Amplification is important for data visualization: it should enhance our understanding of the information presented. John Snow’s map of cholera outbreaks (1854) showed that cholera outbreaks were localized around water pumping facilities. Similarly, Florence Nightingale produced a diagram showing causes of mortality in the Crimean war (1857) and Charles Minard (1869) produced a figurative map of French losses during the Russian campaign. In each of these cases, a lot of information is packed into an accessible visual representation.
Harry Beck’s original map of the London Underground (1931) moves on from simple geographical representation and uses a circuit-diagram inspired approach with clean horizontal and vertical lines, stripping out extraneous information. It is an emminently usable representation of the Tube system.
Visualization is typically defined by the type of qualitative or quantitative data that is available. Visual representations can be static or interactive, but both should help people to find the most important information or insights. It’s worth thinking about the different variables will be selected and shown.
Types of visualizations (ManyEyes)
- Bubble Chart
- Pie Chart
- Line/stack graph
- Word tree
- Phrase net
- Matrix chart
Another (new) form is sentiment analysis. Data visualisation can help to check data by highlighting outliers or unexpected information visually. In the humanities, there are a number of textual analysis tools available, including entity recognition and nGrams.
There are also many flawed visualizations, both in terms of accuracy and depth. They can also be responsible for obscuring problems with datasets, their consistency and how they were collected. They may also over-simplify complex information, or force categories on data in order to render them capable of visualization. Imposing the binary logic of computers onto research data can also lead to nonsensical results. It’s also worth remembering the old adage that correlation does not imply causation.
Best practice in data visualization:
- How effectively does the visualization support cognition and/or communication?
- Spatial arrangements should make sense of variables
- Be aware of the audience
- Are you telling a story or letting people explore?
- 80% of data visualization is cleaning of datasets. One need to think about how to organise data.
This project explores the cultural geography of the ancient historian Herodotus, extracting placename data for visualization: places, settlements and natural features. This leads to some complications (e.g. a single dot for a sea) and computers generally don’t deal with uncertainty or ‘fuzzy’ information all that well. Network maps were used to show the connections between territories, showing, for example, that Greece is not the centre of the narrative, even though Herodotus was Greek.
Data shared between project partners enables the application of a range of API tools. This allows for the exploration of relations between places through data; and conversely between data through places (e.g. heat mapping).