Today Beck Pitt and I travelled up to Birmingham in the midlands of the UK to attend a BERA/Wiley workshop on technologies and ethics in educational research. I’m mainly here on focus on the redraft of the Ethics Manual for OER Research Hub and to give some time over to thinking about the ethical challenges that can be raised by openness. The first draft of the ethics manual was primarily to guide us at the start of the project but now we need to redraft it to reflect some of the issues we have encountered in practice.
Things kicked off with an outline of what BERA does and the suggestion that consciousness about new technologies in education often doesn’t filter down to practitioners. The rationale behind the seminar seems to be to raise awareness in light of the fact that these issues are especially prevalent at the moment.
We were first told that these meetings would be taken under the ‘Chatham House Rule’ which suggests that participants are free to use information received but without identifying speakers or their affiliation… this seems to be straight into the meat of some of the issues provoked by openness: I’m in the middle of life-blogging this as this suggestion is made. (The session is being filmed but apparently they will edit out anything ‘contentious’.)
Anyway, on to the first speaker:
Jill Jameson, Prof. of Education and Co-Chair of the University of Greenwich
‘Ethical Leadership of Educational Technologies Research: Primum non noncere’
The latin part of the title of this presentation means ‘do no harm’ and is a recognised ethical principle that goes back to antiquity. Jameson wants to suggest that this is a sound principle for ethical leadership in educational technology.
After outlining a case from medical care Jameson identified a number of features of good practice for involving patients in their own therapy and feeding the whole process back into training and pedagogy.
- No harm
- Informed consent
- Data-informed consultation on treatment
- Anonymity, confidentiality
- Sensitivity re: privacy
- No coercion
- Research-linked: treatment & PG teaching
This was contrasted with a problematic case from the NHS concerning the public release of patient data. Arguably very few people have given informed consent to this procedure. But at the same time the potential benefits of aggregating data are being impeded by concerns about sharing of identifiable information and the commercial use of such information.
In educational technology the prevalence of ‘big data’ has raised new possibilities in the field of learning analytics. This raises the possibility of data-driven decision making and evidence-based practice. It may also lead to more homogenous forms of data collection as we seek to aggregate data sets over time.
The global expansion of web-enabled data presents many opportunities for innovation in educational technology research. But there are also concerns and threats:
- Privacy vs surveillance
- Commercialisation of research data
- Limits of big data
- Learning analytics acts as a push against anonymity in education
- Predictive modelling could become deterministic
- Transparency of performance replaces ‘learning
- Audit culture
- Learning analytics as models, not reality
- Datasets >< information and stand in need of analysis and interpretation
Simon Buckingham-Shum has put this in terms of a utopian/dystopian vision of big data:
Leadership is thus needed in ethical research regarding the use of new technologies to develop and refine urgently needed digital research ethics principles and codes of practice. Students entrust institutions with their data and institutions need to act as caretakers.
I made the point that the principle of ‘do no harm’ is fundamentally incompatible with any leap into the unknown as far as practices are concerned. Any consistent application of the principle leads to a risk-averse application of the precautionary principle with respect to innovation. How can this be made compatible with experimental work on learning analytics and sharing of personal data? Must we reconfigure the principle of ‘do no harm’ so it it becomes ‘minimise harm’? It seems that way from this presentation… but it is worth noting that this is significantly different to the original maxim with which we were presented… different enough to undermine the basic position?
Ralf Klamma, Technical University Aachen
‘Do Mechanical Turks Dream of Big Data?’
Klamma started in earnest by showing us some slides: Einstein sticking his tongue out; stills from Dr. Strangelove; Alan Turing; a knowledge network (citation) visualization which could be interpreted as a ‘citation cartel’. The Cold War image of scientists working in isolation behind geopolitical boundaries has been superseded by building of new communities. This process can be demonstrated through data mining, networking and visualization.
Historical figures of the like of Einstein and Turing are now more like nodes on a network diagram – at least, this is an increasingly natural perspective. The ‘iron curtain’ around research communities has dropped:
- Research communities have long tails
- Many research communities are under public scrutiny (e.g. climate science)
- Funding cuts may exacerbate the problem
- Open access threatens the integrity of the academy (?!)
Klamma argues that social network analysis and machine learning can support big data research in education. He highlights the US Department of Homeland Security, Science and Technology, Cyber Security Division publication The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research as a useful resource for the ethical debates in computer science. In the case of learning analytics there have been many examples of data leaks:
One way to approach the issue of leaks comes from the TellNET project. By encouraging students to learn about network data and network visualisations they can be put in better control of their own (transparent) data. Other solutions used in this project:
- Protection of data platform: fragmentation prevents ‘leaks’
- Non-identification of participants at workshops
- Only teachers had access to learning analytics tools
- Acknowledgement that no systems are 100% secure
In conclusion we were introduced to the concept of ‘datability‘ as the ethical use of big data:
- Clear risk assessment before data collection
- Ethcial guidelines and sharing best pracice
- Transparency and accountability without loss of privacy
- Academic freedom
Fiona Murphy, Earth and Environmental Science (Wiley Publishing)
‘Getting to grips with research data: a publisher perspective’
From a publisher perspective, there is much interest in the ways that research data is shared. They are moving towards a model with greater transparency. There are some services under development that will use DOI to link datasets and archives to improve the findability of research data. For instance, the Geoscience Data Journal includes bi-direction linking to original data sets. Ethical issues from a publisher point of view include how to record citations and accreditation; manage peer review and maintenance of security protocols.
Data sharing models may be open, restricted (e.g. dependent on permissions set by data owner) or linked (where the original data is not released but access can be managed centrally).
[Discussion of open licensing was conspicuously absent from this though this is perhaps to be expected from commercial publishers.]
Luciano Floridi, Prof. of Philosophy & Ethics of Information at The University of Oxford
‘Big Data, Small Patterns, and Huge Ethical Issues’
Data can be defined by three Vs: variety, velocity, and volume. (Options for a fourth have been suggested.) Data has seen a massive explosion since 2009 and the cost of storage is consistently falling. The only limits to this process are thermodynamics, intelligence and memory.
Epistemological Problems with Big Data: ‘big data’ has been with us for a while generally should be seen as a set of possibilities (prediction, simulation, decision-making, tailoring, deciding) rather than a problem per se. The problem is rather that data sets have become so large and complex that they are difficult to process by hand or with standard software.
Ethical Problems with Big Data: the challenge is actually to understand the small patterns that exist within data sets. This means that many data points are needed as ways into a particular data set so that meaning can become emergent. Small patterns may be insignificant so working out which patterns have significance is half the battle. Sometimes significance emerges through the combining of smaller patterns.
Thus small patterns may become significant when correlated. To further complicate things: small patterns may be significant through their absence (e.g. the curious incident of the dog in the night-time in Sherlock Holmes).
A specific ethical problem with big data: looking for these small patterns can require thorough and invasive exploration of large data sets. These procedures may not respect the sensitivity of the subjects of that data. The ethical problem with big data is sensitive patterns: this includes traditional data-related problems such as privacy, ownership and usability but now also includes the extraction and handling of these ‘patterns’. The new issues that arise include:
- Re-purposing of data and consent
- Treating people not only as means, resources, types, targets, consumers, etc. (deontological)
It isn’t possible for a computer to calculate every variable around the education of an individual so we must use proxies: indicators of type and frequency which render the uniqueness of the individual lost in order to make sense of the data. However this results in the following:
- The profile becomes the profiled
- The profile becomes predictable
- The predictable becomes exploitable
Floridi advances the claim that the ethical value of data should not be higher than the ethical value of that entity but demand at most the same degree of respect.
Putting all this together: how can privacy be protected while taking advantage of the potential of ‘big data’?. This is an ethical tension between competing principles or ethical demands: the duties to be reconciled are 1) safeguarding individual rights and 2) improving human welfare.
- This can be understood as a result of polarisation of a moral framework – we focus on the two duties to the individual and society and miss the privacy of groups in the middle
- Ironically, it is the ‘social group’ level that is served by technology
Five related problems:
- Can groups hold rights? (it seems so – e.g. national self-determination)
- If yes, can groups hold a right to privacy?
- When might a group qualify as a privacy holder? (corporate agency is often like this, isn’t it?)
- How does group privacy relate to individual privacy?
- Does respect for individual privacy require respect for the privacy of the group to which the individual belongs? (big data tends to address groups (‘types’) rather than individuals (‘tokens’))
The risks of releasing anonymised large data sets might need some unpacking: the example given was that during the civil war in Cote d’Ivoire (2010-2011) Orange released a large metadata set which gave away strategic information about the position of groups involved in the conflict even though no individuals were identifiable. There is a risk of overlooking group interests by focusing on the privacy of the individual.
There are legal or technological instruments which can be employed to mitigate the possibility of the misuse of big data, but there is no one clear solution at present. Most of the discussion centred upon collective identity and the rights that might be afforded an individual according to groups they have autonomously chosen and those within which they have been categorised. What happens, for example, if a group can take a legal action but one has to prove membership of that group in order to qualify? The risk here is that we move into terra incognito when it comes to the preservation of privacy.
Summary of Discussion
Generally speaking, it’s not enough to simply get institutional ethical approval at the start of a project. Institutional approvals typically focus on protection of individuals rather than groups and research activities can change significantly over the course of a project.
In addition to anonymising data there is a case for making it difficult to reconstruct the entire data set so as to stop others from misuse. Increasingly we don’t even know who learners are (e.g. MOOC) so it’s hard to reasonably predict the potential outcomes of an intervention.