Month: March 2014

Thinking Learning Analytics

I’m back in the Ambient Labs again, this time for a workshop on learning analytics for staff here at The Open University.

Challenges for Learning Analytics: Visualisation for Feedback

Denise Whitelock described the SaFeSEA project which is based around trying to give students meaningful feedback on their activities.  SaFeSEA was a response to high student dropout rates for 33% new OU students who don’t submit their first TMA.  Feedback on submitted writing prompts ‘advice for action’; a self reflective discourse with a computer.  Visualizations of these interactions can open a discourse between tutor and student.

Students can worry a lot about the feedback they receive.  Computers can offer a non-judgmental, objective feedback without any extra tuition costs.  OpenEssayist the structure of an essay; identifies key words and phrases; and picks out key sentences (i.e. those that are most representative of the overall content of the piece).  This analysis can be used to generate visual feedback, some forms of which are more easily understood than others.

Bertin (1977/81) provides a model for the visualization of data.   Methods can include diagrams which show how well connected difference passages are to the whole, or to generate different patterns that highlight different types of essay. These can be integrated with social network analysis & discourse analytics.

Can students understand this kind of feedback? Might they need special training?  Are these tools that could be used primarily by educators?  Would they also need special training?  In both case, it’s not entirely clear what kind of training this might be (information literacy?).  Can one tool be used to support writing across all disciplines or should such a tool be generic?

The Wrangler’s relationship with the Science Faculty

Doug Clow then presented on ‘data wrangling’ in the science faculty at The Open University.  IET collects information on student performance and presents this back to faculties in a ‘wrangler report’ able to feed back into future course delivery / learning design.

What can faculty do with these reports?  Data is arguably better at highlighting problems or potential problems than it is at solving them.  This process can perhaps get better at identifying key data points or performance indicators, but faculty still need to decide how to act based on this information.  If we move towards the provision of more specific guidance then the role of faculty could arguably ben diminished over time.

The relation between learning analytics and learning design in IET work with the faculties

Robin Goodfellow picked up these themes from a module team perspective.  Data can be understood as a way of closing the loop on learning design, creating a virtuous circle between the two.  In practice, there can be significant time delays in terms of processing the data in time for it to feed in.  But the information can still be useful to module teams in terms of thinking about course:

  • Communication
  • Experience
  • Assessment
  • Information Management
  • Productivity
  • Learning Experience

This can give rise to quite specific expectations about the balance of different activities and learning outcomes.  Different indicators can be identified and combined to standardize metrics for student engagement, communication, etc.

In this way, a normative notion of what a module should be can be said to be emerging.  (This is perhaps a good thing in terms of supporting course designers but may have worrying implications in terms of promoting homogeneity.)

Another selective element arises from the fact that it’s usually only possible to collect data from a selection of indicators:  this means that we might come to place too much emphasis on data we do have instead of thinking about the significance of data that has not been collected.

The key questions:

  • Can underlying learning design models be identified in data?
  • If so, what do these patterns correlate with?
  • How can all this be bundled up to faculty as something useful?
  • Are there implications for general elements of course delivery (e.g. forums, VLE, assessment)?
  • If we only permit certain kinds of data for consideration, does this lead to a kind of psychological shift where these are the only things considered to be ‘real’ or of value?
  • Is there a special kind of interpretative skill that we need in able to make sense of learning analytics?

Learning Design at the OU

Annie Bryan drilled a little deeper into the integration of learning design into the picture.   Learning design is now a required element of course design at The Open University.  There are a number of justifications given for this:

  • Quality enhancement
  • Informed decision making
  • Sharing good practice
  • Improving cost-effectiveness
  • Speeding up decision making
  • Improve online pedagogy
  • Explicitly represent pedagogical activity
  • Effective management of student workload

A number of (beta) tools for Learning Design have been produced.  These are focused on module information; learning outcomes; activity planning, and mapping modules and resources.  These are intended to support constructive engagement over the life of the course.   Future developments will also embrace a qualification level perspective which will map activities against qualification routes.

These tools are intended to help course teams think critically about and discuss the purpose of tolls and resources chosen in the context of the course as a whole and student learning experiences.  A design perspective can also help to identify imbalances in course structure or problematic parts of a course.

Guerrilla Research #elesig

We don't need no stinking permissions....

Today I’m in the research laboratories in the Jennie Lee Building at The Institute of Educational Technology (aka work) for the ELESIG Guerrilla Research Event.  Martin Weller began the session with an outline of the kind of work that goes into preparing unsuccessful research proposals.  Using figures from the UK research councils he estimates that the ESRC alone attracts bids (which it does not fund) equivalent to 65 work years every year (2000 failed bids x 12 days per bid).   This work is not made public in any way and can be considered lost.

He then went on to discuss some different digital scholarship initiatives – like a meta educational technology journal based on aggregation of open articles; MOOC research by Katy Jordan; an app built at the OU; DS106 Digital Storytelling – these have elements of what is being termed ‘guerrilla research’.  These include:

  • No permissions (open access, open licensing, open data)
  • Quick set up
  • No business case required
  • Allows for interdisciplinarity unconstrained by tradition
  • Using free tools
  • Building open scholarship identity
  • Kickstarter / enterprise funding

Such initiatives can lead to more traditional forms of funding and publication; and the two at least certainly co-exist.  But these kinds of activities are not always institutionally recognised, giving rise to a number of issues:

  • Intellectual property – will someone steal my work?
  • Can I get institutional recognition?
  • Do I need technical skills?
  • What is the right balance between traditional and digital scholarship?
  • Ethical concerns about the use of open data – can consent be assumed?  Even when dealing with personal or intimate information?

Tony Hirst then took the floor to speak about his understanding of ‘guerrilla research’.  He divided his talk into the means, opportunity and motive for this kind of work.

First he spoke about the use of the commentpress WordPress theme to disaggregate the Digital Britain report so that people could comment online.  The idea came out of a tweet but within 3 months was being funded by the Cabinet Office.

In 2009 Tony produced a map of MP expense claims which was used by The Guardian.  This was produced quickly using open technologies and led to further maps and other ways of exploring data stories.  Google Ngrams is a tool that was used to check for anachronistic use of language in Downton Abbey.

In addition to pulling together recipes using open tools and open data is to use innovative codings schemes. Mat Morrison (@mediaczar) used this to produce an accession plot graph of the London riots.  Tony has reused this approach – so another way of approaching ‘guerrilla research’ is to try to re-appropriate existing tools.

Another approach is to use data to drive a macroscopic understanding of data patterns, producing maps or other visualizations from very large data sets, helping sensemaking and interpretation.  One important consideration here is ‘glanceability‘ – whether the information has been filtered and presented so that the most important data are highlighted and the visual representation conveys meaning successfully to the view. is a good source of data:  the UK government publishes large amounts of information on open licence.  Access to data sets like this can save a lot of research money, and combining different data sets can provide unexpected results.  Publishing data sets openly supports this method and also allows others to look for patterns that original researchers might have missed.

Google supports custom searches which can concentrate on results from a specific domain (or domains) and this can support more targeted searches for data.  Freedom of information requests can also be a good source of data; publicly funded bodies like universities, hospitals and local government all make data available in this way (though there will be exceptions). FOI requests can be made through  Google spreadsheets support quick tools for exploring data such as sliding filters and graphs.

OpenRefine is another tool which Tony has found useful.  It can cluster open text responses in data sets according to algorithms and so replace manual coding of manuscripts.   The tool can also be used to compare with linked data on the web.

Tony concluded his presentation with a comparison of ‘guerrilla research’ and ‘recreational research’. Research can be more creative and playful and approaching it in this way can lead to experimental and exploratory forms of research.  However, assessing the impact of this kind of work might be problematic.  Furthermore, going through the process of trying to get funding for research like this can impede the playfulness of the endeavour.

A workflow for getting started with this kind of thing:

  • Download openly available data: use open data, hashtags, domain searches, RSS
  • DBpedia can be used to extract information from Wikipedia
  • Clean data using OpenRefine
  • Upload to Google Fusion Tables
  • From here data can be mapped, filtered and graphed
  • Use Gephi for data visualization and creating interactive widgets
  • StackOverflow can help with coding/programming

(I have a fuller list of data visualization tools on the Resources page of OER Impact Map.)

Ethical Use of New Technology in Education

Today Beck Pitt and I travelled up to Birmingham in the midlands of the UK to attend a BERA/Wiley workshop on technologies and ethics in educational research.  I’m mainly here on focus on the redraft of the Ethics Manual for OER Research Hub and to give some time over to thinking about the ethical challenges that can be raised by openness.  The first draft of the ethics manual was primarily to guide us at the start of the project but now we need to redraft it to reflect some of the issues we have encountered in practice.

Things kicked off with an outline of what BERA does and the suggestion that consciousness about new technologies in education often doesn’t filter down to practitioners.  The rationale behind the seminar seems to be to raise awareness in light of the fact that these issues are especially prevalent at the moment.

This blog post may be in direct contravention of the Chatham convention

This blog post may be in direct contravention of the Chatham convention

We were first told that these meetings would be taken under the ‘Chatham House Rule’ which suggests that participants are free to use information received but without identifying speakers or their affiliation… this seems to be straight into the meat of some of the issues provoked by openness:  I’m in the middle of life-blogging this as this suggestion is made.  (The session is being filmed but apparently they will edit out anything ‘contentious’.)

Anyway, on to the first speaker:

Jill Jameson, Prof. of Education and Co-Chair of the University of Greenwich
‘Ethical Leadership of Educational Technologies Research:  Primum non noncere’

The latin part of the title of this presentation means ‘do no harm’ and is a recognised ethical principle that goes back to antiquity.  Jameson wants to suggest that this is a sound principle for ethical leadership in educational technology.

After outlining a case from medical care Jameson identified a number of features of good practice for involving patients in their own therapy and feeding the whole process back into training and pedagogy.

  • No harm
  • Informed consent
  • Data-informed consultation on treatment
  • Anonymity, confidentiality
  • Sensitivity re: privacy
  • No coercion
  • ‘Worthwhileness’
  • Research-linked: treatment & PG teaching

This was contrasted with a problematic case from the NHS concerning the public release of patient data.  Arguably very few people have given informed consent to this procedure.  But at the same time the potential benefits of aggregating data are being impeded by concerns about sharing of identifiable information and the commercial use of such information.

In educational technology the prevalence of ‘big data’ has raised new possibilities in the field of learning analytics.  This raises the possibility of data-driven decision making and evidence-based practice.  It may also lead to more homogenous forms of data collection as we seek to aggregate data sets over time.

The global expansion of web-enabled data presents many opportunities for innovation in educational technology research.  But there are also concerns and threats:

  • Privacy vs surveillance
  • Commercialisation of research data
  • Techno-centrism
  • Limits of big data
  • Learning analytics acts as a push against anonymity in education
  • Predictive modelling could become deterministic
  • Transparency of performance replaces ‘learning
  • Audit culture
  • Learning analytics as models, not reality
  • Datasets >< information and stand in need of analysis and interpretation

Simon Buckingham-Shum has put this in terms of a utopian/dystopian vision of big data:

Leadership is thus needed in ethical research regarding the use of new technologies to develop and refine urgently needed digital research ethics principles and codes of practice.  Students entrust institutions with their data and institutions need to act as caretakers.

I made the point that the principle of ‘do no harm’ is fundamentally incompatible with any leap into the unknown as far as practices are concerned.  Any consistent application of the principle leads to a risk-averse application of the precautionary principle with respect to innovation.  How can this be made compatible with experimental work on learning analytics and sharing of personal data?  Must we reconfigure the principle of ‘do no harm’ so it it becomes ‘minimise harm’?  It seems that way from this presentation… but it is worth noting that this is significantly different to the original maxim with which we were presented… different enough to undermine the basic position?

Ralf Klamma, Technical University Aachen
‘Do Mechanical Turks Dream of Big Data?’

Klamma started in earnest by showing us some slides:  Einstein sticking his tongue out; stills from Dr. Strangelove; Alan Turing; a knowledge network (citation) visualization which could be interpreted as a ‘citation cartel’.  The Cold War image of scientists working in isolation behind geopolitical boundaries has been superseded by building of new communities.  This process can be demonstrated through data mining, networking and visualization.

Historical figures of the like of Einstein and Turing are now more like nodes on a network diagram – at least, this is an increasingly natural perspective.  The ‘iron curtain’ around research communities has dropped:

  • Research communities have long tails
  • Many research communities are under public scrutiny (e.g. climate science)
  • Funding cuts may exacerbate the problem
  • Open access threatens the integrity of the academy (?!)

Klamma argues that social network analysis and machine learning can support big data research in education.  He highlights the US Department of Homeland Security, Science and Technology, Cyber Security Division publication The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research as a useful resource for the ethical debates in computer science.  In the case of learning analytics there have been many examples of data leaks:

One way to approach the issue of leaks comes from the TellNET project.  By encouraging students to learn about network data and network visualisations they can be put in better control of their own (transparent) data.  Other solutions used in this project:

  • Protection of data platform: fragmentation prevents ‘leaks’
  • Non-identification of participants at workshops
  • Only teachers had access to learning analytics tools
  • Acknowledgement that no systems are 100% secure

In conclusion we were introduced to the concept of ‘datability‘ as the ethical use of big data:

  • Clear risk assessment before data collection
  • Ethcial guidelines and sharing best pracice
  • Transparency and accountability without loss of privacy
  • Academic freedom

Fiona Murphy, Earth and Environmental Science (Wiley Publishing)
‘Getting to grips with research data: a publisher perspective’

From a publisher perspective, there is much interest in the ways that research data is shared.  They are moving towards a model with greater transparency.  There are some services under development that will use DOI to link datasets and archives to improve the findability of research data.  For instance, the Geoscience Data Journal includes bi-direction linking to original data sets.  Ethical issues from a publisher point of view include how to record citations and accreditation; manage peer review and maintenance of security protocols.

Data sharing models may be open, restricted (e.g. dependent on permissions set by data owner) or linked (where the original data is not released but access can be managed centrally).

[Discussion of open licensing was conspicuously absent from this though this is perhaps to be expected from commercial publishers.]

Luciano Floridi, Prof. of Philosophy & Ethics of Information at The University of Oxford
‘Big Data, Small Patterns, and Huge Ethical Issues’

Data can be defined by three Vs: variety, velocity, and volume. (Options for a fourth have been suggested.)  Data has seen a massive explosion since 2009 and the cost of storage is consistently falling.  The only limits to this process are thermodynamics, intelligence and memory.

This process is to some extent restricted by legal and ethical issues.

Epistemological Problems with Big Data: ‘big data’ has been with us for a while generally should be seen as a set of possibilities (prediction, simulation, decision-making, tailoring, deciding) rather than a problem per se.  The problem is rather that data sets have become so large and complex that they are difficult to process by hand or with standard software.

Ethical Problems with Big Data: the challenge is actually to understand the small patterns that exist within data sets.  This means that many data points are needed as ways into a particular data set so that meaning can become emergent.  Small patterns may be insignificant so working out which patterns have significance is half the battle.  Sometimes significance emerges through the combining of smaller patterns.

Thus small patterns may become significant when correlated.  To further complicate things:  small patterns may be significant through their absence (e.g. the curious incident of the dog in the night-time in Sherlock Holmes).

A specific ethical problem with big data: looking for these small patterns can require thorough and invasive exploration of large data sets.  These procedures may not respect the sensitivity of the subjects of that data.  The ethical problem with big data is sensitive patterns: this includes traditional data-related problems such as privacy, ownership and usability but now also includes the extraction and handling of these ‘patterns’.  The new issues that arise include:

  • Re-purposing of data and consent
  • Treating people not only as means, resources, types, targets, consumers, etc. (deontological)

It isn’t possible for a computer to calculate every variable around the education of an individual so we must use proxies:  indicators of type and frequency which render the uniqueness of the individual lost in order to make sense of the data.  However this results in the following:

  1. The profile becomes the profiled
  2. The profile becomes predictable
  3. The predictable becomes exploitable

Floridi advances the claim that the ethical value of data should not be higher than the ethical value of that entity but demand at most the same degree of respect.

Putting all this together:  how can privacy be protected while taking advantage of the potential of ‘big data’?.  This is an ethical tension between competing principles or ethical demands: the duties to be reconciled are 1) safeguarding individual rights and 2) improving human welfare.

  • This can be understood as a result of polarisation of a moral framework – we focus on the two duties to the individual and society and miss the privacy of groups in the middle
  • Ironically, it is the ‘social group’ level that is served by technology

Five related problems:

  • Can groups hold rights? (it seems so – e.g. national self-determination)
  • If yes, can groups hold a right to privacy?
  • When might a group qualify as a privacy holder? (corporate agency is often like this, isn’t it?)
  • How does group privacy relate to individual privacy?
  • Does respect for individual privacy require respect for the privacy of the group to which the individual belongs? (big data tends to address groups (‘types’) rather than individuals (‘tokens’))

The risks of releasing anonymised large data sets might need some unpacking:  the example given was that during the civil war in Cote d’Ivoire (2010-2011) Orange released a large metadata set which gave away strategic information about the position of groups involved in the conflict even though no individuals were identifiable.  There is a risk of overlooking group interests by focusing on the privacy of the individual.

There are legal or technological instruments which can be employed to mitigate the possibility of the misuse of big data, but there is no one clear solution at present.  Most of the discussion centred upon collective identity and the rights that might be afforded an individual according to groups they have autonomously chosen and those within which they have been categorised.  What happens, for example, if a group can take a legal action but one has to prove membership of that group in order to qualify?  The risk here is that we move into terra incognito when it comes to the preservation of privacy.

Summary of Discussion

Generally speaking, it’s not enough to simply get institutional ethical approval at the start of a project.  Institutional approvals typically focus on protection of individuals rather than groups and research activities can change significantly over the course of a project.

In addition to anonymising data there is a case for making it difficult to reconstruct the entire data set so as to stop others from misuse.  Increasingly we don’t even know who learners are (e.g. MOOC) so it’s hard to reasonably predict the potential outcomes of an intervention.

The BERA guidelines for ethical research are up for review by the sounds of it – and a working group is going to be formed to look at this ahead of a possible meeting at the BERA annual conference.