big data

Workshop Notes: #Ethics and #LearningAnalytics

This morning I’m attending a talk given by Sharon Slade about the ethical dimensions of learning analytics (LA), part of a larger workshop devoted to LA at The Open University’s library on the Walton Hall campus.

I was a bit late from a previous meeting but Sharon’s slides are pretty clear so I’m just going to crack on with trying to capture the essence of the talk.  Here are the guidelines currently influencing thinking in this area (with my comment in parentheses).

  1. LA as a moral practice (I guess people need to be reminded of this!)
  2. OU has a responsibility to use data for student benefit
  3. Students are not wholly defined by their data (Ergo partially defined by data?)
  4. Purpose and boundaries should be well defined and visible (transparency)
  5. Students should have the facility to update their own data
  6. Students as active agents
  7. Modelling approaches and interventions should be free from bias (Is this possible? What kind of bias should be avoided?)
  8. Adoptions of LA requires broad acceptance of the values and benefits the development of appropriate skills (Not sure I fully grasped this one)

Sharon was mainly outlining the results of some qualitative research done with OU staff and students. The most emotive discussion was around whether or not this use of student data was appropriate at all – many students expressed dismay that their data was being looked at, much less used to potentially determine their service provision and educational future (progress, funding, etc.). Many felt that LA itself is a rather intrusive approach which may not be justified by the benevolent intention to improve student support.

While there are clear policies in place around data protection (like most universities) there were concerns about the use of raw data and information derived from data patterns. There was lots of concern about the ability of the analysts to adequately understand the data they were looking at and treat it responsibly.

Students want to have a 1:1 relationship with tutors, and feel that LA can undermine this; although at the OU there are particular challenges around distance education at scale.

The most dominant issue surrounded the idea of being able to opt-out of having their data collected without this having an impact on their future studies or how they are treated by the university. The default position is one of ‘informed consent’, where students are currently expected to opt out if they wish. The policy will be explained to students at the point of registration and well as providing case studies and guidance for staff and students.

Another round of consultation is expected around the issue of whether students should have an opt-out or opt-in model.

There is an underlying paternalistic attitude here – the university believes that it knows best with regard to the interests of the students – though it seems to me that this potentially runs against the idea of a student centred approach.

Some further thoughts/comments:

  • Someone like Simon Buckingham-Shum will argue that the LA *is* the pedagogy – this is not the view being taken by the OU but we can perhaps identify a potential ‘mission creep’
  • Can we be sure that the analyses we create through LA are reliable?  How?
  • The more data we collect and the more open it is then the more effective LA can be – and the greater the ethical complexity
  • New legislation requires that everyone will have the right to opt-out but it’s not clear that this will necessarily apply to education
  • Commercialisation of data has already taken place in some initiatives

Doug Clow then took the floor and spoke about other LA initiatives.  He noted that the drivers behind interest in LA are very diverse (research, retention, support, business intelligence, etc).  Some projects of note include:

Many projects are attempting to produce the correct kind of ‘dashboard’ for LA.  Another theme is around the extent to which LA initiatives can be scaled up to form a larger infrastructure.  There is a risk that with LA we focus only on the data we have access to and everything follows from there – Doug used the metaphor of darkness/illumination/blinding light. Doug also noted that machine learning stands to benefit greatly from LA data, and LA generally should be understood within the context of trends towards informal and blended learning as well as MOOC provision.

Overall, though, it seems that evidence for the effectiveness of LA is still pretty thin with very few rigorous evaluations. This could reflect the age of the field (a lot of work has yet to be published) or alternatively the idea that LA isn’t really as effective as some hope.  For instance, it could be that any intervention is effective regardless of whether it has some foundation in data that has been collected (nb. ‘Hawthorne effect‘).

Thinking Learning Analytics

I’m back in the Ambient Labs again, this time for a workshop on learning analytics for staff here at The Open University.


Challenges for Learning Analytics: Visualisation for Feedback

Denise Whitelock described the SaFeSEA project which is based around trying to give students meaningful feedback on their activities.  SaFeSEA was a response to high student dropout rates for 33% new OU students who don’t submit their first TMA.  Feedback on submitted writing prompts ‘advice for action’; a self reflective discourse with a computer.  Visualizations of these interactions can open a discourse between tutor and student.

Students can worry a lot about the feedback they receive.  Computers can offer a non-judgmental, objective feedback without any extra tuition costs.  OpenEssayist the structure of an essay; identifies key words and phrases; and picks out key sentences (i.e. those that are most representative of the overall content of the piece).  This analysis can be used to generate visual feedback, some forms of which are more easily understood than others.

Bertin (1977/81) provides a model for the visualization of data.   Methods can include diagrams which show how well connected difference passages are to the whole, or to generate different patterns that highlight different types of essay. These can be integrated with social network analysis & discourse analytics.

Can students understand this kind of feedback? Might they need special training?  Are these tools that could be used primarily by educators?  Would they also need special training?  In both case, it’s not entirely clear what kind of training this might be (information literacy?).  Can one tool be used to support writing across all disciplines or should such a tool be generic?

The Wrangler’s relationship with the Science Faculty

Doug Clow then presented on ‘data wrangling’ in the science faculty at The Open University.  IET collects information on student performance and presents this back to faculties in a ‘wrangler report’ able to feed back into future course delivery / learning design.

What can faculty do with these reports?  Data is arguably better at highlighting problems or potential problems than it is at solving them.  This process can perhaps get better at identifying key data points or performance indicators, but faculty still need to decide how to act based on this information.  If we move towards the provision of more specific guidance then the role of faculty could arguably ben diminished over time.

The relation between learning analytics and learning design in IET work with the faculties

Robin Goodfellow picked up these themes from a module team perspective.  Data can be understood as a way of closing the loop on learning design, creating a virtuous circle between the two.  In practice, there can be significant time delays in terms of processing the data in time for it to feed in.  But the information can still be useful to module teams in terms of thinking about course:

  • Communication
  • Experience
  • Assessment
  • Information Management
  • Productivity
  • Learning Experience

This can give rise to quite specific expectations about the balance of different activities and learning outcomes.  Different indicators can be identified and combined to standardize metrics for student engagement, communication, etc.

In this way, a normative notion of what a module should be can be said to be emerging.  (This is perhaps a good thing in terms of supporting course designers but may have worrying implications in terms of promoting homogeneity.)

Another selective element arises from the fact that it’s usually only possible to collect data from a selection of indicators:  this means that we might come to place too much emphasis on data we do have instead of thinking about the significance of data that has not been collected.

The key questions:

  • Can underlying learning design models be identified in data?
  • If so, what do these patterns correlate with?
  • How can all this be bundled up to faculty as something useful?
  • Are there implications for general elements of course delivery (e.g. forums, VLE, assessment)?
  • If we only permit certain kinds of data for consideration, does this lead to a kind of psychological shift where these are the only things considered to be ‘real’ or of value?
  • Is there a special kind of interpretative skill that we need in able to make sense of learning analytics?

Learning Design at the OU

Annie Bryan drilled a little deeper into the integration of learning design into the picture.   Learning design is now a required element of course design at The Open University.  There are a number of justifications given for this:

  • Quality enhancement
  • Informed decision making
  • Sharing good practice
  • Improving cost-effectiveness
  • Speeding up decision making
  • Improve online pedagogy
  • Explicitly represent pedagogical activity
  • Effective management of student workload

A number of (beta) tools for Learning Design have been produced.  These are focused on module information; learning outcomes; activity planning, and mapping modules and resources.  These are intended to support constructive engagement over the life of the course.   Future developments will also embrace a qualification level perspective which will map activities against qualification routes.

These tools are intended to help course teams think critically about and discuss the purpose of tolls and resources chosen in the context of the course as a whole and student learning experiences.  A design perspective can also help to identify imbalances in course structure or problematic parts of a course.

Guerrilla Research #elesig

https://i1.wp.com/upload.wikimedia.org/wikipedia/commons/thumb/6/69/Afrikaner_Commandos2.JPG/459px-Afrikaner_Commandos2.JPG

We don't need no stinking permissions....

Today I’m in the research laboratories in the Jennie Lee Building at The Institute of Educational Technology (aka work) for the ELESIG Guerrilla Research Event.  Martin Weller began the session with an outline of the kind of work that goes into preparing unsuccessful research proposals.  Using figures from the UK research councils he estimates that the ESRC alone attracts bids (which it does not fund) equivalent to 65 work years every year (2000 failed bids x 12 days per bid).   This work is not made public in any way and can be considered lost.

He then went on to discuss some different digital scholarship initiatives – like a meta educational technology journal based on aggregation of open articles; MOOC research by Katy Jordan; an app built at the OU; DS106 Digital Storytelling – these have elements of what is being termed ‘guerrilla research’.  These include:

  • No permissions (open access, open licensing, open data)
  • Quick set up
  • No business case required
  • Allows for interdisciplinarity unconstrained by tradition
  • Using free tools
  • Building open scholarship identity
  • Kickstarter / enterprise funding

Such initiatives can lead to more traditional forms of funding and publication; and the two at least certainly co-exist.  But these kinds of activities are not always institutionally recognised, giving rise to a number of issues:

  • Intellectual property – will someone steal my work?
  • Can I get institutional recognition?
  • Do I need technical skills?
  • What is the right balance between traditional and digital scholarship?
  • Ethical concerns about the use of open data – can consent be assumed?  Even when dealing with personal or intimate information?

Tony Hirst then took the floor to speak about his understanding of ‘guerrilla research’.  He divided his talk into the means, opportunity and motive for this kind of work.

First he spoke about the use of the commentpress WordPress theme to disaggregate the Digital Britain report so that people could comment online.  The idea came out of a tweet but within 3 months was being funded by the Cabinet Office.

In 2009 Tony produced a map of MP expense claims which was used by The Guardian.  This was produced quickly using open technologies and led to further maps and other ways of exploring data stories.  Google Ngrams is a tool that was used to check for anachronistic use of language in Downton Abbey.

In addition to pulling together recipes using open tools and open data is to use innovative codings schemes. Mat Morrison (@mediaczar) used this to produce an accession plot graph of the London riots.  Tony has reused this approach – so another way of approaching ‘guerrilla research’ is to try to re-appropriate existing tools.

Another approach is to use data to drive a macroscopic understanding of data patterns, producing maps or other visualizations from very large data sets, helping sensemaking and interpretation.  One important consideration here is ‘glanceability‘ – whether the information has been filtered and presented so that the most important data are highlighted and the visual representation conveys meaning successfully to the view.

Data.gov.uk is a good source of data:  the UK government publishes large amounts of information on open licence.  Access to data sets like this can save a lot of research money, and combining different data sets can provide unexpected results.  Publishing data sets openly supports this method and also allows others to look for patterns that original researchers might have missed.

Google supports custom searches which can concentrate on results from a specific domain (or domains) and this can support more targeted searches for data.  Freedom of information requests can also be a good source of data; publicly funded bodies like universities, hospitals and local government all make data available in this way (though there will be exceptions). FOI requests can be made through whatdotheyknow.com.  Google spreadsheets support quick tools for exploring data such as sliding filters and graphs.

OpenRefine is another tool which Tony has found useful.  It can cluster open text responses in data sets according to algorithms and so replace manual coding of manuscripts.   The tool can also be used to compare with linked data on the web.

Tony concluded his presentation with a comparison of ‘guerrilla research’ and ‘recreational research’. Research can be more creative and playful and approaching it in this way can lead to experimental and exploratory forms of research.  However, assessing the impact of this kind of work might be problematic.  Furthermore, going through the process of trying to get funding for research like this can impede the playfulness of the endeavour.

A workflow for getting started with this kind of thing:

  • Download openly available data: use open data, hashtags, domain searches, RSS
  • DBpedia can be used to extract information from Wikipedia
  • Clean data using OpenRefine
  • Upload to Google Fusion Tables
  • From here data can be mapped, filtered and graphed
  • Use Gephi for data visualization and creating interactive widgets
  • StackOverflow can help with coding/programming

(I have a fuller list of data visualization tools on the Resources page of OER Impact Map.)

My author metrics from ORO

My ORO report

I’ve just a quick look at my author report from the ORO repository of research published by members of The Open University.  I’m quite surprised to learn that I’ve accrued almost 1,300 downloads of materials I have archived here!

An up to date account of my ORO analytics can be found at http://oro.open.ac.uk/cgi/stats/report/authors/31087069bed3e4363443db857ead0546/. I suppose a 50% strike rate for open access publication ain’t bad… but there is probably room for improvement…

Sociology & Big Data

Can sociological researchers make use of big data?  Should they? There’s something equivocal going on between the allure of massive data sets and the temptation to try and explain everything in terms of that data…

New Sociological Approach to Big Data » Sociology Lens.