What is fair assessment? Was it fair this summer? What about the future?

by Isabel Nisbet

AssessmentFor more than a couple of years I have been working with Stuart Shaw on a book about assessment fairness1. Entitled “Is Assessment Fair?” it examines what is meant by fairness in educational assessment, considers examples from this country and overseas and makes recommendations for the future. The book has now been published, but it was sent to the publishers back in January, before the Covid-19 pandemic (with its tumultuous impact on education). Between then and now the issue of fairness in assessment has been set ablaze by the row this summer about the awarding of grades for A levels, GCSEs and equivalent qualifications when it was not possible for students to sit actual exams. The approach initially taken in all the countries of the UK, involving grades calculated using statistical information, was howled down as ‘unfair’ and the jury is out on whether the action eventually taken – awarding “centre assessed grades (CAGs) – was any fairer. Attention is now focussing on what would be the fairest form of assessment next summer for students, some of whom have already missed out on months of learning and in the absence of any knowledge about the likely public health situation. We believe that our book will help provide a conceptual framework and a language for understanding and addressing such questions.

This year (summer 2020)

Were the critics right to say that it was unfair to award calculated grades, based on the infamous “alogorithm”? The first point to make in answer to that question may be obvious, but is no less important for that: it depends what you mean by “fair”. In our book we distinguish several senses of “fair” that are all potentially relevant to assessment. One of these is relational fairness – treating (relevantly) like cases alike. Most (if not all) assessments involve some kind of ‘discrimination’, meaning distinguishing between levels of achievement or between candidates who perform differently in relevant respects. The discrimination is fair if it is based on relevant considerations and unfair if it is based on something else, such as the candidate’s race or gender.

Another form of relational fairness is the expectation that the standards used for marking exams will remain stable over time. This concern, which is written into Ofqual’s statutory objectives, does seem relevant to fairness, at least over a limited period of years – if my son is competing for a scarce university place with someone whose exam was marked more generously a year later, then that might be unfair.

Relational fairness can be judged at different levels. An exam in which economically disadvantaged students have poorer outcomes than those of their richer contemporaries may be technically fair in many senses – for example, it may have been scrutinised to avoid bias in the test questions – but the outcomes may be  unfair at a higher level, because they reflect unjust inequalities in society.

But that is not the only relevant meaning of “fair” – another is that a fair outcome is deserved - it is fair for a student to get the grade that his or her work deserves. Where that doesn’t happen for any reason, that seems unfair. This concern lies behind many of the distressing personal accounts which we heard when students received their results this summer. Linked to the importance of desert is the sense of “fair” in which a fair outcome is what those affected can legitimately expect.

Something can be fair in some respects and unfair in others. Traditional exams are often thought of – perhaps uncritically – as a paradigm of fairness, but teachers know that there are some of their students who shine in that form of assessment and others who do not. These differences become pronounced if a terminal exam is the only source of evidence for the assessment.

Also, some aspects of fairness may be judged to be more important than others. And that judgement may differ in different circumstances. For example, it could be argued that although the maintenance of standards over time is one kind of fairness – and in normal times it helps to sustain confidence in the system – given the abnormality of this year, it was not as important as attempting to give each individual student the grades they deserved. Once the decision had been taken that the exams could not go ahead in the summer, the problem was how to enable students to get the grades they deserved in the absence of exam-based evidence about each individual, while maintaining relational fairness as much as possible.

Next year (summer 2021)

There have already been discussions about how to ensure that students have had an opportunity to learn all the content that is normally assessed in GCSEs and A levels. The Americans refer to this as “instructional validity”. So far the aim has been to allow some reductions in the mandatory content for some subjects but not to reduce the standard required. That balancing act is tricky, particularly in those (“linear”) subjects where students’ expertise gets more advanced the more they do. There has also been discussion about holding the exams a little later to allow more time for preparation. But what kind of assessment would be fair in those circumstances? Is it fair to base all preparation on the hope/belief that some kind of traditional exam will be possible? And if that proves not to be possible, how do we avoid a rerun of the contest between an “algorithm” and CAGs?

In my view the approach initially proposed this year was an attempt to use statistics to help achieve relational fairness between as many candidates as possible. Statistical information has for many years been taken into account in marking exams, but it carried greater weight this year. But no statistical model could guarantee to give each individual student the grade they deserved. That was recognised at the time, and the main means of remedying individual unfairnesses that the statistical analysis could not capture was to be the appeals process. However, that proved unmanageable and unpopular and lost its credibility. There was much criticism of the calculated grades as favouring one kind of school rather than another. But we did not know if the same could be said of the CAGs.  Looking to the future, if fairness-as-desert requires students to be judged on work they have done individually, exams or no exams, then there may have to be less emphasis on relational fairness, as the evidence produced from different teaching settings may not be strictly comparable.

There is no silver bullet to guarantee fair assessment for all candidates in normal times, let alone now. But we hope that our book will help to provide a language for the debate that is needed to understand and evaluate assessment issues relating to fairness.

1 Isabel Nisbet and Stuart Shaw, Is Assessment Fair? SAGE Publications, September 2020