‘I think the educational and psychological studies I mentioned are examples of what I would like to call Cargo Cult Science. In the South Seas there is a Cargo Cult of people. During the war they saw airplanes land with lots of good materials, and they want the same thing to happen now. So they’ve arranged to make things like runways, to put fires along the sides of the runways, to make a wooden hut for a man to sit in, with two wooden pieces on his head like headphones and bars of bamboo sticking out like antennas – he’s the controller – and they wait for the airplanes to land. They’re doing everything right. The form is perfect. It looks exactly the way it looked before. But it doesn’t work. No airplanes land. So I call these things Cargo Cult Science, because they follow all the apparent precepts and forms of scientific investigation, but they’re missing something essential, because the planes don’t land.’ Richard Feynman’s Caltech commencement address on Education and Cargo Cult Science (1974).
‘Let’s put behind us once and for all the old sterile debate about dumbing down. I want to end young people being told that the GCSE or A-level grades they are proud of aren’t worth what they used to be.’ Ed Balls to the Labour Party Conference, 2007.
‘It is undeniable that the last Labour government dramatically improved school standards in secondary education.’ Tristram Hunt, 26 January 2015.
‘Despite the apparently plausible and widespread belief to the contrary, the evidence that levels of attainment in schools in England have systematically improved over the last 30 years is unconvincing. Much of what is claimed as school improvement is illusory… standards have not risen; teaching has not improved… The question, therefore, is not whether there has been grade inflation, but how much…’ (Professor Robert Coe, 2013, here.)
This series of blogs will discuss: 1) what we know about standards in English schools including the effect of the introduction of the National Curriculum and GCSEs; 2) how ‘ability’ and ‘standards’ should be defined; 3) what can be learned from the 2010-15 reforms and what incentives now dominate the system; 4) what research and policy agenda is needed; 5) what materials are there for those interested in standards beyond those of the National Curriculum and state controlled exams.
The debate about ‘standards in English schools’ is obviously of great importance but it suffers from many fundamental problems. Ironically for a debate that often involves the word ‘rigour’, the debate is itself unrigorous.
The main concepts are not properly defined. Politicians, policy people, officials, and journalists speak and write daily using phrases such as ‘we must drive up standards so that [X% of schools or pupils] hit the standard of [Y]’ when Y has no objective definition. Most obviously there has been enormous debate about grades in GCSEs and A Levels but these grades themselves are arbitrarily created according to criteria that would not impress physical scientists. The ‘standards’ are circular. Exams are regulated by the DfE and Ofqual in order that there is a very high chance that at least X% ‘pass’, then people say ‘more than X% should pass’, or ‘X% is too tough’. But the X% is just based in the first place on where the system happens to be which is historically contingent – it is not based on any scientific judgement about what children of different abilities (rigorously defined) are capable of doing given certain teaching.
In the recent debate over reforming GCSEs, when we tried to drop their use in the accountability system in 2012, Nick Clegg insisted, and Cameron agreed, that the entire reform process be based on the principles that i) about 95% of the cohort should do the same exams at 16 and ii) not many more would fail to pass than now (2012). Definitions of a ‘pass’ were therefore set in order to fit with an a priori desire for a certain percentage to pass – a political desire of one party rather than an educational judgement. (The other two parties have had the same approach over the past thirty years – my point is not that the LibDems are particularly bad.)
Despite having a circular process for defining standards, it has been a central feature of education debates for politicians to set targets for what proportion of pupils every school must get to ‘pass’ – targets that have high stakes for school management and teachers. One can understand the motivation, given the bad effects for individuals of being in really bad schools, but the process as a whole does not make sense. Further, Ofqual imposes a system (‘comparable outcomes’) which is intended to combat grade inflation but which also seems to operate deliberately against the goal of significant rises in the proportion passing GCSEs. Further, Ofsted’s reports add noise, not signal, given, as Professor Coe has said, ‘its judgements have little scientific credibility’ (some argue this is too generous).
Similarly, people in the education world use the word ‘ability’ but they almost never define or have an objective measure for ‘ability’. The work of scientists on this subject has been almost entirely ignored and has had practically no effect on policy in England. Many teacher training colleges promote ‘cargo cult’ science on the subject of ‘ability’ to thousands of teachers who are therefore confident in views that are the opposite of what the science says.
As far as I am aware, there is no serious research agenda in English schools attempting to a) discover what pupils of different ability, using objective measures, are capable of achieving given certain teaching and b) use this knowledge to shape the curriculum, tests, and objective measures of school performance in an iterative feedback loop that can improve its accuracy over time.
The main point of these blogs is to help make the case for such a research programme (see below). Since first becoming involved in education debates in 2007, I have had many discussions about this. I have said to many people, including in the Royal Society, the home of British science, that we need a scientific approach to the issue of standards and ability. I wrote about it in my essay that became public in 2013. I argued for it in the DfE, with subject associations, with those responsible for teacher training (‘the most bankrupt institution I know’, said Hattie), and with many people who talk about ‘research’ and ‘evidence’.
Few have wanted to engage in this subject because it is so politically fraught. Even fewer have done so publicly and I have personal experience of severe pressure put on many academics by university administrators not to tell the truth. However, a very positive development in English education is the growth of support for thinking seriously about evidence. In the DfE, there was a long battle on this issue that ended suddenly when the new Permanent Secretary arrived and immediately agreed with the appointment of Ben Goldacre to do a review of the Department’s handling of evidence, research, and data, which was published in 2013. (I have written many critical things about officials, such as HERE, so it is worth noting that Wormald, and other officials particularly younger ones, took this enlightened view.) There is no doubt that the culture inside the Department changed as a result though there is a very long way to go in this area and it is reasonable to be doubtful about any of the three parties’ commitment to this approach and about civil service commitment. Tom Bennett’s efforts with ResearchEd have been fantastic and are one of the most hopeful things I’ve seen since 2007. There is also now a discussion about a possible College of Teachers – an institution that will only be credible if it has high standards on the subject of cargo cult research. Unsurprisingly, therefore, more people are starting to ask: what do we know about standards? (E.g. Sam Freedman recently blogged on it.)
I therefore thought I would jot down in a series of blogs various bits of evidence, history, thoughts, discussions and so on that I have accumulated since 2007.
Five broad areas
This series of blogs will consider inter alia these questions grouped in five rough areas (which may change as I go along).
A. What is the evidence concerning ‘standards in English schools’? What was the effect of the introduction of 1) GCSEs and 2) the National Curriculum with its connected testing regime? What were the cascading effects on A Levels and higher education? What do comparisons with international tests and other academic studies tell us? What do subject associations and organisations such as the Royal Society say? What do universities and subject experts say?
B How should ‘ability’ and ‘standards’ be defined? What undermines sensible discussion about this?
C. What was Gove’s team trying to do 2010-14? How effective were reforms concerning the curriculum, exams, and accountability (including the role of Ofsted)? What lessons might be learned from the period 2010-15? What incentives dominate the system now?
D. What should come next? What can we reasonably infer from the period since 1985 about what is very unlikely to work? What should the parties not put in their manifestos? What are the main reasons why political and policy discussion of this subject has been so controversial? How does the transformation of the technological landscape since the mid-1980s change arguments? How could a focus on evidence and empiricism help improve the system?
E. What materials are there that can be used by schools that are focused more on education and learning than the official accountability system?
The goal of these blogs is not to ‘defend the Gove reforms’. When I get onto them, I will try to explain as clearly as I can why we tried to do certain things and what went wrong. GCSE reform (along with the disaster of Ofsted) is arguably the biggest failure of our team and therefore particularly needs analysis. The goal is not to affect party manifestos – it is possible but unlikely that someone reading this may be able to nudge things off a party or bureaucratic agenda. It is reasonable to assume that whatever the parties promise their plans will crumble on contact with reality. My main hope is that people outside SW1 at the coalface of education take matters into their own hands and develop their own approaches to scientific experimentation with the curriculum, exams, and training.
In my opinion, the only real hope for large improvements in learning is if 1) a critical mass of people become convinced of the need for an empirical approach and the rejection of ‘cargo cult science’ that has dominated education, and 2) an empirical programme emerges that iteratively a) tests what children of different abilities can learn and b) uses this information to alter curricula, tests, and teacher training. We need experiments and Grand Prizes in education that have brought dramatic breakthroughs in other areas, such as DARPA’s Grand Challenge that led to breakthroughs in basic science and then to driverless cars. Imagine what well-defined Grand Challenges could bring to English schools.
Improvements in education do not need to be justified as goals with reference to other things such as economic growth. Learning and education are fundamental aspects of being human. However, it is obvious that humans will have to grapple with profound challenges over the next thirty years. The population will grow by another few billion, mainly in cities and connected to the mobile internet and ‘the internet of things’. Energy and other resource demands will put the global system under huge pressure. We face old security threats like nuclear weapons and new threats such as the use of genetic engineering techniques empowering garage bio-hackers, for good and evil. For example, the revolutionary genome ‘cut and paste’ engineering tool, CRISPR, may soon be used to ‘de-extinct’ species and eradicate diseases but the same techniques could be used destructively. Much progress in machine intelligence and robotics is being driven by research controlled by militaries and intelligence agencies but little research is done on the profound dangers.
If we are to cope with these things, we will need new technologies, new institutions, and new ideas. Improving our education system is therefore obviously central. I have proposed that it ought to become the central organising principle for the British state, as an answer to Dean Acheson’s famous quip that Britain had failed to find a post-imperial role.
Hopefully the discussion of standards in English schools will be useful regardless of whether you agree with this broader argument or not.
Please leave comments, corrections, research reports, complaints etc below. I will add things people leave as I go along and at the end try to produce something short and rigorous…
Some potentially powerful, and timely food for thought, Dominic.
Some very important points made. I’d be interested to know what sort of role you think government should have in the education system.
Reblogged this on The Echo Chamber.
So what do we need to teach?
Lets look forward 13 years to children who are entering the system today. What is the use of languages when an average level of foreign language comprehension and spoken foreign language facility is exceeded by translation software? Why learn to program when computers can do it more accurately and easily for you? Why aspire to a profession when it is being hollowed out? Why work for a large company when it is being deconstructed by nimble competitors?
What is left?
Creativity, imagination, empathy, science, entrepreneurship. People who can continue learning and use the emerging tools to deliver things that people are willing to pay for. There will be opportunities – 13 years is well into AI territory but before the most optimistic timescales for strong AI – so particularly in matters of beauty, art, design a human input will still have value.
So how do we raise a generation to take advantage of the future? Well it is not by over-testing or constant assessment. There are a valid set of reasons to test, to make people accountable and hence raise standards but concentration on testing takes away time from learning as teachers change their priorities. Teaching to the tests we currently have will practically never instil creativity or imagination as they are not the point of the tests. Empathy probably comes into the curriculum a little in the humanities but I think it is flushed out by graduation. Science is taught without enough genuine discovery and experimentation (which is expensive and in health and safety terms can be dangerous) and the scientific method is not always explained clearly to non-scientists. Entrepreneurship is rarely taught – a business game here or there perhaps. I have an MBA and even as part of that programme – at a top European school – Entrepreneurship teaching was poor.
Well I don’t want to ramble on, the only reason I think about this stuff is that I stare in horror at the education that awaits my 2 year old son (public or private) and wish it would prepare him for the future – but I see little evidence that it will.
Pingback: Standards In English Schools Part I: The introduction of the National Curriculum and GCSEs | Dominic Cummings's Blog
There is certainly no shortage of evidence for University grade inflation. I think that to some extent this and the grade inflation seen in schools are as a result of the imposition of a market approach to education. (schools buy better exam results by going to more forgiving exams, applicants for university now shop around and are more likely to apply to a course with a higher proportion of first class degrees which they perceive perhaps correctly as an important positive on a CV) as well as a keep up with the Joneses attitude – “well if they are doing it we have to” among schools and higher education institutions
As regards schools, there are (at least) two key points: a) what is the education for? and b) what is the testing process for?
One comes easily to the rather sad conclusion that in the current system, the education is “for” passing the test, and the test is “for” building self-esteem in pupils, teachers and ministers. As well as keeping the exam boards, who I must say are in no small part responsible, in business. In this I am completely with Tim Gowers.
The better solution of course is that the education should be “for” preparing students for later life, and the test should be for i) distinguishing between pupils of different abilities to enable them to attend the most appropriate level of further education and ii) as a comparator of achievement between different schools so that quality control can be achieved and resources directed where appropriate. As such, one would want and expect grades to follow a normal distribution. Having seen data from Ofqual this is far from the case at present.
The current situation achieves neither goal. From my experience it is almost impossible to distinguish between bright and exceptional candidates for University entry based on exam results. Whatever else it is IMO vital that our ablest students are given the opportunity to study at our best universities. I am afraid this is not currently the case. Furthermore the compression of grades and the failure to standardise and normalise the results against previous years means that exams are less useful as a means of internal quality control.
While I admire my son’s Maths teacher who teaches his admittedly very bright cohort of students that the C word is to be avoided at all costs (curriculum),this is a luxury that can be afforded by the top set in Prof Gowers’ alma mater, but one suggests is not likely to find widespread acceptance!
What is the purpose to the student of any exam at 16, now that education to 18 is compulsory? Why not abolish the GCSE altogether and replace it with a series of standardised tests for quality control with international benchmarking at 11 and 16. At 18 there are different options including a US style SAT, or a proper Bacculaureate, which, maybe with the aid of specialist extension papers and technical elements could give a wide range of scores and identify the best students, allowing the educators to give a broad and appropriate level of education across the board
In an analysis like Dominic’s errors and distortions are rarely deliberate. Most sneak in because of starting points that appear self-evident, but which may be inappropriate.
Dominic takes for granted the relevance of differences in “ability” when he states his first aim
“to a) discover what pupils of different ability, using objective measures, are capable of achieving given certain teaching”.
This assumption may seem relevant at degree level – but it is hard to know whether this is merely due to “attainment”, or whether it stems from “ability”. (It is worth reading John Mighton on “The myth of ability”.) But the idea may be irrelevant at primary school.
On the level of pub debate, few would argue that everyone is equal. But this does not automatically warrant the assumption behind Dominic’s stated goal a).
Without defining “ability”, we may tentatively agree that, a maths teacher faced with Tim Gowers and Dominic in the same class could be forgiven for discerning different mathematical talents. But that does *not* imply that – at lest up to some level – they could not both profitably follow the same basic curriculum. (Dominic might supplement it with a dash of rhetoric, and Tim might add some harder problems: but there may be excellent social reasons – at least up to some level – to stick to the same prescribed content for both.)
How far one can push this deserves serious attention. The lesson from other countries is absolutely clear: one may eventually have to allow for some divergence (at age 13, or age 16 or age 18); but we in England are simply *wrong* to presume differences ab initio, and to exacerbate them.
There is no need to deny the relevance of “ability” altogether: all I am suggesting is that it should not be injected up front. An evidence-based approach should start by seeing how far one can get without appealing to such ill-defined and uncomprehended concepts.
Appealing to “ability” is the lazy man’s way of “explaining” observed outcomes: it avoids the uncomfortable possibility that individual and institutional educational failure may be due to social and political neglect, or to bad teaching, bad schools, bad assessment, and bad inspection. If (as is the case) a teacher can take a bottom set and get them to outperform many in higher sets, then we should hold off distorting our starting assumptions by giving the elusive (and possibly irrelevant) notion of “ability” a dominant position ab initio.
I think your argument is confused.
1. There is a vast science of ‘ability’. It is barely known about in the education world because it is rejected ab initio on political grounds and does not feature in teacher training or CPD. But it is there. Why do scientists pay attention to it? Because it makes accurate predictions – something v rare in education. I will go through this in a future blog but you are just wrong to think that ability is an ‘ill-defined’ concept.
If you want to get a very uptodate explanation of the cutting edge of this science, then read this by a physicist now working with Plomin et el to find the genes responsible for the distribution curve of ‘g’. (Ps. As a mathematician you may also find it interesting to see a physicist applying new maths from ‘compressed sensing’ to genetic problems. Most biologists will have no knowledge of this field – never mind most people in education/sociology/economics/journalism etc who dominate these debates.)
Click to access 1408.3421v2.pdf
2. Precisely because of the accuracy of predictions made on the basis of ‘ability’, it is important and would be very useful to know with some precision the differences even a brilliant system should expect to see from people -3/-2/-1/average/+1/+2/+3 SD of ability. Everyone now argues about things like ‘when should kids be able to do 12×12, or the quadratic equation, or calculus etc’ on a totally circular basis. This is not just unscientific – it is also wasteful and damaging (both because some are wrongly ‘accelerated’ and others bored out of their skulls, like poor TG would be if he had to keep doing trivial drills while the teacher explains things to me!).
3. The point of doing the research is NOT to write people off. It is obvious that you have a good point about how other countries try to get a very large proportion of pupils to master basics and do not say ‘X is not mathematical’ etc. Ironically, the science of ability – psychometrics and genetics – actually provides an interesting argument for you here. In England, there has been a strong idea that some people are ‘clever but not mathematical’. Genetic studies suggest this is a bad idea because genes have broad effects – if a kid is brilliant at English but bad at maths, the conclusion should NOT be ‘they’re not mathematical, give up’, but ‘the teaching is probably bad and that is where attention should focus’. It is not ‘wrong to presume differences ab initio’. Differences ARE demonstrably there ab initio as genetics shows beyond argument. But that does NOT mean that the ‘try to get everybody to master basics’ approach is wrong. The issue is – we do not know how to define ‘basics’. It is defining ‘basics’ – not defining ‘ability’ – that is the problem.
4. Absorbing what biology tells us does not mean avoiding uncomfortable possibilities re institutional failure, political neglect etc. This is a very common fallacy in the education world but it is illogical. One can accept clear biological evidence AND face institutional and political failures. It is a very bad idea that one has to choose between ignoring science and ignoring political / institutional failure. A rational person with a scientific perspective on the world should not ignore either but should go where the evidence leads…
I have to ask: why are we not automatically giving students two axes of grading? We do it in good primary school reports – Jr got an A for effort and a B for attainment. Somewhere along the line this approach is thrown away, but it’s actually useful information:
1) Important for the student because an honest assessment of their diligence builds a life skill of being able to assess themselves.
2) As someone who has done a lot of work with some MBA courses about admissions and outcomes, I can tell you that issues like “grit” and “conscientiousness” have relevance for education and business outcomes. B for attainment and A for effort vs A for attainment and C for effort is actually useful information – especially when we get away from the endlessly fetishised top end. (I have a friend who lectures in Engineering at Cambridge, so I’m not unaware of how the system is failing at that end, but it’s by no means the whole issue for “UK plc.” The quality of the system for the mid-range matters too.)