On the referendum #33: High performance government, ‘cognitive technologies’, Michael Nielsen, Bret Victor, & ‘Seeing Rooms’

On the referendum #33: High performance government, ‘cognitive technologies’, Michael Nielsen, Bret Victor, & ‘Seeing Rooms’

‘People, ideas, machines — in that order!’ Colonel Boyd.

‘The main thing that’s needed is simply the recognition of how important seeing is, and the will to do something about it.’ Bret Victor.

‘[T]he transfer of an entirely new and quite different framework for thinking about, designing, and using information systems … is immensely more difficult than transferring technology.’ Robert Taylor, one of the handful most responsible for the creation of the internet and personal computing, and in inspiration to Bret Victor.

‘[M]uch of our intellectual elite who think they have “the solutions” have actually cut themselves off from understanding the basis for much of the most important human progress.’ Michael Nielsen, physicist. 

Introduction

This blog looks at an intersection of decision-making, technology, high performance teams and government. It sketches some ideas of physicist Michael Nielsen about cognitive technologies and of computer visionary Bret Victor about the creation of dynamic tools to help understand complex systems and ‘argue with evidence’, such as tools for authoring dynamic documents’, and ‘Seeing Rooms’ for decision-makers — i.e rooms designed to support decisions in complex environments. It compares normal Cabinet rooms, such as that used in summer 1914 or October 1962, with state-of-the-art Seeing Rooms. There is very powerful feedback between: a) creating dynamic tools to see complex systems deeper (to see inside, see across time, and see across possibilities), thus making it easier to work with reliable knowledge and interactive quantitative models, semi-automating error-correction etc, and b) the potential for big improvements in the performance of political and government decision-making.

It is relevant to Brexit and anybody thinking ‘how on earth do we escape this nightmare’ but 1) these ideas are not at all dependent on whether you support or oppose Brexit, about which reasonable people disagree, and 2) they are generally applicable to how to improve decision-making — for example, they are relevant to problems like ‘how to make decisions during a fast moving nuclear crisis’ which I blogged about recently, or if you are a journalist ‘what future media could look like to help improve debate of politics’. One of the tools Nielsen discusses is a tool to make memory a choice by embedding learning in long-term memory rather than, as it is for almost all of us, an accident. I know from my days working on education reform in government that it’s almost impossible to exaggerate how little those who work on education policy think about ‘how to improve learning’.

Fields make huge progress when they move from stories (e.g Icarus)  and authority (e.g ‘witch doctor’) to evidence/experiment (e.g physics, wind tunnels) and quantitative models (e.g design of modern aircraft). Political ‘debate’ and the processes of government are largely what they have always been largely conflict over stories and authorities where almost nobody even tries to keep track of the facts/arguments/models they’re supposedly arguing about, or tries to learn from evidence, or tries to infer useful principles from examples of extreme success/failure. We can see much better than people could in the past how to shift towards processes of government being ‘partially rational discussion over facts and models and learning from the best examples of organisational success‘. But one of the most fundamental and striking aspects of government is that practically nobody involved in it has the faintest interest in or knowledge of how to create high performance teams to make decisions amid uncertainty and complexity. This blindness is connected to another fundamental fact: critical institutions (including the senior civil service and the parties) are programmed to fight to stay dysfunctional, they fight to stay closed and avoid learning about high performance, they fight to exclude the most able people.

I wrote about some reasons for this before the referendum (cf. The Hollow Men). The Westminster and Whitehall response was along the lines of ‘natural party of government’, ‘Rolls Royce civil service’ blah blah. But the fact that Cameron, Heywood (the most powerful civil servant) et al did not understand many basic features of how the world works is why I and a few others gambled on the referendum — we knew that the systemic dysfunction of our institutions and the influence of grotesque incompetents provided an opportunity for extreme leverage. 

Since then, after three years in which the parties, No10 and the senior civil service have imploded (after doing the opposite of what Vote Leave said should happen on every aspect of the negotiations) one thing has held steady — Insiders refuse to ask basic questions about the reasons for this implosion, such as: ‘why Heywood didn’t even put together a sane regular weekly meeting schedule and ministers didn’t even notice all the tricks with agendas/minutes etc’, how are decisions really made in No10, why are so many of the people below some cognitive threshold for understanding basic concepts (cf. the current GATT A24 madness), what does it say about Westminster that both the Adonis-Remainers and the Cash-ERGers have become more detached from reality while a large section of the best-educated have effectively run information operations against their own brains to convince themselves of fairy stories about Facebook, Russia and Brexit…

It’s a mix of amusing and depressing — but not surprising to me — to hear Heywood explain HERE how the British state decided it couldn’t match the resources of a single multinational company or a single university in funding people to think about what the future might hold, which is linked to his failure to make serious contingency plans for losing the referendum. And of course Heywood claimed after the referendum that we didn’t need to worry about the civil service because on project management it has ‘nothing to learn’ from the best private companies. The elevation of Heywood in the pantheon of SW1 is the elevation of the courtier-fixer at the expense of the thinker and the manager — the universal praise for him recently is a beautifully eloquent signal that those in charge are the blind leading the blind and SW1 has forgotten skills of high value, the skills of public servants such as Alanbrooke or Michael Quinlan.

This blog is hopefully useful for some of those thinking about a) improving government around the world and/or b) ‘what comes after the coming collapse and reshaping of the British parties, and how to improve drastically the performance of critical institutions?’

Some old colleagues have said ‘Don’t put this stuff on the internet, we don’t want the second referendum mob looking at it.’ Don’t worry! Ideas like this have to be forced down people’s throats practically at gunpoint. Silicon Valley itself has barely absorbed Bret Victor’s ideas so how likely is it that there will be a rush to adopt them by the world of Blair and Grieve?! These guys can’t tell the difference between courtier-fixers and people with models for truly effective action like General Groves (HERE). Not one in a thousand will read a 10,000 word blog on the intersection of management and technology and the few who do will dismiss it as the babbling of a deluded fool, they won’t learn any more than they learned from the 2004 referendum or from Vote Leave. And if I’m wrong? Great. Things will improve fast and a second referendum based on both sides applying lessons from Bret Victor would be dynamite.

NB. Bret Victor’s project, Dynamic Land, is a non-profit. For an amount of money that a government department like the Department for Education loses weekly without any minister realising it’s lost (in the millions per week in my experience because the quality of financial control is so bad), it could provide crucial funding for Victor and help itself. Of course, any minister who proposed such a thing would be told by officials ‘this is illegal under EU procurement law and remember minister that we must obey EU procurement law forever regardless of Brexit’ — something I know from experience officials say to ministers whether it is legal or not when they don’t like something. And after all, ministers meekly accepted the Kafka-esque order from Heywood to prioritise duties of goodwill to the EU under A50 over preparations to leave A50, so habituated had Cameron’s children become to obeying the real deputy prime minister…

Below are 4 sections:

  1. The value found in intersections of fields
  2. Some ideas of Bret Victor
  3. Some ideas of Michael Nielsen
  4. A summary

*

1. Extreme value is often found in the intersection of fields

The legendary Colonel Boyd (he of the ‘OODA loop’) would shout at audiences ‘People, ideas, machines — in that order.‘ Fundamental political problems we face require large improvements in the quality of all three and, harder, systems to integrate all three. Such improvements require looking carefully at the intersection of roughly five entangled areas of study. Extreme value is often found at such intersections.

  • Explore what we know about the selection, education and training of people for high performance (individual/team/organisation) in different fields. We should be selecting people much deeper in the tails of the ability curve — people who are +3 (~1:1,000) or +4 (~1:30,000) standard deviations above average on intelligence, relentless effort, operational ability and so on (now practically entirely absent from the ’50 most powerful people in Britain’). We should  train them in the general art of ‘thinking rationally’ and making decisions amid uncertainty (e.g Munger/Tetlock-style checklists, exercises on SlateStarCodex blog). We should train them in the practical reasons for normal ‘mega-project failure’ and case studies such as the Manhattan Project (General Groves), ICBMs (Bernard Schriever), Apollo (George Mueller), ARPA-PARC (Robert Taylor) that illustrate how the ‘unrecognised simplicities’ of high performance bring extreme success and make them work on such projects before they are responsible for billions rather than putting people like Cameron in charge (after no experience other than bluffing through PPE then PR). NB. China’s leaders have studied these episodes intensely while American and British institutions have actively ‘unlearned’ these lessons.
  • Explore the frontiers of the science of prediction across different fields from physics to weather forecasting to finance and epidemiology. For example, ideas from physics about early warning systems in physical systems have application in many fields, including questions like: to what extent is it possible to predict which news will persist over different timescales, or predict wars from news and social media? There is interesting work combining game theory, machine learning, and Red Teams to predict security threats and improve penetration testing (physical and cyber). The Tetlock/IARPA project showed dramatic performance improvements in political forecasting are possible, contra what people such as Kahneman had thought possible. A recent Nature article by Duncan Watts explained fundamental problems with the way normal social science treats prediction and suggested new approaches — which have been almost entirely ignored by mainstream economists/social scientists. There is vast scope for applying ideas and tools from the physical sciences and data science/AI — largely ignored by mainstream social science, political parties, government bureaucracies and media — to social/political/government problems (as Vote Leave showed in the referendum, though this has been almost totally obscured by all the fake news: clue — it was not ‘microtargeting’).
  • Explore technology and tools. For example, Bret Victor’s work and Michael Nielsen’s work on cognitive technologies. The edge of performance in politics/government will be defined by teams that can combine the ancient ‘unrecognised simplicities of high performance’ with edge-of-the-art technology. No10 is decades behind the pace in old technologies like TV, doesn’t understand simple tools like checklists, and is nowhere with advanced technologies.
  • Explore the frontiers of communication (e.g crisis management, applied psychology). Technology enables people to improve communication with unprecedented speed, scale and iterative testing. It also allows people to wreak chaos with high leverage. The technologies are already beyond the ability of traditional government centralised bureaucracies to cope with. They will develop rapidly such that most such centralised bureaucracies lose more and more control while a few high performance governments use the leverage they bring (c.f China’s combination of mass surveillance, AI, genetic identification, cellphone tracking etc as they desperately scramble to keep control). The better educated think that psychological manipulation is something that happens to ‘the uneducated masses’ but they are extremely deluded — in many ways people like FT pundits are much easier to manipulate, their education actually makes them more susceptible to manipulation, and historically they are the ones who fall for things like Russian fake news (cf. the Guardian and New York Times on Stalin/terror/famine in the 1930s) just as now they fall for fake news about fake news. Despite the centrality of communication to politics it is remarkable how little attention Insiders pay to what works — never mind the question ‘what could work much better?’.  The fact that so much of the media believes total rubbish about social media and Brexit shows that the media is incapable of analysing the intersection of politics and technology but, although it is obviously bad that the media disinforms the public, the only rational planning assumption is that this problem will continue and even get worse. The media cannot explain either the use of TV or traditional polling well, these have been extremely important for over 70 years, and there is no trend towards improvement so a sound planning assumption is surely that the media will do even worse with new technologies and data science. This will provide large opportunities for good and evil. A new approach able to adapt to the environment an order of magnitude faster than now would disorient political opponents (desperately scrolling through Twitter) to such a degree — in Boyd’s terms it would ‘collapse their OODA loops’ — that it could create crucial political space for focus on the extremely hard process of rewiring government institutions which now seems impossible for Insiders to focus on given their psychological/operational immersion in the hysteria of 24 hour rolling news and the constant crises generated by dysfunctional bureaucracies.
  • Explore how to re-program political/government institutions at the apex of decision-making authority so that a) people are more incentivised to optimise things we want them to optimise, like error-correction and predictive accuracy, and less incentivised to optimise bureaucratic process, prestige, and signalling as our institutions now do; b) institutions are incentivised to build high performance teams rather than make this practically illegal at the apex of government; and c) we have ‘immune systems’ based on decentralisation and distributed control to minimise the inevitable failures of even the best people and teams.

Example 1: Red Teams and pre-mortems can combat groupthink and normal cognitive biases but they are practically nowhere in the formal structure of governments. There is huge scope for a Parliament-mandated small and extremely elite Red Team operating next to, and in some senses above, the Cabinet Office to ensure diversity of opinions, fight groupthink and other standard biases, make sure lessons are learned and so on. Cost: a few million that it would recoup within weeks by stopping blunders.

Example 2: prediction tournaments/markets could improve policy and project management, with people able to ‘short’ official delivery timetables — imagine being able to short Grayling’s transport announcements, for example. In many areas new markets could help — e.g markets to allow shorting of house prices to dampen bubbles, as Chris Dillow and others have suggested. The way in which the IARPA/Tetlock work has been ignored in SW1 is proof that MPs and civil servants are not actually interested in — or incentivised to be interested in — who is right, who is actually an ‘expert’, and so on. There are tools available if new people do want to take these things seriously. Cost: a few million at most, possibly thousands, that it would recoup within a year by stopping blunders.

Example 3: we need to consider projects that could bootstrap new international institutions that help solve more general coordination problems such as the risk of accidental nuclear war. The most obvious example of a project like this I can think of is a manned international lunar base which would be useful for a) basic science, b) the practical purposes of building urgently needed near-Earth infrastructure for space industrialisation, and c) to force the creation of new practical international institutions for cooperation between Great Powers. George Mueller’s team that put man on the moon in 1969 developed a plan to do this that would have been built by now if their plans had not been tragically abandoned in the 1970s. Jeff Bezos is explicitly trying to revive the Mueller vision and Britain should be helping him do it much faster. The old institutions like the UN and EU — built on early 20th Century assumptions about the performance of centralised bureaucracies — are incapable of solving global coordination problems. It seems to me more likely that institutions with qualities we need are much more likely to emerge out of solving big problems than out of think tank papers about reforming existing institutions. Cost = 10s/100s of billions, return = trillions, or near infinite if shifting our industrial/psychological frontiers into space drastically reduces the chances of widespread destruction.

A) Some fields have fantastic predictive models and there is a huge amount of high quality research, though there is a lot of low-hanging fruit in bringing methods from one field to another.

B) We know a lot about high performance including ‘systems management’ for complex projects but very few organisations use this knowledge and government institutions overwhelmingly try to ignore and suppress the knowledge we have.

C) Some fields have amazing tools for prediction and visualisation but very few organisations use these tools and almost nobody in government (where colour photocopying is a major challenge).

D) We know a lot about successful communication but very few organisations use this knowledge and most base action on false ideas. E.g political parties spend millions on spreading ideas but almost nothing on thinking about whether the messages are psychologically compelling or their methods/distribution work, and TV companies spend billions on news but almost nothing understanding what science says about how to convey complex ideas — hence why you see massively overpaid presenters like Evan Davis babbling metaphors like ‘economic takeoff’ in front of an airport while his crew films a plane ‘taking off’, or ‘the economy down the plughole’ with pictures of — a plughole.

E) Many thousands worldwide are thinking about all sorts of big government issues but very few can bring them together into coherent plans that a government can deliver and there is almost no application of things like Red Teams and prediction markets. E.g it is impossible to describe the extent to which politicians in Britain do not even consider ‘the timetable and process for turning announcement X into reality’ as something to think about — for people like Cameron and Blair the announcement IS the only reality and ‘management’ is a dirty word for junior people to think about while they focus on ‘strategy’. As I have pointed out elsewhere, it is fascinating that elite business schools have been collecting billions in fees to teach their students WRONGLY that operational excellence is NOT a source of competitive advantage, so it is no surprise that politicians and bureaucrats get this wrong.

But I can see almost nobody integrating the very best knowledge we have about A+B+C+D with E and I strongly suspect there are trillion dollar bills lying on the ground that could be grabbed for trivial cost — trillion dollar bills that people with power are not thinking about and are incentivised not to think about. I might be wrong but I would remind readers that Vote Leave was itself a bet on this proposition being right and I think its success should make people update their beliefs on the competence of elite political institutions and the possibilities for improvement.

Here I want to explore one set of intersections — the ideas of Bret Victor and Michael Nielsen.

*

2. Bret Victor: Cognitive technologies, dynamic tools, interactive quantitative models, Seeing Rooms — making it as easy to insert facts, data, and models in political discussion as it is to insert emoji 

In the 1960s visionaries such as Joseph Licklider, Robert Taylor and Doug Engelbart developed a vision of networked interactive computing that provided the foundation not just for new technologies (the internet, PC etc) but for whole new industries. Licklider, Sutherland,Taylor et al provided a model (ARPA) for how science funding can work. Taylor provided a model (PARC) of how to manage a team of extremely talented people who turned a profound vision into reality. The original motivation for the vision of networked interactive computing was to help humans make good decisions in a complex world — or, ‘augmenting human intelligence’ and ‘man-machine symbiosis’. This story shows how to make big improvements in the world with very few resources if they are structured right: PARC involved ~25 key people and tens of millions over roughly a decade and generated trillions of dollars in value. If interested in the history and the super-productive processes behind the success of ARPA-PARC read THIS.

It’s fascinating that in many ways the original 1960s Licklider vision has still not been implemented. The Silicon Valley ecosystem developed parts of the vision but not others for complex reasons I don’t understand (cf. The Future of Programming). One of those who is trying to implement parts of the vision that have not been implemented is Bret Victor. Bret Victor is a rare thing: a genuine visionary in the computing world according to some of those ‘present at the creation’ of ARPA-PARC such as Alan Kay. His ideas lie at critical intersections between fields sketched above. Watch talks such as Inventing on Principle and Media for Thinking the Unthinkable and explore his current project, Dynamic Land in Berkeley.

Victor has described, and now demonstrates in Dynamic Land, how existing tools fail and what is possible. His core principle is that creators need an immediate connection to what they are creating. Current programming languages and tools are mostly based on very old ideas before computers even had screens and there was essentially no interactivity — they date from the era of punched cards. They do not allow users to interact dynamically. New dynamic tools enable us to think previously unthinkable thoughts and allow us to see and interact with complex systems: to see inside, see across time, and see across possibilities.

I strongly recommend spending a few days exploring his his whole website but I will summarise below his ideas on two things:

  1. His ideas about how to build new dynamic tools for working with data and interactive models.
  2. His ideas about transforming the physical spaces in which teams work so that dynamic tools are embedded in their environment — people work inside a tool.

Applying these ideas would radically improve how people make decisions in government and how the media reports politics/government.

Language and writing were cognitive technologies created thousands of years ago which enabled us to think previously unthinkable thoughts. Mathematical notation did the same over the past 1,000 years. For example, take a mathematics problem described by the 9th Century mathematician al-Khwarizmi (who gave us the word algorithm):

screenshot 2019-01-28 23.46.10

Once modern notation was invented, this could be written instead as:

x2 + 10x = 39

Michael Nielsen uses a similar analogy. Descartes and Fermat demonstrated that equations can be represented on a diagram and a diagram can be represented as an equation. This was a new cognitive technology, a new way of seeing and thinking: algebraic geometry. Changes to the ‘user interface’ of mathematics were critical to its evolution and allowed us to think unthinkable thoughts (Using Artificial Intelligence to Augment Human Intelligence, see below).

Screenshot 2019-03-06 11.33.19

Similarly in the 18th Century, there was the creation of data graphics to demonstrate trade figures. Before this, people could only read huge tables. This is the first data graphic:

screenshot 2019-01-29 00.28.21

The Jedi of data visualisation, Edward Tufte, describes this extraordinary graphic of Napoleon’s invasion of Russia as ‘probably the best statistical graphic ever drawn’. It shows the losses of Napoleon’s army: from the Polish-Russian border, the thick band shows the size of the army at each position, the path of Napoleon’s winter retreat from Moscow is shown by the dark lower band, which is tied to temperature and time scales (you can see some of the disastrous icy river crossings famously described by Tolstoy). NB. The Cabinet makes life-and-death decisions now with far inferior technology to this from the 19th Century (see below).

screenshot 2019-01-29 10.37.05

If we look at contemporary scientific papers they represent extremely compressed information conveyed through a very old fashioned medium, the scientific journal. Printed journals are centuries old but the ‘modern’ internet versions are usually similarly static. They do not show the behaviour of systems in a visual interactive way so we can see the connections between changing values in the models and changes in behaviour of the system. There is no immediate connection. Everything is pretty much the same as a paper and pencil version of a paper. In Media for Thinking the Unthinkable, Victor shows how dynamic tools can transform normal static representations so systems can be explored with immediate feedback. This dramatically shows how much more richly and deeply ideas can be explored. With Victor’s tools we can interact with the systems described and immediately grasp important ideas that are hidden in normal media.

Picture: the very dense writing of a famous paper (by chance the paper itself is at the intersection of politics/technology and Watts has written excellent stuff on fake news but has been ignored because it does not fit what ‘the educated’ want to believe)

screenshot 2019-01-29 10.55.01

Picture: the same information presented differently. Victor’s tools make the information less compressed so there’s less work for the brain to do ‘decompressing’. They not only provide visualisations but the little ‘sliders’ over the graphics are to drag buttons and interact with the data so you see the connection between changing data and changing model. A dynamic tool transforms a scientific paper from ‘pencil and paper’ technology to modern interactive technology.

screenshot 2019-01-29 10.58.38

Victor’s essay on climate change

Victor explains in detail how policy analysis and public debate of climate change could be transformed. Leave aside the subject matter — of course it’s extremely important, anybody interested in this issue will gain from reading the whole thing and it would be great material for a school to use for an integrated science / economics / programming / politics project, but my focus is on his ideas about tools and thinking, not the specific subject matter.

Climate change is a great example to consider because it involves a) a lot of deep scientific knowledge, b) complex computer modelling which is understood in detail by a tiny fraction of 1% (and almost none of the social science trained ‘experts’ who are largely responsible for interpreting such models for politicians/journalists, cf HERE for the science of this), c) many complex political, economic, cultural issues, d) very tricky questions about how policy is discussed in mainstream culture, and e) the problem of how governments try to think about and act on important, complex, and long-term problems. Scientific knowledge is crucial but it cannot by itself answer the question: what to do? The ideas BV describes to transform the debate on climate change apply generally to how we approach all important political issues.

In the section Languages for technical computing, BV describes his overall philosophy (if you look at the original you will see dynamic graphics to help make each point but I can’t make them play on my blog — a good example of the failure of normal tools!):

‘The goal of my own research has been tools where scientists see what they’re doing in realtime, with immediate visual feedback and interactive exploration. I deeply believe that a sea change in invention and discovery is possible, once technologists are working in environments designed around:

  • ubiquitous visualization and in-context manipulation of the system being studied;
  • actively exploring system behavior across multiple levels of abstraction in parallel;
  • visually investigating system behavior by transforming, measuring, searching, abstracting;
  • seeing the values of all system variables, all at once, in context;
  • dynamic notations that embed simulation, and show the effects of parameter changes;
  • visually improvising special-purpose dynamic visualizations as needed.’

He then describes how the community of programming language developers have failed to create appropriate languages for scientists, which I won’t go into but which is fascinating.

He then describes the problem of how someone can usefully get to grips with a complex policy area involving technological elements.

‘How can an eager technologist find their way to sub-problems within other people’s projects where they might have a relevant idea? How can they be exposed to process problems common across many projects?… She wishes she could simply click on “gas turbines”, and explore the space:

  • What are open problems in the field?
  • Who’s working on which projects?
  • What are the fringe ideas?
  • What are the process bottlenecks?
  • What dominates cost? What limits adoption?
  • Why make improvements here? How would the world benefit?

‘None of this information is at her fingertips. Most isn’t even openly available — companies boast about successes, not roadblocks. For each topic, she would have to spend weeks tracking down and meeting with industry insiders. What she’d like is a tool that lets her skim across entire fields, browsing problems and discovering where she could be most useful…

‘Suppose my friend uncovers an interesting problem in gas turbines, and comes up with an idea for an improvement. Now what?

  • Is the improvement significant?
  • Is the solution technically feasible?
  • How much would the solution cost to produce?
  • How much would it need to cost to be viable?
  • Who would use it? What are their needs?
  • What metrics are even relevant?

‘Again, none of this information is at her fingertips, or even accessible. She’d have to spend weeks doing an analysis, tracking down relevant data, getting price quotes, talking to industry insiders.

‘What she’d like are tools for quickly estimating the answers to these questions, so she can fluidly explore the space of possibilities and identify ideas that have some hope of being important, feasible, and viable.

‘Consider the Plethora on-demand manufacturing service, which shows the mechanical designer an instant price quote, directly inside the CAD software, as they design a part in real-time. In what other ways could inventors be given rapid feedback while exploring ideas?’

Victor then describes a public debate over a public policy. Ideas were put forward. Everybody argued.

‘Who to believe? The real question is — why are readers and decision-makers forced to “believe” anything at all? Many claims made during the debate offered no numbers to back them up. Claims with numbers rarely provided context to interpret those numbers. And never — never! — were readers shown the calculations behind any numbers. Readers had to make up their minds on the basis of hand-waving, rhetoric, bombast.’

And there was no progress because nobody could really learn from the debate or even just be clear about exactly what was being proposed. Sound familiar?!! This is absolutely normal and Victor’s description applies to over 99% of public policy debates.

Victor then describes how you can take the policy argument he had sketched and change its nature. Instead of discussing words and stories, DISCUSS INTERACTIVE MODELS. 

Here you need to click to the original to understand the power of what he is talking about as he programs a simple example.

‘The reader can explore alternative scenarios, understand the tradeoffs involved, and come to an informed conclusion about whether any such proposal could be a good decision.

‘This is possible because the author is not just publishing words. The author has provided a model — a set of formulas and algorithms that calculate the consequences of a given scenario… Notice how the model’s assumptions are clearly visible, and can even be adjusted by the reader.

‘Readers are thus encouraged to examine and critique the model. If they disagree, they can modify it into a competing model with their own preferred assumptions, and use it to argue for their position. Model-driven material can be used as grounds for an informed debate about assumptions and tradeoffs.

‘Modeling leads naturally from the particular to the general. Instead of seeing an individual proposal as “right or wrong”, “bad or good”, people can see it as one point in a large space of possibilities. By exploring the model, they come to understand the landscape of that space, and are in a position to invent better ideas for all the proposals to come. Model-driven material can serve as a kind of enhanced imagination.

Victor then looks at some standard materials from those encouraging people to take personal action on climate change and concludes:

‘These are lists of proverbs. Little action items, mostly dequantified, entirely decontextualized. How significant is it to “eat wisely” and “trim your waste”? How does it compare to other sources of harm? How does it fit into the big picture? How many people would have to participate in order for there to be appreciable impact? How do you know that these aren’t token actions to assauge guilt?

‘And why trust them? Their rhetoric is catchy, but so is the horrific “denialist” rhetoric from the Cato Institute and similar. When the discussion is at the level of “trust me, I’m a scientist” and “look at the poor polar bears”, it becomes a matter of emotional appeal and faith, a form of religion.

‘Climate change is too important for us to operate on faith. Citizens need and deserve reading material which shows context — how significant suggested actions are in the big picture — and which embeds models — formulas and algorithms which calculate that significance, for different scenarios, from primary-source data and explicit assumptions.’

Even the supposed ‘pros’ — Insiders at the top of research fields in politically relevant areas — have to scramble around typing words into search engines, crawling around government websites, and scrolling through PDFs. Reliable data takes ages to find. Reliable models are even harder to find. Vast amounts of useful data and models exist but they cannot be found and used effectively because we lack the tools.

‘Authoring tools designed for arguing from evidence’

Why don’t we conduct public debates in the way his toy example does with interactive models? Why aren’t paragraphs in supposedly serious online newspapers written like this? Partly because of the culture, including the education of those who run governments and media organisations, but also because the resources for creating this sort of material don’t exist.

‘In order for model-driven material to become the norm, authors will need data, models, tools, and standards…

‘Suppose there were good access to good data and good models. How would an author write a document incorporating them? Today, even the most modern writing tools are designed around typing in words, not facts. These tools are suitable for promoting preconceived ideas, but provide no help in ensuring that words reflect reality, or any plausible model of reality. They encourage authors to fool themselves, and fool others

‘Imagine an authoring tool designed for arguing from evidence. I don’t mean merely juxtaposing a document and reference material, but literally “autocompleting” sourced facts directly into the document. Perhaps the tool would have built-in connections to fact databases and model repositories, not unlike the built-in spelling dictionary. What if it were as easy to insert facts, data, and models as it is to insert emoji and cat photos?

‘Furthermore, the point of embedding a model is that the reader can explore scenarios within the context of the document. This requires tools for authoring “dynamic documents” — documents whose contents change as the reader explores the model. Such tools are pretty much non-existent.’

These sorts of tools for authoring dynamic documents should be seen as foundational technology like the integrated circuit or the internet.

‘Foundational technology appears essential only in retrospect. Looking forward, these things have the character of “unknown unknowns” — they are rarely sought out (or funded!) as a solution to any specific problem. They appear out of the blue, initially seem niche, and eventually become relevant to everything.

‘They may be hard to predict, but they have some common characteristics. One is that they scale well. Integrated circuits and the internet both scaled their “basic idea” from a dozen elements to a billion. Another is that they are purpose-agnostic. They are “material” or “infrastructure”, not applications.’

Victor ends with a very potent comment — that much of what we observe is ‘rearranging  app icons on the deck of the Titanic’. Commercial incentives drive people towards trying to create ‘the next Facebook’ — not fixing big social problems. I will address this below.

If you are an arts graduate interested in these subjects but not expert (like me), here is an example that will be more familiar… If you look at any big historical subject, such as ‘why/how did World War I start?’ and examine leading scholarship carefully, you will see that all the leading books on such subjects provide false chronologies and mix facts with errors such that it is impossible for a careful reader to be sure about crucial things. It is routine for famous historians to write that ‘X happened because Y’ when Y happened after X. Part of the problem is culture but this could potentially be improved by tools. A very crude example: why doesn’t Kindle make it possible for readers to log factual errors, with users’ reliability ranked by others, so authors can easily check potential errors and fix them in online versions of books? Even better, this could be part of a larger system to develop gold standard chronologies with each ‘fact’ linked to original sources and so on. This would improve the reliability of historical analysis and it would create an ‘anti-entropy’ ratchet — now, entropy means that errors spread across all books on a subject and there is no mechanism to reverse this…

 

‘Seeing Rooms’: macro-tools to help make decisions

Victor also discusses another fundamental issue: the rooms/spaces in which most modern work and thinking occurs are not well-suited to the problems being tackled and we could do much better. Victor is addressing advanced manufacturing and robotics but his argument applies just as powerfully, perhaps more powerfully, to government analysis and decision-making.

Now, ‘software based tools are trapped in tiny rectangles’. We have very sophisticated tools but they all sit on computer screens on desks, just as you are reading this blog.

In contrast, ‘Real-world tools are in rooms where workers think with their bodies.’ Traditional crafts occur in spatial environments designed for that purpose. Workers walk around, use their hands, and think spatially. ‘The room becomes a macro-tool they’re embedded inside, an extension of the body.’ These rooms act like tools to help them understand their problems in detail and make good decisions.

Picture: rooms designed for the problems being tackled

Screenshot 2017-03-20 14.29.19

The wave of 3D printing has developed ‘maker rooms’ and ‘Fab Labs’ where people work with a set of tools that are too expensive for an individual. The room is itself a network of tools. This approach is revolutionising manufacturing.

Why is this useful?

‘Modern projects have complex behavior… Understanding requires seeing and the best seeing tools are rooms.’ This is obviously particularly true of politics and government.

Here is a photo of a recent NASA mission control room. The room is set up so that all relevant people can see relevant data and models at different scales and preserve a common picture of what is important. NASA pioneered thinking about such rooms and the technology and tools needed in the 1960s.

Screenshot 2017-03-20 14.35.35

Here are pictures of two control rooms for power grids.

Screenshot 2017-03-20 14.37.28

Here is a panoramic photo of the unified control centre for the Large Hadron Collider – the biggest of ‘big data’ projects. Notice details like how they have removed all pillars so nothing interrupts visual communication between teams.

Screenshot 2017-03-20 15.31.33

Now contrast these rooms with rooms from politics.

Here is the Cabinet room. I have been in this room. There are effectively no tools. In the 19th Century at least Lord Salisbury used the fireplace as a tool. He would walk around the table, gather sensitive papers, and burn them at the end of meetings. The fire is now blocked. The only other tool, the clock, did not work when I was last there. Over a century, the physical space in which politicians make decisions affecting potentially billions of lives has deteriorated.

British Cabinet room practically as it was July 1914

Screenshot 2017-03-20 15.42.59

Here are JFK and EXCOM making decisions during the Cuban Missile Crisis that moved much faster than July 1914, compressing decisions leading to the destruction of global civilisation potentially into just minutes.

Screenshot 2019-02-14 16.06.04

Here is the only photo in the public domain of the room known as ‘COBRA’ (Cabinet Office Briefing Room) where a shifting set of characters at the apex of power in Britain meet to discuss crises.

Screenshot 2017-03-20 14.39.41

Notice how poor it is compared to NASA, the LHC etc. There has clearly been no attempt to learn from our best examples about how to use the room as a tool. The screens at the end are a late add-on to a room that is essentially indistinguishable from the room in which Prime Minister Asquith sat in July 1914 while doodling notes to his girlfriend as he got bored. I would be surprised if the video technology used is as good as what is commercially available cheaper, the justification will be ‘security’, and I would bet that many of the decisions about the operation of this room would not survive scrutiny from experts in how to construct such rooms.

I have not attended a COBRA meeting but I’ve spoken to many who have. The meetings, as you would expect looking at this room, are often normal political meetings. That is:

  • aims are unclear,
  • assumptions are not made explicit,
  • there is no use of advanced tools,
  • there is no use of quantitative models,
  • discussions are often dominated by lawyers so many actions are deemed ‘unlawful’ without proper scrutiny (and this device is routinely used by officials to stop discussion of options they dislike for non-legal reasons),
  • there is constant confusion between policy, politics and PR then the cast disperses without clarity about what was discussed and agreed.

Here is a photo of the American equivalent – the Situation Room.

Screenshot 2017-03-20 15.51.12.png

It has a few more screens but the picture is essentially the same: there are no interactive tools beyond the ability to speak and see someone at a distance which was invented back in the 1950s/1960s in the pioneering programs of SAGE (automated air defence) and Apollo (man on the moon). Tools to help thinking in powerful ways are not taken seriously. It is largely the same, and decisions are made the same, as in the Cuban Missile Crisis. In some ways the use of technology now makes management worse as it encourages Presidents and their staff to try to micromanage things they should not be managing, often in response to or fear of the media.

Individual ministers’ officers are also hopeless. The computers are old and rubbish. Even colour printing is often a battle. Walls are for kids’ pictures. In the DfE officials resented even giving us paper maps of where schools were and only did it when bullied by the private office. It was impossible for officials to work on interactive documents. They had no technology even for sharing documents in a way that was then (2011) normal even in low-performing organisations. Using GoogleDocs was ‘against the rules’. (I’m told this has slightly improved.) The whole structure of ‘submissions’ and ‘red boxes’ is hopeless. It is extremely bureaucratic and slow. It prevents serious analysis of quantitative models. It reinforces the lack of proper scientific thinking in policy analysis. It guarantees confusion as ministers scribble notes and private offices interpret rushed comments by exhausted ministers after dinner instead of having proper face-to-face meetings that get to the heart of problems and resolve conflicts quickly. The whole approach reinforces the abject failure of the senior civil service to think about high performance project management.

Of course, most of the problems with the standards of policy and management in the civil service are low or no-tech problems — they involve the ‘unrecognised simplicities’ that are independent of, and prior to, the use of technology — but all these things negatively reinforce each other. Anybody who wants to do things much better is scuppered by Whitehall’s entangled disaster zone of personnel, training, management, incentives and tools.

*

Dynamic Land: ‘amazing’

I won’t go into this in detail. Dynamic Land is in a building in Berkeley. I visited last year. It is Victor’s attempt to turn the ideas above into a sort of living laboratory. It is a large connected set of rooms that have computing embedded in surfaces. For example, you can scribble equations on a bit of paper, cameras in the ceiling read your scribbles automatically, turn them into code, and execute them — for example, by producing graphics. You can then physically interact with models that appear on the table or wall while the cameras watch your hands and instantly turn gestures into new code and change the graphics or whatever you are doing. Victor has put these cutting edge tools into a space and made it open to the Berkeley community. This is all hard to explain/understand because you haven’t seen anything like it even in sci-fi films (it’s telling the media still uses the 15 year-old Minority Report as its sci-fi illustration for such things).

This video gives a little taste. I visited with a physicist who works on the cutting edge of data science/AI. I was amazed but I know nothing about such things — I was interested to see his reaction as he scribbled gravitational equations on paper and watched the cameras turn them into models on the table in real-time, then he changed parameters and watched the graphics change in real-time on the table (projected from the ceiling): ‘Ohmygod, this is just obviously the future, absolutely amazing.’ The thought immediately struck us: imagine the implications of having policy discussions with such tools instead of the usual terrible meetings. Imagine discussing HS2 budgets or possible post-Brexit trading arrangements with the models running like this for decision-makers to interact with.

Video of Dynamic Land: the bits of coloured paper are ‘code’, graphics are projected from the ceiling

 

screenshot 2019-01-29 15.01.20

screenshot 2019-01-29 15.27.05

*

3. Michael Nielsen and cognitive technologies

Connected to Victor’s ideas are those of the brilliant physicist, Michael Nielsen. Nielsen wrote the textbook on quantum computation and a great book, Reinventing Discovery, on the evolution of the scientific method. For example, instead of waiting for the coincidence of Grossmann helping out Einstein with some crucial maths, new tools could create a sort of ‘designed serendipity’ to help potential collaborators find each other.

In his essay Thought as a Technology, Nielsen describes the feedback between thought and interfaces:

‘In extreme cases, to use such an interface is to enter a new world, containing objects and actions unlike any you’ve previously seen. At first these elements seem strange. But as they become familiar, you internalize the elements of this world. Eventually, you become fluent, discovering powerful and surprising idioms, emergent patterns hidden within the interface. You begin to think with the interface, learning patterns of thought that would formerly have seemed strange, but which become second nature. The interface begins to disappear, becoming part of your consciousness. You have been, in some measure, transformed.’

He describes how normal language and computer interfaces are cognitive technologies:

‘Language is an example of a cognitive technology: an external artifact, designed by humans, which can be internalized, and used as a substrate for cognition. That technology is made up of many individual pieces – words and phrases, in the case of language – which become basic elements of cognition. These elements of cognition are things we can think with…

‘In a similar way to language, maps etc, a computer interface can be a cognitive technology. To master an interface requires internalizing the objects and operations in the interface; they become elements of cognition. A sufficiently imaginative interface designer can invent entirely new elements of cognition… In general, what makes an interface transformational is when it introduces new elements of cognition that enable new modes of thought. More concretely, such an interface makes it easy to have insights or make discoveries that were formerly difficult or impossible. At the highest level, it will enable discoveries (or other forms of creativity) that go beyond all previous human achievement.’

Nielsen describes how powerful ways of thinking among mathematicians and physicists are hidden from view and not part of textbooks and normal teaching.

The reason is that traditional media are poorly adapted to working with such representations… If experts often develop their own representations, why do they sometimes not share those representations? To answer that question, suppose you think hard about a subject for several years… Eventually you push up against the limits of existing representations. If you’re strongly motivated – perhaps by the desire to solve a research problem – you may begin inventing new representations, to provide insights difficult through conventional means. You are effectively acting as your own interface designer. But the new representations you develop may be held entirely in your mind, and so are not constrained by traditional static media forms. Or even if based on static media, they may break social norms about what is an “acceptable” argument. Whatever the reason, they may be difficult to communicate using traditional media. And so they remain private, or are only discussed informally with expert colleagues.’

If we can create interfaces that reify deep principles, then ‘mastering the subject begins to coincide with mastering the interface.’ He gives the example of Photoshop which builds in many deep principles of image manipulation.

‘As you master interface elements such as layers, the clone stamp, and brushes, you’re well along the way to becoming an expert in image manipulation… By contrast, the interface to Microsoft Word contains few deep principles about writing, and as a result it is possible to master Word‘s interface without becoming a passable writer. This isn’t so much a criticism of Word, as it is a reflection of the fact that we have relatively few really strong and precise ideas about how to write well.’

He then describes what he calls ‘the cognitive outsourcing model’: ‘we specify a problem, send it to our device, which solves the problem, perhaps in a way we-the-user don’t understand, and sends back a solution.’ E.g we ask Google a question and Google sends us an answer.

This is how most of us think about the idea of augmenting the human intellect but it is not the best approach. ‘Rather than just solving problems expressed in terms we already understand, the goal is to change the thoughts we can think.’

‘One challenge in such work is that the outcomes are so difficult to imagine. What new elements of cognition can we invent? How will they affect the way human beings think? We cannot know until they’ve been invented.

‘As an analogy, compare today’s attempts to go to Mars with the exploration of the oceans during the great age of discovery. These appear similar, but while going to Mars is a specific, concrete goal, the seafarers of the 15th through 18th centuries didn’t know what they would find. They set out in flimsy boats, with vague plans, hoping to find something worth the risks. In that sense, it was even more difficult than today’s attempts on Mars.

‘Something similar is going on with intelligence augmentation. There are many worthwhile goals in technology, with very specific ends in mind. Things like artificial intelligence and life extension are solid, concrete goals. By contrast, new elements of cognition are harder to imagine, and seem vague by comparison. By definition, they’re ways of thinking which haven’t yet been invented. There’s no omniscient problem-solving box or life-extension pill to imagine. We cannot say a priori what new elements of cognition will look like, or what they will bring. But what we can do is ask good questions, and explore boldly.

In another essay, Using Artificial Intelligence to Augment Human Intelligence, Nielsen points out that breakthroughs in creating powerful new cognitive technologies such as musical notation or Descartes’ invention of algebraic geometry are rare but ‘modern computers are a meta-medium enabling the rapid invention of many new cognitive technologies‘ and, further, AI will help us ‘invent new cognitive technologies which transform the way we think.’

Further, historically powerful new cognitive technologies, such as ‘Feynman diagrams’, have often appeared strange at first. We should not assume that new interfaces should be ‘user friendly’. Powerful interfaces that repay mastery may require sacrifices.

‘The purpose of the best interfaces isn’t to be user-friendly in some shallow sense. It’s to be user-friendly in a much stronger sense, reifying deep principles about the world, making them the working conditions in which users live and create. At that point what once appeared strange can instead becomes comfortable and familiar, part of the pattern of thought…

‘Unfortunately, many in the AI community greatly underestimate the depth of interface design, often regarding it as a simple problem, mostly about making things pretty or easy-to-use. In this view, interface design is a problem to be handed off to others, while the hard work is to train some machine learning system.

‘This view is incorrect. At its deepest, interface design means developing the fundamental primitives human beings think and create with. This is a problem whose intellectual genesis goes back to the inventors of the alphabet, of cartography, and of musical notation, as well as modern giants such as Descartes, Playfair, Feynman, Engelbart, and Kay. It is one of the hardest, most important and most fundamental problems humanity grapples with.

‘As discussed earlier, in one common view of AI our computers will continue to get better at solving problems, but human beings will remain largely unchanged. In a second common view, human beings will be modified at the hardware level, perhaps directly through neural interfaces, or indirectly through whole brain emulation.

We’ve described a third view, in which AIs actually change humanity, helping us invent new cognitive technologies, which expand the range of human thought. Perhaps one day those cognitive technologies will, in turn, speed up the development of AI, in a virtuous feedback cycle:

Screenshot 2019-02-04 18.16.42

It would not be a Singularity in machines. Rather, it would be a Singularity in humanity’s range of thought… The long-term test of success will be the development of tools which are widely used by creators. Are artists using these tools to develop remarkable new styles? Are scientists in other fields using them to develop understanding in ways not otherwise possible?’

I would add: are governments using these tools to help them think in ways we already know are more powerful and to explore new ways of making decisions and shaping the complex systems on which we rely?

Nielsen also wrote this fascinating essay ‘Augmenting long-term memory’. This involves a computer tool (Anki) to aid long-term memory using ‘spaced repetition’ — i.e testing yourself at intervals which is shown to counter the normal (for most people) process of forgetting. This allows humans to turn memory into a choice so we can decide what to remember and achieve it systematically (without a ‘weird/extreme gift’ which is how memory is normally treated). (It’s fascinating that educated Greeks 2,500 years ago could build sophisticated mnemonic systems allowing them to remember vast amounts while almost all educated people now have no idea about such techniques.)

Connected to this, Nielsen also recently wrote an essay teaching fundamentals of quantum mechanics and quantum computers — but it is an essay with a twist:

‘[It] incorporates new user interface ideas to help you remember what you read… this essay isn’t just a conventional essay, it’s also a new medium, a mnemonic medium which integrates spaced-repetition testing. The medium itself makes memory a choice This essay will likely take you an hour or two to read. In a conventional essay, you’d forget most of what you learned over the next few weeks, perhaps retaining a handful of ideas. But with spaced-repetition testing built into the medium, a small additional commitment of time means you will remember all the core material of the essay. Doing this won’t be difficult, it will be easier than the initial read. Furthermore, you’ll be able to read other material which builds on these ideas; it will open up an entire world…

‘Mastering new subjects requires internalizing the basic terminology and ideas of the subject. The mnemonic medium should radically speed up this memory step, converting it from a challenging obstruction into a routine step. Frankly, I believe it would accelerate human progress if all the deepest ideas of our civilization were available in a form like this.’

This obviously has very important implications for education policy. It also shows how computers could be used to improve learning — something that has generally been a failure since the great hopes at PARC in the 1970s. I have used Anki since reading Nielsen’s blog and I can feel it making a big difference to my mind/thoughts — how often is this true of things you read? DOWNLOAD ANKI NOW AND USE IT!

We need similarly creative experiments with new mediums that are designed to improve  standards of high stakes decision-making.

*

4. Summary

We could create systems for those making decisions about m/billions of lives and b/trillions of dollars, such as Downing Street or The White House, that integrate inter alia:

  • Cognitive toolkits compressing already existing useful knowledge such as checklists for rational thinking developed by the likes of Tetlock, Munger, Yudkowsky et al.
  • A Nielsen/Victor research program on ‘Seeing Rooms’, interface design, authoring tools, and cognitive technologies. Start with bunging a few million to Victor immediately in return for allowing some people to study what he is doing and apply it in Whitehall, then grow from there.
  • An alpha data science/AI operation — tapping into the world’s best minds including having someone like David Deutsch or Tim Gowers as a sort of ‘chief rationalist’ in the Cabinet (with Scott Alexander as deputy!) — to support rational decision-making where this is possible and explain when it is not possible (just as useful).
  • Tetlock/Hanson prediction tournaments could easily and cheaply be extended to consider ‘clusters’ of issues around themes like Brexit to improve policy and project management.
  • Groves/Mueller style ‘systems management’ integrated with the data science team.
  • Legally entrenched Red Teams where incentives are aligned to overcoming groupthink and error-correction of the most powerful. Warren Buffett points out that public companies considering an acquisition should employ a Red Team whose fees are dependent on the deal NOT going ahead. This is the sort of idea we need in No10.

Researchers could see the real operating environment of decision-makers at the apex of power, the sort of problems they need to solve under pressure, and the constraints of existing centralised systems. They could start with the safe level of ‘tools that we already know work really well’ — i.e things like cognitive toolkits and Red Teams — while experimenting with new tools and new ways of thinking.

Hedge funds like Bridgewater and some other interesting organisations think about such ideas though without the sophistication of Victor’s approach. The world of MPs, officials, the Institute for Government (a cheerleader for ‘carry on failing’), and pundits will not engage with these ideas if left to their own devices.

This is not the place to go into how to change this. We know that the normal approach is doomed to produce the normal results and normal results applied to things like repeated WMD crises means disaster sooner or later. As Buffett points out, ‘If there is only one chance in thirty of an event occurring in a given year, the likelihood of it occurring at least once in a century is 96.6%.’ It is not necessary to hope in order to persevere: optimism of the will, pessimism of the intellect…

*

A final thought…

A very interesting comment that I have heard from some of the most important scientists involved in the creation of advanced technologies is that ‘artists see things first’ — that is, artists glimpse possibilities before most technologists and long before most businessmen and politicians.

Pixar came from a weird combination of George Lucas getting divorced and the visionary Alan Kay suggesting to Steve Jobs that he buy a tiny special effects unit from Lucas, which Jobs did with completely wrong expectations about what would happen. For unexpected reasons this tiny unit turned into a huge success — as Jobs put it later, he was ‘sort of snookered’ into creating Pixar. Now Alan Kay says he struggles to get tech billionaires to understand the importance of Victor’s ideas.

The same story repeats: genuinely new ideas that could create huge value always seem so odd that almost all people in almost all organisations cannot see new possibilities. If this is true in Silicon Valley, how much more true is it in Whitehall or Washington… 

If one were setting up a new party in Britain, one could incorporate some of these ideas. This would of course also require recruiting very different types of people to the norm in politics. The closed nature of Westminster/Whitehall combined with first-past-the-post means it is very hard to solve the coordination problem of how to break into this system with a new way of doing things. Even those interested in principle don’t want to commit to a 10-year (?) project that might get them blasted on the front pages. Vote Leave hacked the referendum but such opportunities are much rarer than VC-funded ‘unicorns’. On the other hand, arguably what is happening now is a once in 50 or 100 year crisis and such crises also are the waves that can be ridden to change things normally unchangeable. A second referendum in 2020 is quite possible (or two referendums under PM Corbyn, propped up by the SNP?) and might be the ideal launchpad for a completely new sort of entity, not least because if it happens the Conservative Party may well not exist in any meaningful sense (whether there is or isn’t another referendum). It’s very hard to create a wave and it’s much easier to ride one. It’s more likely in a few years you will see some of the above ideas in novels or movies or video games than in government — their pickup in places like hedge funds and intelligence services will be discrete — but you never know…

*

Ps. While I have talked to Michael Nielsen and Bret Victor about their ideas, in no way should this blog be taken as their involvement in anything to do with my ideas or plans or agreement with anything written above. I did not show this to them or even tell them I was writing about their work, we do not work together in any way, I have just read and listened to their work over a few years and thought about how their ideas could improve government.

Further Reading

If interested in how to make things work much better, read this (lessons for government from the Apollo project) and this (lessons for government from ARPA-PARC’s creation of the internet and PC).

Links to recent reports on AI/ML.

Should we trust London police (and therefore the Mayor’s/Home Office’s) claims on crime stats?

I just read a report in the Islington newspaper about crime statistics.

Ten days ago I was sitting outside a cafe in Islington.

Thanks to a few years of working in a nightclub and a couple of years in Russia followed by episodes like the referendum, I have developed a greater than average degree of paranoia. This is almost always irritating but occasionally useful. 

Typing away on my computer, I sensed a scooter’s noise was too close.

I looked up and my eyes locked on those of someone driving their scooter into my table in about one second’s time.

Between a mask across his nose and a hat, all I could see was young black male eyes (~15-25). I think it extremely unlikely I could have identified him in a lineup.

A second later his bike hit my table and he grabbed my laptop.

I grabbed it back, we wrestled, he nearly fell off his bike and after a half second pause when I thought ‘fuckhesabouttogetoffhisbikehithimfirst’ he whizzed off down the pavement nearly smashing into someone on the pavement.

This was seen by half a dozen people. Two were calling the cops within seconds. 

I stood on the street and cursed my stupidity in fighting over a laptop (no Carole, no evidence of global conspiracies there).

The cops told both witnesses they would come.

I hung around for an hour or so. They never came despite having been told the whole scene was captured on CCTV.

A few days later, as I was typing on my laptop inside the same cafe, the same scene played out.

I saw it through the window as a guy grabbed a laptop from a girl and whizzed off. She was sitting between two parents each with a small child. Both parents were rightly worried about the potential for such attacks to lead to a collision between escaping bike and toddler. My wife and I often sit there with our toddler.

She called the cops and told them it would be on CCTV (I’d told her).

The cops said they’d come.

I hung around for an hour or so.

They never came.

These two incidents have happened after a spate of knife attacks in the half a square mile around this cafe and the colonisation of Rosemary Gardens by various gangs at various times of the day.

According to the Islington Gazette today (HERE), the local cops are claiming that moped crime is 60% down.

I know from my nightclub days that when local cops need to show a fall in crime for political reasons there are all sorts of ways in which they can easily cheat numbers.

As far as I understand it, neither of the two moped attacks above would be recorded in the stats. There was no attempt to watch CCTV footage or gather evidence once they knew the people concerned were not claiming injuries.

Should I trust official statistics such as those announced by Islington police today or am I right to be sceptical? Are there are any serious statistical papers estimating what sort of errors are likely in such statistics (NB. polling companies often misstate the definition of ‘margin of error’ in their own polls so it is common for fields to have ropey ideas about error rates)? 

Apart from whether there is a rigorous process for gathering such statistics locally, is there a Red Team that acts across London to review local processes?

Do Corbyn and Thornberry (local MPs) believe the official statistics? Does the Mayor? Do the Home Secretary and Prime Minister? (I imagine that the Mayor’s office has no real capacity to interrogate official figures and is more or less completely reliant on what he is told?)

Ps. Later that day I called 111 to see what would happen if I tried to report it. A recorded message said long delays, use the website. I went to the website, started to fill it in, the page crashed, and I abandoned ship. Doubtless I should have pursued harder to ensure it was recorded but I guess my behaviour is roughly typical so many similar incidents with other people are probably not being recorded, which is my main point.

Pps. At the least, saying ‘we will come’ then not coming leads local people to conclude a) you can’t rely on the police, b) they’re giving up. So if they are not going to come, it would be wiser to say so and explain why. It’s always interesting when such basic processes are wrong. E.g the way health systems kill thousands every year needlessly because they don’t use simple checklists to avoid central line infections. People in politics tend to spend far too much time on higher profile issues affecting few people and too little time on such basic processes that affect thousands or millions and which we know how to do much better… Cf. blog on expertise which is also relevant to the new money for the NHS.

On the referendum #18: the ECJ uses the Charter of Fundamental Rights to take more power over the UK, this is just the start

The European Court of Justice (ECJ) in Luxembourg today issued a series of judgments of considerable importance for the UK and the upcoming EU referendum.

Firstly, in a case called Delvigne, the ECJ held that some prisoners must obtain the vote in elections to the European Parliament. Rejecting the UK Government’s submissions and agreeing with those of the European Commission and Parliament, it held that it had jurisdiction under article 51 of the EU’s Charter of Fundamental Rights to rule on the legality of a French law depriving some prisoners of the vote. Overruling two recent decisions of the Supreme Court in R (Chester) v Lord President of the Council and Moohan v Lord Advocate, the ECJ held that the Charter gives EU citizens the right to vote in certain elections.

Although the ECJ accepted the French law was a ‘proportionate’ limitation of the right to vote, there is little doubt that the UK’s 1983 legislation is not ‘proportionate’ and is thus contrary to the Charter. The European Court of Human Rights in Strasbourg has consistently said as much, and article 51(3) of the Charter states that the Charter must be interpreted to conform to Strasbourg rulings. The ECJ found the French legislation was ‘proportionate’ because unlike the UK, France only removed the vote from prisoners sentenced to more than five years, and gave them a right to review their disenfranchisement. The UK has no right of review, and removes the vote from all prisoners.

It is now a virtual certainty that some prisoners will gain the right to vote in European Parliamentary elections. Either the UK’s 1983 legislation will be set aside altogether, or prisoners denied the vote will win damages under the EU law doctrine of ‘state liability’. This is a legal certainty for as long as the European Communities Act 1972 remains in force, regardless of what the Government might say. In 2011, Parliament voted overwhelmingly against allowing any convicted prisoners the vote, and Cameron has said the prospect makes him ‘physically ill’.

Secondly, in a case called Schrems, the Court invalidated the 2000 EU-US safe harbour agreement on procedural grounds, with big implications for British companies which store data outside the EU. More importantly, the Court engaged in a significant discussion about the compatibility of government surveillance programmes with the Charter of Fundamental Rights (see paragraphs [87]-[95]). In a little noticed case in July 2015, the Divisional Court in London struck down the Data Retention and Investigatory Powers Act 2014 for inconsistency with the Charter. Nothing in today’s ruling suggests the Government’s appeal against that ruling will be successful. In fact, the reverse is true. Despite ‘security’ being the theme of the Conservative Party conference, a foreign court, rather than the British Parliament, will now decide what is necessary to protect the UK’s national security. The EU has long desired to scupper the US-UK intelligence sharing agreement that has been in place since 1945. The ECJ now has the tool it needs to start doing this.

The ECJ has been given more power over the UK by the Charter than the US Supreme Court has over the American states. Although pro-EU lawyers claim the Charter only applies when the UK ‘implements EU law’, the UK Supreme Court made clear in 2011 that that potentially limitless phrase ‘is to be interpreted broadly’. Even pro-EU bodies like the CBI admit than over half of new British laws originate from the EU. The ECJ will increasingly use the Charter to do whatever it likes without any democratic accountability. The Blair Government wrongly claimed that it had an ‘opt-out’ from the Charter, which it alleged would have the same legal status as ‘The Beano or the Sun’. Subsequent cases in the ECJ and UK Supreme Court have made clear these claims were entirely false, and that the Charter has ‘direct effect’ in UK law. David Cameron once claimed that ‘we will want a complete opt-out from the Charter of Fundamental Rights,’ a promise repeated at p. 114 of the 2010 Conservative Manifesto, yet the Charter does not appear to be part of the renegotiation.

Today’s judgments also demonstrate that the Conservatives’ 2015 manifesto promise to repeal the Human Rights Act 1998 and reform the ECHR is pointless within the EU. As Mr Justice Mostyn observed in a 2013 judgment, the Charter contains all the rights in the ECHR and this ‘much wider Charter of Rights would remain part of our domestic law even if the Human Rights Act were repealed.’

Thirdly, the Advocate General issued an opinion in a case where the Commission was suing the UK over the requirement that claimants of child tax credits and child benefit must be habitually and lawfully resident in the UK. Amazingly, the Commission argued that claimants didn’t need to be lawfully resident in the UK: an argument that the AG rejected. However, the ECJ isn’t bound by this opinion, and might still accept the Commission’s extreme arguments, as it did today over prisoner voting.

However, the AG stated that any checks on the lawful residence of EU citizens had to comply with the principle of ‘proportionality’, and could not be carried ‘out in every single case, something which, in my view, is prohibited’. The AG stated that the UK must apply a presumption that EU citizens resident in the UK for more than three months are here lawfully. This undermines a pledge by David Cameron that EU jobseekers who haven’t found work within three months will be forced to leave the UK.

The opinion makes it crystal clear that for as long as Britain remains in the EU, the ultimate arbiter of whether migrants can stay in the UK and claim benefits will be the EU institutions like the ECJ, not the British people.

For many decades, Whitehall has deceived itself and deceived the public about the true nature of the EU project. Their ability to keep doing this is crumbling…

Standards In English Schools Part I: The introduction of the National Curriculum and GCSEs

The Introduction to this series of blogs, HERE, sets out the background and goals.

There are many different senses in which people discuss ‘standards’. Sometimes they mean an overall judgement on the performance of the system as judged by an international test like PISA. Sometimes they mean judgements based on performance in official exams such as KS2 SATs (at 11) or GCSEs. Sometimes they mean the number of schools above or below a DfE ‘floor target’. Sometimes they mean the number of schools and/or pupils in Ofsted-defined categories. Sometimes people talk about ‘the quality of teachers’. Sometimes they mean ‘the standards required of pupils when they take certain exams’. Today, the media is asking ‘have Academies raised standards?’ because of the Select Committee Report (which, after a brief flick through, seems to have ignored most of the most interesting academic studies done on a randomised/pseudo-randomised basis).

This blog in the series is concerned mainly with the questions of – what has happened to the standards required of pupils when they take GCSEs and A Levels as a result of changes since the mid-1980s, and how do universities and learned societies judge the preparation of pupils for further studies. Have the exams got easier? Do universities and learned societies think pupils are well-prepared for further studies?

I will give a very short potted history of the introduction of GCSEs and the National Curriculum before examining the evidence of their effects. If you are not interested in the history, please skip to the Section B on Evidence. If you just want to see my Conclusions, scroll to the end for a short section.

I stress that my goal is not to argue for a return to the pre-1988 system of O Levels and A Levels. While it had some advantages over the existing system, it also had profound problems. I think that an unknown fraction of the cohort could experience far larger improvements in learning than we see now if they were introduced to different materials in different ways, rather than either contemporary exams or their predecessors, but I will come to this argument, and why I have this belief, in a later blog.

I have used the word ‘Department’ to represent the DES of the 1980s, the DfE of post-2010, and its different manifestations in between.

This is just a rough first stab at collecting things I’ve shoved in boxes, emails etc over the past few years. Please leave corrections and additions in Comments.

A. A very potted history

Joseph introduces GCSEs – ‘a right old mess’

The debate over the whole of education policy, and particularly the curriculum and exams, changed a lot after Callaghan’s Ruskin speech in 1976 and the Department’s Yellow Book. Before then, the main argument was simply about providing school places and the furore over selection. After 1976 the emphasis shifted to ‘standards’ and there was growing momentum behind a National Curriculum (NC) of some sort and reforms to the exam system.

Between 1979-85, the Department chivvied LAs on the curriculum but had little power and nothing significant changed. Joseph was too much of a free marketeer to support a NC so its proponents could not make progress.

Joseph was persuaded to replace O Levels with GCSEs. He thought that the outcome would be higher standards for all but he later complained that he had been hoodwinked by the bureaucratic process involving The Schools Examination Committee (SEC). He later complained:

‘I should have fought against flabbiness in general more than I did… I thought I did, but how do you reach into such a producer-oriented world? … “Stretching” was my favourite word; I judged that if you leant on that much else would follow. That’s what my officials encouraged me to imagine I was achieving… I said I’d only agree to unify the two examinations provided we established differentiation [which he defined as ‘you’re stretching the academic and you’re stretching the non-academic in appropriate ways’], and now I find that unconsciously I have allowed teacher assessment, to a greater extent than I assumed. My fault … my fault… it’s the job of ministers to see deeply… and therefore it’s flabby… You don’t find me defending either myself or the Conservative Party, but I reckon that we’ve all together made a right old mess of it. And it’s hurt most those who are most vulnerable.’ (Interview with Ball.)

I have not come across any other ministers or officials from this period so open about their errors.

The O Level survived under a different name as an international exam provided by Cambridge Assessment. It is still used abroad including in Singapore which regularly comes in the top three in all international tests. Cambridge Assessment also offers an ‘international GCSE’ that is, they say, tougher than the ‘old’ GCSE (i.e. the one in use now before it changes in 2015) but not as tough as the O Level. This international GCSE was used in some private schools pre-2010 along with ‘international GCSEs’ from other exam boards. From 2010, state schools could use iGCSEs. In 2014, the DfE announced that it would stop this again. I blogged on this decision HERE.

Entangled interests – Baker and the National Curriculum

In 1986, Thatcher replaced Joseph with Baker hoping, she admitted, that he would make up ‘in presentational flair what ever he lacked in attention to detail’. He did not. Nigel Lawson wrote of Baker that ‘not even his greatest friends would describe him as a profound thinker or a man with mastery of detail’. Baker’s own PPS said that at the morning meeting ‘the main issue was media handling’. Jenny Bacon, the official responsible for The National Curriculum 5-16 (1987), said that Baker liked memos ‘in “ball points” … some snappy things with headings. It wasn’t glorious continuous prose…[Ulrich, a powerful DES official] was appalled but Baker said “That’s just the kind of brief I want”.’

Between 1976 and 1986, concern had grown in Whitehall about the large number of awful schools and widespread bad teaching. Various intellectual arguments, ideology, political interests (personal and party), and bureaucratic interests aligned to create a National Curriculum. Thatcherites thought it would undermine what they thought of as the ‘loony left’, then much in the news. Baker thought it would bring him glory. The Department and HMI rightly thought it would increase their power. After foolishly announcing CTCs at Party Conference, thus poisoning their brand with politics from the start, Baker announced he would create a NC and a testing system at 7, 11, and 14.

The different centres of power disagreed on what form the NC would take. HMI lobbied against subjects and wanted a NC based on ‘areas of expertise’, not traditional subjects. Thatcher wanted a very limited core curriculum based on English, maths, and science. The Department wanted a NC that stretched across the whole curriculum. Baker agreed with the Department and dismissed Thatcher’s limited option as ‘Gradgrind’.

In order to con Thatcher into agreeing his scheme, Baker worked with officials to invent a fake distinction between ‘core’ and ‘foundation’ subjects. As Baker’s Permanent Secretary Hancock said, ‘We devised the notion of the core and the foundation subjects but if you examine the Act you will see that there is no difference between the two. This was a totally cynical and deliberate manoeuvre on Kenneth Baker’s part.’

The 1988 Act established two quangos to be what Baker called ‘the twin guardians of the curriculum’ – The National Curriculum Council (NCC), focused on the NC, and The Schools Examinations and Assessment Council (SEAC), focused on tests. Once the Act was passed, Baker’s junior minister Rumbold said that ‘Ken went out to lunch.’ Like many ministers, he did not understand the importance of the policy detail and the intricate issues of implementation. He allowed officials to control appointments to the two vital committees and various curriculum working groups. Even Baker’s own spad later said that Baker was conned into appointing ‘the very ones responsible for the failures we have been trying to put right’. Baker forlornly later admitted that ‘I thought you could produce a curriculum without bloodshed. Then people marched over mathematics. Great armies were assembled’, and he ‘never envisaged it would be as complex as it turned out to be’. Bacon, the official responsible for the NC, said that Baker ‘wasn’t interested in the nitty gritty’. Nicholas Tate (who was at the NCC and later headed the QCA) said that Baker was ‘affable but remote. He didn’t trouble his mind with attainment targets. He was resting on his laurels.’ Hancock, his Permanent Secretary, said that ‘after 1987 he became increasingly arrogant and impatient’. In 1989, Baker was moved to Party Chairman leaving behind chaos for his successor.

According to his colleagues, Baker was obsessed with the media, he did not try to understand (and did not have the training to understand) the policy issues in detail, and he confused the showmanship necessary to get a bill passed with serious management – he described himself as ‘a doer’ but the ‘doing’ in his mind consisted of legislation and spin. He did not even understand that there were strong disputes among teachers, subject bodies, and educationalists about the content of the NC – never mind what to do about these disputes. (Having watched the UTC programme from the DfE, the same traits were much in evidence thirty years later.)

Baker’s legacy 1989 – 1997: Shambles

Baker’s memoirs do not mention the report of The Task Group on Assessment (TGAT), chaired by Professor Paul Black, commissioned by Baker in 1987 to report on how the NC could be assessed. The plan was very complicated with ten levels of attainment having to be defined for each subject. Thatcher hated it and criticised Baker for accepting it. Meanwhile the Higginson Report had recommended replacing A Levels with some sort of IB type system. Bacon said that ‘the political trade-off was Higginson got ditched … and we got TGAT. In retrospect it may have been the wrong trade off.’

MacGregor could not get a grip of the complexity. He did not even hire a specialist policy adviser because, he said, ‘I didn’t feel I needed one.’ He blamed Baker for the chaos who, he said, ‘hadn’t spent enough time thinking about who was appointed to the bodies. He left it to officials and didn’t think through what he wanted the bodies to do. For the first year I was unable to replace anybody.’ The chairman of NCC described how they used ‘magic words to appease the right’ and get through what they wanted. The officials who controlled SEAC stopped the simplification that Thatcher wanted using the ‘legal advice’ card, claiming that the 1988 Act required testing of all attainment targets. (I had to deal with the same argument 25 years later.) MacGregor was trapped. He had an unworkable system and was under contradictory pressure from Thatcher to simplify everything and from Baker to maintain what he had promised.

Clarke bluffed and bullied his way through 18 months without solving the problems. His Permanent Secretary described the trick of getting Clarke to do what officials wanted: ‘The trick was to never box him into a corner… Show him where there was a door but never look at that door, and never let on you noticed when he walked through.’ Like MacGregor, Clarke blamed Baker for the shambles: ‘[Baker] had set up all these bloody specialist committees to guide the curriculum, he’d set up quango staff who as far as I could see had come out of the Inner London Education Authority the lot of them.’ Clarke solved none of the main problems with the tests, antagonised everybody, and replaced HMI with Ofsted.

After his surprise win, Major told the Tory Conference in 1992, ‘Yes it will mean another colossal row with the education establishment. I look forward to that.’ Patten soon imploded, the unions went for the jugular over the introduction of SATs, and by the end of 1993 Number Ten had backtracked on their bellicose spin and was in full retreat with a review by Dearing (published 1994). Suddenly, the legal advice that had supposedly prevented any simplification was rethought and officials told Dearing that the legal advice did allow simplification after all: ‘our advice is that the primary legislation allows a significant measure of flexibility’. (In my experience, one of the constants of Whitehall is that legal advice tends to shift according to what powerful officials want.) Dearing produced a classic Whitehall fudge that got everybody out of the immediate crisis but did not even try to deal with the fundamental problems, thus pushing the problems into the future.

The historian Robert Skidelsky, helping SEAC, told Patten ‘these tests will not run’ and he should change course but Patten shouted ‘That is defeatist talk.’ Skidelsky decided to work out a radically simpler model than the TGAT system with a small group in SEAC: ‘We pushed the model through committee and through the Council and sent it off to John Patten. We never received a reply. Six months after I resigned Emily Blatch approached me and said she had been looking for my paper on Assessment but no one seems to know where it is.’

Patten was finished. Gillian Shephard was put in to be friendly to the unions and quiet the chaos. Soon she and Major had also fallen out and the cycle of briefing and counter-briefing against Number Ten returned with permanent policy chaos. One of her senior officials, Clive Saville, concluded that ‘There was a great intellectual superficiality about Gillian Shephard and she was as intellectually dishonest as Shirley Williams. She was someone who wanted to be liked but wasn’t up to the job.’

A few thoughts on the process

The Government had introduced a new NC and test system and replaced O Levels with GCSEs. (They also introduced new vocational qualifications (NVQs) described by Professor Alan Smithers as a ‘disaster of epic proportions … utterly lightweight’.) The process was a disastrous bungle from start to finish.

Thatcher deserves considerable blame. She allowed Baker to go ahead with fundamental reforms without any agreed aims or a detailed roadmap. She knew, as did Lawson, that Baker could not cope with details yet appointed him on the basis of ‘presentational flair’ (media obsession is often confused with ‘presentational flair’).

The best book I have read by someone who has worked in Number Ten and seen why the Whitehall architecture is dysfunctional is John Hoskyns’ Just In Time. Extremely unusually for someone in a senior position in No10, Hoskyns both had an intellectual understanding of complex systems and was a successful manager. Inevitably, he was appalled at how the most important decisions were made and left Number Ten after failing to persuade Thatcher to tear up the civil service system. Since then, everybody in Number Ten has been struggling with the same issues. (If she had taken his advice history might have been extremely different – e.g. no ERM debacle.) His conclusion on Thatcher was:

‘The conclusion that I am coming to is that the way in which [Thatcher] herself operates, the way her fire is at present consumed, the lack of a methodical mode of working and the similar lack of orderly discussion and communication on key issues, means that our chance of implementing a carefully worked out strategy – both policy and communications – is very low indeed… Difficult problems are only solved – if they can be solved at all – by people who desperately want to solve them… I am convinced that the people and the organisation are wrong.’ (Emphasis added.)

Arguably the person who knowingly appoints someone like Baker is more to blame for the failings of Baker than Baker is himself. Major and the string of ministers that followed Baker were doomed. They were not unusually bad – they were representative examples of those at the apex of the political process. They did not know how to go about deciding aims, means, and operations. They were obsessed with media management and therefore continually botched the policy and implementation. They could not control their officials. They could not agree a plan and blamed each other. If they were the sort of people who could have got out of the mess, then they were the sort of people who would not have got into the mess in the first place.

Officials over-complicated everything and, like ministers, did not engage seriously with the core issue – what should pupils of different abilities be doing and how can we establish a process where we can collect reliable information. The process was dominated by the same attitude on all sides – how to impose a mentality already fixed.

It was also clearly affected by another element that has contemporary relevance – the constant churn of people. Just between summer 1989 and the end of 1992, there was: a new Permanent Secretary in May 1989, a new SoS in July 1989 (MacGregor), another new SoS in November 1990 (Clarke), a new PM and No10 team (Major), new heads for the NCC and SEAC in July 1991, then another new SoS in spring 1992 (Patten) and another new Permanent Secretary. Everybody blamed problems on predecessors and nobody could establish a consistent path.

Even its own Permanent Secretaries later attacked the DES. James Hamilton (1976-1983) was put into DES in June 1976 from the Cabinet Office to help with the Ruskin agenda and found a place where ‘when something was proposed someone would inevitably say, “Oh we tried that back in whenever and it didn’t work”…’. Geoffrey Holland (1992-3) admitted that, ‘It [DES] simply had no idea of how to get anything off the ground. It was lacking in any understanding or experience of actually making things happen.’

A central irony of the story shows how dysfunctional the system was. Thatcher never wanted a big NC and a complicated testing system but she got one. As some of her ideological opponents in the bureaucracy tried to simplify things when it was clear Baker’s original structure was a disaster, ministers were often fighting with them to preserve a complex system that could not work and which Thatcher had never wanted. This sums up the basic problem – a very disruptive process was embarked upon without the main players agreeing what the goal was.

Although the think tanks were much more influential in this period than they are now, Ferdinand Mount, head of Thatcher’s Policy Unit, made a telling point about their limitations: ‘Enthusiasts for reform at the IEA and the CPS were prodigal with committees and pamphlets but were much less helpful when it came to providing practical options for action. This made it difficult for the Policy Unit’s ideas to overcome the objections put forward by senior officials’. Thirty years later this remains true. Think tanks put out reports but they rarely provide a detailed roadmap that could help people navigate such reforms through the bureaucracy and few people in think tanks really understand how Whitehall works. This greatly limits their real influence. This is connected to a wider point. Few of those who comment prominently on education (or other) policy understand how Whitehall works, hence there is a huge gap between discussions of ideal policy and what is actually possible within a certain timeframe in the existing system, and commentators think that all sorts of things that happen do so because of ministers’ wishes, confusing public debate further.

I won’t go into the post-1997 story. There are various books that tell this whole story in detail. The National Curriculum remained but was altered; the test system remained but gradually narrowed from the original vision; there were some attempts at another major transformation (such as Tomlinson’s attempt to end A Levels, thwarted by Blair) but none took off; money poured into the school system and its accompanying bureaucracy at an unprecedented rate but, other than a large growth in the number and salaries of everybody, it remained unclear what if any progress was being made.

This bureaucracy spent a great deal of taxpayers’ money promoting concepts such as ‘learning styles’ and ‘multiple intelligences’ that have no proper scientific basis but which nevertheless were successfully blended with old ideas from Vygotsky and Piaget to dominate a great deal of teacher training. A lot of people in the education world got paid an awful lot of money (Hargreaves, Waters et al) but what happened to standards?

(The quotes above are taken mainly from Daniel Callaghan’s Conservative Party Education Policies 1976-1997.)

B. The cascading effects of GCSEs and the National Curriculum

Below I consider 1) the data on grade inflation in GCSEs and A Levels, 2) various studies from learned societies and others that throw light on the issue, 3) knock-on effects in universities.

1. Data on grade inflation in GCSEs and A Levels

We do not have an official benchmark against which to compare GCSE results. The picture is therefore necessarily hazy. As Coe has written, ‘we are limited by the fact that in England there has been no systematic, rigorous collection of high-quality data on attainment that could answer the question about systemic changes in standards.’ This is one of the reasons why in 2013 we, supported by Coe and others, pushed through (against considerable opposition including academics at the Institute of Education) a new ‘national reference test’ in English and maths at age 16, which I will return to in a later blog.

However, we can compare the improvement in GCSE results with a) results from international tests and b) consistent domestic tests uncontrolled by Whitehall.

The first two graphs below show the results of this comparison.

Chart 1: Comparison of English performance in international surveys versus GCSE scores 1995-2012 (Coe)

Screenshot 2015-01-06 16.32.49

Chart 2: GCSE grades achieved by candidates with same maths & vocab scores each year 1996-2012 (Coe)

Screenshot 2015-01-06 16.33.23

Professor Coe writes of Chart 1:

‘When GCSE was introduced in 1987 [I think he must mean 1988 as that was the first year of GCSEs or else he means ‘the year before GCSEs were first taken’], 26.4% of the cohort achieved five grade Cs or better. By 2012 the proportion had risen to 81.1%. This increase is equivalent to a standardised effect size of 1.63, 3 or 163 points on the PISA scale… If we limit the period to 1995 – 2011 [as in Chart 1 above] the rise (from 44% to 80% 5A*-C) is equivalent to 99 points on the PISA scale [as superimposed on Chart 1]… [T]he two sets of data [international and GCSEs] tell stories that are not remotely compatible. Even half the improvement that is entailed in the rise in GCSE performance would have lifted England from being an average performing OECD country to being comfortably the best in the world. To have doubled that rise in 16 years is just not believable

‘The question, therefore, is not whether there has been grade inflation, but how much…’ [Emphasis added.] (Professor Robert Coe, ‘Improving education: a triumph of hope over experience‘, 18 June 2013, p. vi.)

Chart 2 plots the improving GCSE grades achieved by pupils scoring the same each year in a test of maths and vocabulary: pupils scoring the same on YELLIS get higher and higher GCSE grades as time passes. Coe concludes that although ‘it is not straightforward to interpret the rise in grades … as grade inflation’, the YELLIS data ‘does suggest that whatever improved grades may indicate, they do not correspond with improved performance in a fixed test of maths and vocabulary’ (Coe, ibid).

This YELLIS comparison suggests that in 2012 pupils received a grade higher in maths, history, and French GCSE, and almost a grade higher in English, than students of the same ability in 1996.

It is important to note that neither of Coe’s charts or measurements include the effects of either a) the initial switch from O Level to GCSE or b) what changed with GCSEs from 1988 – 1995. 

The next two charts show this earlier part of the story (both come from Education: Historical statistics, House of Commons, November 2012). NB. they have different end dates.

Chart 3: Proportion getting 5 O Levels / GCSEs at grade C or higher 1953/4 – 2008/9 

Screenshot 2015-01-09 17.24.19

Chart 4: Proportion getting 1+ or 3+ passes at A Level 1953/4 – 1998/9

Screenshot 2015-01-09 17.24.42

Chart 3 shows that the period 1988-95 saw an even sharper increase in GCSE scores than post-1995 so a GCSE/YELLIS style comparison that included the years 1988-1995 would make the picture even more dramatic.

Chart 4 shows a dramatic increase in A Level passes after the introduction of GCSEs. One interpretation of this graph, supported by the 1997-2010 Government and teaching unions, is that this increase reflected large real improvements in school standards.

There is GCSE data that those who believe this argument could cite. In 1988, 8% of GCSEs were awarded an ‘A’ in GCSE. In 2011, 23% of GCSEs were awarded an ‘A’ or ‘A*’ in GCSE. The DfE published data in 2013 which showed that the number of pupils with ten or more A* grades trebled 2002-12. This implies a very large increase in the numbers of those excelling at GCSE, which is consistent with a picture of a positive knock-on effect on improving A Level results.

However, we have already seen that the claims for GCSEs are ‘not believable’ in Coe’s words. It also seems prima facie very unlikely that a sudden large improvement in A Level results from 1990 could be the result of immediate improvements in learning driven by GCSEs. There is also evidence for A Levels similar to the GCSE/YELLIS comparison.

Chart 5: A level grades of candidates having the same TDA score (1988-2006)

Screenshot 2015-01-21 00.43.33

Chart 5 plots A Level grades in different subjects against the international TDA test. As with GCSEs, this shows that pupils scoring the same in a non-government test got increasingly higher grades in A Levels. The change in maths is particularly dramatic from an ‘Unclassified’ mark in 1988 to a B/C in 2006.

What we know about GCSEs combined with this information makes it very hard to believe that the sudden dramatic increase in A Level performance since 1990 is because of real improvements and suggests another interpretation: these dramatic increases in A Level results reflected (mostly or entirely) A Levels being made significantly easier probably in order to compensate for GCSEs being much easier.

However, the data above can only tell part of the story. Logically, it is hard or impossible to distinguish between possible causes just from these sorts of comparisons. For example, perhaps someone might claim that A Level questions remained as challenging as before but grade boundaries moved – i.e. the exam papers were the same but the marking was easier. I think this is prima facie unlikely but the point is that logically the data above cannot distinguish between various possible dynamics.

Below is a collection of studies, reports, and comments from experts that I have accumulated over the past few years that throws light on which interpretation is more reasonable. Please add others in Comments.

(NB. David Spiegelhalter, a Professor of Statistics at Cambridge, has written about  problems with PISA’s use of statistics. These arguments are technical. To a non-specialist like me, he seems to make important points that PISA must answer to retain credibility and the fact that it has not (as of the last time I spoke to DS in summer 2014) is a blot on its copybook. However, I do not think they materially affect the discussion above. Other international tests conducted on different bases all tell roughly the same story. I will ask DS if he thinks his arguments do undermine the story above and post his reply if any.)

2. Studies 2007 – now 

NB1. Most of these studies are comparing changes over the past decade or so, not the period since the introduction of the NC and GCSEs in the 1980s.

NB2. I will reserve detailed discussion of the AS/A2/decoupling argument for a later blog as it fits better in the ‘post-2010 reforms’ section.

Learned societies. The Royal Society’s 2011 study of Science GCSEs: ‘the question types used provided insufficient opportunity for more able candidates … to demonstrate the extent of their scientific knowledge, understanding and skills. The question types restricted the range of responses that candidates could provide. There was little or no scope for them to demonstrate various aspects of the Assessment Objectives and grade descriptions… [T]he use of mathematics in science was examined in a very limited way.’ SCORE also published (2012) evidence on science GCSEs which reported ‘a wide variation in the amount of mathematics assessed across awarding organisations and confirmed that the use of mathematics within the context of science was examined in a very limited way. SCORE organisations felt that this was unacceptable.’

The 2012 SCORE report and Nuffield Report showed serious problems with the mathematical content of A Levels. SCORE was very critical:

‘For biology, chemistry and physics, it was felt there were underpinning areas of mathematics missing from the requirements and that their exclusion meant students were not adequately prepared for progression in that subject. For example, for physics many of the respondents highlighted the absence of calculus, differentiation and integration, in chemistry the absence of calculus and in biology, converting between different units… For biology, chemistry and physics, the analysis showed that the mathematical requirements that were assessed concentrated on a small number of areas (e.g. numerical manipulation) while many other areas were assessed in a limited way, or not at all… Survey respondents were asked to identify content areas from the mathematical requirements that should feature highly in assessments. In most cases, the biology, chemistry and physics respondents identified mathematical content areas that were hardly or not at all assessed by the awarding organisations.

‘[T]he inclusion of more in-depth problem solving would allow students to apply their knowledge and understanding in unstructured problems and would increase their fluency in mathematics within a science context.’

‘The current mathematical assessments in science A-levels do not accurately reflect the mathematical requirements of the sciences. The findings show that a large number of mathematical requirements listed in the biology, chemistry and physics specifications are assessed in a limited way or not at all within these papers. The mathematical requirements that are assessed are covered repeatedly and often at a lower level of difficulty than required for progression into higher education and employment. It has also highlighted a disparity between awarding organisations in their assessment of the use of mathematics within biology, chemistry and physics A-level. This is unacceptable and the examination system, regardless of the number of awarding organisations, must ensure the assessments provide an authentic representation of the subject and equip all students with the necessary skills to progress in the sciences.

‘This is likely to have an impact on the way that the subjects are taught and therefore on students’ ability to progress effectively to STEM higher education and employment.’ SCORE, 2012. Emphasis added.

The 2011 Institute of Physics report showed strong criticism from university academics of the state of physics and engineering undergraduates’ mathematical knowledge. Four-fifth of academics said that university courses had changed to deal with a lack of mathematical fluency and 92% said that a lack of mathematical fluency was a major obstacle.

‘The responses focused around mathematical content having to be diluted, or introduced more slowly, which subsequently impacts on both the depth of understanding of students, and the amount of material/topics that can be covered throughout the course…

‘Academics perceived a lack of crossover between mathematics and physics at A-level, which was felt to not only leave students unprepared for the amount of mathematics in physics, but also led to them not applying their mathematical knowledge to their learning of physics and engineering.’ IOP, 2011.

The 2011 Centre for Bioscience criticised Biology and Chemistry A Levels and preparation of pupils for bioscience degrees: ‘very many lack even the basics… [M]any students do not begin to attempt quantitative problems and this applies equally to those with A level maths as it does to those with C at GCSE. A lack of mathematics content in A level Biology means that students do not expect to encounter maths at undergraduate level. There needs to be a more significant mathematical component in A level biology and chemistry.’ The Royal Society of Chemistry report, The five decade challenge (2008), said there had been ‘catastrophic slippage in school science standards’ and that Government claims about improving GCSE scores were ‘an illusion’. (The Department said of the RSC report, ‘Standards in science have improved year on year thanks to 10 years of sustained investment and improvement in teaching and the education system – this is something we should celebrate, not criticise. Times have changed.’)

Ofqual, 2012. Ofqual’s Standards Review in 2012 found grade inflation in both GCSE and A-levels between 2001-03 and 2008-10: ‘Many of these reviews raise concerns about the maintenance of standards… In the GCSEs we reviewed (biology, chemistry and mathematics) we found that changes to the structure of the assessments, rather than changes to the content, reduced the demand of some qualifications.’

On A-levels, ‘In general we found that changes to the way the content was assessed had an impact on demand, in many cases reducing it. In two of the reviews (biology and chemistry) the specifications were the same for both years. We found that the demand in 2008 was lower than in 2003, usually because the structure of the assessments had changed. Often there were more short answer, structured questions’ (Ofqual, Standards Reviews – A Summary, 1 May 2012, found here).

Chief Executive of Ofqual, Glenys Stacey, has said: ‘If you look at the history, we have seen persistent grade inflation for these key qualifications for at least a decade… The grade inflation we have seen is virtually impossible to justify and it has done more than anything, in my view, to undermine confidence in the value of those qualifications’ (Sunday Telegraph, 28 April 2012).

The OECD’s International Survey of Adult Skills (October 2013). This assessed numeracy, literacy and computing skills of 16-24-year-olds. The tests were done over 2011/2012. England was 22nd out of 24 for literacy, 21st out of 24 for numeracy, and is 16th out of 20 for ‘problem solving in a technology-rich environment’.

PISA 2012. The normal school PISA tests taken in 2012 (reported 2013) showed no significant change between 2009-12. England was 21st for science, 23rd for reading, and 26th for mathematics. A 2011 OECD report concluded: ‘Official test scores and grades in England show systematically and significantly better performance than international and independent tests… [Official results] show significant increases in quality over time, while the measures based on cognitive tests not used for grading show declines or minimal improvements’ (OECD Economic Surveys: United Kingdom, 16 March 2011, p. 88-89). This interesting chart shows that in the PISA maths test the children of English professionals perform the same as children of Singapore cleaners (Do parents’ occupations have an impact on student performance?, PISA 2014).

Chart 6: Comparing pupil maths scores by parent occupation, UK (left) and Singapore (right) maths skills (PISA 2012)

Screenshot 2015-01-26 18.43.03

TIMMS/PIRLS. The TIMMS/PIRLS tests (taken summer 2011, reported December 2012) told a similar story to PISA. England’s score in reading at age 10 increased since 2006 by a statistically significant amount. England’s score in science at age 10 decreased since 2007 by a statistically significant amount. England’s scores in science at age 14 and mathematics at ages 10 and 14 showed no statistically significant changes since 2007. (According to experts, the PISA maths test relies more on language comprehension than TIMMS which is supposedly why Finland scores higher in the former than the latter.)

National Numeracy (February 2012). Research showed that in 2011 only a fifth of the adult population had mathematical skills equivalent to a ‘C’ in GCSE, down a few percent from the last survey in 2003. About half of 16-65 year olds have at best the mathematical skills of an 11 year-old. A fifth of adults will struggle with understanding price labels on food and half ‘may not be able to check the pay and deductions on a wage slip.’

King’s College, 2009. A major study by academics from King’s College London and Durham University found that basic skills in maths have declined since the 1970s. In 2008, less than a fifth of 14 year-olds could write 11/10 as a decimal. In the early 1980s, only 22 per cent of pupils obtained a GCE O-level grade C or above in maths. In 2008, over 55 per cent gained a GCSE grade C or above in the subject (King’s College London/University of Durham, ‘Secondary students’ understanding of mathematics 30 years on‘, 5 September 2009, found here).

Chart 7: Performance on ICCAMS / CSMS Maths tests showing declines over time

Screenshot 2015-01-22 16.42.53

Shayer et al (2007) found that performance in a test of basic scientific concepts fell significantly between 1976 and 2003. ‘[A]lthough both boys and girls have shown great drops in performance, the relative drop is greater for boys… It makes it difficult to believe in the validity of the year on year improvements reported nationally on Key Stage 3 NCTs in science and mathematics: if children are entering secondary from primary school less and less equipped with the necessary mental conditions for processing science and mathematics concepts it seems unlikely that the next 2.5 years KS3 teaching will have improved so much as more than to compensate for what students of today lack in comparison with 1976.’

Chart 8: Performance on tests of scientific concepts, 1976 – 2003 (Shayer)

Screenshot 2015-01-23 17.21.10

Tymms (2007) reviewed assessment evidence in mathematics from children at the end of primary school between 1978 and 2004 and in reading between 1948 and 2004. The conclusion was that standards in both subjects ‘have remained fairly constant’.

Warner (2013) on physics. Professor Mark Warner (Cambridge University) produced a fascinating report (2013) on problems with GCSE and A Level Physics and compared the papers to old O Levels,  A Levels, ‘S’ Level papers, Oxbridge entry exams, international exams and so on. After reading it, there is no room for doubt. The standards demanded in GCSEs and A Levels have fallen very significantly.

‘[In modern papers] small steps are spelt out so that not more than one thing needs to be addressed before the candidate is set firmly on the right path again. Nearly all effort is spent injecting numbers into formulae that at most require GCSE-level rearrangements… All diagrams are provided… 1986 O-level … [is] certainly more difficult than the AS sample… 1988 A-level … [is] harder than most Cambridge entrance questions currently… 1983 Common Entrance [is] remarkably demanding for this age group, approaching the challenge of current AS… There is a staggering difference in the demands put on candidates… Exams [from the 1980s] much lower down the school system are in effect more difficult than exams given now in the penultimate years [i.e. AS].’

For example, the mechanics problems in GCSE Physics are substantially shallower than those in 1980s O Level, which examined concepts now in A Level. The removal of calculus from A Level physics badly undermined it. Calculus is tested in A Level Maths’ Mechanics I paper and Mechanics II and III test deeper material than Physics A Level. This is one of the reasons why Cambridge Physics department stopped requiring Physics A Level for entry and made clear that Further Maths A Level is acceptable instead (many say it is better preparation for university than physics A Level is).

Warner also makes the point that making Physics GCSE and A Level much easier did not even increase the number taking physics degrees, which has declined sharply since the mid-1980s. He concludes: ‘one could again aim for a school system to get a sizable fraction of pupils to manage exams of these [older] standards. Children are not intrinsically unable to attack such problems.’ (NB. The version of this report on the web is not the full version – I would urge those interested to email Professor Warner.)

Gowers (2012) on maths. Tim Gowers, Cambridge professor and Fields Medallist, described some problems with Maths A Level and concluded:

‘The general point here is of course that A-levels have got easier [emphasis added] and schools have a natural tendency to teach to the test. If just one of those were true, it would be far less of a problem. I would have nothing against an easy A-level if people who were clever enough were given a much deeper understanding than the exam strictly required (though as I’ve argued above, for many people teaching to the test is misguided even on its own terms, since they will do a lot better on the exam if they have not been confined to what’s on the test), and I would not be too against teaching to the test if the test was hard enough…

‘[S]ome exams, such as GCSE maths, are very very easy for some people, such as anybody who ends up reading mathematics at Cambridge (but not just those people by any means). I therefore think that the way to teach people in top sets at schools is not to work towards those exams but just to teach them maths at the pace they can manage.’

Durham University analysis gives data to quantify this conclusion. Pupils who would have received a U (unclassified) in Maths A-Level in 1988 received a B/C in 2006 – see above for Chart 5 showing this (CEM Centre Durham University, Changes in standards at GCSE and A-Level: Evidence from ALIS and YELLIS, April 2007). Further Maths A Level is supposedly the toughest A Level and probably it is but a) it is not the same as its 1980s ancestor and b) it now introduces pupils to material such as matrices that used to be taught in good prep schools.

I spent a lot of time 2007-14 talking to maths dons, including heads of departments, across England. The reason I quote Gowers is that I never heard anybody dispute his conclusion but he was almost the only one who would say it publicly. I heard essentially the same litany about A Level maths from everybody I spoke to: although there were differences of emphasis, nobody disputed these basic propositions. 1) The questions became much more structured so pupils are led up a scaffolding with less requirement for independent problem-solving. 2) The emphasis moved to memorising some basic techniques the choice of which is clearly signalled in the question. 3) The modular system a) encouraged a ‘memorise, regurgitate, forget’ mentality and b) undermined learning about how different topics connect across maths, both of which are bad preparation for further studies. (There are also some advantages to a modular system that I will return to.) 4) Many undergraduates, including even those in the top 5% at such prestigious universities as Imperial, therefore now struggle in their first year as they are not well-prepared by A Level for the sort of problems they are given in undergraduate study. (The maths department at Imperial became so sick of A Level’s failings that they recently sought and got approval to buy Oxford’s entrance exam for use in their admission system.)

I will not go into arguments about vocational qualifications here but note the conclusion of Alison Wolf whose 2011 report on this was not disputed by any of the three main parties:

‘The staple offer for between a quarter and a third of the post- 16 cohort is a diet of low-level vocational qualifications, most of which have little to no labour market value.’

3. Knock-on effects in universities

Serious lack of maths skills

There are many serious problems with maths skills. Part of the reason is that many universities do not even demand A Level maths. The result? As of about 2010-12, about 20% of Engineering undergraduates, about 40% of Chemistry and Economics undergraduates, and about 60-70% of Biology and Computer Science undergraduates did not have A Level Maths. Less than 10% of undergraduate bioscience degree courses demand A Level Maths therefore ‘problems with basic numeracy are evident and this reflects the fact that many students have grades less than A at GCSE Maths. These students are unlikely to be able to carry out many of the basic mathematical approaches, for example unable to manipulate scientific notation with negative powers so commonly used in biology’ (2011 Biosciences report). (I think that history undergraduates should be able to manipulate scientific notation with negative powers – this is one of the many things that should be standard for reasonably able people.)

The Royal Society estimated (Mathematical Needs2012) that about 300,000 per year need a post-GCSE Maths course but only ~100,000 do one. (This may change thanks to Core Maths starting in 2015, see later blog.) This House of Lords report (2012) on Higher Education in STEM subjects concluded: ‘We are concerned that … the level at which the subject [maths] is taught does not meet the requirements needed to study STEM subjects at undergraduate level… [W]e urge HEIs to introduce more demanding maths requirement for admissions into STEM courses as the lack, or low level, of maths requirements at entry acts as a disincentive for pupils to study maths and high level maths at A level.’ House of Lords Select Committee on Science and Technology, Higher Education in STEM subjects, 2012.

Further, though this subject is beyond the scope of this blog, it is also important that the maths PhD pipeline ‘which was already badly malfunctioning has been seriously damaged by EPSRC decisions’, including withdrawal of funding from non-statistics subjects which drew the ire of UK Fields Medallists, cf. Submission by the Council for the Mathematical Sciences to the House of Lords, 2011. The weaknesses in biology also feed into the bioscience pipeline: only six percent of bioscience academics think their graduates are well prepared for a masters in the fast-growing field of Computational Biology (p.8 of report).

Closing of language departments, decline of language skills

I have not found official stats for this but according to research done for the Guardian (with FOIs):

‘The number of universities offering degrees in the worst affected subject, German, has halved over the past 15 years. There are 40% fewer institutions where it is possible to study French on its own or with another language, while Italian is down 23% and Spanish is down 22%.’

As Katrin Kohl, professor of German at Jesus College (Oxford) has said, ‘The UK has in recent years been systematically squandering its already poor linguistic resources.’ Dawn Marley, senior lecturer in French at the University of Surrey, summarised problems across languages:

‘We regularly see high-achieving A-level students who have only a minimal knowledge of the country or countries where the language of study is spoken, or who have limited understanding of how the language works. Students often have little knowledge of key elements in a country’s history – such as the French Revolution, or the fact that France is a republic. They also continue to struggle with grammatical accuracy, and use English structures when writing in the language they are studying… The proposals for the revival of A-level are directly in line with what most, if not all, academics in language departments would see as essential.’ (Emphasis added.)

The same picture applies to classical languages. Already by 1994 the Oxford Classics department was removing texts such as Thucydides as compulsory elements in ‘Greats’ because they were deemed ‘too hard’. These changes continued and have made Classics a very different subject than it was before 1990. At Oxford, they introduced whole new courses (Mods B then Mods C) that do not require any prior study of the ancient languages themselves. The first year of Greats now involves remedial language courses.

I quote at length from a paper by John Davie, a Lecturer in Classics at Trinity College, Oxford, as his comments summarise the views of other senior classicists in Oxbridge and elsewhere who have been reluctant to speak out (In Pursuit of Excellence, Davie, 2013). Inevitably, the problems described are damaging the pipeline for masters, PhDs, and future scholarship.

‘Classics as an academic subject has lost much of its intellectual force in recent years. This is true not only of schools but also, inevitably, of universities, which are increasingly required to adapt to the lowering of standards…

‘In modernist courses…, there is (deliberately) no systematic learning of grammar or syntax, and emphasis is laid on fast reading of a dramatic continuous story in made-up Latin which gives scope for looking at aspects of ancient life. The principle of osmosis underlying this approach, whereby children will learn linguistic forms by constant exposure to them, aroused scepticism among many teachers and has been thoroughly discredited by experts in linguistics. Grammar and syntax learned in this piecemeal fashion give pupils no sense of structure and, crucially, deny them practice in logical analysis, a fundamental skill provided by Classics…

‘[W]e have, in GCSE, an exam that insults the intelligence… Recent changes to this exam have by general consent among teachers made the papers even easier.

‘In the AS exam currently taken at the end of the first year of A-level … students study two small passages of literature, which represent barely a third of an original text. They are asked questions so straightforward as to verge on the banal and the emphasis is on following a prescribed technique of answering, as at GCSE. Imagination and independent thought are simply squeezed out of this process as teachers practise exam-answering technique in accordance with the narrow criteria imposed on examiners.

‘The level of difficulty [in AS] is not substantially higher than that of GCSE, and yet this is the exam whose grades and marks are consulted by the universities when they are trying to determine the ability of candidates… Having learned the translation of these bite-sized chunks of literature with little awareness of their context or the wider picture (as at GCSE, it is increasingly the case that pupils are incapable of working out the Latin/Greek text for themselves, and so lean heavily on a supplied translation), they approach the university interview with little or no ability to think “outside the box”. Dons at Oxford and Cambridge regularly encounter a lack of independent thought and a tendency to fall back on generalisations that betray insufficient background reading or even basic curiosity about the subject. This need not be the case and is clearly the product of setting the bar too low for these young people at school…

‘At A2 … students read less than a third of a literary text they would formerly have read in its entirety.

‘There is the added problem that young teachers entering the profession are themselves products of the modernist approach and so not wholly in command of the classical languages themselves. As a result they welcome the fact that they are not required by the present system to give their pupils a thorough grounding in the language, embracing the less rigorous approach of modern course-books with some relief.

‘In the majority of British universities Classics in its traditional form has either disappeared altogether or has been replaced by a course which presents the literature, history and philosophy mainly (or entirely) in translation, i.e. less a degree course in Classics than in Classical Civilisation.

‘This situation has been forced upon university departments of Classics by the impoverished language skills of young people coming up from schools… It is not only the classical languages but English itself which has suffered in this way in the last few decades. Every university teacher of the classical languages knows that he cannot assume familiarity with the grammar and syntax of English itself, and that he will have to teach from scratch such concepts as an indirect object, punctuation or how a participle differs from a gerund…

‘Even at Oxford cuts have been made to the number of texts students are required to read and, in those texts that remain, not as many lines are prescribed for reading in the original Latin or Greek.

‘In the last ten years of teaching for Mods [at Oxford] I have been struck by how the first-year students who come my way at the start of the summer term appear to know less about the classical languages each year, an experience I know to be shared by dons at other colleges…

‘GCSE should be replaced by a modern version of the O-level that stretches pupils… This would make the present AS exam completely unsuitable, and either a more challenging set of papers should be devised, if the universities wish to continue with pre A-level interviewing, or there should be a return to an unexamined year of wide reading before the specialisation of the last year.

‘Although the present exam, A2, has more to recommend it than AS, it also would no longer be fit for purpose and would need strengthening. As part of both final years there should be regular practice in the writing of essays, a skill that has been largely lost in recent years because of the exam system and is (rightly) much missed by dons.’

This combination of problems explains why we funded a project with Professor Pelling, Regius Professor of Greek at Oxford, to fund teacher training and language enrichment courses for schools.

I will not go into other humanities subjects. I read Ancient & Modern History and have thoughts about it but I do not know of any good evidence similar to the reports quoted above by the likes of the Royal Society. I have spoken to many university teachers. Some, such as Professor Richard Evans (Cambridge) told me they think the standard of those who arrive as undergraduates is roughly the same as twenty years ago. Others at Oxbridge and elsewhere told me they think that essay writing skills have deteriorated because of changes to A Level (disputed by Evans and others) and that language skills among historians have deteriorated (undisputed by anyone I spoke to).

For example, the Cambridge Professor of Mediterranean History, David Abulafia, has contradicted Evans and, like classicists, pointed out the spread of remedial classes at Cambridge:

‘It’s a pity, then, that the director of admissions at Cambridge has proclaimed that the old system [pre-Gove reforms] is good and that AS-levels – a disaster in so many ways – are a good thing because somehow they promote access. I don’t know for whom he is speaking, but not for me as a professor in the same university…

‘[Gove] was quite right about the abolition of the time-wasting, badly devised and all too often incompetently marked AS Levels; these dreary exams have increasingly been used as the key to admissions to Cambridge, to the detriment of intellectually lively, quirky, candidates full of fizz and sparkle who actually have something to say for themselves…

‘Bogus educational theories have done so much to damage education in this country… The effects are visible even in a great university such as Cambridge, with a steady decline in standards of literacy, and with, in consequence, the provision in one college after another of ‘skills teaching’, so that students who no longer arrive knowing how to structure an essay or even read a book can receive appropriate ‘training’… Even students from top ranked schools seem to find it very difficult … to write essays coherently… In the sort of exams I am thinking of, essay writing comes much more to the fore and examiners would be making more subjective judgements about scripts. In an ideal world there would be double marking of scripts.’ Emphasis added.

Judging essay skills is a more nebulous task than judging the quality of mechanics questions. Also, there is less agreement among historians about the sort of things they want to see in school exams compared to mathematicians and physicists who largely (in my experience, I stress, which is limited) agree about the sorts of problems they want undergraduates to be able to solve and the skills they want them to have.

I will quote a Professor of English at Exeter University, Colin MacCabe, whose view of the decline of essay skills is representative of many comments I have heard, but I cannot say confidently that this view represents a consensus, despite his claim:

‘Nobody who teaches A-level or has anything to do with teaching first-year university students has any doubt that A Levels have been dumbed down… The writing of the essay has been the key intellectual form in undergraduate education for more than a century; excelling at A-level meant excelling in this form. All that went by the board when … David Blunkett, brought in AS-levels… A-levels … became two years of continuous assessment with students often taking their first module within three months of entering the sixth form. This huge increase in testing went together with a drastic change in assessment. Candidates were not now marked in relation to an overall view of their ability to mount and develop arguments, but in relation to their ability to demonstrate achievement against tightly defined assessment objectives… A-levels, once a test of general intellectual ability in relation to a particular subject, are now a tightly supervised procession through a series of targets. Assessment doesn’t come at the end of the course – it is the course… In English, students read many fewer books… Students now arrive at university without the knowledge or skills considered automatic in our day… One of the results of the changes at A-level is that the undergraduate degree is itself a much more targeted affair. Students lack of a general education mean that special subjects, dissertations etc are added to general courses which are themselves much more limited in their approach… One result of this is a grade inflation much more dramatic even than A-levels… [T]here is little place within a modern English university for students to develop the kind of intellectual independence and judgment, which has historically been the aim of the undergraduate degree.’ Observer, 22 August, 2004. (Emphasis added.)

If anybody knows of studies on history and other humanities please link in Comments below.

Oxbridge entrance

As political arguments increasingly focused on ‘participation’ and ‘access’, Oxford and Cambridge largely abandoned their own entrance exams in the 1990s. There were some oddities. Cambridge University dropped their maths test and were so worried by the results that they immediately asked for and were given special dispensation to reintroduce it and they have used one since (now known as the STEP paper, used by a few other universities). Other Cambridge departments who wanted to do the same were refused permission and some of them (including the physics department) now use interviews to test material they would like to test in a written exam. Oxford changed its mind and gradually reintroduced admission tests in some subjects. (E.g. It does not use STEP in maths but uses its own test which has more ‘applied’ maths.) Cambridge now uses AS Levels. Oxford does not (but does not like to explain why).

A Levels are largely useless for distinguishing between candidates in the top 2% of ability (i.e. two standard deviations above average). Oxbridge entry now involves a complex and incoherent set of procedures. Some departments use interviews to test skills that are i) either wholly or entirely untested by A Levels and ii) are not explicitly set out anywhere. For example, if you go to an interview for physics at Cambridge, they will ask you questions like ‘how many photons hit your eye per second from Alpha Centauri?’ – i.e. questions that you cannot cram for but from which much information can be gained by tutors watching how students grapple with the problem.

The fact that the real skills they want to test are asked about in interviews rather than in public exams is, in my opinion, not only bad for ‘standards’ but is also unfair. Rich schools with long connections to Oxbridge colleges have teachers who understand these interviews and know how to prepare pupils for them. They still teach the material tested in old exams and other materials such as Russian textbooks created decades ago. A comprehensive in east Durham that has never sent anybody to Oxbridge is very unlikely to have the same sort of expertise and is much more likely to operate on the very mistaken assumption that getting a pupil to three As is sufficient preparation for Oxbridge selection. Testing skills in open exams that everybody can see would be fairer.

I will return to this issue in a later blog but it is important to consider the oddities of this situation. Decades ago, open public standardised tests were seen as a way to overcome prejudice. For example, Ivy League universities like Harvard infamously biased their admissions system against Jews because a fair open process based on intellectual abilities, and ignoring things like lacrosse skills, would have put more Jews into Harvard than Harvard wanted. Similar bias is widespread now in order to keep the number of East Asians low. It is no coincidence that Caltech’s admissions policy is unusually based on academic ability and it has a far higher proportion of East Asians than the likes of Harvard.

Similar problems apply to Oxbridge. A consequence of making exams easier and removing Oxbridge admissions tests was to make the process more opaque and therefore biased against poorer families. The fascinating journey made by the intellectual Left on the issue of standardised tests is described in Steven Pinker’s recent influential essay on university admissions. I agree with him that a big part of the reason for the ‘madness’ is that the intelligentsia ‘has lost the ability to think straight about objective tests’. Half a century ago, the Left fought for standardised tests to overcome prejudice, now many on the Left oppose tests and argue for criteria that give the well-connected middle classes unfair advantages.

This combination of problems is one of the reasons why the Cambridge pure maths department and physics department worked with me to develop projects to redo 16-18 curricula, teacher training, and testing systems. Cambridge is even experimenting with a ‘correspondence Free School’ idea proposed by the mathematician Alexander Borovik (who attended one of the famous Russian maths schools). Powerful forces tried to stop these projects happening because they are, obviously, implicit condemnations of the existing system – condemnations that many would prefer had never seen the light of day. Similar projects in other departments at other universities were kiboshed for the same reason, as were other proposals for specialist maths schools as per the King’s project (which also would never have happened but for the determination of Alison Wolf and a handful of heroic officials in the DfE). I will return to this too.

C. Conclusions

Here are some tentative conclusions.

  1. The political and bureaucratic process for the introduction of the GCSE and National Curriculum was a shambles. Those involved did not go through basic processes to agree aims. Implementation was awful. All elements of the system failed children. There are important lessons for those who want to reform the current system.
  2. Given the weight of evidence above, it is hard to avoid the conclusion that GCSEs were made easier than O Levels and became easier still over time. This means that at least the top fifth are aimed aged 14 at lower standards than they would have been aimed at previously (not that O Levels were at all optimal). Many of them spend two years with low grade material and repeating boring drills, in order that the school can maximise its league table position, instead of delving deeper into subjects. Inflation seems to have stopped in the last two years, perhaps temporarily, but by the use of an Ofqual system known as ‘comparable outcomes’ which is barely understood by anybody in the school system or DfE.
  3. A Levels, at least in maths, sciences, and languages, were quickly made easier after 1988 and not just by enough to keep pass marks stable but by enough to lead to large increases. Even A Level students are aimed at mundane tasks like ‘design a poster’ that are suitable for small children – not near-adults. (As I type this I am looking at an Edexcel textbook for Further Maths A Level which for some reason, Edexcel has chosen to decorate with the picture of a child in a ‘Robin’ masked outfit.)
  4. The old ‘S’ level papers, designed to stretch the best A Level students, were abandoned which contributed to a decline of standards aimed for among the top 5%.
  5. University degrees in some subjects therefore also had to become easier (e.g. classics) or longer (natural sciences) in order to avoid increases in failure rates. This happened in some subjects even in elite universities. Remedial courses spread, even in elite universities, to teach/improve skills that were previously expected on arrival (including Classics at Oxford and History at Cambridge). Not all of the problems are because of failures in schools or easier exams. Some are because universities themselves for political reasons will not make certain requirements of applicants. Even if the exam system were fixed, this would remain a big problem. On the other hand, while publicly speaking out for AS Levels, admissions officers also, very quietly, have been gradually introducing new, non-Government/Ofqual regulated, tests for admissions purposes. On this, it is more useful to watch what universities do than what they say.
  6. These problems have cascaded right through the system and now affect the pipeline into senior university research positions in maths, sciences, and languages. For example, the lack of maths skills among biologists is hampering the development of synthetic biology and computational biology. It is very common now to have (private) discussions with scientists deploring the decline in English research universities. Just in the past few weeks I have had emails from an English physicist now at Harvard and a prominent English neuroscientist giving me details of these developments and how we are falling further behind American universities. As they say, however, nobody wants to speak out.
  7. It is much easier to see what has happened at the top end of the ability curve, where effects show up in universities, than it is for median pupils. The media also  focuses on issues at the top end of the ability curve, A Levels, and the Russell Group.
  8. Because politicians took control of the system and used results to justify their own policies, and because they control funding, debate over standards became thoroughly dishonest, starting with the Conservative government in the 1980s and continuing to now when academics are pressured not to speak out by administrators for fear of politicians’ responses. When governments are in control of the metrics according to which they are judged, there is likely to be dishonesty. If people – including unions, teachers, and officials – claim they deserve more money on the basis of metrics that are controlled by a small group of people operating an opaque process and controlling the regulator themselves, there is likely to be dishonesty.

An important caveat. It is possible that simultaneously a) 1-8 is true and b) the school system has improved in various ways. What do I mean?

This is a coherent (not necessarily right) conclusion from the story told above…

GCSEs are significantly easier than O Levels. Nevertheless, the switch to GCSEs also involved many comprehensives and secondary moderns dropping the old idea that maybe only a fifth of the cohort are ‘academic’ – the idea from Plato’s Republic of gold, silver, and bronze children, that influenced the 1944 Act. Instead, more schools began to focus more pupils on academic subjects. Even though the standards demanded were easier than in the pre-1988 exams, this new focus (combined with other things) at least led between 1988 and now to a) a reduction in the number of truly awful schools and b) more useful knowledge and skills at least for the bottom fifth of the cohort (in ability terms), and perhaps for more. Perhaps the education of median ability pupils stayed roughly the same (declining a bit in maths) hence the consistent picture in international tests, the King’s results comparing maths in 1978/2008, Shayer’s results and so on (above). Meanwhile the standards demanded by post-1988 A Levels clearly fell (at least in some vital subjects), as the changes in universities testify, and S Level papers vanished, so the top fifth of the cohort (and particularly the +2 standard deviation population, i.e. the top 2%) leave school in some subjects considerably worse educated than in the 1980s. (Given most scientific and technological breakthroughs come from among this top 2% this has a big knock-on effect.) Private schools felt incentivised to perform better than state schools on easier GCSEs and A Levels rather than pursue separate qualifications with all the accompanying problems. There remains no good scientific data on what children at different points on the ability curve are capable of achieving given excellent teaching so the discussion of ‘standards’ remains circular. Easier GCSEs and A Levels are consistent with some improvements for the bottom fifth, roughly stability for the median, significant decline for the top fifth, and fewer awful schools.

This is coherent. It fits the evidence sketched above.

But is it right?

In the next blog in this series I will consider issues of ‘ability’ and the circularity of the current debate on ‘standards’.

Questions?

If people accept the conclusions about GCSEs and A Levels (at least in maths, sciences, and languages, I stress again) how should this evidence be weighed against the very strong desire of many in the education system (and Parliament and Whitehall) to maintain a situation in which the vast majority of the cohort are aimed at GCSEs (or international equivalents that are not hugely different) and, for those deemed ‘academic’, A Levels?

Do the gains from this approach outweigh the losses for an unknown fraction of the ‘more able’?

Is there a way to improve gains for all points on the ability distribution?

I have been told that there is no grade inflation in music exams. Is this true? If YES, is this partly because they are not regulated by the state? Are there other factors? Has A Level Music got easier? If not why not?

What sort of approaches should be experimented with instead of the standard approaches seen in O Levels, GCSEs, and A Levels?

What can be learned from non-Government regulated tests such as Force Concepts Tests (physics), university admissions tests, STEP, IQ tests and so on?

What are the best sources on ‘S’ Level papers and what happened with Oxbridge entrance exams?

What other evidence is there? Where are analyses similar to Warner’s on physics for other subjects?

What evidence is there for university grade inflation which many tell me is now worse than GCSEs and A Levels?

 

Complexity, ‘fog and moonlight’, prediction, and politics IV – The birth of computational thinking

This document HERE is the fourth blog in this series on complexity and prediction – The birth of computational thinking.

This page has an Index of various blogs on this and other themes. The previous blog in this series was on von Neumann, economics, maths, and prediction.

Unable to do some formatting things on WordPress, I’ve done it as a PDF.

Please leave corrections and comments below.