#29 On the referendum & #4c on Expertise: On the ARPA/PARC ‘Dream Machine’, science funding, high performance, and UK national strategy

Post-Brexit Britain should be considering the intersection of 1) ARPA/PARC-style science research and ‘systems management’ for managing complex projects with 2) the reform of government institutions so that high performance teams — with different education/training (‘Tetlock processes’) and tools (including data science and visualisations of interactive models of complex systems) — can make ‘better decisions in a complex world’.  

This paper examines the ARPA/PARC vision for computing and the nature of the two organisations. In the 1960s visionaries such as Joseph Licklider, Robert Taylor and Doug Engelbart developed a vision of networked interactive computing that provided the foundation not just for new technologies but for whole new industries. Licklider, Sutherland, Taylor et al provided a model (ARPA) for how science funding can work. Taylor provided a model (PARC) of how to manage a team of extremely talented people who turned a profound vision into reality. The original motivation for the vision of networked interactive computing was to help humans make good decisions in a complex world.

This story suggests ideas about how to make big improvements in the world with very few resources if they are structured right. From a British perspective it also suggests ideas about what post-Brexit Britain should do to help itself and the world and how it might be possible to force some sort of ‘phase transition’ on the rotten Westminster/Whitehall system.

For the PDF of the paper click HERE. Please correct errors with page numbers below. I will update it after feedback.

Further Reading

The Dream Machine.

Dealers of Lightning.

‘Sketchpad: A man-machine graphical communication system’, Ivan Sutherland 1963.

Oral history interview with Sutherland, head of ARPA’s IPTO division 1963-5.

This link has these seminal papers:

  • Man-Computer Symbiosis, Licklider (1960)
  • The computer as a communications device, Licklider & Taylor (1968)

Watch Alan Kay explain how to invent the future to YCombinator classes HERE and HERE.  

HERE for Kay quotes from emails with Bret Victor.

HERE for Kay’s paper on PARC, The Power of the Context.

Kay’s Early History of Smalltalk.

HERE for a conversation between Kay and Engelbart.

Alan Kay’s tribute to Ted Nelson at “Intertwingled” Fest (an Alto using Smalltalk).

Personal Distributed Computing: The Alto and Ethernet Software1, Butler Lampson. 

You and Your Research, Richard Hamming.

AI nationalism, essay by Ian Hogarth. This concerns implications of AI for geopolitics.

Drones go to work, Chris Anderson (one of the pioneers of commercial drones). This explains the economics of the drone industry.

Meditations on Moloch, Scott Alexander. This is an extremely good essay in general about deep problems with our institutions.

Intelligence Explosion Microeconomics, Yudkowsky.

Autonomous technology and the greater human good. Omohundro.

Can intelligence explode? Hutter.

For the issue of IQ, genetics and the distribution of talent (and much much more), cf. Steve Hsu’s brilliant blog.

Bret Victor.

Michael Nielsen.

For some pre-history on computers, cf. The birth of computational thinking (some of the history of computing devices before the Turing/von Neumann revolution) and The crisis of mathematical paradoxes, Gödel, Turing and the basis of computing (some of the history of ideas about mathematical foundations and logic such as the famous papers by Gödel and Turing in the 1930s)

Part I of this series of blogs is HERE.

Part II on the emergence of ‘systems management’, how George Mueller used it to put man on the moon, and a checklist of how successful management of complex projects is systematically different to how Whitehall works is HERE.

On the referendum #28: Some interesting stuff on AI/ML with, hopefully, implications for post-May/Hammond decisions

Here are a few interesting recent papers I’ve read over the past few months.

Bear in mind that Shane Legg, co-founder and chief scientist of Deep Mind, said publicly a few years ago that there’s a 50% probability that we will achieve human level AI by 2028 and a 90% probability by 2050. Given all that has happened since, including at Deep Mind, it’s surely unlikely he now thinks this forecast is too optimistic. Also bear in mind that the US-China AI arms race is already underway, the UK lost its main asset before almost any MPs even knew its name, and the EU in general (outside London) is decreasingly relevant as progress at the edge of the field is driven by coastal America and coastal China, spurred by commercial and national security dynamics. This will get worse as the EU Commission and the ECJ use the Charter of Fundamental Rights to grab the power to regulate all high technology fields from AI to genomics — a legal/power dynamic still greatly under-appreciated in London’s technology world. If you think GDPR is a mess, wait for the ECJ to spend three years deciding crucial cases on autonomous drones and genetic engineering before upending research in the field…

Vote Leave argued during the referendum that a Leave victory should deliver the huge changes that the public wanted and the UK should make science and technology the focus of a profound process of national renewal. On this as on everything else, from Article 50 to how to conduct the negotiations to budget priorities to immigration policy, SW1 in general and the Conservative Party in particular did the opposite of what Vote Leave said. They have driven the country into the ditch and the only upside is they have exposed the rottenness of Westminster and Whitehall and forced many who wanted to keep the duvet over their eyes to face reality — the first step in improvement.

After the abysmal May/Hammond interlude is over, hopefully some time between October 2018 — July 2019, its replacement will need to change course on almost every front from the NHS to how SW1 pours billions into the greedy paws of corporate looters via its appallingly managed >£200 BILLION annual contracting/procurement budget — ‘there’s no money’ bleats most of SW1 as it unthinkingly shovels it at the demimonde of Carillion/BaE-like companies that prop up its MPs with donations.

May’s replacement could decide to take seriously the economic and technological forces changing the world. The UK could, with a very different vision of the future to anything now proposed in Whitehall, improve its own security and prosperity and help the world but this will require 1) substantially changing the wiring of power in Whitehall so decisions are better (new people, training, ideas, tools, and institutions), and 2) making scientific research and technology projects important at the apex of power. We could build real assets with much greater real influence than the chimerical ‘influence’ in Brussels meeting rooms that SW1 has used as an excuse to give away power to Brussels where thinking is much closer to the 1970s than to today’s coastal China or Silicon Valley. Brushing aside Corbyn would be child’s play for a government that could focus on important questions and took project management — an undiscussable subject in SW1 — seriously.

The whole country — the whole world — can see our rotten parties have failed us. The parties ally with the civil service to keep new ideas and people excluded. SW1 has tried to resist the revolutionary implications of the referendum but this resistance has to crack: one way or the other the old ways are doomed. The country voted for profound change in 2016. The Tories didn’t understand this hence, partly, the worst campaign in modern history. This dire Cabinet, doomed to merciless judgement in the history books, is visibly falling: let’s ‘push what is falling’…

For specific proposals on improving the appalling science funding system, see below.

*

The Sam Altman co-founded non-profit, OpenAI, made major progress with its Dota-playing AI last week: follow @gdb for updates. Deep Mind is similarly working on Starcraft. It is a major advance to shift from perfect information games like GO to imperfect strategic games like Dota and Starcraft. If AIs shortly beat the best humans at full versions of such games, then it means they can outperform at least parts of human reasoning in ways that have been assumed to be many years away. As OpenAI says, it is a major step ‘towards advanced AI systems which can handle the complexity and uncertainty of the real world.’

https://blog.openai.com/openai-five-benchmark-results/

RAND paper on how AI affects the chances of nuclear catastrophe:

https://www.rand.org/content/dam/rand/pubs/perspectives/PE200/PE296/RAND_PE296.pdf

The Malicious Use of Artificial Intelligence:

https://img1.wsimg.com/blobby/go/3d82daa4-97fe-4096-9c6b-376b92c619de/downloads/1c6q2kc4v_50335.pdf

Defense Science Board: ‘Summer Study on Autonomy’ (2016):

http://www.acq.osd.mil/dsb/reports/2010s/DSBSS15.pdf

JASON: ‘Perspectives on Research in Artificial Intelligence and Artificial General Intelligence Relevant to DoD’ (2017)

https://fas.org/irp/agency/dod/jason/ai-dod.pdf

Artificial Intelligence and National Security, Greg Allen Taniel Chan (for IARPA):

Artificial Intelligence and National Security – The Belfer Center for …

Some predictions on driverless cars and other automation milestones: http://rodneybrooks.com/my-dated-predictions/

Project Maven (very relevant to politicians/procurement): https://thebulletin.org/project-maven-brings-ai-fight-against-isis11374

Chris Anderson on drones changing business sectors:

https://hbr.org/cover-story/2017/05/drones-go-to-work

On the trend in AI compute and economic sustainability (NB. I think the author is wrong on the Manhattan Project being a good upper bound for what a country will spend in an arms race, US GDP spent on DoD at the height of the Cold War would be a better metric): https://aiimpacts.org/interpreting-ai-compute-trends/

Read this excellent essay on ‘AI Nationalism’ by Ian Hogarth, directly relevant to arms race arguments and UK policy.

Read ‘Intelligence Explosion Microeconomics’ by Yudkowsky.

Read ‘Autonomous technology and the greater human good’ by Omohundro — one of the best things about the dangers of AGI and ideas about safety I’ve seen by one of the most respected academics working in this field.

Existential Risk: Diplomacy and Governance (Future of Humanity Institute, 2017).

If you haven’t you should also read this 1955 essay by von Neumann ‘Can we survive technology?’. It is relevant beyond any specific technology. VN was regarded by the likes of Einstein and Dirac as the smartest person they’d ever met. He was involved in the Manhattan Project, inventing computer science, game theory and much more. This essay explored the essential problem that the scale and speed of technological change suddenly blew up assumptions about political institutions’ ability to cope. Much reads as if it were written yesterday.  ‘For progress there is no cure…’

I blogged on a paper by Judea Pearl a few months ago HERE. He is the leading scholar of causation. He argues that current ML approaches are inherently limited and advance requires giving machines causal reasoning:

‘If we want machines to reason about interventions (“What if we ban cigarettes?”) and introspection (“What if I had finished high school?”), we must invoke causal models. Associations are not enough — and this is a mathematical fact, not opinion.’

I also wrote this recently on science funding which links to a great piece by two young neuroscientists about how post-Brexit Britain should improve science and is also relevant to how the UK could set up an ARPA-like entity to fund AI/ML and other fields:

https://dominiccummings.com/2018/06/08/on-the-referendum-25-how-to-change-science-funding-post-brexit/

 

Effective action #4b: ‘Expertise’, prediction and noise, from the NHS killing people to Brexit

In part A I looked at extreme sports as some background to the question of true expertise and the crucial nature of fast high quality feedback.

This blog looks at studies comparing expertise in many fields over decades, including work by Tetlock and Kahneman, and problems like — why people don’t learn to use even simple tools to stop children dying unnecessarily. There is a summary of some basic lessons at the end.

The reason for writing about this is that we will only improve the performance of government (at individual, team and institutional levels) if we reflect on:

  • what expertise really is and why do some very successful fields cultivate it effectively while others, like government, do not;
  • how to select much higher quality people (it’s insane people as ignorant and limited as me can have the influence we do in the way we do — us limited duffers can help in limited ways but why do we deliberately exclude ~100% of the most intelligent, talented, relentless, high performing people from fields with genuine expertise, why do we not have people like Fields Medallist Tim Gowers or Michael Nielsen as Chief Scientist  sitting ex officio in Cabinet?);
  • how to train people effectively to develop true expertise in skills relevant to government: it needs different intellectual content (PPE/economics are NOT good introductory degrees) and practice in practical skills (project management, making predictions and in general ‘thinking rationally’) with lots of fast, accurate feedback;
  • how to give them effective tools: e.g the Cabinet Room is worse in this respect than it was in July 1914 — at least then the clock and fireplace worked, and Lord Salisbury in the 1890s would walk round the Cabinet table gathering papers to burn in the grate — while today No10 is decades behind the state-of-the-art in old technologies like TV, doesn’t understand simple tools like checklists, and is nowhere with advanced technologies;
  • and how to ‘program’ institutions differently so that 1) people are more incentivised to optimise things we want them to optimise, like error-correction and predictive accuracy, and less incentivised to optimise bureaucratic process, prestige, and signalling as our institutions now do to a dangerous extent, and, connected, so that 2) institutions are much better at building high performance teams rather than continue normal rules that make this practically illegal, and so that 3) we have ‘immune systems’ to minimise the inevitable failures of even the best people and teams .

In SW1 now, those at the apex of power practically never think in a serious way about the reasons for the endemic dysfunctional decision-making that constitutes most of their daily experience or how to change it. What looks like omnishambles to the public and high performers in technology or business is seen by Insiders, always implicitly and often explicitly, as ‘normal performance’. ‘Crises’ such as the collapse of Carillion or our farcical multi-decade multi-billion ‘aircraft carrier’ project occasionally provoke a few days of headlines but it’s very rare anything important changes in the underlying structures and there is no real reflection on system failure.

This fact is why, for example, a startup created in a few months could win a referendum that should have been unwinnable. It was the systemic and consistent dysfunction of Establishment decision-making systems over a long period, with very poor mechanisms for good accurate feedback from reality, that created the space for a guerrilla operation to exploit.

This makes it particularly ironic that even after Westminster and Whitehall have allowed their internal consensus about UK national strategy to be shattered by the referendum, there is essentially no serious reflection on this system failure. It is much more psychologically appealing for Insiders to blame ‘lies’ (Blair and Osborne really say this without blushing), devilish use of technology to twist minds and so on. Perhaps the most profound aspect of broken systems is they cannot reflect on the reasons why they’re broken  — never mind take effective action. Instead of serious thought, we have high status Insiders like Campbell reduced to bathos with whining on social media about Brexit ‘impacting mental health’. This lack of reflection is why Remain-dominated Insiders lurched from failure over the referendum to failure over negotiations. OODA loops across SW1 are broken and this is very hard to fix — if you can’t orient to reality how do you even see your problem well? (NB. It should go without saying that there is a faction of pro-Brexit MPs, ‘campaigners’ and ‘pro-Brexit economists’ who are at least as disconnected from reality, often more, as the May/Hammond bunker.)

Screenshot 2018-06-05 10.05.19

In the commercial world, big companies mostly die within a few decades because they cannot maintain an internal system to keep them aligned to reality plus startups pop up. These two factors create learning at a system level — there is lots of micro failure but macro productivity/learning in which useful information is compressed and abstracted. In the political world, big established failing systems control the rules, suck in more and more resources rather than go bust, make it almost impossible for startups to contribute and so on. Even failures on the scale of the 2008 Crash or the 2016 referendum do not necessarily make broken systems face reality, at least quickly. Watching Parliament’s obsession with trivia in the face of the Cabinet’s and Whitehall’s contemptible failure to protect the interests of millions in the farcical Brexit negotiations is like watching the secretary to the Singapore Golf Club objecting to guns being placed on the links as the Japanese troops advanced.

Neither of the main parties has internalised the reality of these two crises. The Tories won’t face reality on things like corporate looting and the NHS, Labour won’t face reality on things like immigration and the limits of bureaucratic centralism. Neither can cope with the complexity of Brexit and both just look like I would look like in the ring with a professional fighter — baffled, terrified and desperate for a way to escape. There are so many simple ways to improve performance — and their own popularity! — but the system is stuck in such a closed loop it wilfully avoids seeing even the most obvious things and suppresses Insiders who want to do things differently…

But… there is a network of almost entirely younger people inside or close to the system thinking ‘we could do so much better than this’. Few senior Insiders are interested in these questions but that’s OK — few of them listened before the referendum either. It’s not the people now in power and running the parties and Whitehall who will determine whether we make Brexit a platform to contribute usefully to humanity’s biggest challenges but those that take over.

Doing better requires reflecting on what we know about real expertise…

*

How to distinguish between fields dominated by real expertise and those dominated by confident ‘experts’ who make bad predictions?

We know a lot about the distinction between fields in which there is real expertise and fields dominated by bogus expertise. Daniel Kahneman, who has published some of the most important research about expertise and prediction, summarises the two fundamental tests to ask about a field: 1) is there enough informational structure in the environment to allow good predictions, and 2) is there timely and effective feedback that enables error-correction and learning.

‘To know whether you can trust a particular intuitive judgment, there are two questions you should ask: Is the environment in which the judgment is made sufficiently regular to enable predictions from the available evidence? The answer is yes for diagnosticians, no for stock pickers. Do the professionals have an adequate opportunity to learn the cues and the regularities? The answer here depends on the professionals’ experience and on the quality and speed with which they discover their mistakes. Anesthesiologists have a better chance to develop intuitions than radiologists do. Many of the professionals we encounter easily pass both tests, and their off-the-cuff judgments deserve to be taken seriously. In general, however, you should not take assertive and confident people at their own evaluation unless you have independent reason to believe that they know what they are talking about.’ (Emphasis added.)

In fields where these two elements are present there is genuine expertise and people build new knowledge on the reliable foundations of previous knowledge. Some fields make a transition from stories (e.g Icarus) and authority (e.g ‘witch doctor’) to quantitative models (e.g modern aircraft) and evidence/experiment (e.g some parts of modern medicine/surgery). As scientists have said since Newton, they stand on the shoulders of giants.

How do we assess predictions / judgement about the future?

‘Good judgment is often gauged against two gold standards – coherence and correspondence. Judgments are coherent if they demonstrate consistency with the axioms of probability theory or propositional logic. Judgments are correspondent if they agree with ground truth. When gold standards are unavailable, silver standards such as consistency and discrimination can be used to evaluate judgment quality. Individuals are consistent if they assign similar judgments to comparable stimuli, and they discriminate if they assign different judgments to dissimilar stimuli.

‘Coherence violations range from base rate neglect and confirmation bias to overconfidence and framing effects (Gilovich, Griffith & Kahneman, 2002; Kahneman, Slovic & Tversky, 1982). Experts are not immune. Statisticians (Christensen-Szalanski & Bushyhead, 1981), doctors (Eddy, 1982), and nurses (Bennett, 1980) neglect base rates. Physicians and intelligence professionals are susceptible to framing effects and financial investors are prone to overconfidence.

‘Research on correspondence tells a similar story. Numerous studies show that human predictions are frequently inaccurate and worse than simple linear models in many domains (e.g. Meehl, 1954; Dawes, Faust & Meehl, 1989). Once again, expertise doesn’t necessarily help. Inaccurate predictions have been found in parole officers, court judges, investment managers in the US and Taiwan, and politicians. However, expert predictions are better when the forecasting environment provides regular, clear feedback and there are repeated opportunities to learn (Kahneman & Klein, 2009; Shanteau, 1992). Examples include meteorologists, professional bridge players, and bookmakers at the racetrack, all of whom are well-calibrated in their own domains.‘ (Tetlock, How generalizable is good judgment?, 2017.)

In another 2017 piece Tetlock explored the studies furtherIn the 1920s researchers built simple models based on expert assessments of 500 ears of corn and the price they would fetch in the market. They found that ‘to everyone’s surprise, the models that mimicked the judges’ strategies nearly always performed better than the judges themselves’ (Tetlock, cf. ‘What Is in the Corn Judge’s Mind?’, Journal of American Society for Agronomy, 1923). Banks found the same when they introduced models for credit decisions.

‘In other fields, from predicting the performance of newly hired salespeople to the bankruptcy risks of companies to the life expectancies of terminally ill cancer patients, the experience has been essentially the same. Even though experts usually possess deep knowledge, they often do not make good predictions

When humans make predictions, wisdom gets mixed with “random noise.”… Bootstrapping, which incorporates expert judgment into a decision-making model, eliminates such inconsistencies while preserving the expert’s insights. But this does not occur when human judgment is employed on its own…

In fields ranging from medicine to finance, scores of studies have shown that replacing experts with models of experts produces superior judgments. In most cases, the bootstrapping model performed better than experts on their own. Nonetheless, bootstrapping models tend to be rather rudimentary in that human experts are usually needed to identify the factors that matter most in making predictions. Humans are also instrumental in assigning scores to the predictor variables (such as judging the strength of recommendation letters for college applications or the overall health of patients in medical cases). What’s more, humans are good at spotting when the model is getting out of date and needs updating…

Human experts typically provide signal, noise, and bias in unknown proportions, which makes it difficult to disentangle these three components in field settings. Whether humans or computers have the upper hand depends on many factors, including whether the tasks being undertaken are familiar or unique. When tasks are familiar and much data is available, computers will likely beat humans by being data-driven and highly consistent from one case to the next. But when tasks are unique (where creativity may matter more) and when data overload is not a problem for humans, humans will likely have an advantage…

One might think that humans have an advantage over models in understanding dynamically complex domains, with feedback loops, delays, and instability. But psychologists have examined how people learn about complex relationships in simulated dynamic environments (for example, a computer game modeling an airline’s strategic decisions or those of an electronics company managing a new product). Even after receiving extensive feedback after each round of play, the human subjects improved only slowly over time and failed to beat simple computer models. This raises questions about how much human expertise is desirable when building models for complex dynamic environments. The best way to find out is to compare how well humans and models do in specific domains and perhaps develop hybrid models that integrate different approaches.‘ (Tetlock)

Kahneman also recently published new work relevant to this.

Research has confirmed that in many tasks, experts’ decisions are highly variable: valuing stocks, appraising real estate, sentencing criminals, evaluating job performance, auditing financial statements, and more. The unavoidable conclusion is that professionals often make decisions that deviate significantly from those of their peers, from their own prior decisions, and from rules that they themselves claim to follow.’

In general organisations spend almost no effort figuring out how noisy the predictions made by senior staff are and how much this costs. Kahneman has done some ‘noise audits’ and shown companies that management make MUCH more variable predictions than people realise.

‘What prevents companies from recognizing that the judgments of their employees are noisy? The answer lies in two familiar phenomena: Experienced professionals tend to have high confidence in the accuracy of their own judgments, and they also have high regard for their colleagues’ intelligence. This combination inevitably leads to an overestimation of agreement. When asked about what their colleagues would say, professionals expect others’ judgments to be much closer to their own than they actually are. Most of the time, of course, experienced professionals are completely unconcerned with what others might think and simply assume that theirs is the best answer. One reason the problem of noise is invisible is that people do not go through life imagining plausible alternatives to every judgment they make.

‘High skill develops in chess and driving through years of practice in a predictable environment, in which actions are followed by feedback that is both immediate and clear. Unfortunately, few professionals operate in such a world. In most jobs people learn to make judgments by hearing managers and colleagues explain and criticize—a much less reliable source of knowledge than learning from one’s mistakes. Long experience on a job always increases people’s confidence in their judgments, but in the absence of rapid feedback, confidence is no guarantee of either accuracy or consensus.’

Reviewing the point that Tetlock makes about simple models beating experts in many fields, Kahneman summarises the evidence:

‘People have competed against algorithms in several hundred contests of accuracy over the past 60 years, in tasks ranging from predicting the life expectancy of cancer patients to predicting the success of graduate students. Algorithms were more accurate than human professionals in about half the studies, and approximately tied with the humans in the others. The ties should also count as victories for the algorithms, which are more cost-effective…

‘The common assumption is that algorithms require statistical analysis of large amounts of data. For example, most people we talk to believe that data on thousands of loan applications and their outcomes is needed to develop an equation that predicts commercial loan defaults. Very few know that adequate algorithms can be developed without any outcome data at all — and with input information on only a small number of cases. We call predictive formulas that are built without outcome data “reasoned rules,” because they draw on commonsense reasoning.

‘The construction of a reasoned rule starts with the selection of a few (perhaps six to eight) variables that are incontrovertibly related to the outcome being predicted. If the outcome is loan default, for example, assets and liabilities will surely be included in the list. The next step is to assign these variables equal weight in the prediction formula, setting their sign in the obvious direction (positive for assets, negative for liabilities). The rule can then be constructed by a few simple calculations.

The surprising result of much research is that in many contexts reasoned rules are about as accurate as statistical models built with outcome data. Standard statistical models combine a set of predictive variables, which are assigned weights based on their relationship to the predicted outcomes and to one another. In many situations, however, these weights are both statistically unstable and practically unimportant. A simple rule that assigns equal weights to the selected variables is likely to be just as valid. Algorithms that weight variables equally and don’t rely on outcome data have proved successful in personnel selection, election forecasting, predictions about football games, and other applications.

‘The bottom line here is that if you plan to use an algorithm to reduce noise, you need not wait for outcome data. You can reap most of the benefits by using common sense to select variables and the simplest possible rule to combine them…

‘Uncomfortable as people may be with the idea, studies have shown that while humans can provide useful input to formulas, algorithms do better in the role of final decision maker. If the avoidance of errors is the only criterion, managers should be strongly advised to overrule the algorithm only in exceptional circumstances.

Jim Simons is a mathematician and founder of the world’s most successful ‘quant fund’, Renaissance Technologies. While market prices appear close to random and are therefore extremely hard to predict, they are not quite random and the right models/technology can exploit these small and fleeting opportunities. One of the lessons he learned early was: Don’t turn off the model and go with your gut. At Renaissance, they trust models over instincts. The Bridgewater hedge fund led by Ray Dalio is similar. After near destruction early in his career, Dalio explicitly turned towards explicit model building as the basis for decisions combined with radical attempts to create an internal system that incentivises the optimisation of error-correction. It works.

*

People fail to learn from even the great examples of success and the simplest lessons

One of the most interesting meta-lessons of studying high performance, though, is that simply demonstrating extreme success does NOT lead to much learning. For example:

  • ARPA and PARC created the internet and PC. The PARC research team was an extraordinary collection of about two dozen people who were managed in a very unusual way that created super-productive processes extremely different to normal bureaucracies. XEROX, which owned PARC, had the entire future of the computer industry in its own hands, paid for by its own budgets, and it simultaneously let Bill Gates and Steve Jobs steal everything and XEROX then shut down the research team that did it. And then, as Silicon Valley grew on the back of these efforts, almost nobody, including most of the billionaires who got rich from the dynamics created by ARPA-PARC, studied the nature of the organisation and processes and copied it. Even today, those trying to do edge-of-the-art research in a similar way to PARC right at the heart of the Valley ecosystem are struggling for long-term patient funding. As Alan Kay, one of the PARC team, said, ‘The most interesting thing has been the contrast between appreciation/exploitation of the inventions/contributions [of PARC] versus the almost complete lack of curiosity and interest in the processes that produced them. ARPA survived being abolished in the 1970s but it was significantly changed and is no longer the freewheeling place that it was in the 1960s when it funded the internet. In many ways DARPA’s approach now is explicitly different to the old ARPA (the addition of the ‘D’ was a sign of internal bureaucratic changes).

Screenshot 2018-06-05 14.55.00

  • ‘Systems management’ was invented in the 1950s and 1960s (partly based on wartime experience of large complex projects) to deal with the classified ICBM project and Apollo. It put man on the moon then NASA largely abandoned the approach and reverted to being (relative to 1963-9) a normal bureaucracy. Most of Washington has ignored the lessons ever since — look for example at the collapse of ObamaCare’s rollout, after which Insiders said ‘oh, looks like it was a system failure, wonder how we deal with this’, mostly unaware that America had developed a successful approach to such projects half a century earlier. This is particularly interesting given that China also studied Mueller’s approach to systems management in Apollo and as we speak is copying it in projects across China. The EU’s bureaucracy is, like Whitehall, an anti-checklist to high level systems management — i.e they violate almost every principle of effective action.
  • Buffett and Munger are the most successful investment partnership in world history. Every year for half a century they have explained some basic principles, particularly concerning incentives, behind organisational success. Practically no public companies take their advice and all around us in Britain we see vast corporate looting and politicians of all parties failing to act — they don’t even read the Buffett/Munger lessons and think about them. Even when given these lessons to read, they won’t read them (I know this because I’ve tried).

Perhaps you’re thinking — well, learning from these brilliant examples might be intrinsically really hard, much harder than Cummings thinks. I don’t think this is quite right. Why? Partly because millions of well-educated and normally-ethical people don’t learn even from much simpler things.

I will explore this separately soon but I’ll give just one example. The world of healthcare unnecessarily kills and injures people on a vast scale. Two aspects of this are 1) a deep resistance to learning from the success of very simple tools like checklists and 2) a deep resistance to face the fact that most medical experts do not understand statistics properly and their routine misjudgements cause vast suffering, plus warped incentives encourage widespread lies about statistics and irrational management. E.g People are constantly told things like ‘you’ve tested positive for X therefore you have X’ and they then kill themselves. We KNOW how to practically eliminate certain sorts of medical injury/death. We KNOW how to teach and communicate statistics better. (Cf. Professor Gigerenzer for details. He was the motivation for including things like conditional probabilities in the new National Curriculum.) These are MUCH simpler than building ICBMs, putting man on the moon, creating the internet and PC, or being great investors. Yet our societies don’t do them.

Why?

Because we do not incentivise error-correction and predictive accuracy. People are not incentivised to consider the cost of their noisy judgements. Where incentives and culture are changed, performance magically changes. It is the nature of the systems, not (mostly) the nature of the people, that is the crucial ingredient in learning from proven simple success. In healthcare like in government generally, people are incentivised to engage in wasteful/dangerous signalling to a terrifying degree — not rigorous thinking and not solving problems.

I have experienced the problem with checklists first hand in the Department for Education when trying to get the social worker bureaucracy to think about checklists in the context of avoiding child killings like Baby P. Professionals tend to see them as undermining their status and bureaucracies fight against learning, even when some great officials try really hard (as some in the DfE did such as Pamela Dow and Victoria Woodcock). ‘Social work is not the same as an airline Dominic’. No shit. Airlines can handle millions of people without killing one of them because they align incentives with predictive accuracy and error-correction.

Some appalling killings are inevitable but the social work bureaucracy will keep allowing unnecessary killings because they will not align incentives with error-correction. Undoing flawed incentives threatens the system so they’ll keep killing children instead — and they’re not particularly bad people, they’re normal people in a normal bureaucracy. The pilot dies with the passengers. The ‘CEO’ on over £150,000 a year presiding over another unnecessary death despite constantly increasing taxpayers money pouring in? Issue a statement that ‘this must never happen again’, tell the lawyers to redact embarrassing cockups on the grounds of ‘protecting someone’s anonymity’ (the ECHR is a great tool to cover up death by incompetence), fuck off to the golf course, and wait for the media circus to move on.

Why do so many things go wrong? Because usually nobody is incentivised to work relentlessly to suppress entropy, never mind come up with something new.

*

We can see some reasonably clear conclusions from decades of study on expertise and prediction in many fields.

  • Some fields are like extreme sport or physics: genuine expertise emerges because of fast effective feedback on errors.
  • Abstracting human wisdom into models often works better than relying on human experts as models are often more consistent and less noisy.
  • Models are also often cheaper and simpler to use.
  • Models do not have to be complex to be highly effective — quite the opposite, often simpler models outperform more sophisticated and expensive ones.
  • In many fields (which I’ve explored before but won’t go into again here) low tech very simple checklists have been extremely effective: e.g flying aircraft or surgery.
  • Successful individuals like Warren Buffett and Ray Dalio also create cognitive checklists to trap and correct normal cognitive biases that degrade individual and team performance.
  • Fields make progress towards genuine expertise when they make a transition from stories (e.g Icarus) and authority (e.g ‘witch doctor’) to quantitative models (e.g modern aircraft) and evidence/experiment (e.g some parts of modern medicine/surgery).
  • In the intellectual realm, maths and physics are fields dominated by genuine expertise and provide a useful benchmark to compare others against. They are also hierarchical. Social sciences have little in common with this.
  • Even when we have great examples of learning and progress, and we can see the principles behind them are relatively simple and do not require high intelligence to understand, they are so psychologically hard and run so counter to the dynamics of normal big organisations, that almost nobody learns from them. Extreme success is ‘easy to learn from’ in one sense and ‘the hardest thing in the world to learn from’ in another sense.

It is fascinating how remarkably little interest there is in the world of politics/government, and social sciences analysing politics/government, about all this evidence. This is partly because politics/government is an anti-learning and anti-expertise field, partly because the social sciences are swamped by what Feynman called ‘cargo cult science’ with very noisy predictions, little good feedback and learning, and a lot of chippiness at criticism whether it’s from statistics experts or the ‘ignorant masses’. Fields like ‘education research’ and ‘political science’ are particularly dreadful and packed with charlatans but much of economics is not much better (much pro- and anti-Brexit mainstream economics is classic ‘cargo cult’).

I have found there is overwhelmingly more interest in high technology circles than in government circles, but in high technology circles there is also a lot of incredulity and naivety about how government works — many assume politicians are trying and failing to achieve high performance and don’t realise that in fact nobody is actually trying. This illusion extends to many well-connected businessmen who just can’t internalise the reality of the apex of power. I find that uneducated people on 20k living hundreds of miles from SW1 generally have a more accurate picture of daily No10 work than extremely well-connected billionaires.

This is all sobering and is another reason to be pessimistic about the chances of changing government from ‘normal’ to ‘high performance’ — but, pessimism of the intellect, optimism of the will…

If you are in Whitehall now watching the Brexit farce or abroad looking at similar, you will see from page 26 HERE a checklist for how to manage complex government projects at world class levels (if you find this interesting then read the whole paper). I will elaborate on this. I am also thinking about a project to look at the intersection of (roughly) five fields in order to make large improvements in the quality of people, ideas, tools, and institutions that determine political/government decisions and performance:

  • the science of prediction across different fields (e.g early warning systems, the Tetlock/IARPA project showing dramatic performance improvements),
  • what we know about high performance (individual/team/organisation) in different fields (e.g China’s application of ‘systems management’ to government),
  • technology and tools (e.g Bret Victor’s work, Michael Nielsen’s work on cognitive technologies, work on human-AI ‘minotaur’ teams),
  • political/government decision making affecting millions of people and trillions of dollars (e.g WMD, health), and
  • communication (e.g crisis management, applied psychology).

Progress requires attacking the ‘system of systems’ problem at the right ‘level’. Attacking the problems directly — let’s improve policy X and Y, let’s swap ‘incompetent’ A for ‘competent’ B — cannot touch the core problems, particularly the hardest meta-problem that government systems bitterly fight improvement. Solving the explicit surface problems of politics and government is best approached by a more general focus on applying abstract principles of effective action. We need to surround relatively specific problems with a more general approach. Attack at the right level will see specific solutions automatically ‘pop out’ of the system. One of the most powerful simplicities in all conflict (almost always unrecognised) is: ‘winning without fighting is the highest form of war’. If we approach the problem of government performance at the right level of generality then we have a chance to solve specific problems ‘without fighting’ — or, rather, without fighting nearly so much and the fighting will be more fruitful.

This is not a theoretical argument. If you look carefully at ancient texts and modern case studies, you see that applying a small number of very simple, powerful, but largely unrecognised principles (that are very hard for organisations to operationalise) can produce extremely surprising results.

How to jump from the Idea to Reality? More soon…


Ps. Just as I was about to hit publish on this, the DCMS Select Committee released their report on me. The sentence about the Singapore golf club at the top comes to mind.

On the referendum #24J: Collins, grandstanding, empty threats & the plan for a rematch against the public

The DCMS Select Committee has just sent me the following letter.

Screenshot 2018-05-24 13.51.14

Here is my official reply…

Dear Damian et al

As you know I agreed to give evidence.

In April, I told you I could not do the date you suggested. On 12 April I suggested July.

You ignored this for weeks.

On 3 May you asked again if I could do a date I’d already said I could not do.

I replied that, as I’d told you weeks earlier, I could not.

You then threatened me with a Summons.

On 10 May, Collins wrote:

Dear Dominic

We have offered you different dates, and as I said previously we are not prepared to wait until July for you to give evidence to the committee. We have also discussed this with the Electoral Commission who have no objection to you giving evidence to us.

We are asking you to give evidence to the committee following evidence we have received that relates to the work of Vote Leave. We have extended a similar invitiation to Arron Banks and Andy Wigmore, to respond to evidence we have received about Leave.EU, and they have both agreed to attend.

The committee will be sending you a summons to appear and I hope that you are able to respond positively to this

best wishes

I replied:

The EC has NOT told me this.

Sending a summons is the behaviour of people looking for PR, not people looking to get to the bottom of this affair.

A summons will have ZERO positive impact on my decision and is likely only to mean I withdraw my offer of friendly cooperation, given you will have shown greater interest in grandstanding than truth-seeking, which is one of the curses of the committee system.

I hope you reconsider and put truth-seeking first.

Best wishes

d

You replied starting this charade.

 

You talk of ‘contempt of Parliament’.

You seem unaware that most of the country feels contempt for Parliament and this contempt is growing.

  • You have failed miserably over Brexit. You have not even bothered to educate yourselves on the basics of ‘what the Single Market is’, as Ivan Rogers explained in detail yesterday.
  • We want £350 million a week for the NHS plus long-term consistent funding and learning from the best systems in the world and instead you funnel our money to appalling companies like the parasites that dominate defence procurement.
  • We want action on unskilled immigration and you give us bullshit promises of ‘tens of thousands’ that you don’t even believe yourselves plus, literally, free movement for murderers, then you wonder why we don’t trust you.
  • We want a country MORE friendly to scientists and people from around the world with skills to offer and you give us ignorant persecution that is making our country a bad joke.
  • We want you to take money away from corporate looters (who fund your party) and fund science research so we can ‘create the future’, and you give us Carillion and joke aircraft carriers.
  • We want to open government to the best people and ideas in the world and you keep it a closed dysfunctional shambles that steals our money and keeps power locked within two useless parties and a closed bureaucracy that excludes ~100% of the most talented people. We want real expertise and you don’t even think about what that means.
  • You spend your time on this sort of grandstanding instead of serving millions of people less fortunate than you and who rely on you.

If you had wanted my evidence you would have cooperated over dates.

You actually wanted to issue threats, watch me give in, then get higher audiences for your grandstanding.

I’m calling your bluff. Your threats are as empty as those from May/Hammond/DD to the EU. Say what you like, I will not come to your committee regardless of how many letters you send or whether you send characters in fancy dress to hand me papers.

If another Committee behaves reasonably and I can give evidence without compromising various legal actions then I will consider it. Once these legal actions have finished, presumably this year, it will be easy to arrange if someone else wants to do it.

Further, I’m told many of your committee support the Adonis/Mandelson/Campbell/Grieve/Goldman Sachs/FT/CBI campaign for a rematch against the country.

Do you know what Vote Leave 2 would feel like for the MPs who vote for that (and donors who fund it)?

It would feel like having Lawrence Taylor chasing you and smashing you into the ground over and over and over again.

Vote Leave 2 would not involve me — nobody will make that mistake again — but I know what it would feel like for every MP who votes for a rematch against the public.

Lawrence Taylor: relentless 

So far you guys have botched things on an epic scale but it’s hard to break into the Westminster system — you rig the rules to stop competition. Vote Leave 1 needed Cameron’s help to hack the system. If you guys want to run with Adonis and create another wave, be careful what you wish for. ‘Unda fert nec regitur’ and VL2 would ride that wave right at the gates of Westminster.

A second referendum would be bad for the country and I hope it doesn’t happen but if you force the issue, then Vote Leave 2 would try to create out of the smoking wreck in SW1 something that can deliver what the public wants. Imagine Amazon-style obsession on customer satisfaction (not competitor and media obsession which is what you guys know) with Silicon Valley technology/scaling and Mueller-style ‘systems politics’ combined with the wave upon wave of emotion you will have created. Here’s some free political advice: when someone’s inside your OODA loop, it feels to them like you are working for them. If you go for a rematch, then this is what you will be doing for people like me. 350m would just be the starter.

‘Mixed emotions, Buddy, like Larry Wildman going off a cliff — in my new Maserati.’

I will happily discuss this with your colleagues on a different committee if they are interested, after the legal issues are finished…

 

Best wishes

Dominic

Ps. If you’re running an inquiry on fake news, it would be better to stop spreading fake news yourselves and to correct your errors when made aware of them. If you’re running an inquiry on issues entangled with technologies, it would be better to provide yourself with technological expertise so you avoid spreading false memes. E.g your recent letter to Facebook asked them to explain to you the operational decision-making of Vote Leave. This is a meaningless question which it is impossible for Facebook to answer and could only be asked by people who do not understand the technology they are investigating.

Effective action #4a: ‘Expertise’ from fighting and physics to economics, politics and government

‘We learn most when we have the most to lose.’ Michael Nielsen, author of the brilliant book Reinventing Discovery.

‘There isn’t one novel thought in all of how Berkshire [Hathaway] is run. It’s all about … exploiting unrecognized simplicities…Warren [Buffett] and I aren’t prodigies.We can’t play chess blindfolded or be concert pianists. But the results are prodigious, because we have a temperamental advantage that more than compensates for a lack of IQ points.’ Charlie Munger,Warren Buffett’s partner.

I’m going to do a series of blogs on the differences between fields dominated by real expertise (like fighting and physics) and fields dominated by bogus expertise (like macroeconomic forecasting, politics/punditry, active fund management).

Fundamental to real expertise is 1) whether the informational structure of the environment is sufficiently regular that it’s possible to make good predictions and 2) does it allow high quality feedback and therefore error-correction. Physics and fighting: Yes. Predicting recessions, forex trading and politics: not so much. I’ll look at studies comparing expert performance in different fields and the superior performance of relatively very simple models over human experts in many fields.

This is useful background to consider a question I spend a lot of time thinking about: how to integrate a) ancient insights and modern case studies about high performance with b) new technology and tools in order to improve the quality of individual, team, and institutional decision-making in politics and government.

I think that fixing the deepest problems of politics and government requires a more general and abstract approach to principles of effective action than is usually considered in political discussion and such an approach could see solutions to specific problems almost magically appear, just as you see happen in a very small number of organisations — e.g Mueller’s Apollo program (man on the moon), PARC (interactive computing), Berkshire Hathaway (most successful investors in history), all of which have delivered what seems almost magical performance because they embody a few simple, powerful, but largely unrecognised principles. There is no ‘solution’ to the fundamental human problem of decision-making amid extreme complexity and uncertainty but we know a) there are ways to do things much better and b) governments mostly ignore them, so there is extremely valuable low-hanging fruit if, but it’s a big if, we can partially overcome the huge meta-problem that governments tend to resist the institutional changes needed to become a learning system.

This blog presents some basic background ideas and examples…

*

Extreme sports: fast feedback = real expertise 

In the 1980s and early 1990s, there was an interesting case study in how useful new knowledge jumped from a tiny isolated group to the general population with big effects on performance in a community. Expertise in Brazilian jiu-jitsu was taken from Brazil to southern California by the Gracie family. There were many sceptics but they vanished rapidly because the Gracies were empiricists. They issued ‘the Gracie challenge’.

All sorts of tough guys, trained in all sorts of ways, were invited to come to their garage/academy in Los Angeles to fight one of the Gracies or their trainees. Very quickly it became obvious that the Gracie training system was revolutionary and they were real experts because they always won. There was very fast and clear feedback on predictions. Gracie jiujitsu quickly jumped from an LA garage to TV. At the televised UFC 1 event in 1993 Royce Gracie defeated everyone and a multi-billion dollar business was born.

People could see how training in this new skill could transform performance. Unarmed combat changed across the world. Disciplines other than jiu jitsu have had to make a choice: either isolate themselves and not compete with jiu jitsu or learn from it. If interested watch the first twenty minutes of this documentary (via professor Steve Hsu, physicist, amateur jiu jitsu practitioner, and predictive genomics expert).

Video: Jiu Jitsu comes to Southern California

Royce Gracie, UFC 1 1993 

Screenshot 2018-05-22 10.41.20

 

Flow, deep in the zone

Another field where there is clear expertise is extreme skiing and snowboarding. One of the leading pioneers, Jeremy Jones, describes how he rides ‘spines’ hurtling down the side of mountains:

‘The snow is so deep you need to use your arms and chest to swim, and your legs to ride. They also collapse underfoot, so you’re riding mini-avalanches and dodging slough slides. Spines have blind rollovers, so you can’t see below. Or to the side. Every time the midline is crossed, it’s a leap into the abyss. Plus, there’s no way to stop and every move is amplified by complicated forces. A tiny hop can easily become a twenty-foot ollie. It’s the absolute edge of chaos. But the easiest way to live in the moment is to put yourself in a situation where there’s no other choice. Spines demand that, they hurl you deep into the zone.’ Emphasis added.

Video: Snowboarder Jeremy Jones

What Jones calls ‘the zone’ is also known as ‘flow‘ — a particular mental state, triggered by environmental cues, that brings greatly enhanced performance. It is the object of study in extreme sports and by the military and intelligence services: for example DARPA is researching whether stimulating the brain can trigger ‘flow’ in snipers.

Flow — or control on ‘the edge of chaos’ where ‘every move is amplified by complicated forces’ — comes from training in which people learn from very rapid feedback between predictions and reality. In ‘flow’, brains very rapidly and accurately process environmental signals and generate hypothetical scenarios/predictions and possible solutions based on experience and training. Jones’s performance is inseparable from developing this fingertip feeling. Similarly, an expert fireman feels the glow of heat on his face in a slightly odd way and runs out of the building just before it collapses without consciously knowing why he did it: his intuition has been trained to learn from feedback and make predictions. Experts operating in ‘flow’ do not follow what is sometimes called the ‘rational model’ of decision-making in which they sequentially interrogate different options — they pattern-match solutions extremely quickly based on experience and intuition.

The video below shows extreme expertise in a state of ‘flow’ with feedback on predictions within milliseconds. This legendary ride is so famous not because of the size of the wave but its odd, and dangerous, nature. If you watch carefully you will see what a true expert in ‘flow’ can do: after committing to the wave Hamilton suddenly realises that unless he reaches back with the opposite hand to normal and drags it against the wall of water behind him, he will get sucked up the wave and might die. (This wave had killed someone a few weeks earlier.) Years of practice and feedback honed the intuition that, when faced with a very dangerous and fast moving problem, almost instantly (few seconds maximum) pattern-matched an innovative solution.

Video: surfer Laird Hamilton in one of the greatest ever rides

 

The faster the feedback cycle, the more likely you are to develop a qualitative improvement in speed that destroys an opponent’s decision-making cycle. If you can reorient yourself faster to the ever-changing environment than your opponent, then you operate inside their ‘OODA loop’ (Observe-Orient-Decide-Act) and the opponent’s performance can quickly degrade and collapse.

This lesson is vital in politics. You can read it in Sun Tzu and see it with Alexander the Great. Everybody can read such lessons and most people will nod along. But it is very hard to apply because most political/government organisations are programmed by their incentives to prioritise seniority, process and prestige over high performance and this slows and degrades decisions. Most organisations don’t do it. Further, political organisations tend to make too slowly those decisions that should be fast and too quickly those decisions that should be slow — they are simultaneously both too sluggish and too impetuous, which closes off favourable branching histories of the future.

Video: Boxer Floyd Mayweather, best fighter of his generation and one of the quickest and best defensive fighters ever

The most extreme example in extreme sports is probably ‘free soloing’ — climbing mountains without ropes where one mistake means instant death. If you want to see an example of genuine expertise and the value of fast feedback then watch Alex Honnold.

Video: Alex Honnold ‘free solos’ El Sendero Luminoso (terrifying)

Music is similar to sport. There is very fast feedback, learning, and a clear hierarchy of expertise.

Video: Glenn Gould playing the Goldberg Variations (slow version)

Our culture treats expertise/high performance in fields like sport and music very differently to maths/science education and politics/government. As Alan Kay observes, music and sport expertise is embedded in the broader culture. Millions of children spend large amounts of time practising hard skills. Attacks on them as ‘elitist’ don’t get the same damaging purchase as in other fields and the public don’t mind about elite selection for sports teams or orchestras.

‘Two ideas about this are that a) these [sport/music] are activities in which the basic act can be seen clearly from the first, and b) are already part of the larger culture. There are levels that can be seen to be inclusive starting with modest skills. I think a very large problem for the learning of both science and math is just how invisible are their processes, especially in schools.’ Kay 

When it comes to maths and science education, the powers-that-be (in America and Britain) try very hard and mostly successfully to ignore the question: where are critical thresholds for valuable skills that develop true expertise. This is even more a problem with the concept of ‘thinking rationally’, for which some basic logic, probability, and understanding of scientific reasoning is a foundation. Discussion of politics and government almost totally ignores the concept of training people to update their opinions in response to new evidence — i.e adapt to feedback. The ‘rationalist community’ — people like Scott Alexander who wrote this fantastic essay (Moloch) about why so much goes wrong, or the recent essays by Eliezer Yudkowsky — are ignored at the apex of power. I will return to the subject of how to create new education and training programmes for elite decision-makers. It is a good time for UK universities to innovate in this field, as places like Stanford are already doing. Instead of training people like Cameron and Adonis to bluff with PPE, we need courses that combine rational thinking with practical training in managing complex projects. We need people who practice really hard making predictions in ways we know work well (cf. Tetlock) then update in response to errors.

*

A more general/abstract approach to reforming government

If we want to get much higher performance in government, then we need to think rigorously about: the selection of people and teams, their education and training, their tools, and the institutions (incentives and so on) that surround and shape them.

Almost all analysis of politics and government considers relatively surface phenomena. For example, the media briefly blasts headlines about Carillion’s collapse or our comical aircraft carriers but there is almost no consideration of the deep reasons for such failures and therefore nothing tends to happen — the media caravan moves on and the officials and ministers keep failing in the same ways. This is why, for example, the predicted abject failure of the traditional Westminster machinery to cope with Brexit negotiations has not led to self-examination and learning but, instead, mostly to a visible determination across both sides of the Brexit divide in SW1 to double down on long-held delusions.

Progress requires attacking the ‘system of systems’ problem at the right ‘level’. Attacking the problems directly — let’s improve policy X and Y, let’s swap ‘incompetent’ A for ‘competent’ B — cannot touch the core problems, particularly the hardest meta-problem that government systems bitterly fight improvement. Solving the explicit surface problems of politics and government is best approached by a more general focus on applying abstract principles of effective action. We need to surround relatively specific problems with a more general approach. Attack at the right level will see specific solutions automatically ‘pop out’ of the system. One of the most powerful simplicities in all conflict (almost always unrecognised) is: ‘winning without fighting is the highest form of war’. If we approach the problem of government performance at the right level of generality then we have a chance to solve specific problems ‘without fighting’ — or, rather, without fighting nearly so much and the fighting will be more fruitful.

This is not a theoretical argument. If you look carefully at ancient texts and modern case studies, you see that applying a small number of very simple, powerful, but largely unrecognised principles (that are very hard for organisations to operationalise) can produce extremely surprising results.

We have no alternative to trying. Without fundamental changes to government, we will lose our hourly game of Russian roulette with technological progress.

‘The combination of physics and politics could render the surface of the earth uninhabitable… [T]he ever accelerating progress of technology and changes in the mode of human life … gives the appearance of approaching some essential singularity in the history of the race beyond which human affairs, as we know them, could not continue.’ John von Neumann

As Steve Hsu says: Pessimism of the Intellect, Optimism of the Will.


Ps. There is an interesting connection between the nature of counterfactual reasoning in the fast-moving world of extreme sports and the theoretical paper I posted yesterday on state-of-the-art AI. The human ability to interrogate stored representations of their environment with counter-factual questions is fundamental to the nature of intelligence and developing expertise in physical and mental skills. It is, for now, absent in machines.

State-of-the-art in AI #1: causality, hypotheticals, and robots with free will & capacity for evil (UPDATED)

Judea Pearl is one of the most important scholars in the field of causal reasoning. His book Causality is the leading textbook in the field.

This blog has two short parts — a paper he wrote a few months ago and an interview he gave a few days ago.

*

He recently wrote a very interesting (to the very limited extent I understand it) short paper about the limits of state-of-the-art AI systems using ‘deep learning’ neural networks — such as the AlphaGo system which recently conquered the game of GO and AlphaZero which blew past centuries of human knowledge of chess in 24 hours — and how these systems could be improved.

The human ability to interrogate stored representations of their environment with counter-factual questions is fundamental and, for now, absent in machines. (All bold added my me.)

‘If we examine the information that drives machine learning today, we find that it is almost entirely statistical. In other words, learning machines improve their performance by optimizing parameters over a stream of sensory inputs received from the environment. It is a slow process, analogous in many respects to the evolutionary survival-of-the-fittest process that explains how species like eagles and snakes have developed superb vision systems over millions of years. It cannot explain however the super-evolutionary process that enabled humans to build eyeglasses and telescopes over barely one thousand years. What humans possessed that other species lacked was a mental representation, a blue-print of their environment which they could manipulate at will to imagine alternative hypothetical environments for planning and learning…

‘[T]he decisive ingredient that gave our homo sapiens ancestors the ability to achieve global dominion, about 40,000 years ago, was their ability to sketch and store a representation of their environment, interrogate that representation, distort it by mental acts of imagination and finally answer “What if?” kind of questions. Examples are interventional questions: “What if I act?” and retrospective or explanatory questions: “What if I had acted differently?” No learning machine in operation today can answer such questions about actions not taken before. Moreover, most learning machines today do not utilize a representation from which such questions can be answered.

‘We postulate that the major impediment to achieving accelerated learning speeds as well as human level performance can be overcome by removing these barriers and equipping learning machines with causal reasoning tools. This postulate would have been speculative twenty years ago, prior to the mathematization of counterfactuals. Not so today. Advances in graphical and structural models have made counterfactuals computationally manageable and thus rendered meta-statistical learning worthy of serious exploration

Figure: the ladder of causation

Screenshot 2018-03-12 11.22.54

‘An extremely useful insight unveiled by the logic of causal reasoning is the existence of a sharp classification of causal information, in terms of the kind of questions that each class is capable of answering. The classification forms a 3-level hierarchy in the sense that questions at level i (i = 1, 2, 3) can only be answered if information from level j (j ≥ i) is available. [See figure]… Counterfactuals are placed at the top of the hierarchy because they subsume interventional and associational questions. If we have a model that can answer counterfactual queries, we can also answer questions about interventions and observations… The translation does not work in the opposite direction… No counterfactual question involving retrospection can be answered from purely interventional information, such as that acquired from controlled experiments; we cannot re-run an experiment on subjects who were treated with a drug and see how they behave had then not given the drug. The hierarchy is therefore directional, with the top level being the most powerful one. Counterfactuals are the building blocks of scientific thinking as well as legal and moral reasoning…

‘This hierarchy, and the formal restrictions it entails, explains why statistics-based machine learning systems are prevented from reasoning about actions, experiments and explanations. It also suggests what external information need to be provided to, or assumed by, a learning system, and in what format, in order to circumvent those restrictions

[He describes his approach to giving machines the ability to reason in more advanced ways (‘intent-specific optimization’) than standard approaches and the success of some experiments on real problems.]

[T]he value of intent-base optimization … contains … the key by which counterfactual information can be extracted out of experiments. The key is to have agents who pause, deliberate, and then act, possibly contrary to their original intent. The ability to record the discrepancy between outcomes resulting from enacting one’s intent and those resulting from acting after a deliberative pause, provides the information that renders counterfactuals estimable. It is this information that enables us to cross the barrier between layer 2 and layer 3 of the causal hierarchy… Every child undergoes experiences where he/she pauses and thinks: Can I do better? If mental records are kept of those experiences, we have experimental semantic to counterfactual thinking in the form of regret sentences “I could have done better.” The practical implications of this new semantics is worth exploring.’

The paper is here: http://web.cs.ucla.edu/~kaoru/theoretical-impediments.pdf.

*

By chance this evening I came across this interview with Pearl in which he discuses some of the ideas above less formally, HERE.

‘The problems that emerged in the early 1980s were of a predictive or diagnostic nature. A doctor looks at a bunch of symptoms from a patient and wants to come up with the probability that the patient has malaria or some other disease. We wanted automatic systems, expert systems, to be able to replace the professional — whether a doctor, or an explorer for minerals, or some other kind of paid expert. So at that point I came up with the idea of doing it probabilistically.

‘Unfortunately, standard probability calculations required exponential space and exponential time. I came up with a scheme called Bayesian networks that required polynomial time and was also quite transparent.

‘[A]s soon as we developed tools that enabled machines to reason with uncertainty, I left the arena to pursue a more challenging task: reasoning with cause and effect.

‘All the machine-learning work that we see today is conducted in diagnostic mode — say, labeling objects as “cat” or “tiger.” They don’t care about intervention; they just want to recognize an object and to predict how it’s going to evolve in time.

‘I felt an apostate when I developed powerful tools for prediction and diagnosis knowing already that this is merely the tip of human intelligence. If we want machines to reason about interventions (“What if we ban cigarettes?”) and introspection (“What if I had finished high school?”), we must invoke causal models. Associations are not enough — and this is a mathematical fact, not opinion.

‘As much as I look into what’s being done with deep learning, I see they’re all stuck there on the level of associations. Curve fitting. That sounds like sacrilege, to say that all the impressive achievements of deep learning amount to just fitting a curve to data. From the point of view of the mathematical hierarchy, no matter how skillfully you manipulate the data and what you read into the data when you manipulate it, it’s still a curve-fitting exercise, albeit complex and nontrivial.

‘I’m very impressed, because we did not expect that so many problems could be solved by pure curve fitting. It turns out they can. But I’m asking about the future — what next? Can you have a robot scientist that would plan an experiment and find new answers to pending scientific questions? That’s the next step. We also want to conduct some communication with a machine that is meaningful, and meaningful means matching our intuition.

‘If a machine does not have a model of reality, you cannot expect the machine to behave intelligently in that reality. The first step, one that will take place in maybe 10 years, is that conceptual models of reality will be programmed by humans. The next step will be that machines will postulate such models on their own and will verify and refine them based on empirical evidence. That is what happened to science; we started with a geocentric model, with circles and epicycles, and ended up with a heliocentric model with its ellipses.

We’re going to have robots with free will, absolutely. We have to understand how to program them and what we gain out of it. For some reason, evolution has found this sensation of free will to be computationally desirable… Evidently, it serves some computational function.

‘I think the first evidence will be if robots start communicating with each other counterfactually, like “You should have done better.” If a team of robots playing soccer starts to communicate in this language, then we’ll know that they have a sensation of free will. “You should have passed me the ball — I was waiting for you and you didn’t!” “You should have” means you could have controlled whatever urges made you do what you did, and you didn’t.

[When will robots be evil?] When it appears that the robot follows the advice of some software components and not others, when the robot ignores the advice of other components that are maintaining norms of behavior that have been programmed into them or are expected to be there on the basis of past learning. And the robot stops following them.’

Please leave links to significant critiques of this paper or work that has developed the ideas in it.

If interested in the pre-history of the computer age and internet, this paper explores it.

On the referendum 24I: new research on Facebook & ‘psychographic’ microtargeting

Summary: a short blog on a new paper casting doubt on claims re microtargeting using Facebook.

The audience for conspiracy theories about microtargeting, Facebook and Brexit is large and includes a big subset of SW1 and a wider group (but much smaller than it thinks it is) that wants a rematch against the public. The audience for facts, evidence and research about microtargeting, Facebook and Brexit is tiny. If you are part of this tiny audience…

I wrote a few days ago about good evidence on microtargeting in general and Cambridge Analytica’s claims on ‘psychographics’ in particular (see HERE).

Nutshell: the evidence and science re ‘microtargeting’ does not match the story you read in the media or the conspiracy theories about the referendum, and Vote Leave did not do microtargeting in any normal sense of the term.

Another interesting paper on this subject has been published a few days ago.

Background…

One of the most influential researchers cited by the media since Brexit/Trump is Michal Kosinski who wrote a widely cited 2015 paper on predicting Big 5 personality traits from Facebook ‘likes’: Computer-based personality judgments are more accurate than those made by humans.

Duncan Watts, one of the leading scholars in computational sociology, pointed out:

‘All it shows is that algorithmic predictions of Big 5 traits are about as accurate as human predictions, which is to say only about 50 percent accurate. If all you had to do to change someone’s opinion was guess their openness or political attitude, then even really noisy predictions might be worrying at scale. But predicting attributes is much easier than persuading people.’

Kosinski published another paper recently: Psychological targeting as an effective approach to digital mass persuasion (November 2017). The core claim was:

‘In three field experiments that reached over 3.5 million individuals with psychologically tailored advertising, we find that matching the content of persuasive appeals to individuals’ psychological characteristics significantly altered their behavior as measured by clicks and purchases. Persuasive appeals that were matched to people’s extraversion or openness-to-experience level resulted in up to 40% more clicks and up to 50% more purchases than their mismatching or unpersonalized counterparts. Our findings suggest that the application of psychological targeting makes it possible to influence the behavior of large groups of people by tailoring persuasive appeals to the psychological needs of the target audiences.’

If this claim were true it would be a big deal in the advertising world. Further, Kosinski claimed that ‘The assumption is that the same effects can be observed in political messages.’ That would be an even bigger deal.

I was sceptical when I read the 2017 paper, mainly given the large amount of evidence in books like Hacking the Electorate that I touched on in the previous blog, but I didn’t have the time or expertise to investigate. I did read this Wired piece on that paper in which Watts commented:

‘Watts says that the 2017 paper didn’t convince him the technique could work, either. The results barely improve click-through rates, he says — a far cry from predicting political behavior. And more than that, Kosinski’s mistargeted openness ads — that is, the ads tailored for the opposite personality characteristic — far outperformed the targeted extraversion ads. Watts says that suggests other, uncontrolled factors are having unknown effects. “So again,” he says, “I would question how meaningful these effects are in practice.”‘

Another leading researcher, David Lazer, commented:

‘On the psychographic stuff, I haven’t see any science that really aligns with their [CA/Kosinski] claims.’

Another leading researcher, Alex Pentland at MIT (who also successfully won a DARPA project to solve a geolocation intelligence problem) was also sceptical:

‘Everybody talks about Google and Facebook, but the things that people say online are not nearly as predictive as, say, what your telephone company knows about you. Or your credit card company. Fortunately telephone companies, banks, things like that are very highly regulated companies. So we have a fair amount of time. It may never happen that the data gets loose.’

I’ve just been sent this paper (preprint link): Field studies of psychologically targeted ads face threats to internal validity (2018). It is an analysis of Kosinski’s 2017 experiments. It argues that the Kosinski experiment is NOT RANDOMISED and points out statistical and other flaws that undermine Kosinski’s claims:

‘The paper [Kosinski 2017] uses Facebook’s standard ad platform to compare how different versions of ads perform. However, this process does not create a randomized experiment: users are not randomly assigned to different ads, and individuals may even receive multiple ad types (e.g., both extroverted and introverted ads). Furthermore, ad platforms like Facebook optimize campaign performance by showing ads to users whom the platform expects are more likely to fulfill the campaign’s objective… This optimization generates differences in the set of users exposed to each ad type, so that differences in responses across ads do not by themselves indicate a causal effect.’ (Emphasis added.)

Kosinski et al reply here. They admit that the optimisation of Facebook’s ad algorithms could affect their results though they defend their work. (Campaigns face similar operational problems in figuring out ways to run experiments on Facebook without FB’s algorithms distorting them.)

I am not remotely competent to judge the conflicting claims and haven’t yet asked anybody who is though I have a (mostly worthless) hunch that the criticisms will stack up. I’ll add an update in the future when this is resolved.

Big claims require good evidence and good science — not what Feynman called ‘cargo cult science’ which accounts for a lot of social science research. Most claims you read about psychological manipulation are rubbish. There are interesting possibilities for applying advanced technology, as I wrote in my last blog, but a) almost everything you read about is not in this class and b) I am sceptical in general that ideas in published work on using Big 5 personality traits could add anything more than a very small boost to political campaigns at best and it can also easily blow up in your face, as Hersh’s evidence to the Senate shows. I strongly suspect that usually the ‘gains’ are less than the fees of the consultants flogging the snake oil — i.e a net loss for campaigns.

If you believe, like the Observer, that the US/UK military and/or intelligence services have access to technological methods of psychological manipulation that greatly exceed what is done commercially, you misunderstand their real capabilities. For example, look at how the commander of US classified special forces (JSOC), Stanley McChrystal, recruited civilians for his propaganda operations in Afghanistan because the military did not know what to do. The evidence since 9/11 is of general failure in the UK/USA viz propaganda / ‘information war’ / ‘hybrid war’ etc. Further, if you want expertise on things like Facebook and Google, the place to look is Silicon Valley, not the Pentagon. Look at how recent UK Prime Ministers have behaved. Look at how Cameron tweeted about rushing back from Chequers in the middle of the night to deal with ISIS beheadings. Look at how Blair, Brown and Cameron foolishly read out the names of people killed in the Commons. Of course it is impossible from the outside to know how much of this is because Downing Street mangles advice and operations and how much is failure elsewhere. I assume there are lots of good people in the system but, like elsewhere in modern Whitehall, expertise is suppressed by centralised hierarchies (as with Brexit).

On campaigns and in government, figuring out the answers to a few deep questions is much more important than practically anything you read about technology issues like microtargeting. But focus and priorities are very hard for big organisations including parties and governments, because they are mostly dominated by seniority, groupthink, signalling, distorted incentives and so on. A lack of focus means they spread intelligent effort too widely and don’t think enough about deep questions that overwhelmingly determine their fate.

Of course, it is possible to use technology to enhance campaigns and it is possible to devise messages that have game-changing effects but the media focus on microtargeting is almost completely misguided and the Select Committee’s inquiry into fake news has mostly spread fake news. There has been zero scrutiny, as far as I have seen, on the evidence from reputable scholars like Duncan Watts or Eitan Hersh on the facts and evidence about microtargeting and fake news in relation to Trump/Brexit. Sadly they are more interested in grandstanding than truth-seeking, which is why the Committee turned down my offer to arrange a time to give evidence and instead tried to grab headlines. I offered friendly cooperation, as the government should have done with Brexit, but the Committee went for empty threats, as per May and Hammond, and this approach will be as successful as this government’s negotiating strategy.