Computational Modelling of Public Policy: Reflections on Practice

Computational models are increasingly being used to assist in developing, implementing and evaluating public policy. This paper reports on the experience of the authors in designing and using computational models of public policy (‘policy models’, for short). The paper considers the role of computational models in policy making, and some of the challenges that need to be overcome if policy models are to make an e ective contribution. It suggests that policy models can have an important place in the policy process because they could allow policy makers to experiment in a virtual world, and have many advantages compared with randomised control trials and policy pilots. The paper then summarises some general lessons that can be extracted from the authors’ experiencewith policymodelling. These general lessons include the observation that o en themain benefit of designing andusing amodel is that it provides anunderstanding of the policy domain, rather than the numbers it generates; that care needs to be taken that models are designed at an appropriate level of abstraction; that although appropriate data for calibration and validation may sometimes be in short supply, modelling is o en still valuable; that modelling collaboratively and involving a range of stakeholders from the outset increases the likelihood that the model will be used and will be fit for purpose; that attention needs to be paid to e ective communication betweenmodellers and stakeholders; and thatmodelling for public policy involves ethical issues that need careful consideration. The paper concludes that policy modelling will continue to grow in importance as a component of public policy making processes, but if its potential is to be fully realised, there will need to be amelding of the cultures of computationalmodelling and policymaking.


Introduction
. Computational models have been used to assist in developing, implementing and evaluating public policies for at least three decades, but their potential remains to be fully exploited (Johnston & Desouza ; Anzola et al. ; Barbrook-Johnson et al. ). In this paper, using a selection of examples of computational models used in public policy processes, we (i) consider the roles of models in policy making, (ii) explore policy making as a type of experimentation in relation to model experiments, and (iii) suggest some key lessons for the e ective use of models. We also highlight some of the challenges and opportunities facing such models and their use in the future. Our aim is to support the modelling community that reads this journal in its e ort to build computational models of public policy that are valuable and useful. .
We believe this e ort is timely given that computational models, of the type this journal regularly reports on, are now increasingly used by government, business, and civil society as well as in academic communities (Hauke et al. ). There are many guides to computational modelling produced for di erent communities, for example in UK government the 'Aqua Book' (reviewed for JASSS in Edmonds ( )), but these are o en aimed at practitioner and government audiences, can be highly procedural and technical, generally omit discussion of failure and rarely include deeper reflections on how best to model for public policy. Our aim here is to fill gaps .
In the remainder of this paper, Section introduces the role of policy models in policy making. Section explores the idea of policy making as a type of experimentation in relation to policy model experiments. We then discuss some examples and experiences of policy modelling (Section ) and draw out some key lessons to help make policy modelling more e ective (Section ). Finally, Section concludes and discusses some key next steps and other opportunities for computational policy modellers.

The Role of Models in Policy Making
. The standard, but now somewhat discredited view of policy making is that it occurs in cycles (for example, see the seminal arguments made in Lindblom and Lindblom ; and more recently o icial recognition in HM Treasury ). A policy problem comes to light, perhaps through the occurrence of some crisis, a media campaign, or as a response to a political event. This is the agenda setting stage and is followed by policy formulation, gathering support for the policy, implementing the policy, monitoring and evaluating the success of the policy and finally policy maintenance or termination. The cycle then starts again, as new needs or circumstances generate demands for new policies. Although the idea of a policy cycle has the merit of being a clear and straightforward way of conceptualising the development of policy, it has been criticised as being unrealistic and oversimplifying what happens, which is typically highly complex and contingent on multiple sources of pressure and information (Cairney ; Moran ), and even self-organising (Byrne & Callaghan ; Teisman & Klijn ).
. The idea of a cycle does, however, still help to identify the many components that make up the design and implementation of policy. There are at least two areas where models have a clear and important role to play: in policy design and appraisal, and policy evaluation. Policy appraisal (as defined in HM Treasury , sometimes referred to as ex-ante evaluation, consists of assessing the relative merits of alternative policy prescriptions in meeting the policy objectives. Appraisal findings are a key input into policy design decisions. Policy evaluation either takes a summative approach, examining whether a policy has actually met its objectives (i.e. ex-post), or a more formative approach to see how a policy might be working, for whom and where (HM Treasury ). In the formative role, the key goal is learning to inform future iterations of the policy, and others with similar characteristics.

Modelling to support policy design and appraisal .
When used ex-ante, a policy model may be used to explore a policy option, helping to identify and specify in detail a consistent policy design (HM Treasury ), for example by locating where best a policy might intervene, or by identifying possible synergies or conflicts between the mechanisms of multiple policies. Policy models can also be used to appraise alternative policies, to see which of several possibilities can be expected to yield the best or most robust outcome. In this mode, a policy model is in essence used to 'experiment' with alternative policy options and assumptions about the system in which it is intervening, by changing the parameters or the rules in the model and observing what the outcomes are. This is valuable because it saves the time and cost associated with having to run experiments or pilots in the actual policy domain. This concept of the model as an experimental space is discussed in more detail in Section below.
. The common assumption is that one builds computational models in order to make predictions. However, prediction, in the sense of predicting the future value of some measure, is in fact o en impossible in policy domains. Social and economic phenomena are o en complex (in the technical sense, see e.g. Sawyer ). This means that how some process evolves depends on random chance, its previous history ('path dependence') and the e ect of positive and negative feedback loops. Just as with the weather, for which exact forecasting is impossible more than a few days ahead, the future course of many social processes may be literally unknowable in detail, no matter how detailed the model may be. Secondly, a model is necessarily an abstraction from reality, and since it is impossible to isolate sections of society, from outside influences, there may be unexpected exogenous factors that have not been modelled and that a ect the outcome. .
For these reasons, the ability to make 'point predictions', i.e. forecasts of specific values at a specific time in the future, is rarely possible. More possible is a prediction that some event will or will not take place, or qualitative statements about the type or direction of change of values. Understanding what sort of unexpected outcomes can emerge and something of the nature of how these arise also helps design policies that can be responsive to unexpected outcomes when they do arise. It can be particularly helpful in changing environments to use the model to explore what might happen under a range of possible, but di erent, potential futures -without any commitment about which of these may eventually transpire. Even more valuable is a finding that the model shows that certain outcomes could not be achieved given the assumptions of the model. An example of this is the use of a whole system energy model to develop scenarios that meet the decarbonisation goals set by the EU for (see, for example, RAENG .) .
Rather di erent from using models to make predictions or generate scenarios is the use of models to formalise and clarify understanding of the processes at work in some domain. If this is done carefully, the model may be valuable as a training or communication tool, demonstrating the mechanisms at work in a policy domain and how they interact.
Modelling to support policy evaluation . To evaluate a policy ex-post, one needs to compare what happened a er the policy has been implemented against what would have happened in the absence of the policy (the 'counterfactual'). To do this, one needs data about the real situation (with the policy evaluation) and data about the situation if the policy had not been implemented (the so-called 'business as usual' situation). To obtain the latter, one can use a randomised control trial (RCT) or quasi-experiment (HM Treasury ), but this is o en di icult, expensive and sometimes impossible to carry out due to the nature of the intervention barring possibility of creating control groups (e.g. a scheme which is accessible to all, or a policy in which local implementation decisions are impossible to control and have a strong e ect).

.
Policy models o er some alternatives. One is to develop a computational model and run simulations with and without implementation of a policy, and then compare formally the two model outcomes with each other and with reality (with the policy implemented), using quantitative analysis. This avoids the problems of having to establish a real-world counterfactual. Once again, the policy model is being used in place of an experiment.
. Another alternative is to use more qualitative System Mapping type approaches (e.g. Fuzzy Cognitive Mapping; see Uprichard & Penn ), to build qualitative models with di erent structures and assumptions (to represent the situation with and without the intervention), and again interrogate the di erent outcomes of the model analyses. .
Finally, another use in ex-post evaluation is to use models to refine and test the theory of how policies might have a ected an outcome of interest, i.e. to support common theory-based approaches to evaluation such as Theory of Change (see Clark & Taplin ), and Logic Mapping (see Hills ). .
Interrogation of models and model results can be done quantitatively (i.e. through multiple simulations, sensitivity analysis, and 'what if' tests), but may also be done in qualitative and participatory fashion with stakeholders, with stakeholders involved in the actual analysis (as opposed to just being shown the results). The choice should be driven by the purpose of the modelling process, and the needs of stakeholders. In both exante and ex-post evaluation, policy models can be powerful tools to use as a route for engaging and informing stakeholders, including the public, about policies and their implications (Voinov & Bousquet ). This may be by including stakeholders in the process, decisions, and validation of model design; or it may be later in the process, in using the results of a model to open up discussions with stakeholders, and/or even using the model 'live' to explore connections between assumptions, scenarios, and outcomes (Johnson a).
Di iculties in the use of modelling .
While, in principle, policy models have all these roles and potential benefits, experience shows that it can be di icult to achieve them (see Taylor ; Kolkman et al. , and Section for some examples). The policy process has many characteristics that can make it di icult to incorporate modelling successfully, including : • The need for acceptability and transparency: policy makers may fall back on more traditional and more widely accepted forms of evidence, especially where the risks associated with the decision are high. Models may appear to act as black boxes that only experts can understand or use, with outputs highly reliant on assumptions that are di icult to validate. Analysts and researchers in government o en have little autonomy, and although they may see the value of policy models, it can be di icult for them to communicate this to the decision-makers.
• Change and uncertainty: the environment in which the policy will be implemented, may be highly uncertain, this can undermine model development when beliefs or decisions shi as a result of the modelling process (although this is equally an important outcome and benefit of the modelling process), or other factors.
• Short timescales: the timescales associated with policy decision making are almost always relatively fast, and needs can be di icult to predict, meaning it can be di icult for computational modellers to provide timely support.
• Procurement processes: o en departments lack the capability and su iciently flexible processes to procure complex modelling.
• The political and pragmatic realities of decision making: individuals' values and political values can hold huge sway, even in the face of empirical evidence (let alone modelling) that may contradict their view, or point towards policy which is politically impossible.
• Stakeholders: There will be many di erent stakeholders involved in developing, or a ected by, policies. It will not be possible to engage all of these in the policy modelling process, and indeed policy makers may be wary of closely involving them in a participatory modelling process.
. These characteristics may also apply more widely to evidence and other forms of research and analysis. It is not our suggestion that these characteristics are inherently negative; they may be important and reasonable parts of the policy making process. The important thing to remember, as a modeller, is that a model can only, and should only, provide more information to the process, not a final decision for the policy process to simply implement.

Policy Experiments and Policy Models
. Although the roles and uses of policy models are relatively well-described and understood, our perception is that there are still many areas where more use could be made of modelling and that a lack of familiarity with, and confidence in, policy modelling, is restricting its use. Potential users may question whether policy modelling in their domain is su iciently scientifically established and mature to be safely applied to guiding realworld policies. The di erence between applying policies to the real world and making experimental interventions in a policy model might be too big to generate any learning from the latter to inform the former.

Policy pilots .
One response is to argue that actual policy implementations are themselves experimental interventions and are therefore of the same character as interventions in a policy model. Boeschen et al. ( ) propose that we live in "experimental societies" and that implementing policies is nothing but conducting "real-world experiments". Real-world experiments are "a more or less legitimate, methodically guided or carelessly adopted social practice to start something new" (Krohn , p. ; own translation). Their outcomes immediately display "success or failure of a design process" (ibid., p. ). .
A real-world experiment implements one solution for the policy design problem. It does not check for other possible solutions or alternative options, but at best monitors and responds to what is emerging in real time.
Implementing policies as a real-world experiment is therefore far from ideal and far removed from the idea of reversibility in the laboratory. In laboratory experiments the experimental system is isolated from its environment in such a way that the e ects of single parameters can be observed. .
One approach that tries to bridge the gap between the real-world and laboratory experiments is to conduct policy pilots. The use of policy pilots (Greenberg & Shroder ; Cabinet O ice ; Martin & Sanderson ) as social experiments is fairly widespread. In a policy pilot, a policy change can be assessed against a counterfactual in a limited context before rolling it out for general implementation. In this way (a small number of) di erent solutions can be tried out and evaluated, and learning fed back into policy design. .
A dominant method for policy pilots is the Randomised Control Trial (RCT) (Greenberg & Shroder ; Boruch ), well-known from medical research, where a carefully selected treatment group is compared with a control group that is not administered the treatment under scrutiny. RCTs can thus present a halfway house between an idealised laboratory experiment and a real-world experiment. However, the claim that an RCT is capable of reproducing a laboratory situation where rigorous testing against a counterfactual is possible has also been contested (Cabinet O ice , p. ). It is argued that in principle there is no possibility of social experiments due to the requirement of ceteris paribus (i.e. in the social world, it is impossible to have two experiments with everything equal but the one parameter under scrutiny); that the complex system-environment interactions that are necessary to adequately understand social systems cannot be reproduced in an RCT; and that random allocation is impossible in many domains, so that a 'neutral' counterfactual cannot be established. Moreover, it may be a risky political strategy or even unethical to administer a certain benefit in some pilot context but not to the corresponding control group. This is even more the case if the policy would put the selected recipients at a disadvantage (Cabinet O ice , p. ).
. There are also more practical problems to consider, among them time, sta resources and budget. There is general agreement that a good pilot is costly, time-consuming, "administratively cumbersome" and in need of well-trained managing sta (Cabinet O ice , p. ). There is "a sense of pessimism and disappointment with the way policy pilots and evaluations are currently used and were used in the past (. . . ): poorly designed studies; weak methodologies; impatient political masters; time pressures and unrealistic deadlines" (Seminar on Policy Pilots and Evaluation , p. ).

While
. Thus, policy pilots cannot meet the claim to be a happy medium between laboratory experiments, with their isolation strategies capable of parameter variation, and Krohn's real-world experiments involving complex systemenvironment interactions in real time. This is where computational policy modelling comes in.
Policy models for policy experimentation .
Unlike policy pilots, computational policy models are able to work with ceteris paribus rules, random control, and non-contaminated counterfactuals (see below). Using policy models, we can explore alternative solutions, simply by trying out parameter variations in the model, and experiment with context-specific models and with short, medium and long time horizons. Furthermore, policy models are ethically and politically neutral to build and run, though the use of their outcomes may not be.
. Unlike real-world and policy pilots, policy models allow the user to investigate the future. Initially the modellers will seek to reproduce the database describing the initial state of a real-world experiment and then extrapolate simulated structures and dynamics into the future. At first a baseline scenario can be derived: what if there were no changes in the future? This is artificial and, for methodological reasons, boring: nothing much happens but incremental evolution, no event, no surprise, no intervention; changes can then be introduced. .
As with real-world experiments, modelling experiments enable recursive learning by stakeholders. Stakeholders can achieve system competence and practical skills through interacting with the model to learn 'by doing' how to act in complex situations. With the model, it is not only possible to simulate the real-world experiment envisaged but also to test multiple scenarios for potential real-world experiments via extensive parameter variations. The whole solution space can be checked, where future states are not only accessible but tractable.
. This does not imply that it is possible to obtain exact predictions for future states of complex social systems (see the discussion on prediction above). Deciding under uncertainty has to be informed di erently: "Experimenting under conditions of uncertainties of this kind, it appears, will be one of the most distinctive characteristics of decision-making in future societies [. . . ], they import and use methods of investigation and research. Among these are conceptual modelling of complex situations, computer simulation of possible futures, and -perhaps most promising -the turning of scenarios into 'real-world experiments'" (Gross & Krohn , p. ). .
Regarding the continuum between the extremes of giving no consideration (e.g. with laboratory experiments) and full consideration (e.g. real-world experiments) to complex system-environment interactions, policy modelling experiments indeed sit somewhere in the (happy) middle. We would argue that, where the costs or risks associated with a policy change are high, and the context is complex, it is not only common sense to carry out policy modelling, but it would be unethical not to.

Examples of Policy Models
. We have discussed the role of policy models in abstract at some length, it is now important to illustrate the use of policy models using a number of examples of policy modelling drawn from our own experience. These have been selected to o er a wide range of types of model and contexts of application. In the spirit of recording failure as well as success, we mention not only the ultimate outcomes, but also some of the problems and challenges encountered along the way. In the next section, we shall draw out some general lessons from these examples.
The European-funded TELL ME project focused on health communication associated with influenza epidemics. One output was a prototype agent-based model, intended to be used by health communicators to understand the potential e ects of di erent communication plans under various influenza epidemic scenarios (Figure ).
. The basic structure of the model was determined by its purpose: to compare the potential e ects of di erent communication plans on protective personal behaviour and hence on the spread of an influenza epidemic. This requires two linked models: a behaviour model that simulates the way in which people respond to communication and make decisions about whether to vaccinate or adopt other protective behaviour, and an epidemic model that simulates the spread of influenza. The key model entities are: (i) messages, which together implement the communication plans; (ii) individuals, who receive communication and make decisions about whether to adopt protective behaviour; and (iii) regions, which hold information about the local epidemic state. The major flow of influence is the e ect that communication has on attitude and hence behaviour, which affects epidemic transmission and hence incidence. Incidence contributes to perceived risk, which influences behaviour and establishes a feedback relationship (see Badham & Gilbert for the detailed specification). A fuller description of the model and discussion on it uses can be found in Barbrook-Johnson et al. ( ). A more technical paper on a novel model calibration approach using the Tell Me as an example is presented in Badham et al. ( ).
. Drawing on findings from stakeholder workshops and the results of the model itself, the modelling team suggest the TELL ME model can be useful: (i) as a teaching tool, (ii) to test theory, and (iii) to inform data collection (Barbrook-Johnson et al. ).

HOPES .
Practice theories provide an alternative to the theory of planned behaviour and the theory of reasoned action to explore sustainability issues such as energy use, climate change, food production, water scarcity, etc. The central argument is that the routine activities (aka practices) that people carry out in the service of everyday living (e.g. ways of cooking, eating, travelling, etc.), o en with some level of automaticity developed over time, should be the focus of inquiry and intervention if the goal is to transform energy-and emissions-intensive ways of living.
. The Households and Practices in Energy use Scenarios (HOPES) agent-based model (Narasimhan et al. ) was developed to formalise key features of practice theories and to use the model to explore the dynamics of energy use in households. A key theoretical feature that HOPES sought to formalise is the performance of practices, enabled by the coming together of appropriate meanings (mental activities of understanding, knowing how and desiring, Reckwitz ), materials (objects, body and mind) and skills (competences). For example, a laundry practice could signify a desire for clean clothes (meaning) realised by using a washing machine (material) and knowing how to operate the washing machine (skill); performance of the practice then results in energy use.
. HOPES has two types of agents: households and practices. Elements (meanings, materials and skills) are entities in the model. The model concept is that households choose di erent elements to perform practices depending on the socio-technical settings unique to each household. The performance of some practices result in energy use while some do not, e.g., using a heater to keep warm results in energy use whereas using a jumper or blanket does not incur energy use. Furthermore, the repeated performance of practices across space and time causes the enabling elements to adapt (e.g. some elements are used more popularly than others), which subsequently a ects the future performance of practices and thereby energy use. A rule-based system, developed based on empirical data collected from UK households, was included in HOPES to enable households to choose elements to perform practices. The rule-based approach allowed organising the complex contextual information and socio-technical insights gathered from the empirical study in a structured way to choose the most appropriate actions when faced with incomplete and/or conflicting decisions. HOPES also includes sub-models to calculate the energy use resulting from the performance of practices, e.g. a thermal model of a house is built in to consider the outdoor temperature, the type and size of heater, and the thermostat setpoint to estimate the energy used for thermal comfort practices in each household.
. The model is used to test di erent policy and innovation scenarios to explore the impacts of the performance of practices on energy use. For example, the implementation of a time of use tari demand response scenario shows that while some demand shi ing is possible as a consequence of pricing signals, there is no significant reduction in energy use during peak periods as many households cannot put o using energy (Narasimhan et al. ). The overall motivation is that by gaining insight into the trajectories of unsustainable energy consuming practices (and underlying elements) under di erent scenarios, it might be possible to propose alternative pathways that allow more sustainable practices to take hold.

SWAP .
The SWAP model (Johnson b,a) is an agent-based model of farmers' making decisions about adopting soil and water conservation (SWC) practices on their land. Developed in NetLogo (Wilensky ), the main agents in the model are farmers, who are making decisions about whether to practice SWC or not, and extension agents who are government and non-governmental actors who encourage farmers to adopt. Farmers can also be encouraged or discouraged to change their behaviour depending on what those nearby and in their social networks are doing. The environment is a simple model of the soil quality (Figure ). The main outcomes of interest are the temporal and spatial patterns of SWC adoption. A full description can be found in Johnson ( b) and Johnson ( a). .
The SWAP model was developed: (i) as an 'interested amateur' to be used as a discussion tool to improve the quality of interaction between policy stakeholders; and (ii) as an exploration of the theory on farmer behaviour in the SWC literature. .
The model's use as an 'interested amateur' was explored with stakeholders in Ethiopia. Using a model as an interested amateur is a concept inspired by Dennett ( ). Dennett suggests experts o en talk past each other, make wrong assumptions about others' beliefs, and/or do not wish to look stupid by asking basic questions. These failings can o en mean experts err on the side of under-explaining issues, and thus fail to come to consensus or agreeable outcomes in discussion. For Dennett, an academic philosopher, the solution is to bring undergraduate students -interested amateurs -into discussions to ask the simple questions, and generally force experts to err on the side of over-explanation.

.
The SWAP model was used as an interested amateur with a di erent set of experts, policy makers and o icials in Ethiopia. This was done because policies designed to increase adoption of SWC have generally been unsuccessful due to poor calibration to farmers' needs. This is understood in the literature to be a result of poor interaction between the various stakeholders working on SWC. When used, participants recognised the value of the model and it was successful in aiding discussion. However, participants described an inability to innovate in their work, and viewed stakeholders 'lower-down' the policy spectrum as being in more need of discussion tools. A full description of this use of the model can be found in Johnson ( a).
The European Commission was expecting to spend around e billion on research and development through its Horizon programme between and . It is the successor to the previous, rather smaller programme, called Framework . When Horizon was being designed, the Commission wanted to understand how the rules for Framework could be adapted for Horizon to optimise it for current policy goals, such as increasing the involvement of small and medium enterprises (SMEs). .
An agent-based model, INFSO-SKIN, was built to evaluate possible funding policies. The model was set up to reproduce the funding rules, the funded organisations and projects, and the resulting network structures of the Framework programme. This model, extrapolated into the future without any policy changes, was then used as a benchmark for further experiments. Against this baseline scenario, several policy changes that were under consideration for the design of the Horizon programme were then tested, to understand the e ect of a range of policy options: changes to the thematic scope of the programme; the funding instruments; the overall amount of programme funding; and increasing SME participation (Ahrweiler et al. ). The results of these simulations ultimately informed the design of Horizon .

Silent Spread and Exodis-FMD .
Following the outbreak of Foot and Mouth Disease (FMD), the UK Department of the Environment, Food and Rural A airs (Defra) imposed a -day standstill period prohibiting any livestock movements o -farm following the arrival of an animal. The -day rule caused significant di iculties for farmers. The Lessons Learned Inquiry, which reported in July , recommended that the -day standstill remain in place pending a detailed cost-benefit analysis (CBA) of the standstill regime. .
Defra commissioned the CBA in September and a report was required in early in order to inform changes to the movement regime prior to the spring movements season. This timescale was challenging due to the short timescales and limited data available to inform the cost risk benefit modelling required. A top down model was therefore developed that captured only the essential elements of the decision, combining them in an influence diagram representation of the decision to be made. As wide a range of experts as possible were involved in model development, helping inform the structure of the model, its parameterisation, validation and interpretation of the results. An Agile approach was adopted with detail added to the model in a series of development cycles guided by a steering group. .
The resultant 'Silent Spread' model showed that factors such as time to detection of disease, are much more important than length of standstill in determining the size of an outbreak (Risk Solutions ). The modelling was critical to the Government's decision to relax the -day movement control to days, subject to commitments from the livestock industry. The iterative, participatory development process generated an unprecedented level of 'buy-in' to the results in an area which had previously been marked by deep controversy. .
Following this, Defra commissioned further modelling to inform the design of the FMD contingency plan to be followed in the event of an outbreak. For this application, a detailed 'bottom-up' model was needed that could reproduce the detailed mechanisms of disease spread and enable the impacts of di erent control strategies on the spread of a disease to be explored (Risk Solutions ).
. The model was implemented as an agent-based model using the Exodis™ disease modelling framework (Figure ). The framework builds a heterogeneous geo-spatial representation of the UK based on farm census data, sets-up the various FMD disease transmission mechanisms and integrates the e ects of di erent control strategies and the resources required to carry out these strategies. The agents in the model are farms. For a given set of outbreak starting conditions and for a given control strategy, the model provides detailed information on how the outbreak might evolve, calculating parameters such as the number of premises infected, the duration of the outbreak, the number of animals culled and/or vaccinated, etc. It produces distributions for each of these parameters to reflect the range of potential outcomes for any outbreak. use during exercises, and to inform decisions in the event of an actual outbreak. The model was used during the emerging outbreak of FMD in and continues to be used to test proposed changes to the control regime.

The Abstractor Behaviour Model
.
The abstraction of water from rivers and aquifers in England is controlled by a licensing regime established in the s. The UK Government wish to reform the system to one that encourages abstractors to manage water e iciently and work together to make best use of water. Water abstraction management is a classic 'wicked' problem in that it is highly resistant to resolution. Previous attempts to reform the system have failed, partly through not engaging stakeholders in the need for, and nature of, a solution. .
Assessing the costs, risks and benefits of the di erent ways of reforming the system is complex. It needs to take into account: • The interactions between a complex natural system and the abstractors (including the public water supply, power producers, farmers, and industry), • That economic, social and climate conditions will change in ways that we cannot predict, and • The complex way that the new measures will influence individual abstractor behaviours on a day-by-day, year-by-year basis.
. Agent-based modelling was ideally suited to explore how the existing and proposed reforms might operate. A multidisciplinary team worked with a wide range of experts and stakeholders to develop an agent-based economic behavioural model integrated with catchment hydrological models on a daily time-step basis (Risk Solutions ). .
The agent population consists of all of the businesses that have a licence to take water from the rivers and aquifers in a particular river basin (Figure ). The river basin is modelled in detail using a hydrological model of the rivers, aquifers, and land use with a geo-spatial resolution of km by km. Each agent makes a series of strategic and operational decisions, and the decision-making evolves with time as water demand and availability changes with economic and climate change. The policy options control water levels in the modelled rivers and aquifers using di erent mechanisms, and allow di erent types of water rights trading between agents. The successful achievement of environmental standards is monitored by regulator agents, who take action to further restrict abstraction permissions if necessary. The model was used to explore in detail how the reforms .
In the water abstraction reform work (Section . ), although the modelling did generate numbers to input to the Impact Assessment Template (absorbing a significant proportion of the modelling e ort), the greatest benefit of the work was the contribution to designing the policy, which was intimately informed by the more exploratory aspects of the modelling, including both: the discipline provided by the need to articulate the reforms in a way that could be represented in the model; and the understanding of the complex, emergent nature of the system uncovered through running multiple scenarios, sensitivity analyses and what-if scenarios.
. In the SWAP model (Section . ), the policy value lay entirely in the process of interrogating the model, and using it as a basis for discussions, sharing assumptions and building consensus. An interesting extra dimension was that critiquing design choices generated value for stakeholders. In this role the model is a boundary object (Star & Griesemer ), and 'interested amateur' (Johnson a) as described above. With stakeholders who do not regularly work together, and/or who do not have the capacity to take ownership and undertake continued use and maintenance of a model, this process-based value is even more likely to be the main benefit of the modelling process.

.
In the Tell Me model (Section . ), we find a similar message. In this example, detailed micro-validation (i.e. sense checking model rules and assumptions) and exploration of their e ects on outcomes was one of the main benefits to public health stakeholders involved in the project. The lack of data available to allow rigorous formal validation of the model meant that this was one of the most valuable aspects of the modelling exercise.
. The HOPES model (Section . ) introduced analysts concerned with developing policies to manage household energy demand to the idea of considering social practices as an alternative to assuming household energy use is determined by individual rational actors making decisions based primarily on cost. The fact that the HOPES model could generate plausible outputs using social practice theory as its foundation was probably more significant to stakeholders than the actual values it yielded.

Models need to be at an appropriate level of abstraction
. No model can fully reflect the real world: some details need to be omitted and some boundary needs to be drawn around what is to be modelled. However, it is not the case that the most detailed model is necessarily the best. On the contrary, highly-detailed models may require more data than is or could be available; can be hard to calibrate and validate; and, most importantly, can be hard to understand. Clients, modellers and stakeholders can all struggle with the idea that less can be more and get drawn into trying to model reality instead of the decision essentials. On the other hand, a model that is too simple or too abstract may be impossible to validate, because there is nothing in the model that corresponds to empirical observation, and because the behaviour of the model may bear little relationship to what happens in the world. The optimal level of abstraction will depend on the purpose of the modelling and the nature of the system being modelled (Edmonds & Moss ). One of the signs of good modelling is pitching the model at the right place in between the two extremes.
. The Silent Spread model described in Section . was a simple model developed at a high level of abstraction. The modelling was required to support a single decision question, could the livestock movement standstill period be reduced or removed? At the time Defra did not routinely collect information on the movement of animals, and so data to inform the modelling was limited. The solution was therefore to develop an abstract model, capturing only those elements essential to the decision. With more time, and a much richer fund of data, it was possible to develop a much more detailed representation of disease spread for the Exodis-FMD™ model. In this case it was also necessary to capture the dynamic interaction of the various control strategies with the spread of a disease, in order to provide a basis for testing these. .
HOPES (Section . ) started as an abstract model that served as a proof that it is possible to go beyond the conventional but limited approach of analysing energy demand in terms of rational and individual decision making to model energy consuming social practices in the household. Only once this proof of concept version had been demonstrated did the model get extended and refined to incorporate specific social practices (maintaining a comfortable environment, doing the laundry, etc.) that could be calibrated against the data collected from energy sensors installed in the sample households. .
One motive for making the HOPES model more concrete was a desire to link it to existing models of the UK energy supply system. These modelled electricity supply from electricity power stations, wind farms, etc. and the interconnecting grid and have been used to develop scenarios for informing decisions about the optimal ways of developing the whole energy system to meet low carbon targets in . However, these supply models incorporated demand functions based on rather simple household utility maximisation assumptions. The HOPES model has been used to improve this aspect of the supply models, but not without di iculty, stemming from the overall complexity of the models, the di erent disciplinary approaches (the supply models are based on optimising using linear programming techniques; HOPES is an agent-based model), and the di erent time scales of the simulations (the supply models use time steps of days or years, while HOPES has hourly time steps). This example illustrates well the fact that one needs to think carefully about the appropriate level of abstraction of models, not only in terms of their relevance for stakeholders but also to fit them properly into what can be a whole ecology of related models.

Data challenges .
Data is never perfect. Lack of, or poor quality data, frustrates the parameterisation and validation of models.
However, lack of data should never be used as an excuse not to model, or not to model an aspect of a problem that is important to the decisions to be made. Collaborative approaches, formal elicitation of expert judgement, explicit modelling of uncertainty and sensitivity analysis can all be used to address a lack of data.
. In the Tell Me example (Section . ), despite an initial belief by modellers and stakeholders that data was available, it became clear that there was no data that connected policy interventions with behavioural change and outcomes. Behavioural outcome data was at an aggregate level, meaning it is impossible to understand the individual level impacts of the intervention. Data directly connecting intervention and outcome, for each individual, is vital for choosing values for e ect size parameters in the model.

.
In this example, the lack of data should not be seen as a reason not to model. The motivations to model remain. Rather, the lack of data made explicit by the model should be used to inform future data collection. As Barbrook-Johnson et al. ( ) states, "[data] collection must go alongside continued development of theory and models of decisionmaking. Improved theories of individual decision-making and interaction will give models a stronger footing on which to base their assumptions. As data and theory improve, so too will the (prototype) models developed using that support. This could then lead to improved data collection and theory building, creating a positive feedback between the three."
Lack of data can present particular challenges to the formal validation of models, particularly in the complex, changing environments where modelling to explore how the future might unfold can be most useful. In the Tell Me example, the data on behavioural outcomes through time either did not exist, or tended to be at a relatively low resolution. This meant that there was not enough longitudinal outcome data for the model results to be compared with.
. Lack of a comprehensive dataset for validation should not be taken to imply the model cannot be validated for its particular purpose. In these circumstances, a layered approach to validation should be used: formal quality assurance processes should be applied from the outset, including the selection of the modelling approach (see for example the Aqua Book, HM Treasury , and Edmonds ), alongside formal documented verification and validation processes. Formal validation should involve subject matter experts in collaboration with model output users and modellers and should be an integral part of model development. .
Validation must ensure that the model ( ) makes technical or scientific sense ( ) can reproduce recorded reality ( ) is fit for the use it is designed for. Taylor ( ) includes a useful checklist of these and other issues to address and questions to ask when using models in decision making.

.
The Silent Spread example (Section . ) illustrates how a model can be developed and validated in the absence of much 'hard' data, through a process of scrutiny of all stages of model development and result generation by subject matter experts, modellers, users and wider stakeholders.

Model development and use needs to be Agile and collaborative .
Agile, collaborative processes ensure models remain focused on the policy need and provide for more e ective peer review and scrutiny of the modelling process. This requires a high degree of trust between commissioners and modellers from the outset. Policy makers, analysts, model output users, stakeholders, and peer reviewers should be involved, not just at the problem definition, user needs stage, but throughout to ensure that the modelling approach, model structure and level of abstraction, parameterisation, analysis and interpretation of the results remain fit for purpose and focused on need.
. At the scoping stage, there needs to be a honest discussion about the best modelling approach and whether existing models will meet needs. Great care needs to be taken when using models for applications they were not originally designed for to ensure that the underlying structure of the model is fit for purpose.
. An Agile development approach (Abrahamsson et al. ), which iteratively adds functionality and detail to the model through cycles of development, testing and scrutiny, is a good way of managing the tendency for modellers and clients alike to drive towards too great a level of detail and more realistic representations in models than is optimal.

.
Finally, modellers should be involved in helping interpret the results for decision making. It is impossible to capture in a report all the nuances of the model simplifications, data weaknesses etc. in a way that policy makers can use reliably.
. The Silent Spread work (Section . ) used a highly participatory approach leading to much improved understanding and cooperation between Defra and industry stakeholders. In contrast, the INFO-SKIN model (Section . ) was developed in response to an invitation to tender that had the e ect of distancing the stakeholders, that is, the relevant policy makers, from the modellers. The people from the European Commission (EC) who were the clients only met the modellers at the beginning, middle and at the end of the model development and were not therefore much involved in its design. Moreover, the EC personnel changed during the development and by the end there was a rather poor understanding by the clients of the purpose and capabilities of the model. A further issue was that the clients wanted the modellers to draw out specific policy recommendations from the model, while the modellers were happy to test policy options proposed by the clients but did not think it appropriate that they should be devising policies themselves. These are all symptoms of the absence of proper collaboration between the modellers and the commissioners.
The ethics of modelling .
Policy modelling requires careful consideration of a wide range of ethical issues, not least because policy models have the potential to change policy and thus directly a ect people's lives. In addition to the basic imperative to ensure that a model is fit for purpose, as discussed above, there is also a need to consider issues arising in connection with the data used to build and calibrate the model and the way in which the results of the model are presented.
. When personal data is collected, either explicitly through, for example, a survey, or implicitly, as administrative records or as the side-e ect of other activities (such as using social media or mobile phones), not only does one need to abide by data protection laws, but also need to ensure that appropriate informed consent for such use of the data has been obtained (see, for example, the ethical guidelines published by the British Sociological Association (BSA ), the Association of Internet Researchers (AOIR ) and the Association of Computing Machinery (ACM ). There remains a need to bring these guidelines together and to draw out their relevance to modelling. .
An important consideration is whether data is representative of the population being modelled. As artificial intelligence researchers have discovered to their cost, basing a model on biased data can lead to biased results and it can be hard to detect this a er the event (Knight ). This is especially a problem with 'big data', where it is easy to assume that because one has a very large volume of data, it must be representative although important but numerically small minorities may be absent. .
The results derived from models are always subject to a degree of uncertainty. However, it is easy for modellers and especially the users of models to downplay, intentionally (because they do not believe it will be well received) or unintentionally (expert bias), the degree of uncertainty present, and the implications of that uncertainty for making policy decisions. Users may also put pressure on modellers to downplay uncertainty. Modellers should be clear and confident in their communication of uncertainty but also informative. The user needs to understand what the uncertainty means in terms of the decisions or communications they need to make. This is made more problematic if the model is complex and presented to users as a 'black box' that generates results without users being able to investigate for themselves the logic and the assumptions lying behind those results. This is another reason for encouraging collaboration between users and modellers: users can follow the model development process and may then get at least a glimpse of its workings and the assumptions being made; modellers can better understand the context and ensure that the results are presented in a form that is useful.
. In the Silent Spread example, decision information was needed quickly, when there was little data available to inform modelling. As wide a range of stakeholders, experts and o icials as possible was actively involved in designing, populating and testing the model. Working groups met regularly at every stage of the modelling process. Once results began to emerge the group helped to interrogate and interpret the results, suggesting a range of ways of refining the modelling to test new hypotheses suggested by the outputs. A variety of di erent ways of presenting the uncertainty in the results was used, in particular the level of residual risk associated with each of the policy options under consideration was clearly illustrated allowing decision makers to take this into account in reaching their decision. The process produced unparalleled acceptance of the final conclusions for policy with the model being described by one expert as the "collective brain of the group".
Communicating the modelling process, structure and results needs careful planning .
Communication is necessary to clearly explain results, and their limitations, ensure that the outputs are used appropriately, and build confidence in the modelling process and outputs. It is the nature of model outputs, consisting of numbers and charts, to appear more certain than they are, and this can mean that the boundary between data and assumption is overlooked. Poor past experience can lead to distrust of modelling. Active collaboration builds confidence in and champions for the work, but it is not possible to involve everyone. Changes of personnel, both in the modelling team and in the policy client can also lead to problems. In complex modelling environments, it is easy to underestimate the communication challenges. .
In the Silent Spread work (Section . ) the modellers had to work hard to break distrust of modelling brought about by issues surrounding the use of predictive models to support the pre-emptive, contiguous cull during the outbreak of disease. While at first it was hard to get stakeholders with entrenched and o en opposed positions about what the 'right' answer was to talk constructively, the model gave them a neutral space to share di erent perspectives and test the results of these.

.
In the SWAP model example (Section . ), trust was less problematic. Rather it was the communication of the model design, and the conclusions the model (and modeller) could make, which needed to be communicated to stakeholders that were not familiar with formal computational modelling approaches. This led to an accessible form of communicating the model being designed. This still needed to provide the detail of the model assumptions and rules so that it could be used as the basis of discussion. To do this, a combination of pseudo code, simplified (and jargonless) Unified Modelling Language diagrams, and projector presentations of results and the model running were used. The emphasis was placed strongly on the assumptions and step by step rules of the model, rather than the results.

Models need to be maintained .
If models can continue to have a role in policy monitoring, development and evaluation a er their initial results, they deliver better value, but ensuring that models are properly maintained is di icult within government procurement processes and structures. Plans for maintenance of the model should be discussed at the start of the modelling project. .
Open source models are attractive because communities can continue to maintain and scrutinise them, but this is not always an option for policy models which must continue to represent complex policy accurately over time, accounting for changes in policy and the policy environment. Decision makers need to have confidence that this will happen and are unlikely to be prepared to rely on voluntary e orts. Moreover, many policy models need to use confidential data, which cannot be made open source at the necessary level of disaggregation. .
Of the models described in Section , only Exodis (Section . ) is currently being maintained. Securing longterm maintenance arrangements is thus a challenge that is so far rarely met properly.

Conclusions
. The technology required for modelling complex domains is in place and increasingly easy to use. However, for policy modelling to achieve its full potential, there needs to be more attention paid to the processes of model development and use. As we have illustrated in this paper, there are many pitfalls along the way in making policy models e ective and used. Much of this is 'cra knowledge', gained from experience and from making mistakes, which is why we have described key lessons that we have learned from our own varied experience. Nevertheless, where the costs or risks associated with a policy change are high, and the context is complex, it is not only common sense to use policy modelling to inform decision making, but it would be unethical not to. .
The most important requirement in our view for successful policy modelling is to encourage communication and collaboration among those involved: the modellers themselves, the clients and stakeholders, the suppliers of data, the users of the model outputs and so on. Academia still has a tendency to work within an ivory tower, making results, and models, available to users only once they have been fully developed and a er the work has been published in the research literature. While this approach may work for some formal modelling, it almost certainly will not yield useful policy models that are actually used by decision makers. Instead, as we have emphasised above, policy modelling needs to be collaborative, iterative and Agile. Such an approach has many benefits. Firstly, it provides a sense of ownership of the model and encourages commitment from users about what they may come to see as 'their' model, rather than some black box that someone else is imposing on them. Secondly, collaboration helps to prevent modellers making naÃŕve assumptions about the target domain, which is easy to do if one is not a domain expert. Thus, through collaboration, the modellers are educated about the complexities of the world they are trying to represent, but equally, the users are educated about the capabilities and limitations of the model that they are helping to develop. Thirdly, active engagement of stakeholders can help parameterise and sense check models, even where 'hard' data is sparse. Lack of data should never be used as an excuse not to model, but the approach needs moderating, an iterative, participative approach to modelling allows data needs to be identified and ways of addressing these developed.
. Such a collaborative style of working may be foreign to many government agencies and can involve delicate negotiations about confidentiality, privacy and access to data. However, there does seem to be an inexorable trend towards the greater use of simulation, machine learning and artificial intelligence to aid decision making in government and business, so the culture may have to change to permit and even encourage a more collaborative, Agile modelling approach. When it does, policy modelling will truly have come of age.

Notes
To help address these issues it is useful for modellers to consider and use the wealth of research on the role of research in the policy process, the science-policy interface and research utilisation, and evidence-based policy. It is not the purpose of this paper to discuss this research, readers are referred to the following sources: • On the science-policy interface researchers have considered how the two communities of 'policy makers' and 'researchers' interact. Historically, the divide has been seen as clear (Weiss ; Caplan et al. ; Caplan ), but more recent work explores the continuous interaction and movement between the communities (e.g. Cash et al. ; Clark & Holmes ).
• On how research is actually used, there have been many conceptualisations and overviews (e.g. Jäger ; Weible ). The most well-known is Weiss ( ) which outlines how research can be used as evidence; a problem-solving tool; one source of information among many; justification for already-made decisions; a tool to delay di icult or sensitive decisions (i.e. 'we need to do more research on this', 'kick it into the long grass'); a source of general enlightenment; and finally, one of many pursuits of society (alongside policy, art, media, law etc.) which all influence each other. A lesson from much of this work is that it is o en di icult or impossible to foresee how a model may be used, and this has implications for how the model is designed and maintained, a point we return to in Section .
• On evidence-based policy, Cairney ( ) is an excellent starting point.