‘One Size Does Not Fit All’: A Roadmap of Purpose-Driven Mixed-Method Pathways for Sensitivity Analysis of Agent-Based Models

: Designing, implementing, and applying agent-based models (ABMs) requires a structured approach, part of which is a comprehensive analysis of the output to input variability in the form of uncertainty and sensitivity analysis (SA). The objective of this paper is to assist in choosing, for a given ABM, the most appropriate methods of SA. We argue that no single SA method fits all ABMs and that different methods of SA should be used based on the overarching purpose of the model. For example, abstract exploratory models that focus on deeper understanding of the target system and its properties are fed with only the most critical data representing patterns or stylized facts. For them, simple SA methods may be sufficient in capturing the dependencies between the output-input spaces. In contrast, applied models used in scenario and policy-analysis are usually more complex and data-rich because a higher level of realism is required. Here the choice of a more sophisticated SA may be critical in establishing the robustness of the results before the model (or its results) can be passed on to end-users. Accordingly, we present a roadmap that guides ABM developers through the process of performing SA that best fits the purpose of their ABM. This roadmap covers a wide range of ABM applications and advocates for the routine emerging in recent years: a) handling temporal and spatial outputs, b) using the whole output distribution of a result rather than its variance, c) looking at topological relationships between input data points rather than their values, and d) looking into the ABM black box âĂŞ– finding behavioral primitives and using them to study complex system characteristics like regime shifts, tipping points, and condensation versus dissipation of collective system behavior.

ABMs allow for an explicit representation of entities (including cells, organisms, humans, and even organizations) and their relations, distributed in space, and characterized by complex heterogeneous behavior (Parker et al. ). They are o en constrained by higher hierarchical levels of the systems (O'Sullivan & Haklay ), which, in turn, emerge from the behavior and interaction of those agents. Through individual and collective decisions, these entities a ect and are a ected by their environment. ABMs are both complex (e.g., di erential model behavior) and complicated (e.g., di iculties in choosing the adequate model structure) (Sun et al. ). Designing, implementing, evaluating, and applying an ABM requires a structured approach, part of which is a comprehensive analysis of output to input variability. Of special importance is the identification of which model inputs, and to what extent, a ect the variability of model outputs. This element of the ABM development cycle is called sensitivity analysis (SA) (Saltelli et al. ). .
Formally, we define SA as the evaluation of the influence of variable model inputs on the variability of a specific model outcome. SA serves two main purposes: ( ) it informs about the robustness of findings gained with a model (an overly sensitive model would not be very useful because we will never know all inputs with absolute certainty); and ( ) it informs about the relative sensitivity of inputs and hence the processes which are modulated by these inputs. Thus, SA helps to tell the important from the less important processes and thereby facilitates detecting and understanding causality. SA is preceded by uncertainty analysis that provides output distributions, which are the consequence of input distributions (Saltelli et al. ).
. SA is not constrained to ABMs. It is an element of model evaluation that dates back to operations research and decision making (Alexander ; French ; Pannell ; Wolters & Mareschal ). Amongst other applications, it has been widely used in environmental modeling and engineering (Anderson et al. ; Baroni & Tarantola ; Branger et al. ; Cosenza et al. ; Vanuytrecht et al. ; Zhan & Zhang ), in analyzing social systems (Chattoe et al. ), in disease transmission modeling (Spicknall et al. ), and in ecological and ecosystem models e.g., (Borgonovo et al. ; Chu-Agor et al. ; Ciric et al. ; Makler-Pick et al. ; Perz et al. ). .
Given the complexity of the models emulating the real world, it is important to conduct SA in a systematic way.
Saltelli & Annoni ( ) and Saltelli et al. ( ) warn against a perfunctory approach to SA, where SA is done in an ad hoc manner -without considering the specificity of a given method and its applicability to a given problem. Therefore, the recent proliferation of reviews on the di erent methods of SA is not surprising (Ferretti et al. ; Iooss & LemaÃőtre ; Lilburne & Tarantola ; Norton ; Pianosi et al. ; Saltelli et al. ). Several papers reviewing SA in ABM have also been published (Lee et al. ; ten Broeke et al. ; Thiele et al. ).
. The objective of this paper is to assist modelers in choosing the most suitable SA methods for a specific ABM application. To provide this assistance, we review a range of ABMs: from simple (data-poor) abstract models with a single scalar variable, through empirically-rich ABMs with a vast range of output variables, to high-complexity models that produce multidimensional outputs like maps. We advocate a mixed-method approach driven by the purpose of the model. We demonstrate that many ABMs require a comprehensive model evaluation, with multiple SA methods employed.
. We start by identifying two overarching purposes of ABM development (Section ). We reflect upon the place and role of SA in ABM development (Section ), and briefly summarize the methods of SA that have been used in ABMs, focusing on their advantages and disadvantages (Section ). We then present a roadmap to purposedriven mixed-method SA (Section ). The roadmap identifies pathways for performing SA by combining multiple methods that complement each other. Each pathway is then explained by describing the corresponding best SA practices (Section ), and substantiated with application examples (Section ). In the concluding Section , we argue that none of the methods identified in the literature is the panacea for performing SA in ABM. As presented in the roadmap, the choice of the SA method should be driven by the purpose of the model and the nature of the target system.

Defining ABM Based on Its Purpose
. ). Sensitivity analysis is just one of the many elements of model evaluation, where developers can increase or lose credibility (Grimm et al. ). .
The development of a specific ABM is dependent on its purpose (Parker et al. ). We distinguish between applied (empirical) models that are developed to support solving practical problems in the real world, for example, the design of energy grids, and abstract models that have a more theoretical focus and aim primarily at exploring ideas and concepts, for example, opinion dynamics ( Figure ). ). In this paper, we use a generalized version with two classes: abstract and applied. .
Abstract theoretical models aim at defining new relationships and discovering the fundamental principles of phenomena. The departure point and the center of interest is model formulation, which implies that the criteria for matching data and observations are much less restrictive than for applied (empirical) models. Sometimes these models are utilized without data and are used to demonstrate a concept or general principle driving a complex system. Developers of theoretical models are less interested in the structure of the observed data than the parsimonious construction of the model (Parker et al. ). âĂŸThe model comes first, then comes the dataâĂŹ (Du & Ligmann-Zielinska ). The ultimate goal is a deeper understanding of the target system and the processes operating within it (Robinson ).
. Application-driven ABMs aim at explaining the specifics of the target system. Models are âĂŸdesigned to closely match the details of a particular case studyâĂŹ p. (Parker et al. ). With application-driven ABMs, key questions associated with their evaluation include: why should we trust the model (Grimm et al. )? Can inferences from the model be transferred to the real world? Are the proposed policies feasible? For these types of models, data-matching becomes the center of attention (Du & Ligmann-Zielinska ). Frequently, application-driven ABMs become empirically-rich and are used in scenario analysis. The ultimate outcome of the simulation study is generative âĂŞwe want to derive findings that can be implemented in the form of system or policy changes (Robinson ).

The Roles of Sensitivity Analysis
. To determine how ABM developers perceive SA, the authors of this manuscript were asked to formulate a definition of SA. Based on this small sample, ABM researchers perceive SA as a way of exposing model drivers ('changes', 'processes', 'dynamics', 'shi ', 'behavior') ( Figure ). While not fully representative, this sample of responses points to the importance of analyzing not only model output sensitivity to input stochasticity but also the processes within the ABMs. This is not surprising because ABMs' non-linear dynamics are not attributed to the implementation of a single process. Instead, modeling complexity can arise from interactions among multiple, o en simple, discrete micro-processes triggered by agents. Such confounded dynamics become a black-box for developers. SA provides tools for getting insight into model dynamics at the micro-level and an improved understanding of how system properties emerge at the macro-level (Grimm et al. ), but the information extracted from the evaluation exercise depends on what tools are employed in the assessment. SA of ABM provides a means of model exploration. It addresses the question of which inputs, and to what extent, drive the change in model outputs. Here, SA is used as a form of design of experiments with the goal of improving our understanding of the dynamics of the modeled system by identifying inputs, and hence their corresponding processes, that have the biggest impact on the changes in the results (the drivers of change). Developers may focus on identifying the driving inputs, exploring low probability high consequence results, and quantifying model nonlinearity and input interactions.
. SA is also used to test ABM credibility. If the model is applied for scenario analysis, SA can give us an indication of the impact of our decisions if the external environment changes. If results prove to be insensitive to broad changes in the inputs, we have more confidence that the model outcomes are likely to occur. A stronger likelihood of a known policy outcome occurring as implemented in a scenario analysis can add decision-making capacity to policymakers. The ultimate goal is to find limits within which the model output makes sense, evaluate model reliability, investigate the robustness of conclusions, and improve transparency (Saltelli et al. ).

Sensitivity Analysis Methods Used in ABM
. In preparation for setting up a roadmap for purpose-driven mixed-method SA, we first discuss the SA methods identified in the ABM literature. We focus on method benefits and shortcomings. We do not aim at providing alternative implementations of the methods. For further information on these topics, we direct the reader to the practical manual by Thiele et al. ( ), and references therein.
. SA has been categorized into local and global (Saltelli et al. ). Local SA (LSA) probes the immediate space of sample input values and evaluates how a relatively small change in input changes the output while holding all other inputs constant. Contrarily, global SA (GSA) is a group of methods that simultaneously probe the whole input space and analyze outcome variability both due to single inputs (first-order) and input interactions (second and higher orders).

Selected local methods .
One-at-a-time (OAT) is the simplest form of SA to understand and implement since only one input at a time is evaluated. The assumption is that, if the results change greatly with slight variations in the value of an input, more e ort should be invested in obtaining more accurate estimates of the value of that input. OAT has also been referred to as 'sensitivity experiments' (Railsback & Grimm ). OAT has been deemed as suitable for exploring mechanistic explanations of single input e ects on outcomes (ten Broeke et al. ) and used in ; Sun et al. ).
Visual depiction of output-input dependencies, e.g., scatterplots, contribution to sample mean/variance.
Fast and easy to interpret. Similar limitations as other LSA methods in that each input is separately plotted against the output.
GSA explores how deviations in inputs, evaluated simultaneously across their whole spread, a ect the variability of the results.
Screening (Elementary E ects Method) (AyllÃşn et al. ; Radchuk et al. ) Rank-orders inputs based on their importance and therefore weeds out the non-influential inputs.
Model-independent with low computational cost, easy to understand, provides succinct sensitivity measures per input (σ, µ*), e icient when dealing with a large number of inputs (even s), identifies inputs that participate in interactions.
Uses crude sampling i.e., a relatively low number of sample points with respect to the number of inputs. The sensitivity measures are insu icient when comparing input importance in relation to each other, inappropriate when the output is not normally distributed.

Regression
(Filatova et al. ; Huang et al. ; Lee et al. ; Sun et al. ) Fits a regression function of inputs to a given output. The regression coe icients become sensitivity measures of inputs.
Regression coe icients provide information about the magnitude and the direction of input influence. Relatively easy to understand.
Assumes that the ABM is additive and monotonic, handles a low number of inputs, and is limited when evaluating the interactions between inputs. Uses a relatively simple mathematical function to approximate the relationships between inputs and outputs obtained by running the original model. The metamodel is then used to identify the influential inputs.

Metamodeling
Lower computational cost than in the case of the original model -simpler model requires a smaller number of executions. Model complexity is captured using simple mathematical expressions.
Handles a relatively low number of inputs, sensitivity measures from emulators may be harder to interpret than indices calculated directly on ABM outputs. ; Zhang et al. ) Decomposes output variance and assigns the partial variances to individual inputs and their interactions, produces two sensitivity indices per input -first order and total e ect, which allow for a complete evaluation of a given input in a nonlinear, non-additive model.

Model independent, produces succinct measures of input influence.
Handles a relatively low number of inputs, and requires a large number of ABM executions, inputs are assumed to be independent, less reliable when the output is not normally distributed. Unlike the regression coe icients, variance-based sensitivity measures do not provide information on the directionality of input influence.
Density-Based SA (Magliocca et al. ) Uses the entire probability distribution of a given output to identify influential inputs. Sensitivity is computed using di erences in the entire model output between model executions.
Model-independent, does not assume output normality, moment independent (the whole output distribution captures output variability better than specific moments like variance). Allows for exploration of outliers.
Handles a relatively low number of inputs, still in its infancy -the specific metrics are not wellestablished in the literature (Puy et al. ).
Unlike all other methods presented here that evaluate sensitivities based on changes in values, TOSA uses changes in topological space (structure) of the input-output datasets to quantify sensitivities.
Model-independent, does not rely on statistical tests and their assumptions (normality, input independence), useful in extreme scenario analysis, complementary to other GSA.
Handles a relatively low number of inputs, still in its infancy -the specific metrics are not wellestablished in the literature, has a high computational cost.
Uses variance-based SA to compute sensitivity indices for every time step of model execution (sensitivity indices are calculated at discrete intervals rather than the final snapshot).
Enables the analyst to understand how a particular input a ects model dynamics within model runs.
The same as variance-based SA.
Has the same limitations as variance-based SA.
Uses variance-based SA to compute sensitivity indices for spatiallyexplicit outputs, producing sensitivity maps. Renders patches of the influence of a given input on the areas with high output variance.
The same as variance-based SA. In addition, it produces multiple sensitivity maps that must be analyzed concurrently.
Has the same limitations as variance-based SA.
Statistically based variance measure of the common patterns of agent behaviors. Translates agent trajectories to a state-space Markov model.
Tunable, cross-scale analysis of agent behaviors. Identifies changes to both low and high-frequency behaviors. Model independent.
GOBS needs a large number of trajectories from the ABM execution for the statistics to work. Limited to the AMBs with agents that continuously move on the simulation space and log their trajectories. situations where the sensitivity of a model for a very specific input set is explored. OAT is used when ABM developers and decision-makers need to get an idea of how much model output would change when considering small deviations of the tested input from the initial value, e.g., plus/minus %.

.
An extension to OAT, known as one-factor-at-a-time (OFAT) (ten Broeke et al. ), uses stochastic modifications to inputs rather than small variations, to produce a graphical result depicting output against inputs. The OFAT procedure involves uniformly sampling N times throughout input KâĂŹs distribution and executing the model N times while holding all other inputs constant. The procedure is then repeated with another input while holding all others constant. OFAT SA results are presented as multiple plots, one for each variation of input on a selected output, which, for a large number of inputs, can be di icult to compare. According to ten Broeke et al. ( ), OFAT allows to identify inputs that generate tipping points, system collapse, path dependence, and nonlinear responses. .
The major limitation of OFAT, which is also a limitation of OAT, is that changes in model behavior are assessed solely for individual input variables. OFAT fails to identify cases when a combination of multiple inputs analyzed causes drastic changes in model outputs. Due in part to these limitations, OAT (and OFAT) cover a minuscule portion of the input space and should not be used to understand the all-encompassing dynamics of the target system. For example, when dealing with ten input variables, OFAT is only able to evaluate . percent of the input space (Saltelli et al. ). Therefore, SA results using only single input approaches, like OAT and OFAT, may generate erroneous inferences about the relative importance of model sensitivity to particular inputs.

Screening .
The screening method (aka Morris or elementary e ects (Morris ) classifies inputs into [ ] those that are negligible, [ ] those that are additive and behave linearly, and [ ] those that are non-linear or involved in input interactions (Saltelli et al. ). It also allows for probing the whole input space (albeit rudimentarily) and is ideal for models with a large number of inputs. Because it does not require a large number of runs to gain insight into input sensitivity, it is computationally cheaper than other GSA methods. The general technique is to run randomized individual OATs. Screening produces two sensitivity measures (µ * , σ). The absolute mean (µ * ) is a measure of the overall impact of the input on the model output, whereas the standard deviation (σ), is a measure of the higher-order e ects of the model input, i.e., non-linear e ects due to interactions with other inputs (Campolongo et al. ). The ultimate result is a ranking of inputs (you screen the inputs to identify the three mentioned above groups). However, screening statistics are not su icient to quantitatively evaluate the significance of inputs in relation to each other (Campolongo et al. ). Specifically, screening does not pinpoint how the interactions among inputs a ect the model output because the higher-order e ects are lumped into a single metric σ. Additionally, the mean and standard deviation parametric descriptors are unreliable when output is non-normally distributed. In practice, screening can be used to identify the most sensitive inputs, which are then used in more sophisticated GSA methods.

Metamodeling and regression
. Metamodeling (aka emulators) describes a set of methods that try to explain the variations in the model output in response to the changes in the model inputs by fitting the outputs and inputs of that model into a (simpler) mathematical function (Kleijnen ). It is assumed that the relationship between output and input can be described in mathematical terms that are relatively easy to conduct and interpret. Emulators come in the form of both simple and complex formulations (Fonoberova et al. ), with the most basic form of a standardized linear regression (SLR). The regression coe icients become the sensitivity measures of model inputs. They provide both the magnitude and the directionality of the influence of a given input on the dependent variable (ABM output) (Saltelli et al. ). Two questions are addressed: to what extent does input K singly drive the variability of outcome M? and Is this dependence positive or negative? Using regression-based SA makes the outcome easy to understand, especially for people familiar with statistics. More complex emulators (i.e., metamodels that use complex equations, a set of equations, or artificial intelligence methods like machine learning -see, for example, Lamperti et al. ( ) allow for a detailed evaluation of input interactions, which are more appropriate for highly nonlinear output spaces (Lamperti et al.
). However, due to the more complex formulation, metamodels can also be di icult to interpret when applied to models representing multi-scale interactions common in complex systems (Mertens et al. ). Metamodels may fail to capture all endogenous interactions such as simultaneous casualties and oscillations, leading to incomplete information concerning the behavior of the system (Hassan et al.

Variance-based methods .
Variance-based SA (aka an ANOVA-based approach) decomposes the variance of model output and apportions the partial variances to inputs treated singly and in combination with an increasing level of dimensionality (Saltelli et al. ). The approach produces two measures: a first-order index (S) computes output sensitivity to independent (decoupled) inputs, and a total-e ects index (ST) that encompasses the overall input influence -both individual and interactions with all other inputs. The most important advantage of variance-based SA over other approaches is that it is model-independent, i.e., does not require linear formulation. In this way, it can deal with the nonlinear relationships between inputs and outputs of a model, and, at the same time, can be used to fully consider the interactions among model inputs. .
The major disadvantages of variance-based SA are the computational cost, the assumption that inputs are independent, and the assumption that output distribution is normal. The computational cost can be reduced by employing a systematic method for sampling all possible combinations of inputs (e.g., quasi-random sampling (Sobol' )), grouping inputs (Ligmann-Zielinska ), or using high-performance computing (Tang et al. ). Assessing the independence of inputs and the distribution of outputs can be done using standard statistical methods (e.g., correlation analysis and Shapiro-Wilk's, respectively).

.
The value of N-model runs is established by trial-and-error. There is no cut-o value to estimate the indices. A good indication is the stabilization of outputs (Lorscheid et al. ; ten Broeke et al. ). Thus the (S, ST) pairs should be reported with confidence intervals (ten Broeke et al. ).

Density-based approaches .
Non-normal outcome distributions are commonly observed in complex systems due to a wide range of nonlinear processes, such as economies of scale or path-dependence of development location due to infrastructure provision (Manson ). In such cases, assumptions about the normality of model output are violated, which is a common characteristic of models of complex systems. Density-based methods for GSA use âĂŸmoment-independentâĂŹ measures of output sensitivity (Borgonovo ; Borgonovo et al. ). Sensitivity is characterized by di erences in the entire model output distributions between model executions, rather than specific moments of output distributions like variance. Pianosi & Wagener ( ) developed a density-based sensitivity index, called PAWN, which uses di erences in cumulative density functions (CDFs) of model output to quantify model sensitivities.
. Magliocca et al. ( ) applied the density-based approach in tandem with variance-based GSA to characterize sensitivities in housing and land markets to coastal storms of varying frequency. For example, in interactions between consumer preferences for coastal amenities and housing stock composition, variance-based GSA established amenity preference as a significant contributor to outcome sensitivity, but density-based GSA was able to discern that high and low values were solely responsible for variation in housing stock composition.

.
An advantage of the density-based approach over the variance-based methods is that model sensitivities can be investigated within particular parts of the output distribution, including outliers that may represent low probability high impact events. Also, density-based GSA can indicate the direction of model output sensitivities with input variations.

Roadmap to Sensitivity Analysis Method Selection
. Lorscheid et al. ( ) urge the modeling community to pay more attention to the design of computational experiments. Proper design optimizes model execution and adds transparency to ABM output synthesis. Just like there are protocols for experimentation in the physical lab, there are protocols to follow for ABM experimentation and reporting (Grimm et al. , ). SA should be a mandatory part of that experimentation.
. ), among others, suggest a mixed-method approach -starting from a rudimentary (quick-and-dirty) method like screening and then progressively moving to large-sample but also more comprehensive methods, e.g., variance-based or density-based approaches. We argue that the purpose of the model should drive the successive selection of methods. We provide a mixed-method SA in the form of paths consolidated into a roadmap (Figure ).

.
In the next section, we describe the paths in more detail. Following these descriptions are examples demonstrating the use of these paths.

Sensitivity Analysis Path Selection
. We identified the paths by balancing cost (time, access to high-performance computing) with the extent of SA (rudimentary SA, first-order calculation, higher-order e ects evaluation, etc.). As mentioned by Saltelli et al. ( ); Saltelli & D'Hombres ( ), poorly performed SA does not help in model evaluation, may lead to false conclusions, and, in the end, becomes a redundant (or even misleading) exercise. We aimed to identify the minimum number of methods required to comprehensively study ABM sensitivity given the modelâĂŹs purpose, while, at the same time, address the strong points and deficiencies of individual methods. In the following, we describe the paths in more detail.

Abstraction .
The Sea Green Path (Figure , le ) pertains to an ABM built for the purpose of abstraction (exploration). The focus is on delving into the relationships between model structure and behavior and evaluating model parsimony. The developers should start from sensitivity experiments that explore the main (first order) e ects to test which inputs singly contribute to the variability of a particular output. An example method to use is calculating correlation coe icients between a given input and the selected output or their visual analysis through scatterplots. The main e ect analysis is then followed by tests for the existence of input interactions and, if the interactions are present, their evaluation. Developers are encouraged to use graphical tools generated from OFAT, as demonstrated in ten Broeke et al. ( ), to gain a quick-and-dirty understanding of model behavior. OFAT can be employed to study the variations in inputs that were a priori identified as potentially influential, or inputs that the model developer is particularly interested in. .
It is prudent to assume that ABMs are imbued with input interactions that need to be studied. We argue that to truly understand the behavior of a given ABM (feedbacks, path dependence, emergence), the main-e ects analysis needs to be followed by tests for the existence of interactions (i.e., whether considerable interactions are present in the model). The crudest method here is a standardized linear regression (SLR). SLR is computationally e icient -it does not require a large number of model runs. It is also easy to communicate. Extending the main e ects analysis with SLR requires little additional work, and the foreboding limitations of OFAT can be resolved.
. A final step is to evaluate the nature and magnitude of interactions. A number of GSA can be employed -from variance-based, through emulators, multi-dimensional plots, to density-based approaches.

Application .
In application-based approaches, the purpose of the ABM is to produce a reliable model that can be used to propose courses of action for a specific real-world problem. To build trust in a model and develop its capability to address a practical problem, it needs to be empirically-rich, hence the focus on data. Given these objectives, we propose a di erent SA pathway (Figure , Yellow Path âĂŞ center). This path involves a reduction of dimensionality (if needed), followed by a joint analysis of the individual and cumulative contribution of stochastic inputs on the variability of outputs.
The assumption here is that the ABM is of high-complexity where interactions are imminent and should be studied. .
If a model contains a large number of inputs, a reduction of dimensionality can be done to determine which inputs have no e ect on output variability and then those inputs may be set to constant values. The screening method is an e icient approach to complete this task (Saltelli et al. ). Furthermore, models with a smaller number of stochastic inputs are easier to communicate to the decision-makers and the public. A disadvantage of dimensionality reduction is that it depends on the output used to evaluate the output-input dependencies (Kang & Aldstadt ; Ligmann-Zielinska ). As Ligmann-Zielinska ( ) demonstrated, the number of inputs that can be set to constant values goes down with the number of outputs considered in SA. Input k may considerably a ect the variability of output A but not output B. If both outputs are equally important to address the research problem, then input k should be le unchanged. The surrogate model should demonstrate roughly the same characteristics as the original model (e.g., the shape of output distribution, summary statistics).
. A er the reduction of dimensionality, we propose to move directly to evaluating interactions using GSA approaches that result in numerical sensitivity metrics. The goal is not as much about gaining insight into the model, as the SA results are o en hard to interpret, but rather its reliability for scenario analysis and real-world decision making. In this case, we ideally want to end up with an ABM that increases modeler (and the public) trust in model outcomes. The direct use of global methods that quantify both single and combined sensitivities (like the (S, ST) indices) provide tools that can be used to assist in negotiation between stakeholders that have di erent objectives.
. As previously mentioned, the âĂŸevaluation of interactionsâĂŹ methods in Yellow Path are computationally expensive. However, since the modelers jump straight into the comprehensive SA, the overall time spent on investigating the SA results can become relatively short, compared to the multi-level and longer Sea Green Path.

Modeling complexity .
To truly embrace the capability of ABMs, we should strive to gain an understanding of the inner processes during model execution as well as model behavior across a spectrum of output variability. This pertains to both abstract and applied ABMs (Figure ). We named this purpose âĂŸmodeling complexityâĂŹ and depicted it as a separate modeling purpose (Figure , Brown Path âĂŞ right). Below we describe advances to current standards of SA that aim at improving our understanding and ability to quantify the inner workings of ABMs. We focus on four aspects of modeling complexity using an ABM approach: the specifics of both spatial and temporal outputs, the structural (topological) relationships between input and output variables, and the causal relationships within the ABM itself that can be traced back to individual agents.
Dealing with spatial outputs .
In practical terms, the purpose of the model is reflected in its results. Therefore, the type of model output should also dictate the choice of the SA method. If a model operates across a geographic area (or a two/threedimensional space), we may also require spatially-explicit approaches to SA. Many ABMs produce spatiallyexplicit outputs like maps of land-use change e.g., Robinson et al. ( ), biodiversity loss due to human activity, or population vulnerability to hunger. As with any other result of the simulation, output maps come in di erent realizations depending on input values. Assuming that a systematic investigation of the input space has been performed, these realizations constitute a two-dimensional distribution of the possible output space (the x and y coordinates). Unless lumped into an aggregate statistic (Ligmann-Zielinska , ), such outputs produce a major interpretative challenge. For example, a raster map of the size × amounts to , (spatially auto-correlated) output variables to explain, where each cell constitutes one variable. Ligmann-Zielinska ( ) proposed variance-based SA as a method to identify the influential inputs of maps produced from an ABM of residential development. The method independently calculates sensitivity indices for every spatial unit in model output (e.g. a polygon for vector data or a cell in a grid). Example results are sensitivity maps as shown in Figure  . .
The uniqueness of the method is that it renders patches of the influence of a given input on the areas with high output variance. A unique challenge is that it produces multiple sensitivity maps that must be analyzed concurrently (Figure ). The number of sensitivity maps depends on the number of model inputs. K inputs produce K+ maps -K first order maps, K total e ect maps, and one interactions map. Even with a relatively low number of inputs, the interpretation of spatially-dependent sensitivities may be di icult. To reduce this analytical challenge, the K sensitivity maps can be synthesized into one dominant sensitivity index map (Ligmann-Zielinska & Jankowski ), which partitions the space into regions represented by inputs that have the highest index value at a given location (i.e., an input that âĂŸdominatesâĂŹ other inputs - Figure   Most GSA applications use âĂŸfinal snapshotâĂŹ analyses of model outcomes, which assume that input influence is stable or changing at consistent rates throughout model execution. If this is not the case, which is typical for emergent behaviors, the final state of model outcomes does not provide insight into the relative and varying importance of particular inputs throughout model execution (Ligmann-Zielinska & Sun ; Richiardi et al. ). Time-varying GSA, specifically variance-based GSA, shi s the focus from how variability in model inputs influence model outcome patterns, to how input variability influences the transition of one variable or system state to another. This is of particular concern since ABM is a process-based simulation technique (Bone et al. ). Time-varying GSA is essentially an application of conventional GSA calculated at discrete intervals during model execution, which enables the analyst to understand how a particular input a ects model dynamics within as well as across model runs.
Dealing with changes in data structure .
Topology Oriented Sensitivity Analysis (TOSA) is a network (structure)-based SA method utilizing the geometric and spatial properties of input data. Developed by Du & Ligmann-Zielinska ( ); Du ( ), TOSA is novel in that it is used to quantify the topological di erences between model input data and output data as a holistic indicator of sensitivity. .
Most SA approaches (Section ) quantify sensitivities based on the change in values. However, datasets with identical statistical features (e.g., variance) may be di erently spaced, resulting in diverse topological structures. As a result, two inputs of identical influence on output variability, measured using variance, may have a di erent e ect when evaluated within the topological space (Figure ). Suppose that Dataset A and B are two di erent outcome spaces and when a value-based SA is applied both datasets end up with the same variances. Consequently, the importance of each dataset in terms of driving output variability will be equal. But, within the topological space, Dataset A has a more random pattern, and Dataset B has a more uniform pattern. TOSA can expose these hidden di erences by capturing the relative âĂŸmovementâĂŹ of data points between the pre-model and post-model space. TOSA allows for grouping outputs based on their relative locations in the multidimensional output space, identifying, for example, which input datasets significantly a ect specific clustering of outputs. TOSA defines sensitivities of a particular input when this input is added to the model. When the data space demonstrates more volatility a er the input is added, it suggests a high level of model sensitivity to this input. For a value-driven SA, we observe how much variance in the output is reduced by removing an input, while for TOSA, we observe how much output-input topological change is observed by removing an input. TOSA provides a di erent yet complementary approach to interpret model sensitivity.

Modeling complexity .
A critical reason for the development and use of ABMs is that they are able to represent the characteristics and heterogeneity of real-world actors as well as their behaviors and interactions that give rise to system-wide, emergent, self-organizing patterns characteristic of the real-world phenomena (Crooks et al. ; Filatova et al. ; Rounsevell et al. ). Because of these inherent properties, the relationship between initial conditions (i.e., input values) and system/model paths and outcomes is unknown a priori. Thus, understanding the full range of model behavior requires an investigation of agent-to-agent and agent-to-landscape behaviors across multiple scales and locals of interactions.

.
Cenek & Dahl ( a,b) introduced the Geometry of Behavioral Spaces (GOBS) as a framework that quantifies the nature of agent behaviors during model execution. This method analyzes the modelâĂŹs emergent behavior from the agentâĂŹs spatio-temporal trajectories that produce a system-and simulation-wide probabilistic model of agent behaviors. GOBS is a statistically based framework that first records agentsâĂŹ trajectories that are then analyzed for common patterns of stable behaviors called behavioral primitives, exhibited by the agents during model execution. The final step is to measure the likelihood of an agent in a given behavioral primitive to either stay in the same stable behavior or transition to a di erent primitive. The authors illustrated the use of GOBS as a SA tool to automate the exploration of the input search space to detect system regime shi s, tipping points, and condensation versus dissipation of collective system behavior.

.
In the context of SA, GOBS analyzes how two model executions di er from each other in terms of exhibited agent behaviors. The framework measures ABM sensitivity relying on its ability to detect new behaviors introduced by changing input(s) values or model drivers. The SA also quantifies how many agents were in which behavioral primitives at the end of the model execution or at any given time of the execution. SA using GOBS o ers a measure of uncertainty between two model executions using the units of exhibited behavior.

Example Applications
Abstraction . An example of a mixed-method SA that follows the abstraction path (Figure , le ) is a study by Parker and colleagues (Filatova et al. ; Huang et al. ; Sun et al. ) who developed a suite of agent-based land market models. Their primary question was whether, and to what extent, the degree of representation of land markets a ect model outcomes represented as patterns of urban sprawl and land prices. They used various SA to analyze how model inputs and model operation rules impacted their model outcomes. Initially, they used an approach akin to OAT to explore the e ects of agent heterogeneity on model outcomes with other inputs fixed. To address the deficiencies of OAT, they augmented their analyses with regression and graphical methods. They used three-dimensional plots to explore the e ect of heterogeneous risk perceptions on land market outcomes in a model with a fixed market, spatial elements, and preference elements (Filatova et al. ). Huang et al. ( ) examined how agent heterogeneity impacted spatial and social model outcomes across a range of values for agent preferences and market representation when agentâĂŹs preferences and budget were stochastic. They used a variety of two and three-dimensional plots to visualize di erences across experiments. Sun et al. ( ) focused on the e ects of land market assumptions on social and spatial outcomes, assuming agent homogeneity. They used descriptive statistics, linear regression, and comprehensive plots to explore di erences between model rule settings. In all these examples, Parker et al. found that each SA method revealed some aspects of the relationship between market representation and heterogeneity on spatial patterns of sprawl and land prices while obscuring others. They determined that regression analysis is not a complete -or even sometimes a correct -tool for analysis of the sensitivity of a model representing a complex system. At the same time, several of their graphical representation methods were quite revealing. In future research, the authors could employ one of the interaction evaluation methods (from Figure ) to account for the limitations of regression. They are also interested in exploring regression trees as GSA.

Application .
The utility of screening was demonstrated in a data-rich ABM developed by Radchuk et al. ( ). They used SA to understand the drivers of vole population dynamics. The authors included inputs in their analysis, and the design of the screening resulted in (just) model runs. Two groups of model outputs were examined: the first one reflecting vole demography (mean and standard deviation of the vole population size) and the second one focusing on the cyclicity of the population dynamics (the amplitude and period of cycles). Their SA indicated that, if the population size was used as the output, the most sensitive inputs were a set of intrinsic inputs (i.e., inputs involved in the survival and reproduction processes of the vole). However, if the model output was quantified by metrics reflecting the cyclicity of the population dynamics, the most sensitive inputs were those that represented both intrinsic and extrinsic (predation) inputs. Only one input was found as non-influential when using both groups of model outputs -this input was involved in the description of the mustelid (predator) e ect on voles (prey). Additionally, the majority of the inputs that largely a ected the outputs were acting in an interactive or non-linear way. .
As shown in Radchuk et al. ( ), when dealing with multiple outputs, ABM may exhibit di erent sensitivities to its inputs. In a di erent study, Ligmann-Zielinska and colleagues came to the same conclusion (Ligmann-Zielinska ; Ligmann-Zielinska et al. ). The ABM they developed simulates farmers' enrollment into Conservation Reserve Program (CRP) (USDA ). Their ABM was designed based on a public notice describing the criteria used in CRP enrollment and the procedure to select o ers submitted by farmers to the farm agency. The model was built to identify how farmer enrollment and agency selection decisions a ect the allocation of federal funds to reassign agricultural land from production to conservation. Their ABM aims to solve the problem of maximizing the environmental benefits from CRP while minimizing the cost per acre of the enrolled land.
Outputs include the cost of enrollment, acres enrolled, and di erent measures of land fragmentation. Since the SA involved the evaluation of multiple outputs, the authors assumed that some of the inputs might cause nonlinear behavior. They selected variance-based GSA as a method of evaluating model sensitivity. The first two outputs -total land enrolled and the cost of enrollment -exhibited an almost linear behavior, and were the most sensitive to di erent realizations of the environmental benefits index. The land fragmentation statistics, on the other hand, were most sensitive to the interactions among inputs (less than % of the variability in every metric was explained by individual inputs). The ABM was validated using an independent data source and can be operationally used to provide recommendations for optimal land rental at specific locations.

Modeling complexity
Topology-based SA .
Topology Oriented Sensitivity Analysis (TOSA) has been used in studies led by Jing Du (Du & Ligmann-Zielinska ; Du ; Du & Wang ). The focus was on comparing the outputs of value-based GSA with the outputs of structure-based TOSA to demonstrate the hidden properties of ABMs that could not be identified with traditional value-based approaches. .
The flagship TOSA application is reported in Du & Wang ( ). They investigated how peopleâĂŹs daily travel behaviors a ected and reshaped the long-term form of a city. TOSA was used to evaluate the importance of seven inputs, including population, traveling preferences, shopping behaviors, initial urban setup, and socioeconomic status of families. The analysis was performed to understand how to control automobile use in the long-term future. Outputs included the final number of stores, the total walkable distance, and the total driving distance. In the study, TOSA was compared with GSA. TOSA was able to reveal two inputs -the preference for the store, and preference for accessibility -that have potential influences on the future of the urban configuration. Those two inputs were identified as non-influential using the other (GSA) methods. Traditional valuebased SA failed to capture this nuance because it was hidden behind dataset topology rather than the mean value or variance of the output.
Only a few ABM applications of time-varying GSA have been performed. Ligmann-Zielinska & Sun ( ) examined time-varying GSA of a land-use ABM, and found that spatial outcomes, such as the size of the largest contiguously developed area, showed varying sensitivity throughout model execution due to feedbacks among landscape characteristics, such as perceived scenic beauty and land values, that varied as development patterns changed. Magliocca et al. ( ) applied a similar approach in a coastal setting. They demonstrated shortterm sensitivities to storm frequency and coastal amenity values immediately a er storm events, and long-term cycles of model outcome sensitivity due to interactions among economic fundamentals, such as discount rates and travel costs, as the area developed for housing increased over time.

Conclusion
. We presented a review of sensitivity analysis (SA) methods used in ABMs to highlight the advantages and limitations of di erent approaches and highlight novel cutting-edge approaches. Using this review, the experiential knowledge of the coauthors, and qualitatively balancing cost (e.g., implementation and computation time) with the extent of the SA (e.g., first-order or higher-order e ects), we developed three paths of sensitivity analysis. Our paths promote mixed-method approaches that are presented in the form of a roadmap for those seeking guidance in conducting a sensitivity analysis of ABMs or other models. A benefit of using a mixed-method approach lies in the opportunity for corroboration of output-input sensitivity among di erent methods, and, where overlap among di erent methods occurs, greater importance may be placed on those inputs.

.
Our constructed roadmap is purpose-driven and synthesizes three disparate paths taken by the ABM modeling community: abstraction, application, and modeling complexity. Broadly, these three paths are similar to those taken by modelers seeking to identify a proof of concept, produce predictive outcomes or increase decisionmaking capacity, and conduct exploratory modeling to gain insight into the e ects that the themes of complexity science (e.g., thresholds, feedbacks, path dependence) have on a system of interest. We acknowledge that our roadmap is not comprehensive and is one of many that may guide others; however, to the best of the authors' knowledge, it o ers a first approach to structure a common pathway or protocol for SA in the ABM community. We hope that others will refine and extend the presented roadmap as well as use it to guide future model evaluation and inform those reviewing model outcomes about the practices for SA with ABMs.
. While SA requires extra e ort, it also provides important benefits. Increased use of SA in a specific ABM domain provides an indication of maturation of modeling because model analysis and -related to this -model selection and parameterization is less ad hoc. The development of a certain culture of ABM analysis is slow but unavoidable. For example, early individual-based models in ecology were o en ad hoc, both in design and analysis; publication without any SA was possible and rather the rule than the exception. This was the situation in . At some point, getting published without an OAT became di icult. More recently, reviewers have started criticizing OAT and call for more comprehensive methods. Therefore GSA approaches become more common (Thiele et al. ). The trend of advancing model construction methodologies is prevalent in other areas that include protocols for ABM description (Grimm et al. ) or robustness analysis -RA (Grimm & Berger ).
. Input uncertainty is an inherent characteristic of any ABM. It stems from the unpredictability of certain events and our incomplete knowledge of systems. GSA is useful and can provide important insights but may also be too complex to comprehend, becoming a black box itself. Better algorithms and visualization methods are needed to make the global methods more accessible to a broadly defined ABM community. Stakeholder participation can also aid formal SA (Corral & Acosta ). More case studies are needed to explain better the advantages of the novel SA approaches. It is important to find the right balance between the pragmatic and theoretic realization of SA. .
This paper calls for treating SA as an integral part of modeling, in which SA-obtained insights add value to information derived from ABM. We need to continue to pursue SA as a key tool for understanding model performance. Avoiding sensitivity analysis is gambling with uncertainty. Clarke ( ) argues that SA is an obligatory step to establish model credibility. It is yet to be seen if SA becomes one of the key elements in the ABM code of ethics.

Acknowledgements
This paper was partially sponsored by USA National Science Foundation grant number: BCS # and SMA # . We also thank the insightful comments and input from all the participants of the ABM Model Verification and Validation Workshop at the ABM Symposium, sponsored by USA National Science Foundation grant number: BCS # . Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors(s) and do not necessarily reflect the views of the National Science Foundation. Support for Derek Thomas Robinson was provided by the Natural Sciences and Engineering Research Council (NSERC) of Canada under Discovery Grant ( -). Nick Maglioccia was supported by the National Socio-Environmental Synthesis Center (SESYNC) under funding received from the National Science Foundation DBI-. The authors would also like to acknowledge the contribution of students of the course on Geosimulation in spring taught by Ligmann-Zielinska. Their fruitful discussions about the role of sensitivity analysis in ABM contributed to many sections in this manuscript.