Augmenting Bottom-Up Metamodels with Predicates

Metamodeling refers to modeling a model. There are two metamodeling approaches for ABMs: (1) top-down and (2) bottom-up. The top down approach enables users to decompose high-level mental models into behaviors and interactions of agents. In contrast, the bottom-up approach constructs a relatively small, simplemodel that approximates the structure andoutcomesof adataset gathered from the runsof anABM. The bottom-upmetamodelmakesbehavior of theABMcomprehensible andexploratory analyses feasible. Formost users the construction of a bottom-up metamodel entails: (1) creating an experimental design, (2) running the simulation for all cases specified by the design, (3) collecting the inputs andoutput in a dataset and (4) applying first-order regression analysis to find amodel that e ectively estimates the output. Unfortunately, the sums of input variables employed by first-order regression analysis give the impression that one can compensate for one component of the system by improving some other component even if such substitution is inadequate or invalid. As a result themetamodel can bemisleading. We address these deficiencies with an approach that: (1) automatically generates Boolean conditions that highlight when substitutions and tradeo s among variables are valid and (2) augments the bottom-upmetamodel with the conditions to improve validity and accuracy. We evaluate our approach using several established agent-based simulations.


Introduction
. Metamodeling refers to modeling a model.Within ABM there are two metamodeling approaches: ( ) top-down and ( ) bottom-up.The top down approach enables users to decompose high-level mental models into behaviors and interactions of agents.In contrast, the bottom-up approach constructs a relatively small, simple model that approximates the structure and outcomes of a dataset typically gathered from the runs of a large and complex ABM.
. While both approaches provide abstractions of an ABM, the purpose of the abstraction is di erent.Top-down metamodels are used in the design phase of modeling to capture requirements and organize the architecture of the ABM, while bottom-up metamodels are constructed a er implementation to make behavior of the model comprehensible and exploratory analyses feasible (Goldspink ; Bigelow & Davis ).
. The process of developing a bottom-up metamodel proceeds as follows: ( ) the user creates an experimental design for the inputs of the ABM, ( ) runs the ABM according to the experimental design, ( ) collects the results in a dataset and ( ) uses statistical methods to generate a model that enables the user to understand why the ABM behaves the way it does (Barton ; Kleijnen & Sargent ; Bigelow & Davis ; Friedman ).The most common implementation of this process predicts the ABM output as a sum of input variables using first-order regression analysis without any bounds on the input space of the model.We refer to metamodels constructed in this manner as baseline metamodels. .
Unfortunately, baseline metamodels assume that critical thresholds within a single input or relative relationships among multiples inputs of the ABM do not exist.Due to this assumption the baseline metamodel can be misleading because it gives impression that one can compensate for a component of the system by improving some other component even if such a substitution is inadequate or invalid (Friedman &  .We propose to address the aforementioned deficiency by augmenting widely used first-order regression analysis with Boolean conditions that highlight when tradeo s and substitutions among variables are valid.These Boolean conditions and the regions that they bound reflect critical components within ABMs.The term critical component refers to a threshold within a single input or among multiples inputs that a ect the behavior of the ABM.Our approach to capture critical components is inspired by the field of so ware engineering, which uses predicates to identify critical components that cause errors within computer programs (Liblit ; Gore et al. ). .
The remainder of this paper proceeds as follows.First, we present background material needed to understand our approach and relate it to existing metamodel research.Then we demonstrate how a user employs our approach to produce an augmented metamodel.Next, we compare the e iciency and the e ectiveness of our augmented metamodels to baseline metamodels for three established ABMs.Our evaluation also compares the e ectiveness of our augmented metamodels to alternative metamodels constructed by applying machine learning methodologies.Finally, we discuss the validity of our analysis and the limitations of our work, summarize our contributions and provide direction for future research.

Background Top Down Metamodeling
. Recall, top-down metamodels capture requirements and system interactions to serve as a template for the implementation of an ABM.Two of theses approaches, Agent-oriented-programming (AOP) and Gaia, provide mechanisms to the decompose agent-behaviors and agent-roles (Wooldridge et al. ).Another approach, the Unified Modeling Language (UML) captures the mental composition of agents through their perception of their environment, their ability to act within their environment, and their reasoning to interpret and solve problems (Hayes-Roth ; Wagner ; Jouault & Bézivin ).Communication between agents can be captured using the Agent extension to UML (AUML) (Odell et al. ).Agent roles or attitudes can also be represented through a set of beliefs, desires, and intentions (BDI) using informal or formal semantics (Rao & George ).
. Other top down approaches take a less modular approach to describing the components of an ABM.The Agent-Object-Relationship (AOR) approach employs object-oriented methodologies, including static structure and structural relationships, dynamic interactions, and functional data flow methods (Iglesias et al.
).The AOR approach conceptually integrates static, dynamic, and mental components of organizational systems (Shoham ).The Kernel MetaMetaModel (KM ) serves as a metamodel definition language by defining a domain definition metamodel (DDMM) of a Domain Specific Language (DSL) (Jouault & Bézivin ).Cassiopeia identifies elementary agent behaviors (Wooldridge et al.
) to study the e ects of global or organizational behaviors (Collinot et al. ).
. While the process of abstraction in top down metamodeling is related to our work, its purpose is orthogonal.Our goal is to construct bottom-up metamodels from data collected from an ABM to make trends comprehensible and exploratory analyses feasible.This process is independent of methodology related to the requirements and architectural design of the ABM.In future work, we will explore how the existence of a top down metamodel could improve our approach to constructing bottom-up metamodels.
Bottom-up approaches focus on creating a model generated from data gathered from the ABM to make trends comprehensible and exploratory analyses feasible.A myriad of di erent statistical methods can be applied to generate bottom-up metamodels.However, most techniques fall into one of three broad categories: ( ) regression analysis, ( ) structural equation modeling or ( ) machine learning.

Regression analysis .
Regression analysis estimates the relationship between an output of the ABM and one or more input variables.
The structure of the model is determined by minimizing the variance between the model and the data by using as few variables as possible.In this paper, we focus on creating bottom-up metamodels by employing first-order regression analysis.In these metamodels input variables are only included as linear terms (i.e the first power) that are combined through addition and subtraction.We choose first-order regression analysis because it is the most commonly employed method to generate metamodels.Its popularity is largely due to its simplicity and accessibility.As a result, improving the accuracy and validity of these metamodels will benefit the most users in the ABM community.

Structural equation modeling .
Structural equation models reflect a second generation of analysis methodology (Fornell & Larcker ; Chin ).Within the family of SEM techniques are many methodologies, including confirmatory factor analysis (Harrington ), causal modeling (Bentler ), causal analysis (James et al. ), simultaneous equation modeling (Chou & Bentler ), and path analysis (Wold et al. ).To begin SEM modeling an expert specifies the dependency structure among the variables in a model.This additional information allows for simultaneous analysis of all the variables in a potential model and which can provide a better fit to the collected data.This aspect of SEM is di erent from regression analysis where all the input variables are assumed to be independent of one another and each potential model must be analyzed incrementally.

Machine learning
. A variety of advanced analysis techniques use machine learning to overcome the deficiencies of structural equation modeling and linear regression analysis.Machine learning algorithms employ feature selection, decision trees, inductive logic programming and neural networks to construct metamodels that support non-linear tradeo s among input variables that traditional analysis techniques cannot identify and/or represent.Recently, researchers studied the performance of ten di erent machine learning algorithms for identify the structure of bottom-up metamodels for a variety of simulations.They found that combining feature selection with decision trees produced more accurate metamodels than any other tested combination of machine learning algorithms (Al Khawli et al. ).
. New experimental designs guided by machine learning have been proposed as well to develop accurate metamodels more e iciently.Formalisms based on state machines and features diagrams have been employed to support the definition of a valid metamodel so it can distinguished from an invalid metamodel (Wooldridge et al.
). Leveraging this representation, inductive logic programming and genetic algorithms are used to select and deduce valid metamodels that best reflect the data produced by the simulation (Faunes et al. ).In addition, Radial Basis Function Networks (RBFNs) have been used to generate accurate metamodels by iteratively adding new sampling points, to approximate responses with discrete changes according to experimental designs (Bandaru et al. ).Similarly, recent research has employed Latin hypercubes to choose the sampling points and support vector regression (SVR) to develop a metamodel for buckling loads of composite plates under compression (Koide et al.
).Finally, other researchers have compared the e icacy of Neural Networks (NNs) and RBFNs for constructing a metamodel to estimate overheating and air pollution in buildings produced from physics simulations.NNs are shown to perform around % better than RBFNs when estimating overheating and air pollution metrics determined in physics models (Symonds et al. ).

Statistical debugging .
Our approach to automatically creating augmented metamodels employs predicates that are used in statistical debuggers.Predicates enable statistical debuggers to analyze relationships within and among variable values.
Using predicates the debuggers isolate the causes of so ware bugs to guide developers in locating and fixing faults in buggy programs (Liblit ; Gore et al. ).
. Within statistical debuggers two di erent types of predicates (single variable, scalar pairs) at two di erent levels of specificity (static and elastic) are employed to localize bugs.The choice of type and the specify-level defines a unique combination of conditions related to a variable(s) that the predicate captures.Two or more predicates can also be combined by generating compound predicates to gather insight about the behavior of a variable(s) at an additional level of granularity.
. A single variable predicate partitions the set of possible values that can be assigned to a variable x.Single variable predicates can be created at two levels of specificity: the static level and the elastic level.The most basic single variable predicates are static.Static single variable predicates are employed to partition the values for each variable x around the number zero: (x > 0), (x ≥ 0), (x = 0) and (x ≤ 0), (x < 0).These single variable predicates are referred to as static because the decision to compare the value of x to is made before execution (Liblit ).In contrast, the single variable elastic predicates use summary statistics of the values given to variable x to create partitions that cluster together values which are a similar distance and direction from the mean.For the variable x with mean µx and standard deviation σx, the elastic single variables predicates created are: (x > µx + σx), (x > µx + 2σx), (x > µx + 3σx), (µx + σx > x > µx − σx), (µx + 2σx > x > µx − 2σx), (µx + 3σx > x > µx − 3σx), and (x < µx − σx), (x < µx − 2σx) and (x < µx − 3σx).These predicates reflect values of variable x that are well above their normal value, within their normal range of values and well below their normal value (Gore et al. ).
. Scalar pair predicates capture the important relationships between two variables that elude single variable predicates.The most basic scalar pair variables are static.Static scalar pair predicates are employed to partition the di erence between a pair of variables, x and y, around the number zero: (x − y > 0), (x − y ≥ 0), (x − y = 0), (x − y ≤ 0) and (x − y < 0).These scalar pairs predicates are referred to as static because the decision to compare the di erence between x and y to is made before execution (Liblit ).In contrast, the scalar pairs elastic predicates use summary statistics of the di erence between x and y to create partitions that cluster together values which are a similar distance and direction from the mean.For the pair of variables x and y with mean di erence µx − y and standard deviation σx − y, the elastic scalar pairs predicates created are: These predicates reflect di erences between the values of x and y that are well above the normal value, within the normal range of values and well below the normal value (Gore et al. ).
. Compound predicates reflect any combination of single variable and scalar pair predicates that can be composed using the logical operators ∧ (and) and ∨ (or).For any two predicates P and Q, two compound predicates are tested: ( ) the conjunction of the predicates (P &&Q) and ( ) the disjunction of the predicates (P ||Q).
Once created a compound predicate can be combined with another compound predicate (Arumuga Nainar et al. ).

Automatically Creating Enlightened Metamodels
. Despite di erences in purpose, we hypothesize that the same Boolean conditions predicate-level statistical debuggers employ to localize bugs in so ware are capable of bounding regions in the input space where tradeo s and substitutions among variables are valid.Here, we provide a small example to demonstrate the process and applicability of our approach.Then we address the implementation of our approach in more detail.

Approach .
Recall, the most common process users follow to create a baseline metamodel begins with running the ABM for many trials where the inputs are varied according to an experimental design.The resulting output data from the simulation is collected and a statistical model from the data is computed using first-order regression analysis.
Based on the analysis, inputs that appear insignificant are discarded while others that seem redundant are combined.Ultimately, the bottom-up metamodel of the simulation is finalized.Recall, we refer to a metamodel created in this manner as a baseline metamodel.The baseline metamodel is significantly simpler than the simulation and produces answers in a fraction of the time (Barton ; Kleijnen & Sargent ; Bigelow & Davis ; Friedman ).
. Using this method, the output of the simulation is treated as a sum of terms, where each term is composed of one or more input variables.This process assumes that critical thresholds within a single input or relative relationships among multiples inputs of the larger simulation do not exist.Due to this assumption the resulting baseline metamodel can be misleading because it gives the impression that one can compensate for one input of the simulation with another input even if such a substitution/tradeo is inadequate or invalid ( .
To remedy this shortcoming we augment the terms within a baseline metamodel with predicates used by statistical debuggers.Augmenting the terms in a baseline metamodel encodes the set of valid tradeo s and substitutions from the ABM into the metamodel.The result is improved accuracy and validity.To elucidate this process and the utility of our contribution, we demonstrate in a small example how our augmentation approach can produce an improved metamodel.

Simple example .
An example helps elucidate our approach to employing predicates to improve the validity and accuracy of baseline metamodels.We suppose a user has constructed a simulation called MED which takes three integers (x, y, and z) as input and outputs the median value.
. We construct a metamodel for MED by specifying the experimental design.For this example, we specify a full factorial design for inputs x, y and z over the range of inputs values [1,3].Next, we run the simulation for each of the inputs.The inputs and output of each run are recorded in a data set.Using the data set we create the single variable, scalar pair predicates at the static and elastic specificity levels for the input variables.Using these predicates we generate compound predicates to explore the captured input conditions in combination.Finally, we augment the collected data to show the truth-value of each generated predicate in the data set.The values of the inputs, an important subset of the generated predicates and the output value of the simulation are shown in Table .We use the convention that if the predicate is true for the inputs a value of is recorded and if the predicate is false for the inputs a value of is recorded.
. Once the data is collected, analysis is performed to identify those predicates and variables that should be retained, combined and discarded (Calcagno & de Mazancourt ).The resulting augmented bottom-up metamodel identified by our approach is shown in Equation : Given the limited scope of the example one can see that this metamodel exactly predicts the M ED's output for: ( ) the inputs over the range [1, 3] and ( ) all inputs that can ever be given to this simulation.While this may seem like an expected outcome from such a simple example, it is not.Using only sums of the terms created from the three input variables (x, y and z) to model the output of MED will not accurately predict the output of the model for unseen inputs because there is no way to capture that the output is not a direct combination of the inputs.For example, consider the baseline metamodel shown in Equation .
Equation does not employ predicates in its regression analysis and as a result it metamodels the median of X, Y and Z as the arithmetic mean.Unfortunately, this representation can be invalid.For example, parameterizing Equation with X = 3, Y = 30, and Z = 3, 000 results in a prediction of , when the actual system output is .In general, the baseline metamodel enables one to maximize a single variable (i.e Z) to compensate for shortcomings in the other variables (i.e.X and Y ).The actual simulation does not have this property.Without the inclusion of predicates to control when substitutions and tradeo s among variables are valid an accurate first order linear regression metamodel is not possible.

.
It is important to note that goal of this example is not to set a standard of % accuracy over observed and unobserved inputs for our augmented metamodels.Instead, the goal is to demonstrate how our approach employs predicates to encode some of the valid tradeo s and substitutions from the ABM into the metamodel to improve validity and accuracy.

.
In the following subsection we provide more details about: ( ) how predicates and input variables are combined within the structure of an augmented metamodel and ( ) how we conduct an automated search using a genetic algorithm to identify the augmented metamodel that retains the most information from the ABM.

Implementation .
The MED example in the previous section demonstrates the improvements that are possible if metamodels are augmented with predicates.However, it also highlights several questions we must address in the implementation of our approach.These questions include: .The metamodel featured in the example contains three compound predicates applied to three input variables.What does this imply about the structure of the augmented metamodels built for ABMs?
. What mechanism is used to identify the augmented metamodel that retains the most information from the ABM? .
The remainder of this section answers these questions in detail.
Structure of the metamodels .
Our approach augments baseline metamodels with predicates.Within an augmented metamodel, each predicate is a Boolean expressions multiplier that is applied to an input variable in the metamodel.In other words, given a vector of predicates P , and a vector of input variables V , the prediction of the metamodel is the result of applying P to V .For example in the augmented metamodel for MED shown in Equation , predicates , and are applied to the terms X, Y and Z respectively.
. This strategy ensures that any metamodel generated by our approach will include one predicate for every term included in the metamodel.This includes cases where qualifying a term with a Boolean condition does not improve the fit of the model.In these cases a predicate that is universally true will be applied to the input variable (i.e.x = 0 ∨ x = 0).It also includes cases where including a variables does improve the fit of the model.In these cases a predicate that is universally false will be applied to the input variable (i.e.x > 0 ∧ x < 0).Furthermore, to ensure that each predicate does not become overly complex we only generate compound predicates that include a maximum of four Boolean conditions.
Finding the best metamodel .
The process of creating augmented metamodels begins with the data collected from the simulation.First, the input variables and an output across a set of test cases is used to generate and score the truth-values of the predicates employed in augmented metamodels.Then using the data, we apply, an automated model selection approach called glmulti.glmulti uses a genetic algorithm to identify the augmented metamodel that retains the most information from the data collected from the ABM.
. The genetic algorithm in glmulti begins by generating an initial population of first-order augmented models that match the structure specified in the previous section.Each model in the population is encoded as a string of s and s.This encoding indicates which of the possible input variables and predicates in the model are present ( ) and absent ( ).The string serves as the model's chromosome that will undergo adaptive evolution and each bit in the string is a locus for possible adaption.

.
Every generation, each model is treated as a first order linear regression model and the Information Criterion (IC) of the model of is used to determine the model's fitness.An indepth discussion of the IC we employ is provided in the Evaluation section.The fitness of a model is computed using Equation where IC model is the IC value of the current model and IC best is the best IC in the current population of models. .
Based on the previous generation, models in the next generation are produced through three di erent genetic operators applied in combination.These operators are: ( ) asexual reproduction ( %), ( ) sexual reproduction ( %), and ( ) immigration ( %). .
A model produced by asexual reproduction is simply a copy of its parent.The parent is drawn from the previous generation with a probability proportional to its fitness.Then the states of some the model's loci are changed by mutation.In our implementation, each locus is changed with a % probability.
. A model produced by sexual reproduction has two parents whose chromosomes are recombined.Again, parents are drawn from the previous generation with a probability proportional to their fitness.In addition to recombination, each locus can mutate.Within sexual reproduction mutations occur with % probability.
. A model produced by immigration has the state of each locus assigned randomly.As a result its application can produce big changes in the structure of the models that will be fitted (Yang ).
. Three rules to define when the algorithm should stop looking for better models.The first two rules reflect target improvements in the best IC and the average IC found in a given generation.In our implementation, if the observed improvements in the IC are below ., then the genetic algorithm is declared not to have significantly improved.The final rule reflects how many times in a row a lack of improvement has to occur to declare the model converged.In our implementation if the best and average IC do not improve above the .threshold for five consecutive times we declare convergence and the best model is output.

Evaluation
. We evaluate the e iciency and e ectiveness of our automated approach to augmenting regression analysis with Boolean conditions that highlight when tradeo s and substitutions among variables are valid.Recall, these Boolean conditions and the regions that they bound reflect critical components within ABMs.The term critical component refers to a threshold within a single input or among multiples inputs that a ect the behavior of the ABM.First we introduce the three established ABMs we use as subjects in our evaluation.Then we discuss our experimental setup and our measure of e ectiveness.Finally, we present the results of our evaluation. .The results of the model match real-world influenza data.Furthermore, with proper parameterization, the model can be used to predict the peak number of influenza infections that occur during the course of a given season.Several parameters control how the flu is spread and determine the peak number of infections during

Creation of baseline and augmented metamodels .
For each of the agent-based simulations we construct an experimental design by applying Latin hypercube sampling with , samples to the simulation's parameters.A Latin hypercube design yields a sample where each of the dimensions of each variable is divided into equal levels and that there is only one point (i.e.sample) at each level.We use an optimized random procedure to determine the point locations so that good coverage of the design space is ensured.Such an experimental design is recommended by (Meckesheimer ; Meckesheimer et al. ) in their review of metamodel design and assessment.
. Next, we run each simulation for all of the specified inputs in the experimental design and collect the results.Once all the data has been gathered we construct predicates.Then using glmulti we generate augmented and baseline metamodels for each simulation.The name, minimum value and maximum value of the parameters varied and the output metamodeled for each ABM are shown in Table .E ectiveness .
To study the e ectiveness of the baseline and augmented metamodels we considered two di erent Information Criteria: ( ) the Akaike Information Criterion (AIC), ( ) the Bayesian Information Criterion (BIC).Here we review the similarities and di erences of each criterion and describe why we chose to use the BIC to measure e ectiveness in our evaluation.
Inf ormation Criterion = kp − 2lnL ( )  .Within in Equation , L is the function measuring the amount of information retained by the metamodel, p is the number of terms in the metamodel and k is penalty coe icient.Given the dynamics of the Equation , as k and p increase, the metamodel needs to retain more information to maintain the same IC value.In the computation of AIC, k = 2.However, in the computation of BIC k is the natural log of the number of data points used to fit the model.Recall, we use , observations for each ABM.This results in a k-value of . in the computation of the BIC.
models.Thus, choosing to evaluate the e ectiveness using the BIC allows us to conservatively measure the extent to which employing our augmented approach reduces the amount of information lost from the ABM in the resulting metamodel.This choice is noteworthy.Even when an augmented metamodel more accurately reflects the data generated from the ABM it will be aggressively penalized because of the predicates it uses to model the ABM output.As a result, for an augmented metamodel to outperform a baseline metamodel it must produce results that are significantly more accurate.
. In addition, it is important to recall how the predicates are formed in the augmented metamodels.They reflect the elastic and static conditions formed from single variable, scalar pair and compound predicates generated from data collected by running the simulation for the experimental design specified in Table .Comparing the input values and the di erence in input values to zero forms static conditions.Conversely, comparing the input values and di erence in the input values to the mean and standard deviations of the values they take on during data collection form elastic predicates.The experimental design is based on Latin hypercube design that employs an optimized random procedure.This ensures good coverage of the input space but also injects some randomness into the input values that are gathered during data collection.

Boids .
The two metamodels generated for the Boids simulation are shown along with the BIC values of each metamodel in Table .The values of coe icients and intercepts employed in the model have been moved from the table into the caption to improve readability.Recall, a smaller BIC value is preferable because it reflects less information lost from the ABM.
. The BIC values shown in Table demonstrate that augmenting the Boids model with predicates improves the accuracy of the metamodel.In particular the predicate P 2 ensures that if the value of any input variable controlling the Boids flight pattern contributes to the estimation of the flocking index, then all the input variables controlling the flight pattern will contribute to the estimation.This property prevents an extreme value from being assigned to any input variable to compensate for another.For example, improving the alignment of Boids, even to an extreme degree, cannot compensate for insu icient cohesion.
. We are not the first researchers to identify this sensitivity of the model.It has been identified by others (Stonedahl & Wilensky ).However, our approach automatically generated an accurate metamodel that makes this constraint (i.e.P 2 ) explicit to the user.The baseline metamodel does not share this constraint.As a result, it enables users to invalidly maximize or minimize any input variable to overcome an extreme value assigned to another input variable.

Schelling's model of segregation .
The two metamodels generated for the Schelling model are shown along with the BIC values of each metamodel in Table .The values of coe icients and intercepts employed in the model have been moved from the table into the caption to improve readability.Recall, a smaller BIC value is preferable because it reflects less information lost from the ABM. of agents in the model that belong to one group.When the mean satisfaction threshold of agents is less than the percentage of agents that belong to one group, the mean satisfaction level of agents is high.Intuitively, this makes sense.Both conditions make agents easier to satisfy because: ( ) the threshold to be satisfied is lower and ( ) there are more similar agents to surround them.Conversely, once the satisfaction threshold of agents in the model exceeds the percentage of agents that belong to one group the mean satisfaction drastically decreases.To enforce this property in the metamodel the scalar pairs predicate P 1 is assigned to both variables in the augmented metamodel that are featured in the condition (Threshold and PrctInOneGroup).

.
This non-linear behavior cannot be represented in the baseline metamodel because it is impossible to compare the values of Threshold and PrctInOneGroup.As a result the baseline metamodel misrepresents the dynamic that exists between these two variables and information from the ABM is lost in the metamodel.

Spread of influenza .
The two metamodels generated for the Influenza Spread model are shown along with the BIC values of each metamodel in Table .The values of coe icients and intercepts employed in the model have been moved from the table into the caption to improve readability.Recall, a smaller BIC value is preferable because it reflects less information lost from the ABM.
. demonstrate that augmenting the Influenza Spread model with predicates improves the accuracy of the metamodel.In this case a compound predicate that combines a scalar pairs predicate with a single variable predicate imposes a constraint on the metamodel to improve its accuracy and validity.Specifically, the compound predicate highlights the relationship between the value of the mean number of people within a home and the mean number of people within a work place.Within the Influenza Spread model, the peak number of infections in a given season is largely determined by the opportunity to spread the infection (Dunham ).When agents work with a larger number of agents than they cohabitate with the number of co-workers dominates the contribution to the number of peak infections.Conversely, when agents cohabitate with more than they work with the number of cohabitators dominates the contribution to the number of peak infections.However, in each case there is a quorum on the number of people needed to provide a significant contribution to the peak number of infections in a season.

.
These relationships cannot be represented in a baseline metamodel because there is no mechanism to compare values of input variables against one another or identify thresholds needed in a single variable to achieve a quorum.Instead input variables are modeled as strictly continuous functions with coe icients to scale them to the output.All three metamodels constructed in this manner are invalid and lose a significant information from the ABM that is retained in an augmented metamodel.

ABM Metamodeled BIC
Decision Tree Boids .Decision Tree Schelling's Model of Segregation .Decision Tree Spread of Influenza .
Table : E ectiveness of Decision Tree Metamodels measured by BIC.
Evaluation against machine learning methodologies .
Our evaluation shows that each augmented metamodel results in a lower BIC than its baseline counterpart.This is significant; recall a lower BIC means that less information is lost (i.e. more information is retained).Furthermore, BIC, controls for the number of factors included in each model even more aggressively than the AIC measure.This means that the constraints included in augmented models retain significantly information from the ABM even when evaluated under conservative conditions (Bozdogan ).
. While the existence of this trend is necessary to demonstrate the e ectiveness of our approach it is not su icient.
To address this deficiency we evaluate our approach against additional machine learning methodologies that could be employed to construct a baseline metamodel: ( ) decision trees and ( ) feature selection.

Decision tree metamodels .
The construction of decision tree metamodels included in our evaluation uses the rpart package available for the R programming language (Therneau et al. ).The rpart package builds decision tree metamodels to predict the output of the ABM by constructing a binary tree using a subset of the input variables.The construction process consists of two steps.First a threshold value for the input that best splits the data into two groups is identified.The data is separated, and then this process is applied separately to each sub-group, and so on recursively until the subgroups either reach a minimum size or until no improvement can be made.The BIC of the decision tree metamodel resulting from application of rpart is shown in Table . .
Table demonstrates that the decision tree metamodels are ine ective compared to the baseline and augmented metamodels shown in Tables -.They result in higher BIC values, which reflect more information, lost from ABM.This is due to an assumption within the decision tree methodology.Decision tree construction assumes that the output space can be discretized by hierarchically nesting conditions related to the ABM inputs.
For the ABMs included in our evaluation this is not the case.The output space of each ABM included in our evaluation is either continuous or discretized into fine-grained intervals.As a result, even when the decision tree metamodel correctly classifies the output of the metamodel, it does not produce a prediction that is close to matching the ABM output.Furthermore, the inputs in the ABMs included in our evaluation do not follow a strictly hierarchical order.Under some parameterizations one input will provide the best split of the data.However, under other parameterizations another input will provide the best split of the data.The result of assuming one input will provide the best split of the data for all runs for the ABMs included in our evaluation is an increase in information lost.

Feature selection metamodels .
The construction of feature selection metamodels included in our evaluation uses the caret package available for the R programming language (Kuhn ).The caret package builds a regression metamodel that is similar to a baseline metamodel.The di erence between a baseline metamodel and a feature selection metamodel is that a feature selection metamodel is not guaranteed to include all of the input variables while a baseline metamodel is.This ability of select only the most important variables (or features) in a metamodel can enable a feature selection metamodel to produce a lower BIC than a baseline metamodel. .
The caret package uses routine that tracks the reduction in the estimate of error of the metamodel for each input variable as it is added to the metamodel.The reduction is used to measure the importance of the input in the metamodel.High reductions in the estimate or error denote important variables while low reductions denote unimportant variables.The minimum reduction in error for an input to be included in the metamodel is determined by the number of inputs in the ABM.The BIC of the decision tree metamodel resulting from application of caret is shown in Table . .Table : The number of times the best alternative metamodel is likely to minimize information loss compared to the as augmented metamodel.
tant features.It is important to note that the metamodels included in our evaluation include a relatively small number of inputs ( , , and ).In future work we will explore how the e ectiveness of feature selection could be applied to baseline and augmented metamodels for ABMs with more inputs.

Evaluation summary .
Our evaluation shows that each augmented metamodel outperforms its baseline counterpart as well as other metamodels constructed using machine learning methodology.Despite the existence of this trend it is hard to conceptualize the improvement provided by our augmented metamodels.To elucidate improvement we define it as is the number of times the best alternative metamodel is likely to retain more information when compared to an augmented metamodel.This formula is shown in Equation .
. The best alternative metamodel reflects the baseline, feature selection or decision tree metamodel with the lowest BIC for a given simulation.Using this measure any result greater than .reflects a situation where the augmented metamodel is more likely to lose information from the ABM than the best alternative metamodel.Similarly, any result less than .reflects a situation where the best alternative metamodel is more likely to lose information from the ABM than the augmented metamodel.
. Equation is an established means to compare the improvement with respect to BIC of two competing metamodels (Konishi & Kitagawa ).Within the formula BIC AU G is the BIC value for the augmented metamodel and BIC ALT is the BIC value for the best alternative metamodel.In each of the simulations, the best alternative metamodel is less than .times as probable as the augmented metamodel to minimize information lost from the ABM.Furthermore, for two of the three models it is 0.50 times as probable as the augmented metamodel to minimize information lost from the ABM.This reflects a significant decrease in accuracy by using any metamodeling methodology included in our evaluation other than our augmented strategy for any of the ABMs included in our evaluation.
. In addition to improve accuracy, the inclusion of predicates in the augmented metamodel makes the existence of constraints that limit substitutions and tradeo s within and among input variables explicit to users.When users employ the augmented metamodels generated for the three established ABMs they will no longer assume that one variable can compensate for another when such a substitution is inadequate or invalid.This is the most powerful capability of our augmented metamodels because even if no additional information from the ABM is retained in an augmented metamodel, the validity of the metamodel is still improved.
Tables shows the amount of wallclock time that is required to construct the best alternative and augmented metamodels for each simulation included in the evaluation.The slowdown incurred is computed by comparing the time required to create each augmented metamodel to the time required to create the best alternative metamodel.Recall, the best alternative metamodel reflects the baseline, feature selection or decision tree  metamodel with the lowest BIC for a given simulation.In the case where there are multiple best metamodels, the metamodel with the shortest construction time is chosen.Formally, this measure is shown in Equation .

.
Table and Figure show that the construction of augmented metamodels incurs a x -x slowdown.In each case, the additional computational time required to construct enlightened metamodels in is spent: ( ) generating the predicates from the experimental design and ( ) finding the best statistical model in a search space with significantly more factors.
. While our approach to creating augmented metamodels is less e icient, it only requires machine time not user time.If users can remain productive while an augmented metamodel is generated, overall e iciency will be improved because the user is given a more e ective metamodel.This rationale has made formal so ware verification methods useful despite execution times measured in days and hours (D'silva et al. ). Next, we discuss the validity of our study and its limitations.

Validity
. Validity threats a ect our evaluation.Threats to internal validity concern factors that might a ect dependent variables without the researcher's knowledge (Schram ).The implementations of the simulations we used in our studies could contain errors.However, these simulations were all gathered from external sources, passed internal code reviews and a face validation of their output was performed before any data was collected.Threats to external validity occur when the results of our evaluation cannot be generalized (Schram ).While we performed our evaluations on three established agent-based simulations we cannot claim that the e ectiveness observed in our evaluation can be generalized to other agent-based simulations.Threats to construct validity concern the appropriateness of the evaluation metrics used (Cronbach & Meehl ).While the BIC information criterion enables the e ectiveness of our approach to be conservatively evaluated, some users may prefer di erent measure of e ectiveness.However, our general approach to constructing metamodels can still be applied even if users prefer di erent measures of e ectiveness.

Assumptions and limitations .
It is important to note that our evaluation does not include ABMs that are as complex as some that exist in the wild.This limitation exists because we wanted to employ an established set of ABMs that could be easily understood in the evaluation.Employing our enhanced validation approach on more complicated ABMs is an opportunity for future work.), queueing (Friedman ) and estimating artificial neural networks (Fonseca et al. ).

Related Work
. These results have improved accuracy and validity of metamodels in specific domains and/or by using advanced analyses.However, none of these e orts have proposed a general approach to address improving accuracy and validity for first-order linear regression bottom up metamodels.Several researchers have attempted to address this issue by complementing the statistical analysis used to construct a metamodel with user knowledge elicited manually (Kleijnen & Sargent ; Bigelow & Davis ).While these motivated metamodels are capable of capturing critical components they require manual e ort and significant input from a user.In contrast, our approach to constructing augmented metamodels is fully automated and requires no user input.

.
More recently, researchers have conducted a study using an integrated agent-based and metamodel to test the four kinds of policy varying along two dimensions.The results identified thresholds causing non-linear dynamics related to incentives and benefits (Polhill et al. ).In future work, we will look to apply our augmented metamodel construction technique to this model to see if it is capable of identifying the same critical thresholds.

Conclusion
. The current practice to constructing first-order linear regression bottom-up metamodels needs improvement.
The linear sums employed by the statistical methods used to construct the model give the impression that one can compensate for one component of the system by improving another component even if such substitution is inadequate or invalid.As a result the metamodel can fail to accurately reflect the critical components of the system and may be misleading.We propose an approach to constructing augmented metamodels where the predicates employed by statistical debuggers constrain substitutions and tradeo s among and within variables.We demonstrate that our approach can reduce the information lost in the metamodel while making critical components of the ABM explicit to users.These augmented bottom-up metamodels are more e ective than their baseline counterparts and other alternatives.Furthermore, they do not give users the impression that one can compensate for one component of the system by improving another when such substitution is invalid.In future work, we will explore how our augmentation approach can be adapted for advanced analysis techniques and apply our approach to more complex ABMs.

Figure :
Figure :The Schelling ABM with two groups (red and blue).

Figure : A visualization
Figure : A visualization of the Influenza Spread ABM.The colored circles reflect the four di erent states an agent can be in related to influenza: (S) -Susceptible, (E) -Exposed, (I) -Infected and (R) -Recovered.Influenza spreads as agents interact with other agents at work and at home (Dunham ).

.
The results from the computation are shown in Tablesand visualized in Figure .

Figure :
Figure :The e ectiveness results in Tablevisualized.All the results are below the dotted line at . .This reflects a case where each augmented metamodel is more probable to minimize information loss than the best alternative.

Figure :
Figure : Slowdown incurred by generating an augmented metamodel as opposed to the best alternative metamodel.

Table :
Experimental Design, Subset of Predicates and Collected Output

Table : ABM
Parameters with Minimum/Maximum Values and ABM Output .Equation defines the form for the AIC and BIC information criterion.Both are measures of fitness that estimate the amount information lost by using a metamodel instead of the data set generated by the ABM.Lower AIC and BIC values for a metamodel are preferable because they reflect a metamodel where less data is lost from the ABM.

Table : E
Table demonstrates that the feature selection metamodels are exactly as e ective as the baseline metamodels shown in Tables -.This result demonstrates that each of the inputs in the ABMs in our evaluation are impor-ectiveness of Feature Selection Metamodels measured by BIC.

.
The construction of metamodels is not new.Metamodels have been created for a variety of applications including: modeling output distribution parameters (Santos & Santos ), business practice optimization (McHaney