Model Exploration of an Information-Based Healthcare Intervention Using Parallelization and Active Learning

This paper describes the application of a large-scale active learning method to characterize the parameter space of a computational agent-based model developed to investigate the impact of CommunityRx, a clinical information-based health intervention that provides patients with personalized information about local community resources to meet basic and self-care needs. The di usion of information about community resources and their use is modeled via networked interactions and their subsequent e ect on agents’ use of community resources across an urban population. A random forest model is iteratively fitted to model evaluations to characterize themodel parameter space with respect to observed empirical data. We demonstrate the feasibilityofusinghigh-performancecomputingandactive learningmodel exploration techniques tocharacterize large parameter spaces; by partitioning the parameter space into potentially viable and non-viable regions, we rule out regions of space where simulation output is implausible to observed empirical data. We argue that such methods are necessary to enable model exploration in complex computational models that incorporate increasingly availablemicro-level behavior data. We provide public access to themodel and high-performance computing experimentation code.

Introduction ( ), we create a population of , agents with static sociodemographic characteristics matching the population of the South and West sides of Chicago (ages and older since the HRx information sheet for patients younger than was typically given to an accompanying adult), each of whom are assigned to a household, and work or school location. Additionally, using CRx and MAPSCorps data from www.mapscorps.org (Makelarski et al. ), we build a synthetic physical environment in which agents move, consisting of , unique places that provide services, including health-related services. The set of places also includes health care clinics where an agent could receive a HRx and exchange information with other agents about community resources. .
Agents perform an activity during each -hour time-step of a -hour activity schedule. Agents are randomly assigned an activity schedule each simulated day based on their sociodemographic characteristics. The activity schedules of the agents were developed using the American Time Use Survey (ATUS) dataset (https: //www.bls.gov/tus/data.htm). Activities are mapped onto relevant services that are associated with geolocated resources in (R). This approach allows us to connect two separate data sets, SPEW and ATUS, and maintain a stochastic variability of the population activities and use of resources based on matching demographic characteristics.

Agent decision behavior .
We model an agent's health maintenance behavior through an agent-based decision model. Agents can encounter two general types of activities in their activity schedules. The first type of activities are related to health maintenance behaviors, which correspond to types of services that are frequently referred to by HRxs. These activities provide a choice in behavior -either to decide in favor of engaging in a health maintenance related activity (use of wellness or health promoting community resource) in question, denoted by Decision A, or not, denoted by Decision B. The second type of agent activities does not contain any decision-making component. The agent will simply do the activity in question at the specified time if a relevant service type is known to the agent. .
Here α i , the agent activation score, denotes an individual agent's intrinsic threshold for health maintenance activities, resource score β t i,j denotes an agent's dynamic level of knowledge of the particular resource and its associated benefits, γ j is a measure of resource inertia, the inherent di iculty associated with performing an activity at a resource, and δ t i,j represents the distance threshold between an individual agent and the location for a resource/activity. An agent thus chooses to perform a health related activity, Decision A, only when their activation threshold α i is exceeded by a function of their perceived characteristics of resource j at time t.
. Agent activation score values (α i ) are derived from Table in Skolasky et al. ( ), stochastically matched to individual agents via demographic characteristics. An agent with a high activation score implies a higher level of di iculty in performing tasks, and ceteris paribus, is therefore less likely to perform health related activities. The resource score (β t i,j ) is dynamic (explained in the following subsection) and evolves over time based on "dosing," a term we use in this paper to refer to an agent's exposure to information about a particular resource at a specific time. The higher the resource score, the higher the likelihood of an agent performing an activity. Delta (δ t i,j ) accounts for a "location e ect" with respect to an agent's decision -the higher the distance to an activity, the lower the likelihood of an agent performing an activity. We derive resource-specific thresholds for what we classify as low-medium-high resource distances using survey data from Garibay et al. ( ).

Information di usion .
We model the dynamic di usion of information as a function of agent interactions, the source of information dosing, and the evolution of an agent's knowledge with respect to the information. Agents following their activity schedules will find themselves geographically co-located with other agents. All co-located agents interact and exchange information about resources with some probability. We model this probabilistic information exchange as dependent on an agent's propensity to share information, and the amenability of the activity itself to information sharing, e.g., the propensity to share information while an agent's activity is "socializing and communicating with others" will be higher than an activity like "doing aerobics." We associate a scaled propensity score for di erent types of activities representing the di erent likelihoods of an agent sharing information with other agents who are co-located at that time -none (e.g., sleeping), low (e.g., doing aerobics), medium (e.g., grocery shopping), and high (e.g., socializing and communicating with others). The propensity scores and their mapping to the activity types were obtained from expert surveys.
. Di erent sources for information dosing are considered in our model (to account for the relative trustworthiness of the source of information) and are denoted by a set X ∈ {Doctor, Nurse, PSR, Use, Peer}. x ∈ (0, 1), x ∈ X is a dosing parameter representing the e ect of each dosing source's relative trustworthiness on an agent's β. n t ij = 1 denotes the instance of information dosing for agent i about resource j, at time t by some source x ∈ X. An agent's evolving level of knowledge for a particular resource and its associated benefits is described through the time evolution of β t i,j , given by Equation , where λ is a decay parameter over time, representing the e ect of a resource receding from an agent's attention, possibly replaced with knowledge about other resources, and x ∈ (0, 1), the dosing parameter for the dosing source's e ect on recall, and n t ij = 1 denotes the instance of information dosing for agent i about resource j, at time t by a dosing source X.
Initialization of β follows from a function f β (I), which considers the implicit assumption that an agent (at time ) knows between and resources in a near ( -mile radius) distance around their home location, and to resources each in the medium (less than miles) and far (more than miles) distances. Each of these initial known resources has a β score equal to κ, while all other resources are unknown to the agent, hence having an e ective β score of . The CRx intervention was modeled by replicating the original algorithms that generated an HRx based on patient demographics, home address, health and social conditions, and preferred language (Kaligotla et al. ).

CRx model implementation .
The CRx model is implemented in C++ using the Repast for High Performance Computing (Repast HPC) (Collier & North ) and the Chicago Social Interaction Model (chiSIM) (Macal et al. ) toolkits. Repast HPC is a C++-based agent-based model framework for implementing distributed ABMs using MPI (Message Passing Interface). chiSIM, built on Repast HPC, is a framework for implementing models that simulate the mixing of a synthetic population. chiSIM itself is a generalization of a model of community associated methicillin-resistant Staphylococcus aureus (CA-MRSA) (Macal et al. ). In the chiSIM-based CRx model, each agent in the simulated population resides in a place (a household). Places are created on a process and remain there. Agents move among the processes according to their activity profiles. When an agent selects a next place to move to, the agent may stay on its current process, or it may have to move to another process if its next place is not on the agent's current process. A load-balancing algorithm is applied to the synthetic population to create an e icient distribution of agents and places, minimizing this computationally expensive cross-process movement of agents and balancing the number of agents on each process (Collier et al. ). The CRx model and the workflow code used to implement the parameter space characterization experiments are publicly available (at the following URL: https://github.com/jozik/community-rx).

Model Exploration using Active Learning
. Model calibration refers to the process of fitting simulation model output to observed empirical data by identifying values for the set of calibration parameters (Kennedy & O'Hagan ). We note that we use the term model exploration to refer to the family of approaches used for characterizing model parameter spaces, including model calibration. .
Model calibration techniques described in the extant literature generally fall into two approaches: direct calibration methods and model-based methods (Xu ). Direct calibration methods generally utilize direct search methods, e.g., stochastic approximation (Yuan et al. ) or discrete optimization (Xu et al. ), to explore the parameter space and identify calibration values that minimize di erences between model outputs and empirical observations. Model-based methods, including Bayesian calibration methods, make use of surrogate models, e.g., Gaussian process (Kennedy & O'Hagan ) or stochastic equilibrium models (Flötteröd et al. ), to combine empirical observations with prior knowledge, and to obtain a posterior distribution on a model's calibration parameters. Xu ( ) provides an overview of the merits and limitations of each of these approaches. Baker et al. ( ) provides a comprehensive review of Gaussian process surrogate models used for calibrating stochastic simulators. The method described in this paper falls into sequential model-based approaches. .
The increasing availability of computational resources has emboldened advances in the exploration and calibration of simulation models using new approaches. Han et al. ( ), for instance, introduce a statistical methodology for simultaneously determining empirically observable and unobservable parameters in settings where data are available. Reuillon et al. ( ) also describes an automated calibration process, computing the e ect of each calibration parameter on overall agent-based model behavior, independently from others. Particularly relevant to large scale parameter space exploration, Lamperti et al. ( ) describes a combination of machine learning and adaptive sampling to build a surrogate meta-model that combines model simulation with output analysis to calibrate their ABM. Another multi-method example of large-scale parameter space exploration is the use of dimension reduction techniques, common in climate science. Higdon et al. ( ), for instance, describe the combination of dimension reduction techniques and regression models to enable computation at scale. Also common in this class of model calibration is the use of history matching as an alternative to probabilistic matching (Williamson et al. ). Like traditional calibration, history matching identifies regions of parameter space that result in acceptable matches between simulation output and empirically observed data. This method has been used to calibrate complex simulators across domains -oil reservoir modeling (Craig et al. ), epidemiology (Andrianakis et al. ), and climate modeling (Edwards et al. ; Holden et al. ) .
Our primary goal in this paper is to characterize the parameter space of the CRx ABM, i.e., identify parameter value combinations where model output is potentially compatible with empirical data. The model exploration techniques described in this section partition the parameter space into potentially viable and non-viable regions. This approach has the benefit of allowing subsequent analyses to focus only on a subset of the parameter space that is likely to yield empirically representative behaviors. .
The stepwise tightening of viability constraints is not unique to our work but is a common ingredient in sequential parameter sampling algorithms such as sequential Approximate Bayesian Computing (ABC) approaches (Hartig et al. ; Holden et al. ; Rutter et al. ). Holden et al. ( ), for instance, describes an approach where history matching is initially performed to rule out regions of space that are implausible, as sequential Approximate Bayesian Computing (ABC) approaches without the use of history matching are computationally costly. The approach we describe in this Section is similarly the first step in validating our model while keeping within finite computational budgets. In the following paragraphs, we describe the empirical targets for the output of the CRx ABM, define the parameter space of the model, and then introduce an AL workflow to characterize the model's parameter space.

Empirical targets for model output .
To identify viable outputs from the CRx model, we use empirical data for total weekly visit volumes to select clinics over three years for patients age years and older, obtained from Lindau et al. ( ). To control for the natural variation in the observed weekly clinic visit volumes, we manually select stable time regions (periods without visible ramp up or ramp down variations in weekly visit volumes) for each of the clinics, from which we measure the visit statistics (mean and standard deviation) to serve as the target outputs for the same clinics in the CRx ABM. Figure depicts the observed empirical visit volumes for one exemplar clinic, showing the selected stable region. Manual selection of stable regions allows us to avoid tail events, which are outside the scope of our model, as are the seasonality of clinic visits. Table details the mean and standard deviation for stable weekly visit volumes for the clinics of interest, and represent the targeted outputs for the CRx ABM. The clinics were selected from the set of CRx model clinics C, based on the availability of empirical data from Lindau et al. ( ), a er ignoring clinics attached to schools (which have a significant population under and thus out of our model scope), clinics with too few weekly visits to obtain robust statistics (< 20), and emergency rooms (which included patient visits for reasons other than health maintenance, and thus were out of scope).
. The objective function used to characterize the model output calculates the z score (z = x−µ σ ) from the modelgenerated mean clinic visits for each of the clinics during the third week of the model run, against the empirically observed weekly mean and standard deviation (µ, σ) of visits for these clinics (as specified in Table ). The model exhibited stable behavior by week (steady-state output seen in the number of weekly visits to the selected clinics) and, thus, we calculate the z score using data from that period. Considering week instead of a later week, also allowed us to fit in more model evaluations within the same computational budget.  . A threshold condition was defined using the z score within which the model outputs are deemed to adequately resemble empirically observed weekly clinic volumes. The threshold condition used was a z score for each of the clinics in the range of + and -for the third week of the model run, i.e., −4 ≥ x t i −µi σi ≥ +4 where x is the total number of agents who visited clinic i (Clinic_Code in Table ) in week t = 3, and (µ i , σ i ) are the empirically observed mean and standard deviation for clinic i, as detailed in Table . Note that while empirical observations in Table imply that visits to some of these clinics would individually result in a z score within (-,+ ), it, however, does not satisfy our threshold condition. The implemented threshold condition necessitates that all clinics individually satisfy the z score threshold condition. Figure in Appendix D shows the weekly visit numbers across all clinics across all potentially viable parameter combinations; it is observed that while a low number of visits (< ) account for approximately % of observations, there are no zero visits during the week for any of the clinics despite the use of the z-score threshold of +/-. Figure shows the marginal distributions for the z-scores for each of the clinics across all potentially viable points (refer Figure in Appendix D for marginal distributions of actual visit numbers recorded by the simulation for each of the clinics across all potentially viable parameterizations.) Given our stated goal of characterizing the parameter space into potentially viable and non-viable regions as the initial part of a stepwise tightening of viability constraints, this threshold condition results in a restricted parameter space for subsequent, more constrained model calibration. Appendix D includes additional discussions on the sensitivity of the z score threshold on potentially viable parameterizations, as well as the stepwise tightening of viability constraints.

CRx ABM parameter space and discretization .
Clinic visits by agents within the CRx ABM occur through the decision calculus described in paragraph . . The implementation of the CRx ABM has several parameters that have the potential to influence agents' clinic visits (refer to Table in the appendix for a complete list of parameters used in the CRx ABM). For this study, we subselected the five most potentially influential parameters in Table , based on their potential to directly a ect an agent's decision calculus in Equations and , and for which we could not directly infer values from the extant Figure : Histograms of Z-score distribution for each of the clinics across all viable points.

Parameter Description Min Max Increment
dosing.decay rate of knowledge attrition 0.9910 0.9994 0.0006 dosing.peer dosing from network peer 0.8000 0.9500 0.1000 gamma.med resource inertia of performing a moderate level of activity 1.0000 3.0000 0.1430 propensity.multiplier multiplier applied to propensity of information sharing 0.5000 1.5000 0.0714 delta.multiplier multiplier applied to distance threshold δ l , δm, δ h 0.5000 1.5000 0.0714 . Given our inability to infer parameter values from empirical data, we utilize an AL model exploration method described in the following sections to identify potentially viable parameter combinations. The AL algorithm that we implemented utilizes a discretization of the parameter space. The algorithm selects from unique parameter points (combinations of parameter values across all dimensions) for evaluation and prediction. We selected discrete points along each parameter dimension to provide a su icient level of granularity with respect to the model output targets and to be able to show uncertainty gradients between the non-viable and potentially viable regions while keeping within an easily manageable memory footprint for the AL algorithm. For each combination of the five parameters in Table (which represent a unique point in parameter space), the CRx model produces weekly clinic visits for each of the simulated weeks.

Adaptive sampling methods for ME .
Given the targets for model output and threshold condition for determining potentially viable regions for model exploration, the challenge becomes one of computational feasibility. As described earlier, the CRx ABM parameter space consists of 15 5 points (each point corresponding to a unique combination of parameter values) across dimensions. Characterizing this parameter space into potentially viable and non-viable regions thus presents a computational challenge -evaluating , points by brute force methods or producing a su iciently dense covering of the space is computationally prohibitive. Instead, we strategically sample the parameter space by selectively applying computational budgets to more important regions.

.
There do exist multiple sampling methods to deal with large parameter spaces, each, however, present their own problems. Grid search methods (dividing the parameter space into equivalent grids), while naturally par-allel, su er from evaluations being made independent of each other, which in high-dimensional parameter spaces can result in high computational costs without corresponding information value. Bergstra & Bengio ( ), for instance, provide empirical and theoretical proof that even random selection is superior to grid selection in high-parameter optimization. Random search, however, is generally not as good as the sequential combination of manual and grid search (Larochelle et al. ) or other adaptive methods (Bergstra & Bengio ). .
Other commonly used a priori sampling methods include orthogonal sampling (Garcia ) and Latin Hypercube Sampling (LHS) (Tang ). While both methods ensure that sampled points (for evaluation) are representative of overall variability, orthogonal sampling ensures that each subspace is evenly sampled. However, these methods also su er in high-dimensional spaces, where checking parameter combinations and interactions across all dimensions are computationally costly. The challenge of computational feasibility in large parameter spaces o en results in computational compromises, like restricting the fidelity, size or scale of the model, or limiting the types of computational experiments that are undertaken. .
An adaptive approach to ME is thus necessary for models with large multi-dimensional parameter spaces, like the CRx ABM. While a number of di erent adaptive methods, including evolutionary algorithms and ABC could potentially be used (Ozik et al. ), an AL approach, described in the next section, maps naturally to the problem. This approach is generalizable and, as we demonstrate in this paper, computationally feasible with new approaches to HPC-scale techniques.
Active learning for ME . We use an AL (Settles ) approach to characterize the large parameter space of the CRx ABM, using metamodels (Cevik et al. ). The AL method combines the adaptive design of experiments (Jin et al. ) and machine learning, to iteratively sample and characterize the parameter space of the model (Xu et al. ). In this work, we use the EMEWS framework (described later in . ), to integrate an R-based AL algorithm, enabling its application at HPC scales, as reported in Wozniak et al. ( ). .
We use an uncertainty sampling strategy in our AL approach. We employ a random forest classifier on already evaluated points and choose subsequent samples close to the classification boundary, i.e., where the uncertainty between classes is maximal, to exploit the information of the classifier. With the availability of concurrency on HPC systems, samples at each iteration of the AL procedure are batch collected (and evaluated) in parallel. Candidate points are first clustered, and an individual point is chosen from this cluster. This approach decreases the overlap in reducing classification uncertainty and ensures a level of diversity in the sampled points (Xu et al. ). Each sampling iteration also includes randomly sampled points, striking a balance between the exploitation of information that the classifier provides with an exploration of the parameter space to prevent an incomplete meta-model due to premature convergence. .
The pseudo-code for our AL algorithm is shown in Figure . Parallel evaluations of the objective function F (), the CRx model simulation, are performed in lines and over a sampling of parameter space. At each iteration, the sampled results are fed into the random forest classifier R (lines and ). At the end of the workflow, the final meta-model predictions are generated for the unevaluated parts of the parameter space.

EMEWS (Extreme-scale Model Exploration with Swi )
. As ABMs improve their ability to model complex processes and interventions, while also incorporating increasingly disparate data sources, there is a concurrent need to develop methods for robustly characterizing model behaviors. The EMEWS framework (Ozik et al. ) was a response to the ubiquitous need for better approaches to large-scale model exploration on HPC resources. EMEWS, built on top of the Swi /T parallel scripting language (Wozniak et al. ) enables the user to directly plugin: ) native-coded models (i.e., without the need to port or re-code) and ) existing ME algorithms implementing potentially complex iterating logic (e.g., Active Learning (Settles )). The models can be implemented as external applications accessed directly by Swi /T (for fast invocation), which is the method that was used in this work. However, applications can also be invoked via command-line executables, or Python, R, Julia, JVM, and other language applications via Swi /T Figure : Pseudo-code of implemented AL algorithm multi-language scripting capabilities. ME algorithms can be expressed in the popular data analytics languages Python and R. We implemented an R-based Active Learning ME algorithm for this work. Despite its flexibility, an EMEWS workflow is highly performant and scalable to the largest cutting-edge HPC resources. This large-scale computation capability provided us the ability to e iciently characterize the parameter space of the CRx ABM and discover parameter space regions compatible with empirical data.

Parallelization to enable model exploration .
The CRx model is a computationally expensive application, primarily due to the interactions between the large number of agents and places. At each time step, every agent in each place can potentially share information with every other person in that place. In the absence of any parallelization, this entails iterating over each place in the model in order to iterate over all the agents in that place so that each person can share information with every other person in that place. In order to mitigate this expense and reduce the run time such that adaptive model exploration was feasible, we parallelized the model both across and within compute cores.
The model was parallelized across compute processes using the chiSIM framework (Macal et al. ). chiSIMbased models such as CRx are MPI applications in which the agent population and the agents' potential locations are distributed across compute processes, that is, across MPI ranks. Each process corresponds to an MPI rank and is identified with a rank id. Agents move across ranks while places remain fixed to a particular rank. A chiSIM model's time step consists, at the very least, of each agent determining where (i.e., what place) it moves to next and then moving to that place, possibly moving between process ranks in doing so. Once in that place, some model specific co-location dependent behavior occurs, for example, disease infection dynamics, or in the case of CRx, information sharing. The model code runs in parallel across processes and thus the iteration over each of the , places in the model described above becomes n number of parallel iterations over p number of places, where n is the number of process ranks, and p is the number of places on that rank. .
The success of this cross-process distribution is dependent on how well the process loads are balanced. Too many places or agents on a process results in a longer per time step run time for that process and other processes sitting idle while that slower process finishes. In addition, we also want to minimize the potentially expensive cross-process movement that occurs when agents move between places on di erent process ranks. Our load balancing strategy described in detail in Collier et al. ( ), uses the Metis (Karypis & Kumar ) graph partitioning so ware to allocate ranks to processes, balancing the number of agents per rank while minimizing cross-process movement. The movement of agents between places can be conceptualized as a network where each place is vertex in the network and where an edge between two places is created by agent movement between those two vertices. It is this network that Metis partitions. In previous work, (Macal et al. ) and (Ozik et al. ), this movement was dictated by relatively static agent schedules and the network was created directly from those schedules without having to run the model itself. In the CRx case, agents' movement schedules are selected at random every simulated day from a demographically appropriate set of schedules.
In the absence of a static schedule, we ran the model for simulated days in order to capture the stochastic variation in agent movement, logging all individual agent movements between places. These data were then used to create the weighted place-to-place network for Metis to partition using the strategy outlined in Collier et al. ( ). .
We experimented with distributing the model over di erent numbers of ranks -, , , , and -running the model for a simulated week and recording the run time to execute each simulated day. These experiments were performed on Bebop, an HPC cluster managed by the Laboratory Computing Resource Center at Argonne National Laboratory (further details on the Bebop cluster are given in paragraph . ). The results can be seen in Figure . Performance does not scale linearly, which is to be expected given the overhead of moving agents between process ranks. The intention here is to achieve good enough performance such that the model exploration can be run within the maximum job run time, memory, and node count constraints of our target HPC machines. In that respect, the or rank configurations are su icient. We utilize the rank configuration for the AL workflow described below. .
We were also interested in how the model distribution related to the geographic distribution of places across process ranks. This was prompted by previous attempts in the literature to distribute ABMs geographically (Lettieri et al. ), where it was determined that geographically based model distribution presented di iculties in achieving performance gains based on the uneven density of agent populations and, as a result, agent activities. In Figure we show all places across Chicago, including households, schools, workplaces, and services, assigned to each process rank in the rank configuration (while we restricted the synthetic population to zip codes, we included places across all Chicago locations). While there are di erences in the patterns observed across ranks, one cannot readily detect any process-specific geographic clustering. This observation strengthens the argument against regarding geographic extents as natural partitions for load balancing when complex travel patterns across geographic regions are involved. However, we do know that the information on the HRxs distributed to our agents is based on geographic proximity to the agent household location. Hence, we could expect that, through geographically local information exchanges between agents engaging in geographically local activities at service providers, the information about service providers could retain some locality. In fact, we do see a correspondence between the geographic location of a service and its process rank in Figure , using the rank configuration for clarity. This suggests that the location of services and the information flow about them produce correlations in agent movement between them that, in turn, create useful partitions for model load balancing. Figure : Locations of households, schools, workplaces, and service providers across all Chicago locations, assigned to each process rank in the rank load balancing configuration. Figure : Geographic locations of service providers colored by their assigned process rank in the rank load balancing configuration.

Intra-process parallelization
. In addition to parallelizing the model by distributing it across processes, we also experimented with intraprocess parallelization by multi-threading the model using the OpenMP (Dagum & Menon ) application programming interface. In distributing the model across processes, we e ectively reduced the number of agents and places on each process. And thus, the loop iteration through all the places in the model that occurs every time step was that much smaller (i.e., faster) and occurred in parallel. Intra-process parallelization with threads used the openMP pragma directive #pragma omp parallel for to iterate through this loop in parallel. An omp parallel for essentially divides the range spanned by the loop into some number of sub-ranges, each sub-range then executes the code within the loop on a separate parallel thread.

.
However, this code executed in the loop iteration discussed above contains many random draws from a shared random stream. In a multi-threaded context, this could, and in fact most certainly would, lead to race conditions, stream corruption and application crashes, for example, when one thread is drawing from the stream while another is simultaneously updating the stream a er its random draw. There are sophisticated strategies for implementing parallel random streams in simulations (see for example Freeth et al. ( )), but we implemented a simpler solution. OpenMP provides a function call to retrieve a running thread's unique numeric id. On model initialization we created a number of random streams equal to the number of threads and associated each with a thread's unique id, allowing each thread to retrieve its own random number stream by thread id. .
We ran the multi-threaded version of the model, again distributed over di erent numbers of ranks, setting the number of threads to , and using two di erent threading schedule directives: static (the default) and dynamic. With the static directive each thread is assigned a chunk of the total number of iterations in a fixed fashion and iterations are divided equally among threads. With the dynamic directive, each thread is assigned an iteration when that thread becomes available for work. Six threads were chosen as the best tradeo between how many cores to allocate to a process for individual threads and how many processes to pack into a HPC node (a collection of cores). The results of these runs can be seen in Figure . .
In all of the cases, we can see that the dynamic directive runs are the fastest for the same number of ranks and that the di erence between static and dynamic run times is especially prominent in those runs that were distributed over smaller number of ranks. This di erence is likely due to the non-uniform distribution of agents among places. For example, a typical household may have or so agents in that household, while a larger workplace or clinic may have much more. Given that the length of time it takes to iterate through all the places assigned to a particular thread is dependent on the number of agents in those places, a thread that contains places with larger numbers of agents takes longer to complete when compared to a thread with places with fewer numbers of people. The dynamic directive naturally load balances the place iteration by assigning a place to a thread when a thread is available for work. Consequently, while places that require a longer runtime will occupy a thread for a while, other threads are still capable of accepting new places, resulting in a more even Figure : Threaded Model Run Times for a Simulated Week spread of work across all the threads. The drawback of the dynamic directive is that the order of random draws is now no longer predictable but rather dependent on run time and the threading library, and as a result, we cannot be sure that runs using the same random seed will necessarily produce the same results. We hope to examine this issue of run reproducibility under di erent threading regimes (static, dynamic, and others) in future work.

AL Results
. In this section, we first evaluate the relative importance of model parameters, describe the evolution of the AL algorithm across iterations, present the parameter space characterization results of the CRx ABM and report on the meta-model performance. The AL runs were performed on the Cray XE Beagle at the University of Chicago, hosted at Argonne National Laboratory, and on the Bebop cluster, managed by the Laboratory Computing Resource Center at Argonne National Laboratory. Beagle has nodes, each with two AMD Operton processors, each having cores, for a total of cores per node. Each node has GB of RAM. Bebop has nodes comprised of Intel Broadwell processors with cores per node and GB of RAM and Intel Knights Landing processors with cores per node and GB of RAM. Additional development was done on the Midway cluster managed by the Research Computing Center at the University of Chicago. Midway has nodes, each with cores and GB of memory. The AL parameter space characterization runs were executed in rounds. The first round consisted of an LHS sweep used to seed the first AL round (see Section . ), followed by AL rounds. The second and third AL rounds were restarted from the serialized final state of the AL algorithm from the previous round. Each of the rounds was run on nodes with computational processes per node. The LHS sweep ran for hours. AL Rounds and ran for hours, and the rd AL round for hours for a total of hours ( , , total core hours) for all rounds. Each model run was distributed over processes and was allocated threads per process using the static threading directive. The combination of inter-and intra-node parallelism provides an estimated -fold runtime speedup from the base undistributed and unthreaded model.

Parameter importance .
The random forest classifier is an ensemble of decision trees, where each tree is trained on a subset of the data and votes on the classification of each observation, variable importance can be calculated from the characteristics of decision trees. Two commonly used measures for importance are: the mean accuracy decrease, which is a measure of mean classification error, and the mean decrease in Gini, which is a measure of the weighted average of a variable's total decrease in node impurity (which translates into a particular predictor variable's role in partitioning the data into the defined classes). A higher Gini decrease value indicates higher variable importance and vice versa. Table shows an analysis based on these two metrics for our results. .
Based on these results we assign the following designations in order of relative parameter importance: d1 := gamma.med, d2 := delta.multiplier, d3 := propensity.multiplier, d4 := dosing.decay and d5 := dosing.peer. We use this relative ordering of parametric importance to organize the visualizations of our results in the following sections.  Characterization of CRx ABM parameter space with active learning .
The AL algorithm was seeded with the results of evaluations where the sampled points (to evaluate) were selected via a Latin Hypercube Sampling. At each iteration step, new points are evaluated, points close to the classification boundary (exploitation) and randomly sampled points (exploration); the class (i.e., potential viability or non-viability) of each parameter point is determined using its z-score as previously described (to statistically match the empirically observed clinic visits). Following the evaluations, the random forest model is updated with this new class information and generates predictions for out-of-sample (non-evaluated) points across the remaining parameter space. Following iterations, the AL algorithm evaluated a total of unique points. .
The characterization (evaluated points and predictions for out-of-sample points) of this multi-dimensional parameter space following the AL workflow a er iterations is shown in Figure .  . Evaluated points are shown in red/green dots indicating non-viable / potentially viable points. Out of sample predictions are shaded as blue (non-viable), orange (potentially viable), and black (equiprobable), where the color gradient between blue to black to orange denotes the varying probability of classification.

Evolution of parameter space characterization across AL iterations .
Figure shows the progression of the AL algorithm for the subset of panels shown in Figure within the dashed bounding box, demonstrating the evolution of evaluated points and the corresponding random forest model predictions for out-of-sample points. We see in Figure that as the AL progresses across the four sets of panels (representing almost-equally spaced iterations , , and ), the initial prediction boundary (black regions) is gradually refined as additional points are evaluated across the iterations, while the meta-model prediction for the rest of the parameter space (blue/orange regions depicting the non-viable / potentially viable predictions) is clarified with less uncertainty in the model prediction.
. Next we consider the panel labeled A in the set of panels corresponding to iteration in Figure -shown in Figure , across iterations , , and . At iteration and in Figure , while no point was yet positively evaluated, we still see an evolution of the meta-model predictions for out-of-sample regions. There is a positive evaluation by iteration (green point with yellow halo), and we observe that as a result of the positive evaluation, the meta-model prediction for the potentially viable regions (orange) in the parameter space around this point has been refined with increased certainty (increased color intensity). Figure features the individual panel labeled B in Figure , also across iterations , , and . We see new points in iteration and iteration evaluated as non-viable (red dots with yellow halo). It is also observed that regions of parameter space with high uncertainty (black regions) decrease in width and area, sharpening the distinction between non-viable and potentially viable (blue/orange) regions. Figure thus shows reduced uncertainty about the non-viable regions following a negative evaluation.
Parameter space evaluation and meta-model performance .
The AL algorithm evaluated a total of unique points in the discretized parameter space (out of a total of , possible points) a er iterations. Of these, points were classified as potentially viable, and are ). Red/green dots indicate evaluated (non-viable/potentially viable) points in parameter space, and blue/orange regions correspond to out-of-sample predictions for non-viable/potentially viable regions while black represents equiprobable prediction. presented in Figure , with point sizes indicating the number of unique points at each collapsed -dimensional space point. One aspect to note is the strong correlation exhibited between parameters d and d and the sharp boundary within the parameter space that this produces. This observation shows an inverse relationship between the parameterization of resource inertia for moderate activities (d ) and the multiplier applied to the distance threshold (d ). This observed relationship suggests that this interplay between activity di iculty and distance to a resource is the primary driver within the set of five selected model parameters in determining resource use at the population level. Table lists all points. .
To analyze the performance of the random forest meta-model in classifying out-of-sample (non-evaluated) parameter space, we train the random forest algorithm on the evaluated points. Using a -fold cross-validation training method, we determine the Positive Predictive Value (PPV) measure of our meta-model prediction (Altman & Bland a,b). PPV is the proportion of true positives to the total positive classifications, i.e., P P V = True Positive True Positive+False Positive . As PPV increases, we increase our certainty that a point classified as X1 is a True Positive. .
The classification of a point in the parameter space is a mapping function f : s → {X0, X1}, of a probability score s assigned by the trained meta-model, to a classification class (X0 or X1), based on a threshold measure (Lipton et al. ) p t , such that f = X1 for s ≥ p t , X0 otherwise. Table shows the PPV against di erent thresholds as well as the expected number of X1 classified points in the non-evaluated parameter space, for each threshold. We observe that higher thresholds result in higher PPV, but with smaller total numbers of points classified as X1. As the threshold increases, the meta-model generates more False Negatives in exchange for a higher rate of True Positives. We chose a threshold of . for the meta-model.

.
As mentioned earlier, the random forest model predicted potentially viable points from the out-of-sample region (the remaining , non-evaluated points) with an associated probability, ranging from . (non-viable) Figure : Progression of the AL workflow across a x slice of the parameter space described in Figure ,  ). Red/green dots indicate points in the parameter space that were evaluated as non-viable/potentially viable points. Yellow halos indicate newly selected points for evaluation since the previous panel iteration. Blue/orange regions correspond to out-of-sample meta-model predictions for evaluating the rest of the parameter space into non-viable/ potentially viable regions. Black regions indicate uncertainty between class predictions. potentially viable points are predicted from the out-of-sample region when considering a . threshold probability; these are shown in Figure . .
The set of potentially viable points (Figure ) and , predicted potentially viable points (Figure ) identifies the region in the parameter space of the CRx model where further investigations are warranted. That is, the main contribution of these points is in characterizing the rest of the parameter space as non-viable, thereby avoiding the expenditure of unnecessary computing resources on other regions, i.e., those not likely to yield JASSS, ( ) , http://jasss.soc.surrey.ac.uk/ / / .html Doi: . /jasss.  empirically corresponding model outputs. These subsequent analyses can use, e.g., stepwise tightening of viability thresholds or surrogate model-based optimization approaches to yield model outputs that are better aligned with empirical observations. As the model calibration is improved, the CRx ABM can increasingly be applied to CRx intervention scenarios as a computational laboratory to conduct in silico experimentation and drive implementation policy (for instance, identifying the most e ective delivery mode, or ideal number of clinics to e ect an increase in resource recall).

Conclusion
. As techniques for building ABMs become more sophisticated and the ABMs become more complex, it has become increasingly di icult to produce trusted tools that can be used to aid in decision and policy making. This is partly due to the computational expense of running large (e.g., urban-scale) models and partly due to the size of the model parameter spaces that need to be explored to obtain a robust understanding of the possible model behaviors. These two issues exacerbate each other when iterative parameter space sampling techniques are utilized to strategically sample the parameter spaces since iterative algorithms require at least some form of sequential evaluation of potentially long-running simulations. .
In this work, we presented an urban-scale distributed ABM of information di usion built using the Repast HPC and chiSIM frameworks. Describing the application of a large-scale AL method, we characterize the parameter space of the CRx ABM to identify a sampling of potentially viable points, while characterization of the rest of the parameter space as non-viable. In the process, we developed intra-node and inter-node parallelization and load balancing schemes to enable an estimated -fold runtime speedup. The load balancing was done using the place-to-place agent-movement network, which produced geographic clustering of service providers, understood to be a demonstration of the local nature of resource information sharing between the agents. .
With the increased e iciency achieved through parallelization in running individual models, we were able to consider an iterative algorithm for characterizing the model parameter space. We exploited the polyglot and pluggable architecture of EMEWS framework and implemented an R-based random forest Active Learning algorithm, and ran it on nodes of a HPC cluster to calibrate the CRx ABM. The run produced potentially viable parameter combinations through direct evaluation and an additional parameter combinations inferred to be potentially viable, with an estimated % PPV, by a random forest model trained on the evaluated points. .
The parameter space characterization that was enabled by our approach represents one component of a comprehensive and iterative model development and validation cycle. Relative to the extant literature on model calibration (Section . ) and sequential sampling (Section . ), this paper presents the use of Active Learning as part of sequential model-based calibration methods, as seen in Edwards et al. ( ); Holden et al. ( ), for large scale parameter space exploration. The combination of AL and parallelization techniques (described in Section . ) enables model exploration in simulators with large parameter spaces and helps mitigate Bellman ( )'s curse of dimensionality. Another important aspect for model validation is the establishing of plausibility for the mechanisms (Edmonds et al. ) at multiple model scales, including the parameterization of agent attributes, decision making, and behaviors. The conception of the CRx ABM originated a er a discovery by ex-perts in medicine about the spread of information, that was not adequately captured, assessed, or analyzed using individual-level models. The demonstrable and empirical need for an ABM ensured a close collaboration. .
In jointly developing the CRx ABM, we have repeatedly incorporated assessments of domain experts on model design, mechanisms, and results, and, perhaps most importantly, these inputs have been sought and incorporated from the early model design stage. The CRx ABM is an additional analytic tool for public health o icials who are adopting new technologies such as geospatial statistics and predictive modeling. Potential users of these models include other academics, local and state public health departments, federal agencies such as the Centers for Disease Control and Prevention, health care systems, and health care payers. In the highly decentralized public health system of the U.S., open-source analytic tools that can capture and evaluate the interaction between service entities and community members are needed more than ever.
. One limitation of the present work is the potential bias resulting from excluding seasonality in agent behaviors to account for fluctuations in empirical clinic visit data. Furthermore, in order to simplify the scope of the parameter space characterization and produce meaningful results with the computational allocations we had access to, the five model parameters in Table were selected based on expert opinions on model parameters deemed to be most relevant to agent clinic visits. However, this process may have omitted additional parameters with significant e ects.
. For future work, the CRx ABM (with the potentially viable parameter points) can be used as a starting point on which to focus computing resources to tighten the viability thresholds and identify better-calibrated points, allowing us to use the CRx ABM to expand our investigations into the e ects of information spread on population health outcomes. These analyses may also find that structural modifications are needed to align the model with empirical data more closely. Additional model exploration algorithms can also be examined, particularly those that have the ability to incorporate stochastic noise from model replicates, e.g., sequential design with Gaussian Processes (Binois et al. ), for producing robust uncertainty estimates of model outputs. Additionally, in order to be able to include larger parameter spaces for our computational experiments, the application of dimensional reduction techniques can be investigated, e.g., active subspaces (Constantine ). Finally, we aim to continue expanding our approaches to improved model e iciency and look at reproducibility under di erent intra-node threading regimes. .
The methods we describe in this paper enable computation at HPC scale. A direct potential application of such methods is particularly relevant to public health emergencies like the COVID-pandemic, where empiricallyinformed large scale models can help aid and inform public health authorities in decision-making. Our current research e orts extending some of the methods described in this paper and in Ozik et al. ( ) involve modeling the spread of COVID-in Chicago. A subset of the authors (Macal, Ozik, Collier, Kaligotla) built the CityCOVID ABM, also based on the ChiSIM framework (Macal et al. ), which incorporates COVID-epidemiology and spread among the . Million people in the City of Chicago and across . Million geographical locations. We calibrate our model with daily reported hospitalizations and deaths. Model results are being provided to the City of Chicago and Cook County Public Health Departments as well as the Illinois Governor's COVID-Modeling Task Force. Our work contributes to the growing need for empirically-informed models using granular, local, and time-sensitive data to support decision-making during and in advance of public health and other crises. See, e.g., Rutter et al. ( ) for a description of the benefits of employing a stepwise narrowing of thresholds approach instead of narrower initial thresholds.

Notes
While this discretization was chosen for the current work, it is possible that either a less or a more granular representation could provide useful insights as well. Investigations of optimal discretization for the CRx model is a possible future area of work.

Model Documentation
A comprehensive model description of the CRx ABM is described in Kaligotla et al. ( ). The CRx ABM model is implemented using the Repast for High Performance Computing (Repast HPC) (Collier & North ) (Macal et al. ) toolkits. The CRx ABM model code and the workflow code used to implement the parameter space characterization experiments are publicly available (at the following URL: https://github.com/jozik/community-rx).

Acknowledgments
Research reported in this publication was supported by the National Institute on Aging of the National Institutes of Health R AG (ST Lindau., PI). This content is solely the responsibility of the authors and does not necessarily represent the o icial views of the National Institutes of Health's National Institute on Aging. This work is also supported by the U.S. Department of Energy under contract number DE-AC -CH . This work was completed in part with resources provided by the Research Computing Center at the University of Chicago (the Midway cluster), the Laboratory Computing Resource Center at Argonne National Laboratory (the Bebop cluster), and the University of Chicago (the Beagle supercomputer). Distance threshold (low,med,high) δ l = 1; δ m = 2; δ h = 3 delta.multiplier multiplier applied to distance threshold ∈ (0.5, . . . , 1.5) in increments of (0.0714) gamma.low (γ low ) resource inertia of performing an easy activity = 1 gamma.med (γ med ) resource inertia of performing a moderate activity ∈ (1, . . . , 3) in increments of (0.143) gamma.high (γ high ) resource inertia of performing a di icult activity = 3 propensity (pr none , pr l , pr m , pr h ) propensity of information sharing (none,low,medium,high) pr none = 0.0001, pr l = 0.005, pr m = 0.025, pr h = 0.01

Conflicts of Interest
propensity.multiplier multiplier applied to p.score ∈ (0.5, . . . , 1.5) in increments of (0.0714) dosing.decay (         A note on Z-score sensitivity and potentially viable parameterizations: We analyze the sensitivity of potentially viable points to di erent z-score thresholds, under current settings. Figure shows the number of viable parameter points as the z-score threshold is reduced. Note, however, that this set is a result of the +/threshold setting for the Active Learning algorithm from the first run of the simulation. If the threshold condition were initially tighter (say +/-or +/-,) we expect the resulting set of potentially viable points to be di erent. Figure below represents a set of points from which fall within z=+/ and z=+/-range. Figure : Sensitivity of the total number of viable parameterizations to z-score threshold.
A note on sensitivity of z-score thresholds: Another approach to a tighter fit (reduced z-score thresholds) between simulation output and empirical data in our simulation could be to consider a more selective subset of clinics used for history matching. E.g., dropping clinic from consideration (where visits result in a z-score between +/-,) results in points at z +/-and points at z +/-. The sensitivity of the number of viable points to di erent z-score criteria under this exemplar restricted scenario is shown below in Figure   A note on the stepwise tightening of viability constraints: The aim of our paper is a methodological case study in model exploration techniques, and our approach represents the first step in a stepwise tightening towards identifying parameter value combinations where model output is potentially compatible with empirical data. The model exploration techniques we use to partition the parameter space into potentially viable and non-viable regions is methodologically similar to history matching as described in Holden et al. ( ), where non-viable regions are selectively removed in expensive simulators. Our stepwise tightening approach is driven by the goal of reducing computational costs compared to using standalone approaches like ABC.