Network Meta-Metrics: Using Evolutionary Computation to Identify Effective Indicators of Epidemiological Vulnerability in a Livestock Production System Model

: We developed an agent-based susceptible / infective model which simulates disease incursions in the hog production chain networks of three U.S. states. Agent parameters, contact network data, and epidemiological spread patterns are output after each model run. Key network metrics are then calculated, some of which pertain to overall network structure, and others to each node’s positionality within the network. We run statistical tests to evaluate the extent to which each network metric predicts epidemiological vulnerability, finding significant correlations in some cases, but no individual metric that serves as a reliable risk indicator. To investigate the complex interactions between network structure and node positionality, we use a genetic programming (GP) algorithm to search for mathematical equations describing combinations of individual metrics — which we call “meta-metrics” — that may better predict vulnerability. We find that the GP solutions — the best of which combine both global and node -level metrics — are far better indicators of disease risk than any individual metric, with meta-metrics explaining up to 91% of the variability in agent vulnerability across all three study areas. We suggest that this methodology could be applied to aid livestock epidemiologists in the targeting of biosecurity interventions, and also that the meta-metric approach may be useful to study a wide range of complex network phenomena.


Introduction
. This paper reports on an experiment that leverages network analytics and evolutionary computation to identify indicators of network structure and node positionality that predict epidemiological vulnerability within simulated livestock production chains.We use an agent-based model (ABM) to generate network graphs and employ network analytical techniques, statistical analysis, and evolutionary computation to investigate the extent to which either single network metrics, or combinations of metrics, may serve as indicators of infection risk.Understanding these relationships will aid livestock production practitioners, managers, and epidemiologists in targeting interventions which may preempt the spread of socioeconomically-important diseases through livestock production networks. .
The strength of the modeling approach lies in its ability to distill real-world systems down to their core processes.However, the complex nature of biological systems has led some to question the extent to which models should be relied upon to forecast real-world disease incursions (Moss ).For example, during the United Kingdom's Foot-and-Mouth Disease (FMD) epidemic, flawed predictive models were used to inform the culling policy (Kitching et al. ).To prevent such occurrences, it is incumbent upon modelers to recognize the limitations of their chosen approaches when drawing conclusions (Bousquet et al. ; Barreteau et al. ; Garner et al. ; Klügl ).We note up-front that RUSH-PNBM in its current form is not intended as a forecasting tool, but rather to understand and quantify the interactions between network structures and disease spread dynamics within parallel real-world systems more generally.

Epidemiological agent-based models .
ABMs constitute a class of complex systems models in which the global dynamics of a system emerge as a result of many individuals' decision heuristics and resulting interaction patterns, rather than being defined from the top down.Since they o en incorporate stochasticity, ABMs can be di icult to validate, and they can also incur high computational overhead due to the quantity of calculations required (Bradhurst et al.
).On the plus side, ABMs provide several advantages over other methodologies when the goal is to discover previouslyunexplored patterns that emerge from heterogeneous behavioral patterns, environmental factors, and especially when agent behaviors are a ected by the state of other agents and/or a changing environment (Auchincloss & Diez Roux ; Parker & Epstein ; An ; Shi et al. ; Kaul & Ventikos ).Used as "virtual laboratories," ABMs can unveil insights into the inter-agent interaction patterns that underpin macro-level results (Macal & North ).
. ABMs focused on disease spread typically include two main components: within-host progression and betweenhost transmission (Hunter et al. ).Within-host progression describes the process of a pathogen infecting a given host, running its course, and eventually dying out.A common means of modeling this is the SIR framework, in which susceptible (S) agents may contract a disease and become infective; infective (I) agents can transmit the disease to others; and removed/recovered (R) agents are conceived of as either dead, or having acquired immunity to the disease (Anderson & May ). SI, a common variant, allows for repeated reinfections.In general, parameters including infection probability and average infection duration mediate the dynamics of within-host progression. .
Between-host transmission occurs probabilistically when a susceptible agent comes into contact with an infective one.Agents enter into contact either at a rate governed by a di erential equation, or in the case of ABMs according to the scheduling of their individual decision heuristics.Transmission probabilities may also be heterogeneous, depending on factors such as inter-agent distance, di ering agent parameters, or network positionality.Modeling the interactions between a network of agents permits the simulation of realistic betweenhost paths, as well as subsequent analysis of individual agent vulnerabilities (Barrett et al. ).

Network analytics and spreading dynamics
. Network analytic techniques have provided insights into spreading behavior in a wide range of contexts, including information spread, di usion of innovations, disease transmission, and other phenomena.Researchers compare and contrast spreading on di erent typologies of algorithmically-generated networks (i.e.random graphs, small-world, scale-free, etc.), as well as analyzing complex graphs constructed from real-world datasets and computer simulations (Wasserman & Faust ).
. The application of network science to epidemics has received considerable attention due to the ability of network models to simultaneously depict the global structure of a population as well as the personal interactions between individuals (Bell et al. ; Christley et al. ; El-Sayed et al. ).For example, the transmission pathways that mediate the spread of a sexually-transmitted disease will di er significantly from those of an airborne-transmitted disease, with the former characterized by interpersonal contact networks (Killworth et al. ), and the latter by global transportation patterns (Colizza et al. ).
. Many algorithms have been developed to measure and mathematically-codify aspects of network structure and node positionality (Albert & Barabási ).Owing largely to continual advancements in this network-analytical toolkit, researchers have increasingly worked to identify network metrics that correlate with spreading dynamics.Below we first present research examining connections between global network structures and spreading dynamics, and second those which focus on individual node positionality within a network.We then describe how these insights have been applied to the study of epidemiological spread through livestock production networks.
The role of network structure on disease spread .
Several studies have examined connections between the overall structure of a network and the propensity with which that network promotes disease spread.Christley et al. ( ) use an SIR model to compare spreading in undirected random networks versus small-world networks (which have more heterogeneous degree distributions), finding faster spreading but ultimately fewer infected individuals in the small-world networks.The smallworld networks had greater clustering and significantly higher k-core densities.In both network typologies, nodes in the k-core were at a higher risk of infection.Comparing unweighted centrality measures, the authors find that degree centrality is about as good a predictor of a node's infection risk as random-walk, shortest-path, or farness centrality, while being simpler to compute.

.
Salathé & Jones ( ) investigate the role of community structure on spreading in networks.Community structure (a.k.a.modularity) in a graph exists when nodes are densely connected in "cohesive subgroups," with only a few bridging connections between groups.Using SIR simulations, the authors find that targeting high-degree nodes for immunization can be e ective in low-modularity graphs, but with more community structure, targeting bridges between communities -identified by betweenness or random-walk centrality -becomes more e ective. .

Kitsak et al. (
) run SIR and SI models on four large, complex real-world networks.The authors demonstrate that a node's "coreness" or "core number" -as determined by the k-shell decomposition procedure -can in many cases be a better predictor of spreading propensity than either degree or betweenness centrality.Further, in their SI model results, the boundaries of k-cores tended to determine where an infection would become systemic versus die out.This suggests that coreness interacts with traditional indicators of risk in complex ways: for example, a peripheral yet high-degree / high-betweenness node may be less infective than a lower-degree or lower-betweenness node within the core.These findings point to the need for further research into the interplay between the overall structure of a graph and the position of a node within that graph when estimating the node's infectivity.

The role of node positionality on disease spread .
Research has also focused on correlations between the positionality of individual nodes and their resultant disease risk.In an early e ort, Rothenberg et al. ( ) use survey data to build a network tracking the spread of HIV within a small community.While their results are largely inconclusive -likely owing to the very low proportion of infected individuals -they identify several important methodological insights.First, they draw a distinction between egocentric network analysis (i.e.individual risk based on network positionality) vs. sociocentic analysis (i.e.how macro-level network structures impact group-level outcomes).Second, they consider whether weighted or unweighted network metrics are better indicators of risk.It is unclear whether, for example, a person who comes into infrequent contact with many other individuals would be more or less vulnerable than a person who comes into more frequent contact with fewer individuals.Since measures of centrality, assortativity, and other metrics can be based on either weighted or unweighted degree, the distinction is important (Wasserman & Faust ). .

Bell et al. (
) calibrate an SIR model based on interview data to reflect the network structure of a small group of individuals, and use model output data from a series of runs to investigate correlations between several indicators of network positionality and two dependent variables: (a) vulnerability, or the number of times a node became infected during a run; and (b) infectivity, the number of times a node spread the infection during a run.As in Rothenberg et al. ( ), both weighted (a.k.a valued) and unweighted (a.k.a.dichotomized) versions of metrics were analyzed.The authors find that vulnerability is best predicted by unweighted versions of eigenvector centrality, information centrality, degree centrality, and in-degree prestige; whereas infectivity was most highly correlated with weighted metrics including out-degree centrality, influence centrality, degree centrality, eigenvector centrality, and power prestige.Similar studies using empirically-derived infection-spreading networks also find that multiple centrality measures correlate with infection risk (De et al. ). .

Ghani & Garnett (
) develop a stochastic SI simulation model of gonorrhea transmission through large (N = 2000) networks of social partners.The authors use nested Poisson regression models to compare the relative influence of each node's unweighted degree (k), concurrency, k at distance = 2, k at distance = 3, closeness, betweenness, and information centrality on both vulnerability and infectivity.They find that k is highly significant in all cases, and concurrency improves the model fit for both vulnerability and infectivity.Increased "local" connectivity (k at distance = 2) improves the model fit only for vulnerability, whereas indicators of centrality within the full network, including closeness and betweenness, are important in predicting infectivity.The authors conclude that vulnerability is primarily contingent upon the structure of local network neighborhoods, whereas infectivity depends more on the interactions betweeen individual behavior and global network positionality.
Application to livestock epidemiology .
In the case of livestock disease outbreaks, the networks that underlie the movement of livestock, feed, supplies, workers, and visitors are o en the ones that pathogens follow as an epidemic spreads.In addition to direct disease transmission resulting from animal movements, disease can be also transmitted indirectly via contaminated feed, fomites, and transportation vehicles (Fèvre et al. ).Network models of livestock disease spread encode the contact patterns between producers and other actors such as auction houses, animal fairs, slaughter plants, and feed mills; each of which can be analyzed as a node in the overall network. .Using an SI model-based approach, Natale et al. ( ) leverage animal traceback data to build networks of Italian cattle supply networks, and examine the influence of "seeding node" positionality upon the final extent of an epidemic, finding that the eigenvector centrality and closeness of the seeding node are strongly correlated with epidemic size.The authors find power law degree distributions for both animal shipment sizes and the number of shipments per node; characteristics of a "scale free" network with "small world" properties.These properties have also been observed in other animal movement networks (Bigras-Poulin et al. , ). .

Wiltshire (
) uses an agent-based SI model to analyze the impact of increased network density and producer specialization on the size and scale of disease outbreaks in U.S. hog production systems, finding evidence of percolation-type phase changes, with more-specialized systems becoming vulnerable to catastrophic outbreaks at lower density levels.) suggest that the fusion of egocentric and sociocentric analytics would be a valuable approach for future research.Whereas in much of the previous work in this area only a single network is analyzed, our study compares and contrasts networks from three distinct study areas, allowing us to investigate the role of both egocentric (node-level) and sociocentric (global) metrics upon a node's vulnerability.

Contributions of this study
.
Our approach to determining the impact of each metric upon vulnerability is also innovative.In much of the previous work in this area, bivariate statistical methods are utilized, with some authors using multivariate regressions to investigate the relative impact of a suite of metrics taken together (Ghani & Garnett ).Rather than relying on traditional statistical techniques, we employ evolutionary computation to search for more-complex relationships between network metrics.Using a genetic programming algorithm, we obtain a set of "metametrics" along a complexity / fitness Pareto front.We find that the GP solutions predict infection risk in our data far better than any metric taken singly, lending support for the e icacy of this methodology.

Model Description
. This section provides an overview of the basic features of the RUSH-PNBM v. ., including initialization procedures, agent interaction heuristics, the epidemiological sub-model, parameterization and validation, and sensitivity analysis.The model was built using Anylogic v. multimethod modeling so ware.For additional details, reference the RUSH-PNBM v. .ODD+D Protocol (Web Appendix).

Agents and initialization
. Three types of hog production chain network agents, identified by industry experts as critical players in the transmission of fecal-oral diseases, are represented in the model, these being (a) hog producers, (b) feed mills, and (c) slaughter plants.Producer agents are assigned one of six industry roles based on the classification system used by the USDA and other industry analysts, these being (a) Farrow to Wean, (b) Farrow to Feeder, (c) Farrow to Finish, (d) Wean to Feeder, (e) Wean to Finish, and (f) Feeder to Finish.
. The spatial extent of the model is defined by the boundaries of one of three U.S. states: (a) North Carolina, (b) Illinois, or (c) Iowa.These study areas were chosen because each is a major producer of hogs, yet key di erences exist regarding the size of their networks and distribution of industry actors, allowing us to incorporate the impact of di erences in global network structure into our analysis.
. We use the Farm Location and Agricultural Production Simulator (FLAPS) tool -which draws upon USDA Census of Agriculture data along with aerial imaging to impute realistic distributions of livestock farms within a specified U.S. region (Burdett et al. ) -to set producer agent locations and key operational parameters including industry roles and capacities.Following FLAPS initialization, each producer generates a list of potential producer trading partners, constrained by maximum distance, size similarity, and maximum number parameters. .
Livestock in the model are represented in batches (or metapopulations) of animals of the same age.Producer agents are initialized with one or several pig batches -depending on their capacity -within the correct age range for their industry roles.Farrowing producers periodically generate new piglets, which are subsequently batched as weaner pigs at a rate dependent on their sow inventory, according to industry standards.  .
Infected livestock transferred to a susceptible producer automatically infect the recipient.If a susceptible producer transfers livestock to an infective premises, there is a small probability that the infection may be brought back in the form of biological material which has contaminated the transportation equipment.Once a producer agent is infected, it is assumed that its entire premises becomes infected, due to the high reported virulence of PEDv once in a herd (Goede & Morrison ).The infection event triggers a mortality calculation that decrements the infected agent's livestock inventory according to observed mortality rates from PEDv appropriate for the age of each pig batch.Producers remain infective for an average duration of days before transitioning back to susceptible.If a producer ships out its entire livestock inventory, it is assumed that the premises is disinfected prior to receiving a new shipment.

.
Feed mills may become infected if a delivery truck that has previously been contaminated (by visiting an infective producer) returns to the mill.Mills remain infective for an average of days.While a feed mill is infective, each delivery truck departing the mill may become contaminated.The truck may also become contaminated upon visiting an infective producer part-way through a route.If a contaminated truck visits a susceptible producer, that producer may be infected upon receiving a delivery of contaminated feed.
. Slaughter plant receiving areas may become contaminated upon receiving a shipment of infected hogs.The plant's receiving area will remain infective for an average of five days, during which arriving transportation equipment may become contaminated.Transportation equipment that has been contaminated in this way may then spread the infection upon returning to the originating producer.
. To skip the transient period and allow agent interactions to stabilize, the initial infection event, which randomly infects five percent of producer agents, occurs one model year post initialization.

Parameter calibration and validation
.
Due to the lack of precise animal movement data coupled with the inherent variability of epidemic events within complex networked systems, we are interested less in empirically-validating the model to be used as a forecasting tool, and more in developing su icient structural-and face-validity to allow for a deeper understanding of the fundamental dynamics of the modeled systems (Klügl ).Even given identical starting conditions, deviations in contact patterns over the course of a real-world disease incursion o en render precise forecasts unfeasible (Moss ).Our aim is rather to uncover and better understand the network features that lead to epidemiological vulnerability in livestock production systems more generally (Epstein ).
. Calibration procedures that leverage concrete historical data are o en regarded as the best way to bring a model in line with empirical evidence.Unfortunately, there is a marked lack of publicly-available data in the agricultural sector beyond aggregated county-or state-level statistics.To the extent that datasets containing explicit locations, operational parameters, livestock and feed movements, and disease histories exist; these data tend to be held by private enterprises, which view them as sensitive internal records.In light of this, following Windrum et al. ( ), we employ several alternative calibration procedures that have been widely-used in previous modeling endeavors in which fine-grained data are scarce.

.
The spatial locations and basic operational parameters of RUSH-PNBM agents associated with each study area are calibrated using the "indirect" approach, whereby stylized facts about the distribution of agents in the system are gleaned from statistical datasets (Windrum et al. ).While the FLAPS tool discussed above (Burdett et al. ) serves as our primary means to set agent locations and operational parameters, our team also gained access to internal records from a large family-owned hog production chain system, which are used to codify realistic contact rate and shipment size parameters (see Appendix ).

.
Calibration of model elements that define how and when inter-agent contact occurs, as well as epidemiological sub-model parameters, were iteratively honed throughout the model development process following the "companion modeling" approach (Bousquet et al. ; Barreteau et al. ).During the scoping phase of model development, our research team convened two expert panels that included leading policymakers drawn from industry associations (e.g. the National Pork Board) and the USDA, leading veterinary scientists, Agricultural Extension agents focused on livestock biosecurity from several states, veterinarians working with major livestock production chains, industry communications specialists, and agricultural economists.An additional two panels, involving many of these same experts, were convened during the model's development.Finally, a fi h expert panel was convened to discuss and interpret the results of the model runs shown here.

.
In these panel sessions, we used both targeted focus groups as well as quantitative survey questionnaires to elicit and hone parameter values.Through this process, model assumptions, data streams, and behavioral heuristics were shared, critiqued, and refined; and implications of model results were discussed.Detailed notes were taken during these expert panels, which were subsequently coded and analyzed.This information was used to bolster the face-validity of the distribution of epidemic patterns, scales, and durations produced by the model.Using this participatory methodology, the modeled system was brought in line with the collective understandings of stakeholders who are intimately familiar with the operational details of U.S. livestock production chains.
. A common perception among our expert panelists concerned the complexity of disease transmission, the variation of disease spread across di erent states and production chains, and the relatively-meager understanding industry and USDA professionals possesses relative to identifying e ective leverage points in production chains that are both most susceptible to disease, and most critical to its spread.Given the complexity of the production chain -with its segmentation of livestock producer roles from farrow to finish -and the importance of feed mills and slaughter plants, industry experts called for the development of an ABM that could capture di erences between regions and the configurations of their associated production chain networks.

Sensitivity analysis
.
We conducted a sensitivity analysis focusing on four key model parameters.While the model includes many parameters (Table ), these four were chosen because each focuses on a specific aspect of the model's network and/or epidemiological architecture.Prob.producer to pig truck infection highlights the impact of outgoing disease transmission from a producer agent to either another producer or a slaughter plant.Avg.producer infection length is an infection duration parameter, to which previous SIR-type models have been found to be sensitive.Max producer connection distance evaluates the e ect of altering the global network structure from more localized to more spatially-di used neighborhoods.Finally, Prob.feed truck to feed mill infection aims at elucidating the relative impact of infections stemming from the feed distribution network (versus the livestock transportation network) on system-wide disease resilience.
. Each parameter is varied in steps between % and % of the baseline values given in  in Illinois, whereas Prob.producer to pig truck infection has the largest impact in North Carolina, reflecting differences in network structure between study areas.Despite state-to-state di erences, the fact that increased infectivity in both hog transportation and feed distribution networks both led to similar increases in overall epidemiological vulnerability suggests that our epidemiological spread sub-model is relatively balanced between these two major modes of transmission.This ground-truths our model with respect to real-world findings implicating both contaminated feed and infected hogs / transportation equipment in the spread of PEDv and other diseases (Schulz & Tonsor ).
. We find that the model is not particularly sensitive to Max producer connection distance.The magnitude and direction of the e ect varies between study areas, with only Illinois demonstrating a significant (positive, in this case) relationship to the response variable.While somewhat surprising, this result likely reflects the fact that, despite increasing the Max producer connection distance, the Max.number of potential producer to producer trading partners parameter remained constant (at ) across all runs.This means that the average k of a given producer would only increase with Max producer connection distance in more spatially-di used networks wherein a sizable proportion of producers did not have potential producer trading partners within the baseline maximum connection distance of km.This is more likely to be the case in Illinois, as there are relatively fewer producers within a relatively large area in that state.
. By contrast, increasing Avg.producer infection length causes significant increases in average vulnerability across all study areas.In light of previous SIR / SI model studies, the observation that average infection duration heavily impacts average vulnerability is not a surprise.Further, the shape of the elasticity curves in the top right of Figure suggest that percolation dynamics may exist, with the nonlinearity -or percolation threshold -being lowest for North Carolina and higher for the other two study areas, corroborating findings from Wiltshire ( ).

Experimental Design and Data Processing
. The model was executed times for each of the three study areas, maintaining all common parameters constant across repetitions.The total run count of was chosen due to the large number of agents in each study area, along with the relative complexity of the model's livestock trade and epidemiological spread heuristics, rendering each run-along with its associated data analysis-quite computationally-intensive. The full experimental results encompass , individual agent-run combinations, with all network metrics and epidemiological statistics calculated for each, yielding a substantially large and rich dataset totaling approximately two million unique datapoints.

.
The numbers and spatial distributions of each type of agent were hard-coded and maintained across runs within each study area (Table ).Further, immutable agent properties set at model initialization-including the pool of potential trading partners-were also held across repetitions within each study area.A er initialization, all stochastic events in a run utilize a random seed, meaning that-while the underlying network structure associated with each study area production chain remains constant-the transportation of livestock and feed, along with the resultant outbreak progression, is unique from one run to the next.
. Throughout each run, a weighted, directed contact network was built up by tracking the number of times each agent contacted another agent-as a result of either delivering or receiving livestock or feed-with the final edge weights being equal to the number of contacts between connected agents.In a similar fashion, an infectivity network was constructed, with edge weights tracking the number of times the infection was passed between agents over the course of a run.An agent's vulnerability-the dependent variable used in the data analyses below-is defined as a node's in-degree within the infectivity network.
. Networks generated during the experiment, along with key statistics associated with each agent, were output as tabular data at the conclusion of each run.Using the Python NetworkX .library (Varoquaux et al.
), the weighted edgelists describing each network were used to generate weighted, directed network graphs, allowing key global and node-level network metrics previously linked to epidemiological spreading to be evaluated (Ghani &  ).Where applicable, both weighted and unweighted versions of these metrics were calculated, in order to compare between the two (Rothenberg et al.
; Bell et al.
).The metrics utilized in this study are listed and briefly described in Table .For clarity, we distinguish between the set of node-level metrics pertaining to centrality, versus those capturing other aspects of node positionality.The full sets of global and node metrics used in the analysis were: To generate meta-metrics-that is, formulae composed of multiple individual indicators-we utilized the Eureqa GP package (Schmidt & Lipson ).Due to the large size of our dataset, we first sampled the data by randomly selecting agents from each study area to include in the GP training and validation sets.Because of the lower numbers of feed mills and slaughter plants compared to producers, we ensured that all agent types were represented in the data by stipulating that a minimum of five agents of each type from each study area were included, yielding , total rows.The GP was trained on half of these data (selected randomly) and validated on the remaining half.which network metrics can predict individual nodes' infection vulnerabilities across the three states, examining both network-structural and node positionality factors.Finding that no individual metric adequately correlates with infection risk across study areas, we turn to the GP results to investigate the extent to which formulae composed of multiple metrics -which we call "meta-metrics" -may predict epidemiological vulnerability.

Characterizing study area production networks
.
Figure shows the degree distributions for each class of agent and study area.Weighted degree (k w ) represents the number of times an agent contacted another agent (i.e., sending or receiving either livestock or feed) over the course of a run, whereas unweighted degree (k) describes the number of unique agents with which an agent had contact.Plotted with log-scaled x and y axes.bins were used for producers, and for feed mills and slaughter plants. .
Overall, producers tend to have the lowest k w and k, followed in order by feed mills and slaughter plants.Feed mills' degrees are constrained to a smaller range, whereas the distributions for producers and slaughter plants are quite long-tailed.Slaughter plants in Iowa appear to have somewhat higher degrees on average than in the other study areas.
. A linear degree distribution in log-log space demonstrates the power law relationship associated with scalefree networks, satisfying P ).This appears to be roughly the case for producers, but it is less clear whether the other agent classes exhibit a power law degree distribution. .
Additionally, we observe that the degree distribution for producers appears to have a di erent pattern in North Carolina compared with Iowa and Illinois, with North Carolina having more high-k producers.An examination of Figure suggests that this may be related to the heavily-connected cluster surrounding Duplin County in the southeast of North Carolina.
.  chance of receiving the infection at least once during a model run, at p inf ≈ %, versus % for Iowa, and % for Illinois.Agents in North Carolina also had the highest average vulnerability ( v ≈ 6), as well as the longest average total infection duration (t inf ≈ 101 days).
. Examining the averaged network metrics provides some initial insights into these disease resilience discrepancies (Table ).While the Iowa networks have significantly more nodes, the North Carolina networks are the most densely-connected (higher k and k w ).Di erences are also apparent in the indicators associated with the networks' k-cores as well as average centrality values.In the sections below, we statistically evaluate the ways in which these patterns in the network data relate to epidemiological vulnerability.

Can network metrics predict infection risk?
.
Since the probabilities of disease transmission and agents' behavioral heuristics do not change across runs, the disease risk discrepancies noted above must necessarily result from di erences in contact networks.We use a three-pronged approach to analyze the relationships between network structure and epidemic risk factors in -. -. -. -. -. -. -. -.
- the RUSH-PNBM output data.First, we examine the relationships between global metrics that capture overarching features of each study area network, versus the average vulnerability of nodes in these networks, as well as individual agents' vulnerabilities.Second, we examine the relationships between node-level metrics that capture various aspects of positionality, versus each agent's vulnerability.Finally, we turn to the GP results to identify meta-metrics that may serve as better indicators of epidemiological risk.

Global network-structural factors .
Figure plots the global network metrics described in Table against the average vulnerability of agents in each run.These findings may be analyzed in two ways: first, we can determine whether an overall trend exists across all three study areas; second, we can determine whether a trend exists across the runs associated with each study area.Overall, we find that all global metrics with the exception of k-core fractional size do indeed correlate significantly with average vulnerability across all runs.
. However, the situation is more complex when examining within-study-area trends.We find that weighted and unweighted average degree, unweighted assortativity, unweighted clustering, weighted and unweighted degree CV, and k-core fractional size all exhibit positive correlations in some states and negative in others.Further, the within-state correlations do not achieve significance for several of the metrics.No metric we assessed significantly correlates with average vulnerability both overall as well as within each study area.
. While some global network metrics correlate strongly with the average vulnerability of agents in each network, we find that global metrics alone do little to predict the infection risk of individual nodes, owing to the wide range of vulnerabilities between agents.Figure evaluates the same metrics as Figure , only this time we plot the individual vulnerability of each agent, rather than per-run averages.Here we find that, in fact, none of the global metrics taken alone is su icient to explain more than % of the variability in the vulnerability of individual agents.

Node positionality factors .
For real-world livestock producers, production system managers, veterinarians, epidemiologists, and policymakers it would be useful to understand the relationships between the network positionality of a given actor and its risk of infection during an epidemic event.To investigate this, we plot each node-level metric from Table against each agent's vulnerability.shows the remaining node positionality indicators.
. We find that all node centrality metrics significantly correlate with infection risk when using the full dataset composed of all three study areas (Figure ).With the exception of weighted eigenvector, these trends hold within each study area as well.The centrality metric with the most consistent positive correlation to vulnerability across study areas would appear to be degree centrality.
. However, linear regressions reveal that, despite achieving significant p-values, no centrality indicator explains more than % of the variability in the response variable across all study areas.Additionally, we find significant variability in the explanatory power of the correlations between study areas.This lends further evidence to the assertion that there is a complex interplay between the structure of a network overall and the e icacy of node-level metrics for determining risk.
. Turning to the remaining node positionality indicators (Figure ), we find -unsurprisingly -that both weighted and unweighted in-degree are relatively-strong predictors of vulnerability, with the weighted metric explaining % of the variability in the response variable.Simply stated, the more times a node is exposed to a potential threat by way of incoming livestock or feed, the more likely it is to be infected.We find that out-degree and clustering coe icient (both weighted and unweighted) do not explain a node's vulnerability.A node's coreness is a moderate (R 2 = 0.09) predictor of vulnerability across study areas, suggesting that subsets of more highlyinterconnected nodes may be largely responsible for sustaining outbreaks.
Using Genetic Programming to develop better vulnerability indicators .
Our preliminary findings suggest that, while both global and node-level metrics correlate with epidemiological vulnerability in some contexts, there are likely complexities arising from the interplay between these factors, ).Of the possible solutions generated by the GP, six were chosen for further analysis (Table ).This subset of solutions aims to sample a range of complexity / fitness pairings, with solutions situated at "knees" of the Pareto front preferred where possible (Branke et al. ).To avoid overfitting the data, solutions beyond a complexity level of are not considered, as there is very little improvement in fitness beyond that point.

.
Overall, it is notable that the better-performing of the GP solutions incorporate both node-level and global metrics.The most frequently-used node-level metrics are first and foremost weighted in-degree -which is used in all six possible solutions -followed by coreness, unweighted in-degree, unweighted total degree, and weighted eigenvector centrality.The most common (and indeed, only) global metric included in the GP formuae is weighted clustering.
. Figure plots the results of the GP solutions against the vulnerability of each node in the same manner as Figures , , and .We find that all of the GP solutions far outperform any individual metric, providing support for the value of the GP-derived meta-metric approach.The best of the GP solutions (S and S ) explain % of the variability in the response variable, while the best single metric (k in w ) explains just %.
Table : Six possible GP solutions along the complexity / fitness Pareto front.Fitness is defined as the R 2 value on the validation data.Red variables are from the global set; blue are from the node set -yet require a full network graph to calculate -and green are from the node set and require only node-level data.Constants are truncated to two significant figures.Equations are algebraically-simplified where applicable.

Discussion and Conclusions
. In this study, we apply agent-based computer modeling, network analytics, and evolutionary computation to explore the impact of key network metrics on the epidemiological vulnerability of livestock production chain actors.We analyze the extent to which indicators describing the structure of a complex contact network can be combined with those pertaining to a node's positionality within that network to arrive at better risk indicators.To this end, we demonstrate the feasibility of using a GP algorithm to formulate such meta-metrics and, at least in this context, we find that the GP solutions outperform any single indicator of epidemiological vulnerability.Below, we discuss how the meta-metric method could be used by real-world practitioners, address methodological issues surrounding data availability and context specificity, and propose future research goals.).Multivariate statistical techniques have also been used to evaluate the relative e ect of multiple node-level metrics on individual vulnerability (Ghani & Garnett ).The GP methodology we have employed expands on this work, searching the solution space to identify mathematical relationships encompassing multiple metrics to predict a node's vulnerability across three networks with di ering sizes, densities, and internal structures.

.
The ability of our GP approach to identify meta-metrics that capture complexities arising from the interactions between global and node-level network features suggests that this methodology may have widespread applicability.The particular networks evaluated in this experiment were state-level livestock production systems generated by a computer model, but we believe the procedures used to analyze the network data and identify meta-metrics could be applied to study phenomena on a wide range of graphs, both empirically-and computationally-derived.While we have applied the meta-metric approach to study epidemiological dynamics, meta-metrics could equally be formulated to predict other important graph properties, such as the probability that a node will add or prune edges over time, the frequency with which a node will enter some state, or virtually any other outcome variable of interest.

Global vs. node metrics as indicators of vulnerability
.
We find that the global metrics, while serving as reasonably-accurate predictors of average vulnerability within each study area, do little to predict the risk of an individual agent within each run, as a result of the high variability in vulnerability across agents of di erent classes, and with di erent network positionality characteristics (Figure ).
. Several of the node-level metrics, such as degree centrality and weighted in-degree, perform reasonably well as predictors of an agent's vulnerability (R 2 = 0.22 and 0.30 respectively).However, many of the node metrics demonstrate di erential correlation coe icients across each study area, indicating that risk cannot be accurately predicted by node-level indicators alone (Figures and ).
Turning to the GP solutions, we find that the best-performing meta-metrics (S -S ) incorporate both global and node-level indicators.Further, among the node metrics, we note that some of the calculations -for example weighted in-degree -require only local information from each node for calculation (indicated in green in Table ).However, other node metrics -for example coreness -require that a full graph of the network is first codified (indicated in blue in Table ).Notably, all six of the GP solutions we analyzed do require at least one instance of a node metric that falls into this latter category (primarily coreness in this case), emphasizing the interplay between global network structures and node positionality in the prediction of infection risk.
. Weighing the six meta-metric solutions against one another, S and S are the best performing of the set, each achieving a high overall R 2 value of .on the full dataset.Table reveals that S , while more complex, cuts the maximum error from .for S to .for S , suggesting that the additional complexity pays o in terms of predictive accuracy in this context.Overall, we conclude that the GP approach to determining e ective predictors of epidemiological risk by combining indicators of network structure with node positionality represents an e ective and under-explored mechanism to evaluate the infection vulnerability of individual nodes within networks.

Application to real-world decision-making .
Aside from demonstrating the applicability of our methodology, this work is also intended to inform practitioners in their e orts to strategically-allocate resources which improve epidemiological resilience in existing livestock production systems.While we do not wish to imply that the results presented here can reliably predict the course of any real-world disease incursion, our findings suggest several novel ways for livestock epidemiologists to approach risk analysis.
. Perhaps the most important takeaway from our study is that risk is a function of both network structure and individual decision-making.Biosecurity measures -such as shower-in-shower-out facilities, lines of separation, or all-in-all-out protocols -have proven e ective in limiting the scope of livestock disease outbreaks.While a producer's primary motivation to adopt such measures is rooted in individual risk reduction, from a systems perspective, our research suggests that biosecurity protocols would be vastly more e ective if implemented at especially-vulnerable network loci.The methods presented in this study allow for the identification of nodes in a system most likely to play a role in disease spread, which can then inform targeted biosecurity interventions to reduce systemic risk.
. The lack of available fine-grained U.S. animal movement data currently represents a major hurdle for network analysts and modelers interested in livestock disease risk mitigation.Animal traceback protocols, which track the movement of livestock between premises over their full lifespans, remain controversial in the U.S., likely due to attitudes around privacy.While consumer protection (recalls, etc.) is the most-cited reason for the implementation of traceback protocols, in light of our findings, an additional consideration is that traceback data could be used to codify accurate network representations of livestock supply chains, as has been done in other countries (Caporale et al. ; Natale et al. ).
. Given su icient input data, RUSH-PNBM could be parameterized to guide on-the-ground policy and managerial decisions.By encoding the interaction patterns and animal movements between actors in real-world production systems into network graphs, and then incorporating data tracking the spread of a historical outbreak through the system, it becomes possible to apply the methods we have presented to actual disease incursion threat scenarios.
. Model use cases could focus on di erent scales.For example, national agencies such as the USDA could use RUSH-PNBM to target nationwide interventions -such as the placement of truck-wash sites -or to propose health and safety regulations focused on potential "hubs" like slaughter plants and feed mills.Alternatively, owners and managers of private livestock production chain systems could employ specifically-parameterized versions of the model to assess disease risks within their own systems.This would facilitate both insights into the individual network loci at which biosecurity protocols could do the most good, as well as analyses of how reorganization of network structures could improve system-wide disease resilience.

Expert panelist feedback .
The variation of individual node vulnerabilities found in our results did not surprise our panelists: the relationship between an individual node's vulnerability and its position and/or function in the network matters.The potential of our model to identify those nodes with the highest disease risk casts light on an important informational need raised in our panel sessions.Coupled with finer-grained and more-detailed data (such as the widespread use of animal traceback protocols), panelists agreed that RUSH-PNBM -together with the metametric methodology -could represent a valuable tool which may be leveraged to identify where best to target biosecurity measures.
. Interestingly, many of the expert panelists were less interested in some of the broader structural implications that may be drawn from this study.For example, despite its demonstrated biosecurity implications, minimizing the bridging links within livestock production networks by reducing producer over-specialization was not universally deemed desirable.However, more recent engagement between our research team and owners of mid-sized, leading-edge production chain systems has revealed an openness to rethinking the role of network connectivity structures upon biosecurity within the context of their own operational networks.

Limitations of our approach .
Macro-scale socio-economic and political factors can directly and indirectly influence the economic incentives, disease patterns, and network-structural parameters embedded in RUSH-PNBM.While this study demonstrates the limited generalizability of our model across three states in the U.S., application of the model to other socioeconomic and political contexts (e.g. the EU) will require careful re-calibration and validation of model parameters, including securing relevant datasets and convening additional expert panels.Extensibility of RUSH-PNBM to generate medium-to long-term (i.e.-to -year) scenarios would also require refinement of simulation processes and parameters to account for projected changes in economic and political conditions within which the simulated networks operate.
. As noted above, practical implementation of this new meta-metric approach (i.e. to guide policymaking) will require better contextualization of the network structures and underlying theoretical assumptions governing network dynamics.Due to complex government-industry relationships and competition within the livestock production industry, empirical data concerning underlying network structures, animal movements, epidemic spread data, and other important factors are not typically easily available.Fortunately, ABMs -acting as virtual laboratories (Macal & North ) -enable testing of alternate assumptions about network structures and their underlying dynamics.As described above, widespread adoption of animal traceback protocols would go a long way toward solving the data availability issue.
Directions for future research .
Previous studies investigating the impact of network positionality on epidemiological risk di erentiate between vulnerability -which we have used as our dependent variable -and infectivity, i.e. the frequency with which an actor spreads the disease to others (Rothenberg et al. ; Ghani & Garnett ).Understanding both factors is important to guide the implementation of interventions which curb disease spread, as some protocols are designed to prevent incoming disease threats, whereas others are aimed at limiting outgoing infection spread.An area for future research is to develop GP-derived meta-metrics that correlate with a node's infectivity rather than its vulnerability.For example, it would be interesting to determine whether the finding of Ghani & Garnett ( ) -that infectivity is more contingent on global network structures -is corroborated.This would o er a more complete picture of the overall risk environment.

.
In addition, future versions of the RUSH-PNBM model itself will be increasingly honed.For example, our team will conduct survey research to evaluate the economic and human dimensions underlying choices to adopt methods and technologies aimed at bolstering producer-level biosecurity.Another future research goal is to use experimental gaming data to determine the circumstances under which individuals choose to adopt biosecurity protocols in risky and/or uncertain decision environments (Merrill et al. ).We can then program the agents in RUSH-PNBM to adapt their behavior under parallel conditions in the model.Coupling such insights with network-based risk analyses -especially those calibrated around real-world production chain systems -will allow for a deeper understanding of the human-behavioral solutions that may promote the adoption of riskmitigating protocols, as well as helping to focus biosecurity protocol adoption upon the nodes at which it will have the greatest e ect on system-wide disease resilience.

Figure :
Figure : Agent connectivity key.Connections also illustrate potential disease transmission vectors.

Figure :
Figure : Sensitivity analysis plots for four key parameters.Scatter points show average values at each step, colored regions show % CIs, and dashed lines show linear trends.Blue represents North Carolina, red Iowa, green Illinois, and black the combined dataset.Pearson correlation coe icients, p-values, and R 2 values of linear regressions appear in legends below each figure.

Figure :
Figure : Weighted and unweighted degree distributions by agent type for each study area, across all runs.Plotted with log-scaled x and y axes.bins were used for producers, and for feed mills and slaughter plants.

Figure :
Figure : Sample model-generated production chain maps (le ), and corresponding network graphs (right), for each study area.Maps show agents of each type (see Figure for key) and infection-spreading edges in red.Graphs show contact (black) and infection-spreading (red) edges.Nodes are positioned using a spring layout.Gray nodes are producers, yellow are feed mills, and red are slaughter plants.
Figure : Correlations between key network-level metrics and the average vulnerability of nodes in each run.Weighted (W) and unweighted (UW) statistics are indicated.Scatter points are averages for each run, color coded by study area.Black lines show best fit, with Pearson correlation statistics and R 2 given in legends both by state and overall.

Figure :
Figure : Correlations between key network-level metrics and node-level vulnerability.Weighted (W) and unweighted (UW) statistics are indicated.Scatter points represent each agent in each run, color coded by study area.Black lines show best fit, with Pearson correlation statistics and R 2 given in legends both by state and overall.

Figure :
Figure : Correlations between centrality metrics and vulnerability.Weighted (W) and unweighted (UW) statistics are indicated.Scatter points represent each agent in each run, color coded by study area.Lines show best fit for each state as well as overall, with Pearson correlation statistics and R 2 given in legends both by state and overall.

Figure :
Figure : Correlations between key node-level metrics and vulnerability.Weighted (W) and unweighted (UW) statistics are indicated.Scatter points represent each agent in each run, color coded by study area.Lines show best fit for each state as well as overall, with Pearson correlation statistics and R 2 given in legends both by state and overall.

Figure :
Figure : Correlations between six GP solutions and vulnerability.Scatter points represent each agent in each run, color coded by study area (blue = NC; red = IA; green = IL).Lines show best fit for each study area as well as overall, with Pearson correlation statistics and R 2 given in legends.Note that, unlike the R 2 values given in Table -which are calculated only on the GP validation set -the R 2 values here are calculated on the full dataset output from the model.
Although the current version of RUSH-PNBM was built to study PEDv transmission in three U.S. states, its general structure allows for re-parameterizations focused on increased context specificity and/or adaptation to other diseases, study areas, or livestock species.This flexibility suggests that the model could serve as a valuable tool for practitioners to assess epidemic risk in a variety of livestock production contexts.

mill disease spread probabilities
Because FLAPS only covers animal production units, feed mill and slaughter plant locations are initialized by distributing them at random positions within each county in proportion to the number of producers in the county, with the overall numbers per state being derived from available datasets in conjunction with expert advisory panels.These non-producer agents are assigned service areas at initialization by having each producer agent connect to the nth-closest of each type, with n being drawn from Poisson distributions calibrated in consultation with industry experts.For example, in the case of feed mills, λ = 1.5, indicating that a producer is most likely to purchase feed from the first-or second-closest mill, with fewer purchasing from the third-closest, fewer still from the fourth-closest, etc.Alternatively, for slaughter plants, λ = 2, indicating that the most likely outcome is for a producer to ship hogs to the second-closest plant.A limitation of RUSH-PNBM is that livestock Prob. feed mill will become infected if returning feed truck is contaminated .EAP Prob.feed truck will become contaminated if feed mill is infected .EAP Slaughter plant disease spread probabilities Prob.slaughter plant receiving area will become infected if pig batch is infected .EAP Prob.pig truck will become contaminated if receiving area is infected .EAP Producer farrow, wean, and batch parameters Farrow to wean sow proportion (relative to total capacity)Table:Model parameters and values explored in the experiment."EAP"indicates that the value was derived through expert advisory panel sessions."FHPC"refers to the family-owned hog production chain system dataset.. and feed transfers occur only between model agents; thus, since the model's spatial extent is a single U.S. state, interstate transportation and trade is not represented.

in Avg. Vulnerability Parameter North Carolina Iowa Illinois All Study Areas
Table , with ten replications per step.Table shows the elasticity of the response variable -average vulnerability -across this range for each study area.Figure visualizes the sensitivity analysis data and provides correlation statistics.
.Results show that the model is moderately-sensitive to changes in Prob.producer to pig truck infection and Prob.feed truck to feed mill infection; with the e ect on average vulnerability being positive and significant in all study areas.It would appear that Prob.producer to pig truck infection has a larger impact on the response variable% ChangeTable:Elasticity of response variable (Avg.Vulnerability) resulting from variation between % and % of baseline values for four key parameters.
Mean values and % confidence intervals for epidemiological statistics and network metrics across agents in each study area.

.
Previous research in this area has largely explored bivariate correlations between of a suite of metrics and epidemiological outcomes -at either the network or individual level -in both simulated and real-world net-