A Hybrid Agent-Based and Equation Based Model for the Spread of Infectious Diseases

Both agent-based models and equation-based models can be used to model the spread of an infectious disease. Equation-basedmodels have been shown to capture the overall dynamics of a disease outbreak while agent-basedmodels are able to capture heterogeneous characteristics of agents that drive the spread of an outbreak. However, agent-based models are computationally intensive. To capture the advantages of both the equation-based and agent-based models, we create a hybrid model where the disease component of the hybridmodel switches between agent-based and equation-based. The switch is determined using the number of agents infected. We first test the model at the town level and then the county level investigating di erent switch values and geographic levels of switching. We find that a hybridmodel is able to save time compared to a fully agent-basedmodel without losing a significant amount of fidelity.

the additional detail and fidelity gained from using real data comes with a cost, the more detailed an agentbased model becomes the more computing time and power is needed to run the model. Ferguson et al. ( ) mention the high computational requirements for their model with each model run for the US simulation taking one to two hours using GB of RAM and CPUs. Scalability is o en an issue when creating a highly detailed model. .
Agent-based models also tend to be stochastic, which can lead to a level of uncertainty around the results of the model. However, this stochasticity is also an advantage of agent-based models for infectious diseases as it allows for the model to simulate a range of possible scenarios from the same initial conditions. This is important in understanding an outbreak and how there are multiple courses that an outbreak might take based on chance and the individual decisions and interactions by agents.
. Although there are some disadvantages of agent-based models-most importantly their computational cost but also the possible uncertainty around the model results and di iculties in interpreting and validating modelstheir advantages make them an important tool in infectious disease modelling. We believe the ability to simulate heterogeneous agents and their interactions, and the ability to produce a range of scenarios for the same outbreak are important in modelling an infectious disease and it is essential to retain these in any hybrid architecture we create.

Equation-based models
.
Equation-based models are another type of epidemiology model with the most common type used for infectious disease modelling being the compartmental model. A compartmental model is made up of a set of di erential equations (Hethcote ). The population in a compartmental model is assumed to be homogeneous and well mixed. Each compartment is defined with its own di erential equation (Duan et al. ). The simplest compartmental model is the SIR model where the population is split into three compartments: susceptible individuals (S), infected individuals (I), and recovered individuals (R) (Hethcote ). Typical variations of the SIR model include the SEIR model (susceptible, exposed, infected and recovered), the SIS model (susceptible, infected and susceptible) and the SEIRS model (susceptible, exposed, infected, recovered and susceptible). The models can be made more complicated and realistic by adding additional compartments for various characteristics of agents including age groups or vaccination status. .
Similar to agent-based models, equation-based models can be used to both study disease dynamics and to analyse a specific outbreak or epidemic. For example, to study disease dynamics Hogan et al. ( ) created an age structured model where each age group has its own compartments for Respiratory Syncytial Virus, a common childhood infection. The model can be useful when simulating age-dependent interventions such as vaccination, for example, the e ects that vaccination rates have on measles outbreaks are studied using the Pang et al. ( ) model. There are many examples of equation-based models beings used to analyse a specific outbreak or epidemic, for example, Vaidya et al. ( ) model the spread of H N in a rural university town and determine that a portion of the susceptible population was protected from infection through self-isolation, social distancing or other preventative measures and this protected population played a substantial role in the dynamics of the epidemic. Mamo & Rao ( ) show that since isolation is commonly used in the treatment of Ebola cases, to best capture the dynamics of Ebola spreading in West Africa an additional compartment, isolated, is needed. Equation-based models have also been used to help shape policy during an outbreak: a series of models were used to help inform policy decisions to control the foot-and-mouth disease epidemic in the UK (Kao ). Other models have been created to investigate the seasonality of measles (Keeling & Grenfell ). .
Another type of equation-based compartmental models that are not used as frequently as di erential equation are di erence equations or discrete time models. They are a type of mathematical model that is similar to a di erential equation but are over a discrete time space instead of a continuous time space as is the case with di erential equations. Di erence equations can exhibit behaviour that a di erential equation cannot, with even a simple non-linear di erence equation being able to show chaotic behaviour (Allen ). While most epidemic models in the literature are di erential equations, there are some that use di erence equations to better capture the dynamics of an outbreak. Keeling & Grenfell ( ) refer to discrete time models as being some of the most accurate models of measles. .
The main advantages of equation-based models are their ability to capture the macro level dynamics of an infectious disease outbreak and their ability to do so at little computational cost. However, there are some disadvantages to using an equation-based model. Equation-based models cannot provide detailed information on the spread of the disease. In addition, the small set of variables that are used in an equation-based model may not be enough to define an outbreak. Assuming that the population is homogeneous within a compartment can also be a problem in not capturing the individual variations and actions that can have a major impact on the course of an outbreak (Duan et al. ). Equation-based homogeneous mixing models for fatal diseases such as AIDS have been known to fail because individuals adapt their behaviour to the epidemic (Duan et al. ). As equation-based models are at the population level they o en miss individual-level dynamics that can play an important part in an outbreak, especially when there are a smaller number of cases.

Hybrid Models
. Hybrid agent-based models are a way to combine the advantages of the "top down" equation-based model and the "bottom up" agent-based modelling method while reducing the limitations of both (Marilleau et al. ). The hybrid allows for further scaling and modelling of a larger population while still keeping a heterogeneous population. .
Although not abundant in the literature there are some examples of hybrid agent-based models for infectious disease epidemiology. These hybrid models tend to fall into two major categories, a system where some parts are modelled using agents and other parts are modelled using an equation-based model and a system that uses both an equation-based and an agent-based model and switches between the two (Binder et al. ). The model by Bobashev et al. ( ) is an example of the latter. They use a hybrid model to study pandemic influenza. The model is made up of cities in a network with transportation in between the cities. When a city reached a certain number of infected agents it switches to an entirely equation-based model. Kasereka et al. ( ) create a model that falls into the former category where agents move between cities based on an agent-based model but the disease model is an equation-based model. Similarly Yoneyama et al. ( ) use an equation-based disease component in their hybrid agent-based model. Hybrid models are also used in infectious disease epidemiology for non-human based diseases. Bradhurst et al. ( ) and Bradhurst et al. ( ) create a model for the spread of foot and mouth disease in livestock. In the model with-in herd disease dynamics is modelled using an equation-based model and between herd dynamics is modelled using an agent-based model. This is because the authors feel that with a herd of cattle their interactions and contact patterns are relatively homogeneous within the herd. Thus it is not necessary to model those dynamics with an agent-based model but in the between herd dynamics it is more important to model the heterogeneity that occurs in these interactions. .
Agent based models are a way to capture the heterogeneity in a system that helps to drive the dynamics of that system. However, the heterogeneity can result in larger models that take more computational power and time. Hybrid models are a way to still capture that heterogeneity while reducing the computational power necessary to run the model. It is important though to decide which parts of the model the fidelity can be reduced in and made equation-based or else the model will lose performance. While making the disease portion of the model equation-based can save time and computing power it misses capturing individual agents actions and the importance of contacts and di erent contact patterns between agents in the spread of a disease. Switching between agent-based and equation-based models allows for the contact patterns in the early stages of the outbreak to help drive the infectious disease spread but saves time when the outbreak gets large enough for a few individual movements and interactions to no longer have as large of an impact on the outbreak because there are enough other agents infected. However, switching the entire model over or an entire city ignores the fact that transportation between cities and the movement of agents is not homogeneous.
. We propose a model that takes advantage of both versions of the hybrid architecture. The model uses a switching point to change from agent-based to an equation-based models and back. However, the entire model does not switch instead only the disease model switches to an agent-based model and the rest of the model remains agent-based.

Town Hybrid Model
. A town hybrid model is created based on the Hunter et al. ( a) model, that was designed to simulate the spread of measles through an Irish town, with a few changes. There are two main motivations behind the changes. The first is to improve e iciency of the model: both the environment component and the transportation component are altered to improve e iciency. The second change is to create a hybrid architecture: the disease component is altered to implement a hybrid model. The model is a hybrid agent-based and equationbased model where the disease component of the model switches between agent-based and equation-based when a certain percentage of agents are infected. The model is tested using the town of Schull in Ireland. Schull is a small town that has a population of approximately , individuals and was selected as there was a measles outbreak in the town in . As the model was created to simulate the spread of measles, which is mainly a childhood disease, we assume the main source of infection will be within schools and the networks between children. Thus we do not include large events or locations such as concerts, sporting events, retail stores, restaurants or gyms that may lead to super spreading events in other diseases but would not be largely attended by children. However, super spreading events can happen in the school setting.

Model components .
The following sections give a brief overview of the model describing the four main components of an agentbased model taxonomy outlined in Hunter et al. ( ).

Environment component
.
In an agent-based model the environment can be created with a high level of detail. For example, in the Hunter et al. ( a) model a town is made up of multiple small areas, from Irish Census data small areas are the smallest geographic region over which the census is aggregated. Each small area contains between to dwellings (CSO ). Within the small areas the model uses real zoning data to assign agents' homes to residential areas and workplaces to industrial or commercial areas. In addition, schools are placed in the correct location. This version of the model, however, reduces environmental fidelity so that a small area is represented by one grid cell or Netlogo patch as is done in the Hunter et al. ( ) model. When in a small area an agent has access to certain information such as the number of schools or workplaces in the small area and the real world distance to each other small area in the model. All agents in the same small area are physically in the same location, however, the agents keep track of where they are in that small area: home, work, school or community and restrict their interactions with other agents accordingly.

Society component .
The society component is based on real world census data from the Irish Central Statistics O ice (CSO ). For each small area we create a population that reflects the population statistics of that small area including age, sex, household size and economic status. To be able to fully test the hybrid model any previous immunity to the disease is not included in the model. This allows for larger outbreaks and more switching in the hybrid model. If immunity from vaccination and previously having had measles was included in the model for the town of Schull only % of agents would be susceptible. Thus the threshold for switching, discussed in later sections, would have to be well below %. As we are aiming to investigate the hybrid dynamics in this paper we do not include immunity. Social networks are included in the model. There are three possible network types an agent could have: family network made up of any agents living in their household, work or school social network made up of other agents in their workplace or their school and a class network for students that is made up of agents who are in their school and of the same age. If an agent is at home they will only come into contact with members of their family network who are also at home, if they are at work or school they will only come into contact with agents who are in their work or school networks who are also at work or school. If an agent is in the community they will have the highest chance of coming into contact with other agents in their family network who are also in the community, followed by agents who are in their work or school network, and finally, agents who are not in any of their networks.

Transportation component .
Transportation in the model di ers from the Hunter et al. ( a) model. Instead of moving in steps between a location and desired destination agents move in one step. Agents movements are determined in one of two ways. Movements are either predetermined with the agents moving between home and school or home and work at certain times in the model or are determined randomly when an agent moves through the community. Agents moving through the community will pick a destination randomly from the small areas in the model. Although random movement is not completely realistic, at a small scale we feel that it is an acceptable approximation of how agents will move through a town. model. Agents will move between susceptible, exposed, infected and recovered states based on the disease dynamics and their interactions with other agents. If a susceptible agent comes into contact with an infectious agent, there is a percent chance the susceptible agent will be infected. If they are infected, the susceptible agent moves to the exposed state for a given period of time, where they are not infectious, before moving to the infectious state, where they can infect other agents. Then they will move to a recovered state and once recovered they can not be reinfected. .
The model was originally created to simulate the spread of measles thus the disease dynamics mimic measles. An agent will remain in the exposed state for an average of days (Nelson & Williams ). While in the exposed state the agent will not be infectious. The agent will then move to an infectious state where they will remain infectious for an average of days (Nelson & Williams ). The time an agent remains infectious in the model is determined for each agent from a normal distribution with a mean of and a standard deviation of . . To determine the probability of transmission per contact we use the method used in Hunter et al. ( a) that utilizes the components of the basic reproductive number, R 0 to find this probability.

Disease component: SEIR model .
The equation-based part of the disease component uses a SEIR di erence equation model. Although di erential equations are more common in infectious disease modelling, we chose di erence equations because they are modelled using discrete time space which is more analogous to the agent-based model and will allow for a more seamless transition between the two models. In the simulation, each geographic area selected runs its own SEIR di erence equation model. The model can be run at the small area level or the town level. The equations are as follows: Where S i is the number of susceptible agents in the geographic area in the previous time step and S i+1 is the number of susceptible agents in the geographic area in the current time step. E i and E i+1 are the number of exposed agents in the geographic area in the previous and current time steps, I i and Ii + 1 are the number of infected agents in the geographic area in the previous and current time steps, and R i and R i+1 are the number of recovered agents in the geographic area in the previous and current time steps. β is the infection rate or the probability of infection per contact between agents, σ is the rate of moving from exposed to infected and γ is the recovery rate. .
In a fully equation-based disease component, each geographic area starts its di erence equation model when an infected or exposed individual enters the area. In a hybrid model the di erence equation model will start when the number of infected or exposed individuals is over a certain threshold. The threshold is discussed further in the next section. This could happen in two ways, either an agent from outside the area who is already exposed or infected moves into the area or an agent who is from the area becomes infected outside and returns home. Once the di erence equation model has started it continues until the number of exposed or infected agents in the model goes below the threshold.

.
At each time step, each geographic area will calculate the values for the di erence equations and adjust the number of agents in the area in each category. If the rounded di erence between E i+1 and the number of exposed agents in the area is greater than , that number of susceptible agents in the small area will randomly be selected to move from the susceptible category to the exposed category. Similarly if the rounded di erence between I i+1 and the count of infected agents in the area is greater than , then that number of exposed agents will be randomly selected to move from exposed to infected. If the rounded di erence between R i+1 and the count of recovered agents in the area is greater than , then that number of infected agents in the area will recover.
. As movement between areas is possible in the model, there are times when the total number of agents in the area in one of the four compartments is di erent than the value predicted in the model. Adjustments are made to account for this. If the value for E i , I i , or R i is less than one and the count of agents exposed, infected or recovered in the area is greater than one then the value for E i , I i , or R i is changed to the count of agents in that area who are exposed, infected or recovered. If the values for the di erence between E i , I i , or R i and the number of agents exposed, infected or recovered respectively in the geographic area is greater than the number of agents who could potentially move into the compartment (if the di erence between E i and the count of agents exposed is greater than the number of susceptible agents) the value for E i , I i , or R i are adjusted down to reflect the actual counts of agents in the geographic area.
The model allows for geographic areas to switch between the equation-based model and the agent-based model. The idea behind using a switch is that the agent-based models are especially important when a few agents are sick because at this stage the individual movements are what drive the spread of the disease so the heterogeneous movements of agents are more important. For example, if the one infected agent decides to stay home the outbreak might not take o versus if they decide to go to school or work every day. However, once the number of infected individuals reaches a certain number the individual movements should not matter as much because there are so many agents infected. .
To capture this in the model the geographic areas are allowed to switch between the agent-based model and the equation-based model. The decision of which model is used in a geographic area in a given time step is determined by the number of agents infected in that area. The user can set the switch threshold to be any percentage of agents infected or exposed and the area will automatically switch between the agent-based model and the equation-based model when this threshold is passed. Note, that if the number of infected or exposed agents in an area drops back below this threshold the model reverts back to an agent-based model.

.
In the town model we consider two levels of the switch, the small area level and the town level. If the switch occurs at the small area level then each small area will keep track of the number of agents who are infected and exposed in that small area. When the percentage of agents who are exposed or infected in the small area is equal to or greater than the selected switch value the model switches from an agent-based disease component to an equation-based disease component. Each small area will run its own set of di erence equations and so the model can have some small areas running an agent-based disease component and some running an equationbased component. When the percent of agents in the small area who are exposed or infected goes below the switch value then the small area returns to an agent-based disease component. If the switch is at the town level when the total percent of agents exposed or infected is greater than the switch value the whole model switches to an equation-based disease component. When the percent of agents in the model who are exposed or infected is below the switch value the model returns to an agent-based disease component. .
It is important to note that if the switch is set to % the model will be completely agent-based. If the switch is at % the disease component will always be equation-based. If the switch is at the town level then a switch at % results in an entirely equation-based model as the location of a given agent does not influence if that agent becomes infected. This is because when the model is switched at the town level all agents are considered in the same equation-based model and will thus mix homogeneously. Thus the results of the town hybrid model with a switch at % are only influenced by the initial conditions of the model such as the total number of agents, the number of initially infected agents or the number of immune agents at the start of the model.

Experiments and results
.
In this section we report a number of experiments on our hybrid town model that were designed to test whether our hybrid model successfully blends the fidelity of agent-based models with the computational e iciency of equation-based models. In these experiments we treat the behaviour of a completely agent-based model (i.e., a model with a switch threshold of %) as the ground truth because agent-based models are considered to have the higher fidelity of the two modelling approaches. Comparing the results of a hybrid model to an agent-based model is used in the literature with Bobashev et al. ( ) using their agent-based model as the standard to compare their hybrid to as it has the most micro level detail. Consequently, if the hybrid model produces similar results to a completely agent-based model, while using less computational resources, then we can consider our hybrid modelling approach to be successful. .
Note, that there are two hyper-parameters that may a ect the performance of the hybrid model. The first hyperparameter is the geographical scale that the switch is applied at: small area or town level. The second is threshold within the relevant geographic area that is used to switch between the agent-based and equation-based models. To test the interactions between these hyper-parameters and the hybrid model performance, in each of the following experiments we run the following hybrid models: town switch with % threshold, town switch with % threshold, town switch with % threshold, small area switch with % threshold, small area switch with % threshold, small area switch with % threshold, small area switch with % threshold, and small area switch with % threshold. We limit the threshold at the town level to % because at higher thresholds there are not enough agents infected or exposed at the same time for switching to occur in the majority of runs. Experiments with higher thresholds determined that a threshold higher than % does not result in sustained use of the equation-based disease component. Similarly, through investigating di erent switch values for the small area level we determined that a er % there are not enough agents infected or exposed to sustain switching. The small area level model has a higher switch threshold because the small areas have a smaller number of agents in them than the town, and thus need fewer agents infected and exposed to reach the threshold. For example, for Schull, which has a population of , , a threshold of % at the town level would require agents in the town to be infected or exposed at the same time to switch. A threshold of % at the small area level would only require about agents to be exposed or infected at the same time in the small area to switch (Schull is made up of seven small areas and the average population across the small areas is ). Also, because of the stochastic nature of agent-based models we run each model times and use statistics calculated across these runs to compare with other models. .
Within the above experimental framework, the first experiment we report is a sense-check analysis that counted the number of switches a hybrid model makes between the agent-based and equation-based component. The motivation for this experiment was that if we found that a hybrid model rarely switches, and remains agentbased for the majority of the runs, then the hybrid model is not useful. The second experiment we report analyses the time-saved by a hybrid model when it switches to an equation-based disease component. To examine the time saved we compare the average number of seconds needed per time step of the hybrid model with a fully agent-based model. The final two experiments we report in this section are designed to compare the fidelity of the hybrid models with the fully agent-based model. The first of these fidelity experiments analyses the divergence between the number of infected agents in the hybrid models and the fully agent-based model. The second fidelity experiment analyses the divergence between the length of outbreaks in the hybrid models and the fully agent-based model.

.
Finally, switching to the equation-based disease component in the hybrid architecture will result in a loss of fidelity in the model results as the advantages of the agent-based disease component are lost. However, some of the advantages of using the equation-based component might out weigh the cost of losing the fidelity of the model. Consequently, we conclude these experiments by identifying a set of hyper-parameters (geographic switch area, and switch threshold) for our hybrid model that usefully balances between model fidelity and time savings.

Number of runs that switch .
We first look at the number of the runs that results in the disease model switching to equation-based. There are some cases where the model does not switch over to the equation-based because the required number of agents is never infected. Table shows the percentage of runs that the model switches for the small area switch and the town switch along with the % confidence intervals for each value. an equation-based model decreases. This is expected as a higher percentage equates to a larger number of agents required to be exposed or infected before the model switches to equation-based from agent-based and infecting a larger number of agents will take more time steps.

Run time .
Before looking into the results, it is important to determine if using the hybrid will actually result in real savings when running the model. To test for this we find the average number of seconds per time step in each of our versions of the model.  .
It can be seen in the figure that as the switch gets higher, a higher percent of agents need to be exposed or infected before the model switches to an equation   can be seen in Figure . .
Although it is useful to visualize the change in distribution, it is possible to actually compare the distributions and get a value for the probability that the sample distributions come from the same population. The Wilcoxon rank sum test is a non parametric alternative to a two-sample t-test that does not assume the population distribution is normal. The null hypothesis of the test is that the two populations have the same distribution. A Wilcoxon rank sum test is done for each of our distributions from the switching models compared to the distri-  Table : P-values for the Wilcoxon rank sum test comparing the outbreak size distributions for the switching models to the completely agent-based model.
bution from the completely agent-based model. The p-values for those tests can be found in Table . .
The values show that as the switch is larger, the distribution gets closer to the agent-based model. This can easily be explained, the larger the switch the longer the model remains agent-based so the more similar the two distributions will be. Our aim was to find a range of switch points that still result in the model switching between an agent-based and equation-based disease model but results in a distribution that is similar to the complete agent-based model. From the table we can see that a switch of % at the small area level results in a distribution that is not significantly di erent from the purely agent-based model and that a switch of % at the town level results in a distribution that is not significantly di erent from the agent-based model distribution at a % significance level.

Length of outbreak .
Finally the total time it takes for an outbreak to finish is investigated. An outbreak is considered finished if there are no agents exposed or infected within the model. The outbreak length is studied because it is another important characteristic of model output. If the hybrid model has a similar outbreak size but a di erent outbreak length than its not possible to say that the outbreaks are similar. The distributions of the number of time steps it takes for the outbreak to finish for the small area switching model and the town switching model can be seen in Figure and    The values can be found in Table . From the table it can be seen that as switch increases the distribution gets closer to that of the agent-based model. For the switch at the small area level we can see that when the switch is % the distributions are not significantly di erent at the % significance level and for the switch at the town level when the switch is at % the distributions are not significantly di erent at the % level.  Table : P-values for the Wilcoxon rank sum test comparing the outbreak time distributions for the switching models to the completely agent-based model.
Based on the above results we can see that the hybrid model is able to switch between agent-based and equationbased for a majority of runs when the switch is % or below and switching at the small area level and % or below and switching at the town level. In addition, at all values of the switch considered we see that there are significant time savings when running the hybrid model over the purely agent-based model. The experiments to analyse the fidelity of the hybrid model show that at lower switch values, the hybrid model distributions for both the total number of agents infected and the length of the outbreak are significantly di erent from the agent-based model distributions. However, at a switch of % for the small area switch and % for the town switch statistical tests show that there is not a significant di erence between the hybrid and agent-based distributions. Although both the small area and town level models result in time savings and produce significant results, the time savings are greater at the town level as the town level switch converges faster to the agentbased model results. A hybrid model switching at the small area level with a threshold switch of % is statistically similar to a fully agent-based model and has a time savings of an average of one millisecond per time step while a hybrid model switching at the town level with a threshold of % is also statistically similar to a fully agent-based model but has a time savings of an average of . milliseconds per time step. Because of this we feel that switching at a town level over a small area level provides a greater advantage.

County Hybrid Model
. The hybrid model for a single town is a start in an analysis to show that a hybrid model can succeed in both saving time and computing power when running a large agent-based model. The results show that not only does a hybrid model save computing time compared to a fully agent-based model but the results also start to converge to the results for the agent-based model as the switch point changes. However, even though in most cases the models appear to be converging the results are still shown to be from di erent distributions based on the Wilcoxon rank sum test and any larger switch values will not save time or result in the model actually switching. One factor causing this could be that the model is run on a small town. With only about , agents in the entire model switching can only happen on a small scale. In addition, at such a small scale the fully agent-based model does not take too much time to run leading to advantages in time saved for the hybrid model being negligible in many cases. To show the true advantage of a hybrid model it will be necessary to start with a model that is Parts of the transportation model di ers between the town model and the county model. Agents will still move in one step from their location to their desired destination and some movements are predetermined by the model with the agents moving between home and school or home and work at certain times. However, commuting patterns within the model are determined using CSO Place of Work, School or College -Census of Anonymity Records (POWSCAR) data (CSO ). This dataset provides information on the commuting patterns of people in Ireland.

.
The random movements throughout the community when agents are not at home, school or work are also changed in the transportation model. While we think random movements is an acceptable modelling simplification when modelling a small town with a closed population, when modelling a county it is no longer acceptable to assume random movements throughout the county: it is much more likely that an agent will remain in their own town or go to a town next door in the next hour than that they will be on the other side of the county. To account for this we use a gravity type model for transportation. A gravity model determines those interactions between two location pairs based on the characteristics of a location and the distance between locations (Rodrigue et al. ). The probability of an agent moving to another small area is proportional to the population density of the small area, an area that has a lot of other agents is more attractive, and inversely proportional to the distance to the small area from the agent's current location, areas that are farther away are less attractive. Admittedly the attractiveness of a location is not always correlated with population density (for example, special attractions, such as monuments, may be located away from population centres), however we believe that using a population density based gravity model to drive movement at the a county level provides a reasonable trade-o between model simplicity and realism at this geographic scale, and so provides a more accurate simplification of movement within a larger area than that in the original town model.

Disease component .
Both the agent-based and equation-based disease components of the model are the same as that of the town model. Switching is also similar, however, the county model switches at either the town level or the whole county level. The model being used in a given time step by the town is determined by the number of agents infected in either the town or the whole county. Similar to in the town model, it is important to note that if the switch is %, the model will switch from agent-based to equation-based when % of agents are exposed or infected, this means the disease component of the model will always by equation-based and if the switch is at % the model will always be completely agent-based. However, when the disease model switches at the county level, there is one set of di erence equations for the whole county, the model is essentially completely equation-based. Even though agents are allowed to move, because agents are infected at the county level their location does not have an influence on if the agent will be infected or not. The model with a switch of % has no stochasticity in it and the only thing that would have an impact on the results is the initial conditions: if the model starts with more or less agents, more than one agent infected, or there are a number of agents who are already immune.

Experiments and results
.
To test the county hybrid model we run similar experiments to those presented in Section to look at the switching behaviour of the model, the time savings and the fidelity of the results when compared to the fully agentbased model. We do one additional fidelity test for the hybrid model to compare how the outbreak spreads through the network of towns in the county.
. For both the town switch and the county switch we look at the switch values of %, % and % and we also look at switches of % and % at the town level. Similar to the town model, the smaller geographic area, town, continues to switch between the agent-based and equation-based disease component at a higher threshold compared to the larger geographic area, county, because the actual number of agents needed to be exposed and infected at the same time is smaller for the town than the county with the same switching threshold. For each switch value except for % we run the model times. As mentioned in the previous section, there is no stochasticity in the model with a switch of % at the county level, so the model only needs to be run once to get the results.

Number of runs that switch .
In order to make sure the model is utilizing the hybrid architecture we look at a number of measures: the number of time steps that the model has switched to hybrid and the maximum number of towns that have switched to hybrid during the model.
. Table shows Table : Percentage of runs that lead to the model switching from agent-based to equation-based .
The table shows that for all versions of the switch the model becomes equation-based for a large portion of runs. It can also be seen from the model that it is more likely for a switch to occur if the model is switching at the county level versus the town level. However, as noted in the previous section if the switch value is % the model switching at the county level will not switch to equation-based.
. Table if it has switched to equation-based for at least one time step, but models switching for only one or two time steps are not taking full advantage of the hybrid architecture of the model. Thus we look at the distributions of the number of time steps that have switched from agent-based  .

A run is counted in the percentages in
Looking at the distributions of the count of time steps where the model has switched to an equation-based disease model it can be seen that when the switch value is lower, the number of time steps where at least one town has switched to equation-based increases. This is as expected and makes sense as the model should reach the point where % of agents are infected or exposed before % of agents are infected or exposed and thus will remain equation-based for longer.
. The maximum number of small areas that have had their town switch to an equation-based model can be found in Figure . This is only done for the town switch model because when the model switches at the county level all towns switch together at the same time. As expected the model with a lower switch value has a higher maximum number of small areas that have switched to equation-based. The town switch model does not result in a larger portion of the model switching at any one time. With a switch of % the maximum number of small areas switched is out of a total of small areas in the county. This number reduces even more as the switch increases to % with only a maximum of small areas in the equation-based model at any given time step. Similar to the run time experiment in Section we look at the run time of the model to determine if the hybrid model produces savings over the fully agent-based model. Table shows the results for the average time in seconds for each time step in the model.

.
From the table it can be seen that there is almost half a second time savings per step going from a full agentbased model to a model where the disease component switches to equation-based when % of the agents are infected or exposed. Additionally we can see that when the model switches when there are % or % of agents infected or exposed, there are not significant time savings when compared to the completely agentbased model.

.
A similar time saving of over a half second per time step can be seen in the model that switches at the county level going from the agent-based model to the hybrid model that switches at % infected and exposed. However, even though the average number of seconds per time step is . seconds less than the agent-based model when the switch is %, the average value falls within the confidence interval of the agent-based model and vice versa showing no significant di erence.  . From the figure it can be seen that the distributions for switching at or % infected or exposed are similar to the fully agent-based model. Switching at % infected or exposed results in a similar distribution, however, there appear to be some more obvious di erences, such as a small cluster of outliers to the right of the distribution representing a number of runs with a much higher number of total infected agents. It can also be noted that comparing the % switching model to the fully agent-based model that there is a higher number of runs with a smaller number of infectious agents when the model switches. The distribution for the % switching model looks distinctly di erent from the rest of the models. The . A similar analysis is done when the switch is at the county level. The distribution of the total number of infected agents can be found in Figure . From the figure it can be seen that the models that switch from an agentbased to an equation-based disease component appear more similar to the model with an equation-based disease component then an agent-based disease component. They do, however, appear to be slowly converging towards the agent-based results. .
To further compare the similarities of the distributions, the Wilcoxon signed-rank test is run comparing the hybrid models to the completely agent-based model. The tests are used to determine if two sample distributions come from the same population.  Table : P-values for the Wilcoxon rank sum test comparing the outbreak size distributions for the switching models to the completely agent-based model.
. Figure . For the model that switches at the county level, the p-values are close to meaning that the null hypothesis of the distributions coming from the same population should be rejected. Thus switching at the county level does not result in distributions of infected agents that are similar to the agent-based model. When the switch is at the town level, the distribution with a % switch value has a p-value very close to so the null should be rejected as well. However, the distributions for %, %, and % are not significantly di erent from the agent-based model.

Length of outbreak .
To compare our outbreaks we also look at the time it takes for the outbreak to finish. An outbreak is complete when there are no longer any exposed or infected agents in the model. If the run times of the models are drastically di erent it will be hard to compare the results as the outbreak length is a key descriptive feature of an outbreak. To compare the lengths of outbreaks across models the distributions of the the number of time steps taken for the model to finish is looked at for both versions of the model and each switch value. The distribution for the model that switches at the town level can be seen in Figure   . The p-values for the Wilcoxon tests to compare the hybrid models to the agent-based model can be seen in Table ,   Another aspect of the outbreak that can be considered is what towns the outbreak spreads to. As the model is run in a scenario with a highly infectious disease and the population has no previous immunity, there is a large number of infected agents in the model.

.
Table shows the percent of runs that lead to an outbreak in the switch model for twelve di erent towns in Leitrim County along with the population of the town and a weighted degree centrality. The degree centrality is a measure of the number of agents that commute in and out of the town. Six of the towns are larger towns that are made up of multiple small areas, Ballinamore, Dromahair, Leitrim, Lurganboy, Manorhamilton, and Mohill. The other six towns are smaller towns that are only made up of one small area, Aghacashel, Corrala, Glenfarn, Newtowngore, Munakill and Rinn. The results are given for each version of the model based on the switch and the fully agent-based model. The version of the model with the fully equation-based disease component is not included as the model results in nearly all the agents becoming infected every run, thus every town would have an outbreak in it each run. From the results it can be seen that for the larger towns, the majority of the runs result in outbreaks and for the completely agent-based model all of the larger towns have % of runs that lead to an outbreak. This is the same percent as the percent of runs that leads to an outbreak in the overall model, with % of runs having at least two agents in the county infected. This is likely because the model is run without immunity to be able to fully test out the hybrid model and therefore for a larger town that is likely more central with more agents commuting in and out when there is an outbreak in the county where no agents are immune it will spread to the larger towns. As the smaller towns have fewer agents they are not as likely to have the outbreak spread to them.
. It can also be seen from Table that as the model gets closer to the fully agent-based model, the switch increases in size, the results are more similar to the fully agent-based model, further showing how the hybrid model converges to the hybrid as the switch increases. The percent of runs that lead to an outbreak is also calculated for the model with the county level switch. However, when the model that switches at the county level switches to an equation-based disease component from an agent-based disease component the location of an agent does not have an influence on if they become infected and there is homogeneous mixing for all agents in the model. Thus the percentage of runs that lead to an outbreak for the model switching at the county level when % of agents are infected or exposed and when % of the agents are infected or exposed is equal to the number of runs that spread outside of a single town in model. This shows a clear di erence between the model that switches at the town level and the model that switches at the county level. The town switch still allows for agents movement patterns, such as their commutes to influence the spread of the outbreak. Even though Table : Percent of runs that the outbreak spreads to the given town the larger more central towns in the model have similar percent of runs that lead to an outbreak, the smaller less central towns have more variable results. This is not the case in the model that switches at the county level where the outbreak is equally likely to spread to all towns regardless of the size and centrality.

Conclusion
. Hybrid models allow us to utilize the advantages of two di erent modelling techniques. The paper shows that it is possible to create a hybrid model for infectious disease epidemiology where the disease component switches between agent-based and equation-based determined by the number of agents infected. We looked at a number of levels for switching, both at the actual value of the switch ( %, % etc.) and at the size of the area where the switch occurs (small area, town or county). For each version of the hybrid model we compared the results to the fully agent-based model and found that a number of factors influence the results of the hybrid model. The value of the switch, if the model turns to equation-based at % or % infected is important as it determines the initial conditions. The higher the switch the less likely it will be that the model switches and switches for an extended period of time. The higher switch values do not result in as much savings of time and computing power as the lower switch values. In addition, these models were run on a scenario where the entire population was susceptible to the disease. While this may be the case for new and emerging diseases, for a disease such as or influenza a portion of the population will be already immune to the disease. This will create even less opportunity for switching at a higher percentage of agents infected or exposed.
. Another factor influencing the results of the model is the area over which the switch occurs. From our test we have looked at a number of levels from small area to town to county. The results of our model show that the smaller the area of the switch the less time saved, this is because at the lower levels the equation-based and agent-based disease components will be running simultaneously based on the number infected at each town so more of the model will be agent-based even when the model has switched. However, at the county level the entire model will be equation-based at once so there is more time savings. In addition, the size of the area that is switched has an impact on how similar the results will get to the agent-based model and the largest switch value that can best used. The smaller the area that is switching the larger the switch can be. For example, when we switched at the small area level the model still switches at values up to % but the county model only switches to about %. We can also see that when the switch is at the town level the hybrid model converges to the agent-based model faster than if the the switch is at the small area level or the county level.

.
Our analysis leads us to the conclusion that at both levels of the model, town and county, the switch for our hybrid model is best done at the town level. We think that the town level switch provides su icient time savings compared to a fully agent-based model while still being able to produce results that are similar to the fully agent-based model. Not only we do capture a similar distribution of the number of infected agents but the county model is also able to capture a similar spread of the outbreak through the county. Further based on the analysis a switch value between % and % is likely going to produce the best results. A switch closer to % will better match the agent-based model but a switch closer to % will result in greater time savings and more time steps with an equation-based disease component.
. Generally, in creating a hybrid model there is a large amount of flexibility. We chose to use a switch between agent-based and equation-base models and to only switch the disease component. Others might choose di erent structures and thus the aggregation and switch that we suggest here would not be applicable to their models. However, we feel that the method of testing, by comparing the hybrid model results at di erent thresholds of switching and di erent levels of aggregation to the fully agent-based model is a valid method for testing a hybrid model that involves a switching behaviour.
. Further work can be done to improve the model, to make the model more realistic it should be tested where there is already some level of immunity in the population. This should require lower levels of switching and may produce di erent comparative results than what we have presented here. The di erence equation model that we use for the equation-based portion of the model is simple. There are ways to create a more realistic equation-based model, for example, adding additional equations for age groups. However, every additional equation makes the model more complicated and will cause additional run time. The idea of creating a hybrid agent-based and equation-based model is to simplify the model and save time and computing. Therefore, any work to further complicate the equation-based model should keep that in mind. Additionally, as the model was created for the spread of measles, schools are considered the main sources of transmission. Going forward additional work should be done to focus on other areas of transmission that might be more relevant to other infectious diseases. For example, including public transportation, shopping centers, gyms and large events such as concerts. Further the simplified gravity model can be made more realistic by adding in special attractions such as monuments and national parks that might draw agents to a particular location.

Model Documentation
The code and documentation for the model is available on the CoMSES Network -Computational Model Library as: Hybrid Agent-Based and Equation Based Model for Infectious Disease Spread (version . . ): https://www. comses.net/codebases/e30e36f0-5471-46b5-9c78-27b3f2185ff9/releases/1.0.0/ All experiments in this paper were run on Netlogo version . . on a Dell Laptop Latitude E with GB of RAM and an Intel®Core TM i -U processor.

Notes
The transportation component for the county model is the same as that described in (Hunter et al. )