Using a Socioeconomic Segregation Burn-in Model to Initialise an Agent-Based Model for Infectious Diseases

Socioeconomic status can have an important e ect on health. In this paper we: (i) propose using house price data as a publicly available proxy for socioeconomic status to examine neighbourhood socioeconomic status at a more fine grained resolution than is available in Irish Central Statistics O ice data; (ii) use a dissimilarity index to demonstrate and measure the existence of socioeconomic clustering at a neighbourhood level; (iii) demonstrate that using a standard ABM initialisation process based on CSO small area data results in ABMs systematically underestimating the socioeconomic clustering in Irish neighbourhoods; (iv) demonstrate that ABMmodels arebetter calibrated towards socioeconomic clusteringa er a segregationmodelshasbeen run for a burn-in period a er initial model setup; and (v) that running a socioeconomic segregation model during the initiation of an ABM epidemiologymodel can have an e ect on the outbreak patterns of themodel. Our results support the use of segregation models as useful additions to the initiation process of ABM for epidemiology.


Introduction
. Agent-based models (ABMs) are a type of computer simulation made up of agents that can interact with each other and with an environment (Gilbert ). They are a popular tool in many fields because they allow for a greater flexibility when creating the model compared with more common methods such as mathematical modelling. ABMs can also capture unexpected aggregate phenomenon that result from combined individual behaviours in a model, which can lead to more accurate epidemiological modelling (Bruch & Atwell ).
. One area where ABMs are particularly useful is infectious disease epidemiology, this is because ABMs can capture the dynamics of the disease spread combined with the heterogeneous mixing and social networks of the agents (Bobashev et al. ). In order to be able to realistically model an outbreak and to be useful in real world scenario the model needs to be based on data and include the appropriate level of detail. If the right level of detail is used an ABM can produce accurate results. These details include not only characteristics of the disease such as infection rates but characteristics of the agents as well (Hunter et al. ). In an infectious disease model the demographic and socioeconomic characteristics of an agent can a ect the health of the agent. .
Many studies have found that demographic and socioeconomic factors can have an influence on individual health (Mackenbach et al. ; Doherty et al. ; Jessop et al. ; Endrich et al. ). Across Europe, there is a trend of higher morbidity and mortality among those with lower levels of income and or education, both factors that contribute to an individual's socioeconomic status (Mackenbach et al. ). Although there are many ways in which socioeconomic factors can a ect health, in relation to the spread of infectious diseases one major factor is vaccination rates. A number of studies have shown that various demographic factors can have an e ect on vaccination status. Doherty et al. ( ) studied the inequality in childhood vaccination status in Ireland and found that the majority of the vaccination inequality can be explained by socioeconomic variables. They found that socioeconomic status, household structure and equalized income explained most of the inequality in vaccination status, but also that a mother with non-Irish ethnicity reduces the inequality in vaccinations while an employed mother increases this inequality. Jessop et al. ( ) studied predictors for the uptake of the first MMR vaccine and determined that a higher level of MMR vaccination was found among children with working mothers, and higher income families. Lower levels of vaccination were found among children with degree level educated mothers, unmarried or lone parents and smoking mothers. A study by Endrich et al. ( ) found that in Ireland there is a lower chance of having been vaccinated for the flu if an individual has more than a primary level education and has a higher income. As many outbreaks of infectious diseases occur today due to low vaccination rates, it is essential to accurately capture this variation in vaccination levels within a population to accurately capture a disease spread. A cluster of individuals with a lower vaccination rate may lead to an outbreak even in a highly vaccinated population. Ireland's Central Statistics O ice (CSO) provides detailed demographic statistics on the Irish population at multiple geographic levels, the smallest of which is the small area level that contains information for between to dwellings (CSO ). While the scale of the CSO small area data might provide information on small areas that have a higher proportion of individuals with one characteristic over another -for example more single person households versus families with children -there may still be clustering of those individuals or households with those characteristics within the small areas that the data does not capture. .
If that clustering is not captured it is possible that the results of the model will not be accurate. More severe outbreaks of non-vaccinated individuals may be missed. On the other side less severe outbreaks due to clusters of vaccinated individuals leading to herd immunity in smaller communities might also be missed. This could create problems when using the results of such a model, for example a vaccination policy based o of model results that included socioeconomic clusters might focus on vaccinating individuals in certain socioeconomic classes while a policy based o a model without clusters might not be as e ective.
. While synthetic populations that can be used to set up agent-based models with a population of agents that represents the actual population exist for some countries such as the USA (Wheaton et al. ), there is no such synthetic population for Ireland. Without access to such a population we need to create our agent population that accurately represents the Irish population. Although the CSO provides rich data on the characteristics of the Irish population, there are some characteristics that are not available such as the level of socioeconomic clustering within neighbourhoods. In order to account for the possibility of socioeconomic clustering within an epidemiological ABM we propose using a segregation model focusing as a burn-in step in the setup of our ABM. The segregation burn-in will allow households in the model to move their home location within a given area in an attempt to find neighbours with similar socioeconomic status. The burn-in model should help include a factor, socioeconomic clustering, that we feel could be important in an infectious disease model but we do not have the appropriate data to simulate it initially. We believe that this allows better simulation of towns and cities than is possible using summary data at the small area level alone. We apply the segregation model step to an ABM created to simulate the spread of infectious diseases in Irish towns. Segregation models are a specific form of the more general social interaction model developed by James Minoru Sakoda (Hegselmann ). Sakoda's model features a checkerboard landscape where two di erent groups move across the landscape. Agents' movements are determined by positive, negative or neutral attitudes towards the other group. Although Sakoda's model includes segregation, segregation models were made famous by a set of models developed by Thomas Schelling in the s and s (Hegselmann ). .
As we are only assuming the model does not capture socioeconomic clusters, before adding additional steps into the model, we first must establish that there is a need for using a segregation model. Section provides background on segregation models. Section of the paper discusses the process of using real data on house prices, as a proxy for socioeconomic status, to determine the level of clustering that already exists in Irish towns. This is done by using the dissimilarity index, a measure commonly used to determine a numerical value for a level of segregation in a given region. The dissimilarity index is calculated for two Irish towns, Schull and Tramore. These towns are selected as examples for the model as they are towns previously used in the infectious disease model. In Section the distribution of agents by socioeconomic status in our initial model setup is used to calculate a second dissimilarity index for each town. The simulated dissimilarity index is compared to the real dissimilarity index (calculated in Section ) and is used to determine that an additional step in the model is needed to account for real world clustering by socioeconomic status that we do not capture in the model. Finally in Section , the segregation step of the model is implemented and again using dissimilarity indices we show that the ABM is better calibrated towards socioeconomic clustering a er the segregation model has been run. Section also discusses the e ects that clustering by socioeconomic status has on the outbreaks in each town by comparing two sets of model results one where the initial model setup includes the burn-in segregation model and one without the burn-in segregation model.

Segregation Models
. Although Schelling presented a suite of models the one that is most famous (the model is o en referred to as the Schelling model) is a two-dimensional spatial segregation model. Schelling's model environment is broken up into grid cells in a checkerboard pattern. Two groups of agents are scattered randomly throughout the checkerboard, each in its own grid cell. The two groups are primarily interpreted as individuals belonging to two ethnic groups, however it is possible to consider other groups. Some grid cells are le empty to allow for movement. Agents have a tolerance for the proportion of similar individuals that they want to live near, with the standard case being that agents do not want to be in the minority so they would have tolerance of at least . . Agents look at their neighbours, agents in the adjacent cells, and determine if the proportion of neighbours who are similar to themselves is greater than or less than their tolerance. If the proportion is equal to or greater than their tolerance the agent will not move. If the proportion is less than their tolerance the agent will move to the closest empty grid cell. Variation include changing the tolerance of agents, the proportion of the population in each group, movement rules, and neighbourhood size (Schelling , ).

.
Schelling's models illustrate the idea that slight individual behaviours or preferences can lead to aggregate results that the individuals did not intend. Schelling's models show how a small individual preference to not be a racial minority in a neighbourhood leads to neighbourhood segregation (Schelling ).  Schelling's results have proven to be robust with many other researchers recreating and expanding on his work. Stoica & Flache ( ) create a model that extends the idea of residential segregation to school segregation with families picking schools based on both distance and the existing racial mix at the school. Muldoon et al. ( ) investigate the e ects changing the utility function or the agents' preferences have on the final segregation of the model. They found that even under conditions where agents prefer to be in a small minority, segregation still occurs when agents have partial information about their surroundings. Survey data showing real preferences for has been used to show that while di erent groups have di erent preferences for neighbourhood make-up, the real world preferences can be used in a Schelling segregation model that produces the results Schelling predicted (Clark ).
. However, real world neighbourhoods are more complicated with other factors than race. For example, in looking at the household survey data, Clark ( ) note that white households will not discriminate against a number of equal status black households, showing that people not only take race into account but other factors such as income and education as well. Clark & Fossett ( ) assert that other factor such as multiple ethnic groups, socioeconomic status, and urban and demographic conditions are needed to truly understand and investigate residential patterns. These factors are not included in Schelling's model. Even when only one factor is considered, a constant tolerance level, which is o en used in Schelling type models, is unrealistic. For example, Benenson et al. ( ) use surveys to investigate cases of wealthier households in poor neighbourhoods and find that these wealthier households do not discriminate against their less wealthy neighbourhoods. In some cases there are advantages of the poorer neighbourhoods such as lower house prices. .
As the Schelling segregation model only considers a world with simplified features, work has been done to expand the Schelling model to include other factors that influence neighbourhood selection. Fossett ( ) includes not only race in his model but socioeconomic status as well, giving agents preferences for housing quality, neighbourhood socioeconomic status and neighbourhood ethnic mix. Benenson et al. ( ) use tolerance to di erent income levels as an agent variable in their model and allow it to change with the household when modelling the segregation in Israeli cities.

Assessing the Degree of Clustering by Socioeconomic Status in Irish Towns
. To determine if it is necessary to adjust a model for clustering by socioeconomic status we first must determine if there is evidence of the phenomenon in Ireland. We do not have data of the exact locations where individuals of di erent socioeconomic statuses live. However, the Property Services Regulatory Authority provides records in the Residential Property Price Register on the price of properties sold by address from to the present day (PSRA ). The Residential Property Price Register provides information including the date of sale, address, county and price . .
Over the past few years there has been emerging literature showing the relationship between property value and socioeconomic status (Co ee & Lockwood ). Moudon et al. ( ) found that neighbourhood wealth measures such as property values had the potential to replace area-level socioeconomic status measures. Coffee & Lockwood ( ) found that a relative location factor based on property value can be used as a proxy for socioeconomic status. Their study determined this factor can be used to enhance area level measures and identify groupings within a given area. If we can consider house price as a proxy for socioeconomic status then it should be possible to determine if clusters of households of the same socioeconomic status exist in Ireland using the data.

Residential property price register data .
For the purpose of finding clusters within the Residential Property Price Register we split the data into six subsets by the sale price of the house. The first subset has the houses with prices in the lower . % , the second subset has houses from the second . % and so forth.  Table : House Prices from PSRA ( ) .
The entire data set was then geocoded using QGIS ( ) so that it could be loaded into Netlogo (Wilensky a).

Calculating dissimilarity in Irish towns
.
To determine the level of clustering that exists in a given region the dissimilarity index is used. The dissimilarity index is a measure that determines the "evenness" of a population or the di erential distribution of social groups within a region which is composed of a set of areal units. It is a popular measure to determine the level of segregation in a space. The dissimilarity index is presented by Massey & Denton ( ) as one of the main measures to determine segregation. It is also used by the US Census Bureau in determining levels of segregation (Iceland et al. ). For the calculation of the dissimilarity index of a region the region is broken up into smaller spatial units call areal units. If any group is segregated then that group will be unevenly distributed. The index produces a value between and . The closer the value to the more even and less segregated a region is and the closer the value is to the less even and more segregated a region is. The dissimilarity index is calculated as: where n is the number of areal units in the region the index is being calculated for, t i is the number of households in areal unit i, p i is the proportion of minority households in areal unit i, T is the total number of households in the region and P is the proportion of minority households across the whole population of the region (Massey & Denton ). .
As our model is run on Irish towns we calculated the dissimilarity index for two towns that we have used our model for, Schull a small town in West Cork and Tramore a town in Waterford. Schull has a population of under , and Tramore has a population of about , . As the Residential Property Price Register data has only houses sold in Schull and sold in Tramore between January and February , the dissimilarity index was also calculated for Cork and Waterford cities to determine if the low numbers of houses sales in our target towns a ected the dissimilarity index. .
To calculate the dissimilarity index for each town, we break the Netlogo environment up into square grids. This is done by selecting a set of patches that will be the center of each square grid in Netlogo. Then the radius function in Netlogo is used to select patches within a radius of and units of each center patch ( unit equals patch in the Netlogo environment and approximately m 3 in the real environment being simulated). The radius of produces areal units of x patches and the radius of produces areal units made up of x patches. The square grids become our n areal units. As we are exploring the e ects of clustering on the model we use two di erent radii to determine how the size of the areal units a ects clustering. Because the house price group that is in the minority might vary between towns, for each town the dissimilarity in that unit is calculated for each house price group. For example, for the house in the first range p i becomes the proportion of households in that unit that are in the same price range and P is the total proportion of households in the same price range. .
Table shows the dissimilarity index for the two towns and two cities calculated with a radius of and units.
We investigated a number of other radii but only present results for two as we found a similar patterns and results for all radii. It can be seen from the table that the dissimilarity index for Tramore is similar to that for Waterford. Although there are some di erences between the dissimilarity indices they are within a range that could be explained by di erent make ups in the towns. Schull, however, has larger di erences in the less than e , range and the e , to e , range. This is likely to the limited number of houses in the data set in those ranges. The dissimilarity indices for the towns based on house price data will be used as a benchmark to compare the dissimilarity indices for our simulated towns based on socioeconomic status in order to determine if randomly allocating agents to locations within a small area creates an appropriate distribution of agents by socioeconomic status in the towns.

Modelling Irish Towns
. This work is part of a larger project that is building large scale, high fidelity agent-based models of Irish towns for epidemiological simulation. This section describes how these models are setup using publicly available datasets, and compares the dissimilarity index scores measured in the models of the towns Schull and Tramore with those calculated based on the PSRA. We then describe how the inclusion of a segregation model following Shelling changes this. Schull and Tramore are chosen as they are two towns in Ireland where concentrated measles outbreaks occurred. In there were at least confirmed cases of measles in Schull in one outbreak ).

Calculating dissimilarity in the Basic Town Model .
Our initial model is based on the model presented in Hunter et al. ( ), which was designed to model the spread of an infectious disease in an Irish town using only openly available data . The model utilizes small area data from the central statistics o ice to populate the town being modeled with agents (CSO ). To populate the model the number of households in each small area are added into the town and randomly placed throughout the small area. Households are assigned a household type (single, couple, couple plus others, couple with children, single parent, or other), adult agents are added into the households and then assigned a sex based on a probability distribution determined from CSO data. Agents are given an age based on another CSO probability distribution for their sex and small area. Children are then added into households designated as having children and given an age and sex. Agents are assigned an economic status based on age and a random selection following the distribution of economic statuses in the small area for their sex. Agents over are assigned retired, children are assigned student, and other agents are assigned work, unemployed, looking for first job, sick/disabled or stay at home. In addition to the characteristics given to agents in the Hunter et al. ( ) model, we assign working agents a social class again based on CSO data. Social class is the CSO variable used to capture socioeconomic status. Agents are assigned to the classes Professional workers, Managerial and technical, Non-manual, Skilled Manual, Semi-skilled, and Unskilled. Households are then given the social class of one of the adults in the house, the adult is randomly selected from all the adults in the household. For household social class Skilled Manual and Semi-Skilled are combined into one group (Skilled/Semi-skilled) and Unskilled is combined with unemployed individuals to create the Other group. The groupings are based o of those used by Doherty et al. ( ) in their analysis on the e ects of socioeconomic status on vaccination rates in Ireland. Irish vaccination data is used to determine the percentage of each age group that received vaccinations and this is adjusted based on socioeconomic status using the odds of having children vaccinated from (Doherty et al. ). We also included a retired grouping that was not included in the research by Doherty et al. ( ). Unoccupied households are also added into the model. The number of unoccupied households in a small area is taken from the census data and the households are randomly placed in locations in that small area. Agents with an economic status of work are given a workplace location and school locations are added into the model. Agents move between work, school and within the community. At the start of the model a set number of agents are given the status of infected. The disease model is based on an SEIR compartment model. As the infected agents move through the town they have a certain probability of infecting susceptible agents they come into contact with (Hunter et al. ). .
We initiate setup for our model for both Schull and Tramore times (each run will be slightly di erent due to random initialisation) and calculate the dissimilarity index for each of the household social classes each time the model is initiated. We then find the average of the dissimilarity index across the model setups. The dissimilarity index is calculated twice for each town, once using areal units with a radius of and once using a radius of . Similar to how the dissimilarity index was calculated for the real housing price data, for each areal unit the dissimilarity in that unit is calculated for each social class. Although house prices and socioeconomic status are not an exact match from the literature we can assume that house prices serve as a proxy for socioeconomic status. If randomly placing houses of di erent social class within a small area produces realistic neighbourhoods clustered by socioeconomic status we would expect the dissimilarity indices from the real house price data set to be similar to the dissimilarity indices from the model. Comparing the values for the dissimilarity index for each social class from the initial setup of our model to the values for the dissimilarity index for the house price ranges it can be seen that the dissimilarity indices from the simulation are generally less than those coming from the real data. As the house prices and are not similar we can conclude that even when using small area data an ABM that uses a random distribution of agents within these areas does not accurately portray neighbourhood socioeconomic status and it may be necessary to adjust the model to account for this. The only exception is the housing price range less than e , in Schull has a dissimilarity index of .
when a radius of is used compared to values of about . for all the social classes from the simulation. The low dissimilarity index from Schull is likely due to small data samples.

Using a segregation model to better model dissimilarity .
In order to account for clustering in neighbourhoods by socioeconomic status that we see in our house price data set, but not from the initial setup of the ABM, we add an additional burn-in step into our model setup process. Once the model is populated with agents and all of the agents are assigned the appropriate characteristics we give households the opportunity to move using a Schelling type segregation model. However, unlike in Schelling's models where race is used as the segregating factor we use social class. Each household in the model has a social class assigned to it and households will seek to surround themselves with households of the same social class. To allow this households are given the opportunity to move to more attractive unoccupied houses during the burn-in process. .
The model is run on discrete time steps. Each household is considered at each time step. For each household, if the proportion of neighbouring households with the same social class is below a pre-set tolerance level then the household will move to a location with more neighbouring households matching its social class. If the proportion of neighbouring households with matching social class is above the threshold the household does not move. Households moving to new locations is enabled by the inclusion of unoccupied houses in our model setup process. When a household moves they move to the unoccupied house in their current small area with the highest proportion of neighbours matching their own social class. If a better location than their current one is not available households do not move. The burn-in process stops a er a time step of no households moving.
. The burn-in process involves two parameters: neighbourhood size and a tolerance level for households of different social class. For neighbourhood size we use the same radii used for calculating dissimilarity (see Section . ): units and units. The next section describes experimental results that are used for setting the tolerance level.
. Once the segregation model has stopped running, agents who are students are assigned to the school that is in the closest distance to their house and the disease model can be run.

Results
. This section presents the results from two experiments. First we look at how the dissimilarity index for the two towns changes as we adjust the radius and tolerance of agents in the model. The second experiment determines how the clustering a ects the results of the infectious disease ABM.
Using segregation modelling to model socioeconomic clustering .
In this section we report an experiment that tested whether the use of a socioeconomic segregation model improved the calibration of an ABM in terms of making the dissimilarity index of the neighbourhoods within the model a er the segregation process has been run more similar to real data than the dissimilarity index prior to the segregation model being run. In order to run a socioeconomic segregation model as part of an ABM setup process we must set hyper-parameters: the number of iterations the segregation model runs for (Due to the stochastic nature of the setup, di erent results can be obtained each iteration or run of the model. Each iteration is a di erent initial setup of the model. There is a burn-in process for each iteration which runs until there no agents move in a single time step.), the tolerance level used in the model, and the radius used in the model. To explore the interactions between the hyper-parameters of the segregation model and the e ect of the segregation model on the calibration of the model we: fixed the number of iteration to be and set up a grid search process where tolerance took the values from . through . and the radius parameter was set to either or . For each combination of hyper-parameters an ABM model of Schull and an ABM model of Tramore was created and a socioeconomic segregation model burn-in process was run in the burn-in process. A er each iteration, the dissimilarity index for each social class within each town was calculated and stored and an average was taken across the iterations. Figures a and b show the setup for Schull and Tramore a er the burn-in model. Clusters of households by color can be seen in both towns, however, as Tramore has a larger population, the clusters are more distinct in Tramore.

.
Table gives the average dissimilarity index for the managerial and technical social class for each of the combinations of tolerance and radii across the iterations. Comparing the final dissimilarity index for each town with the starting dissimilarity index and the real dissimilarity index based on housing price data, it can be seen that the final dissimilarity index is closer to real dissimilarity index. Although we only present the results for one of the social classes in Table , Figures and present the results for each of the social classes for Schull and Tramore respectively. The plots in each figure show how the dissimilarity index changes as the tolerances increases. Dashed lines are the starting dissimilarity index while solid lines represent the final dissimilarity a er the burn-in process. This can be seen further in Tables and . The two tables show the starting and ending dissimilarity index for each social class in Schull and Tramore respectively. The model used to find the dissimilarity index for Tables and had a radius of and a tolerance of . .

.
Within a town the changes in the dissimilarity index based on tolerance and radius follow a similar pattern for each social class. However, between Schull and Tramore the di erences are greater. This is likely due to the di erence in size between the two towns. Tramore has ten times the population of Schull and thus more households and a greater distribution of social classes. For both towns using a radius of results in higher levels of dissimilarity than a radius of . However, for Schull the di erence between the starting dissimilarity and the dissimilarity is much greater for radius . As the tolerance increases for the Schull model the di erences between the dissimilarity for radius versus radius decreases. For most social classes in the Tramore model, as can be seen in Figure , the di erence between results from radius and radius tend to be smallest using a tolerance of . and . and then as the tolerance gets greater the di erence increases.  Table : The starting and ending dissimilarity index for the all CSO social classes for Schull. The model used had a tolerance of . and a radius of .

Social Class
. Although in both towns the simulation still provides a lower value for the dissimilarity index compared to the house pricing data, we feel that the increase in value for the dissimilarity index especially for Tramore moves towards a more realistic artificial society for our simulation. The model shows it is possible to use a segregation type model as a step in the setup of an ABM to make the model more realistic. One of the factors causing the di erences in clustering could have to do with the extra retired social group that we added. This was done since social class was based on employment status in our model. However, in reality the socioeconomic status of retired individuals might be based on their socioeconomic status before retirement.  Table : The starting and ending dissimilarity index for the all CSO social classes for Tramore. The model used had a tolerance of . and a radius of . Assessing the impact of socioeconomic clustering on outbreak modelling .
Although it is useful to show that the segregation model does in fact lead to a model that includes socioeconomic clustering, it is important to determine if it has an e ect on our infectious disease model. If clustering agents by socioeconomic status has no e ect on the results of the final epidemiological model it will not be a useful addition. In order to determine what influence the clustering has on the end results of the model, the infectious disease model was run times with the socioeconomic segregation model as the final steps in setup. For the socioeconomic segregation model a radius of is used with a tolerance of . . The model is run for both Schull and Tramore and results are compared to model runs without clustering. Tables and show the summary of the results across the runs for Schull and Tramore respectively.
. Comparing the results for Schull it can be seen that the distribution of total number of infected cases is similar for both the runs with and without clustering. In fact the clustering leads to a smaller median and mean  Including clustering leads to an almost % increase in the number of runs with -agents infected in the outbreak compared to models without clustering. The Tramore distribution when clustering is included shows a higher percentage of runs with a larger number of agents infected compared to the distribution when clustering is not included. .
The results are not completely unexpected as Schull is a small low density town and the segregation model does not make a large di erence in the level of socioeconomic clustering. For example, from Tables and , the dissimilarity index for the managerial and technical social class with a radius of is . before the segregation model and .
a er the segregation model. This is compared to the Tramore where the dissimilarity index before the segregation model is . and a er is . . Thus it makes sense that the segregation burn in model  does not have much of an influence on the outcome of the infectious disease model for Schull but does have an e ect on Tramore. Intuitively the greater magnitude of the Tramore outbreaks also makes sense as in the clustered Tramore model, agents with similar vaccination rates are living closer together and thus interacting more leading to more infections. Since students choose the closest schools to their home that should also lead to increased infections if students with the lower vaccination rates due to their socioeconomic status attend the same schools. As Tramore has multiple schools for students to attend while Schull only has one primary and one secondary school this could also be a factor as to why there is less of a di erence between the Schull runs. The increase in the number of runs with smaller outbreaks in the Schull model with clustering could be a result of the clustering leading to communities within Schull with a higher overall vaccination rate resulting in the outbreak ending sooner in those communities due to a lack of susceptible individuals. The results show that the socioeconomic segregation model can have an e ect on the outcome of the model.

Conclusion
. Our model successfully used a segregation model to create more realistic distribution of socioeconomic status within small areas in order to setup our ABM. Through using house price data we determined that clusters by socioeconomic status exist in Ireland and that by randomly placing households within a small area we did not capture the correct level of clustering. Not having the appropriate mix of socioeconomic status could have an e ect on an infectious disease model as the overall neighbourhood health and vaccination level of a neighbourhood has an e ect on individual health.

.
Not only have we shown that we can use a segregation model as a step in the setup of our ABM for infectious diseases but we have determined that clusters of individuals by socioeconomic status can have an e ect on the outcome of the overall model. While Schull does not show this e ect, we believe this is due to characteristics of the town. A small town with less than , residents may not have enough of a sample of individuals in di erent socioeconomic groups for clustering to make a di erence. In addition, we feel that the school settings should have a large impact on the results of the model. If a school is located in an area that has lower vaccination rates an infectious disease might spread through the school quicker than if there was an equal distribution of vaccination rates within the school. As there is only one primary and one secondary school in Schull the distribution of vaccination rates will not change regardless of the level of clustering. The increased magnitude of outbreaks in the Tramore runs lead to the conclusion that socioeconomic clustering can result in a di erent outbreak pattern. Tramore has a much greater population that Schull, with just under , residents allowing more distinct cluster of agents by socioeconomic status to form and allowing the spatial distribution of agents to contribute to the model results. Thus leading to the conclusion that socioeconomic distribution should be considered as a factor in an agent-based model for infectious diseases. .
More work can be done to investigate how a di erent radius or tolerance will a ect the results of the model and how the results from other towns besides Schull and Tramore are influenced by including clustering. Other extensions to the model could include running something similar to the segregation model but with workplaces. Agents are currently randomly assigned to work places without regard to social class. A segregation model for work assignment could make a more realistic ABM. In addition as both towns are small future work should focus on the feasibility of scaling the burn-in to larger towns and cities and how socioeconomic clustering might be di erent or have a di erent influence on outbreaks in a denser population.
. It is impossible to make a direct comparison between the house price data set, used to determine if clustering exists, and the simulated socioeconomic data set due to data availability. The data desired to create a model is not always available. For example, while we can find house prices for sold houses we do not know the characteristics of the individuals living in those houses. In addition, we have information on individual characteristics but not have their house prices or exact location. We feel that to move forward in the field of ABMs it is necessary to take limited data and use assumptions, such as the comparison between house prices and socioeconomic status, to fill in the gaps of available data. .
In addition, it should be noted that while the dissimilarity index is a measure commonly used to determine the level of clustering, it has some disadvantages. It is measured using non overlapping areal units and clustering across areal units is not taken into account with this measure. As an initial exploration into clustering within small areas, and because we are looking for clustering within and across the small areas (each small area has an accurate distribution of households by socioeconomic status before the burn-in model) we feel that the measure gives an acceptable comparison of the clustering before and a er the burn-in. However, further work can be done using other spatial methods of clustering such as autocorrelation that might prove the results more robust.