* Abstract

Choice mechanisms and social networks, including "marriage markets", seem well-suited to be modelled using agent-type simulations. Few real-world empirical examples are available in the public literature, particularly those using human populations of size. We reviewed partnership models in both the micro-simulation and agent-based literatures. We then empirically implemented an algorithm derived from two established models using inter-censal data on first partnerships in New Zealand over the period 1981-2006. The purpose of the exercise was to test the robustness of different parameter settings and to determine whether a model simulating partnership selection among eligible never-married young adults at one census period is feasible for predicting patterns of partnership, co-habitation and marriage at the next. Varying simulation time and social network size parameters of the model showed that patterns of ethnic partnering could be consistently produced and were not dependent on these model settings. Examining the different scoring methods showed that age similarity, education similarity, and previous partnering patterns could produce partnership patterns similar to those seen in the census. The simulation produced patterns of ethnic partnering similar to those seen in the census and seemed robust to different parameter settings. To further improve these results, an optimised combination of the scoring components is proposed. The simulations also provided preliminary evidence of ethnic preferences in the New Zealand marriage market.

Marriage, Ethnicity, Homophily, Simulation

* Introduction

Simulation provides the opportunity for the analysis of choice and network mechanisms, such as the examination of partnership formation or "marriage markets" (Billari, Diaz, Fent, & Prkawetz 2007). Such simulation models have been applied in economics (Chantarat and Barrett 2008), criminology (Weerman 2011), anthropology (Hoppitt and Laland 2011), geography (Xu & Sui 2009), public health (Eubank et.al. 2004), and innovation (Gilbert, Pyka, & Ahrweiler 2001). In this paper we review the models of partnership used in the micro-simulation and agent-based literatures, before implementing a population-scale marriage market model and testing different parameter settings for robustness.

The key measures of interest for this study are the patterns of inter-ethnic cohabitation, as they provide an insight into the changing interactions between ethnic groups, as well as being of interest in a population modelling context. The simulation modelling of these patterns uses non-ethnicity based micro variables, observation of the macro environment, and a random stochastic factor to simulate the partnering process. The simulation is run to model the partnership choices of single people in the 18-30 age group empirically identified from micro-data in the New Zealand Census, with ethnicity as a key choice variable. The ethnic partnering patterns generated by the simulation are observed and compared to the actual patterns of cohabitation or partnership in the 23-35 age group detected in the micro-data at the following census five years later. We wish to test the feasibility of such a model and its robustness to different parameter settings.

* Modelling Partnership Formation

Here we review the micro-simulation and agent-based traditions of partnership modelling followed by the outline of a mate selection algorithm in the light of these literatures.

Micro-simulation models

A number of large scale microsimulation projects have incorporated partnership matching into their population and policy simulation models. This section examines a number of these models. An early discussion of such models was done by van Imhoff and Post (1998), who provided a brief history of microsimulation population models and contrasted micro and macro (microsimulation using a sample that is scaled up vs. microsimulation using a whole population) approaches to such models. The models were all simple Monte Carlo processes, so a major criticism of both types of models has the lack of realism in describing the partnership process. The authors concluded that both approaches to microsimulation generally performed well under conditions of a sizeable state space and could provide rich output, but results could be influenced by model specifications, particularly if scaling up from a sample.

Bouffard et.al. (2001) compared the standard life table method with an alternative stochastic algorithm. This was based on a logistic regression using both the age differences of the potential partners and economic factors to find better matches. Validation testing demonstrated that the stochastic algorithm provided better predictions against American census data than the fixed probability method. Similar findings on stochastic algorithms compared to fixed probabilities were reported by Perese (2002) who also applied a logistic regression-based algorithm to marriage matching within a microsimulation model.

The DYNASIM model, first developed by the Urban Institute ( http://www.urbaninstitute.org) as an income model in the 1970s (Orcutt, Caldwell, & Wertheimer II 1976) but has seen updates in the 1980s (Zedlewski 1990) and 2000 (Favreault & Smith 2004). The DYNASIM matching algorithm is a Monte Carlo styled method based around the age differences and education differences of the agents. This method is appealing in several ways. By creating a probability function rather than relying on regression models, the algorithm is not bound by assumptions of independence or other statistical factors. Other researchers have also used the same basic matching function, including the Australian Dynamic Population and Policy Microsimulation Model (APPSIM) (Bacon & Pennec 2007; Harding 2007).

Two European models, the Microsimulation Model of Family Dynamics (FAMSIM) (Spielauer & Vencatasawmy 2001) and the Simulating Social Policy in an Ageing Society (SAGE) model (Cheesbrough & Scott 2003), represent a more robust method than that of the regression approach. Each uses retrospective accounts of partnership histories, collected from stratified random samples of women in their respective regions, to inform their models, rather than using matrices of actual and hypothetical partnerships as logistic regression outcomes.

Not all microsimulation models of partnership formation focus on predictive outcomes. Chen (2005) uses an alternative paradigm, electing to examine possible strategies for the process of how people find their partners rather than focussing solely on an extrapolation-based outcome. He uses simulation models to examine five different theoretical matching strategies and finds that a "well-rounded" which combines satisficing and compensatory behaviour provided the most efficient matches.

Table 1: Summary of microsimulation partnership models

Model/ReferencePartner Matching VariablesPartner Matching Process
(Bacon & Pennec 2007; Harding 2007)
Age difference
Years of education difference
Similar to DYNASIM. Singles randomised. First pair evaluated using exponential probability function. Paired if random number less than probability, otherwise repeat for up to 10 potential partners. If no matches are made and total number of couples has not been met, then pair those with highest probability.
Bouffard et.al. (2001)Age difference, correlation of husband-wife earnings.Data is set up with each male paired with every possible female partner. Logistic regressions are run with outcome 1 for actual partner and every other woman with the same characteristics, and 0 otherwise. Transition probabilities from this model are used in Monte Carlo simulation.
(Perese 2002)
Age, education, average lifetime earnings quintile, marriage numberData is set up with each male paired with every possible female partner. Logistic regressions are run with outcome 1 for actual partner and every other woman with the same characteristics, and 0 otherwise.
Chen (2005)Two generic rankable traits (ie a variable named trait rather than an actual demographic)Trials 5 different methods of matching: best only, well-rounded, differential, compensatory, and immediate.
(Zedlewski 1990)
Age difference
Years of education difference
Singles randomised. First pair evaluated using exponential probability function. Paired if random number less than probability, otherwise repeat for up to 10 potential partners. If no matches, then pair those with highest probability.
(Spielauer & Vencatasawmy 2001)
Children, age, education, pregnant.Monte Carlo simulation based on logistic regression models using retrospective partnership histories from survey.
(Cheesbrough & Scott 2003)
Age, marital status, education, pregnant.Monte Carlo simulation comparing transition probabilities based on logistic regression models using retrospective partnership histories from survey.

Agent-Based Models

Some of the earliest agent-based simulations of partnership were conducted by Kalick and Hamilton (1986; 1988) in the mid- to late-1980s. Since 2000, following improvements in computing power and the availability of new programming languages and simulation tools, the number of publications has increased, in particular thanks to a number of different collaborations involving Peter Todd of the Max Planck Institute for Human Cognitive and Brain Sciences ( http://www.cbs.mpg.de/index.html) (Hills & Todd 2008; Miller & Todd 1998; Simão & Todd 2001; 2002; 2003; Todd & Billari 2003; Todd, Bilari, & Simão 2005; Todd & Miller 1999; Todd & Miller 2002).

Rather than building these matching models for prediction, one of the main goals of these works has been to actively search for emergent patterns in the simulated populations, such as the shape of the aggregate age-at-marriage distribution (Todd & Billari 2003; Todd, Billari & Simão 2005). In both of these articles, the authors demonstrated the effectiveness of a simple search strategy based on a two-sided (mutual) process where matches were based on the combined evaluations of the male agents and the female agents. By comparison, the Simão and Todd articles (2002; 2003) focussed less on the emergent outcomes of their partnership simulations and more on the simulation process itself.

Hills and Todd (2008) developed an agent-based model known as MADAM (Marriage and Divorce Annealing Model). MADAM built on earlier work by Miller and Todd (1998), Todd and Billari (2003) and Todd, Billari and Simão (2005), using a "homophilic trait matching model" which would gradually relax the expectations of the matching preferences as the agents aged. The main matching process involved a randomly generated world of agents, each of which have k mate-relevant traits from a set of N possible traits. Each will initially seek a mate who has a perfect match of the same k traits but, as they age, they will settle for a mate with j ( j < k) traits. This satisficing level was built into the simulation by using the function: j = k.exp(-lambda.t), where j is the current threshold required to partner, k is the initial number of matched traits required, and lambda is the rate of decay of expectation. Like the DYNASIM model this is a decaying exponential function of the form exp(-f(x)).

Alam and Meyer (2008) used an agent-based network simulation to study the spread of HIV/AIDS in an African village. They developed two matching algorithms to model the sexual networks in the village. Two different choice mechanisms were tested. The first one involved each agent having an attraction score and an aspiration value. The second scheme used an "endorsement mechanism", where the decision to partner was based on a potential partner having sufficient positive feedback from others in the network, akin to an explicitly communicated social norm. The authors found that both methods gave broadly similar results, although the endorsement mechanism had less variability than the attraction/aspiration one (see Table 2).

Table 2: Summary of agent based partnership models

Model/ReferencePartner Matching VariablesPartner Matching Process
Alam & Meyer (2008)Attraction scores and endorsement scores.Partner selection occurs either through attraction scores of each agent being greater than the aspiration value of the other agent or through high cumulative endorsement scores from others in the network.
Hills & Todd (2008) MADAM modelN generic mate relevant traits.Agents partner when they find k matches out of N generic traits in another agent. The number of matches required decays exponentially as the agents age.
Kalick & Hamilton (1986)A single generic rankable trait.Agents are either attracted to the agent with the highest trait, the agent with the most similar trait, or a combination of both.
Simão & Todd (2002; 2003)A normally distributed generic trait and aspiration level for evaluating partners.Agents encounter each other randomly. They compare each other's traits with their aspiration levels, then make "offers" to one another if traits are both greater than aspiration. Otherwise, one will reject the other and they adjust their aspiration levels.
Todd & Billari (2003), Todd et.al. (2005)A normally distributed generic trait and aspiration level for evaluating partners.Agents encounter each other randomly. They compare each other's traits with their aspiration levels, then make "offers" to one another if traits are both greater than aspiration. Otherwise, one will reject the other and they adjust their aspiration levels.

Algorithms of Mate Selection

The algorithms for mate selection are pivotal to a successful model. In microsimulation models there are typically two approaches - the stable marriage approach, and the stochastic approach (Bacon & Pennec 2007) - whilst agent-based models are more likely to have some sort of rules or heuristics for decision making (Spielauer & Vencatasawmy 2001).

One of the earlier algorithms for partnership matching came about from an applied mathematical problem which aimed to match students to colleges such that the utility of individuals could not be increased by swapping any pair of individuals (Gale & Shapley 1962). It was shown that there was always a combination which would meet this requirement. The application of this algorithm to marriage problems became known as the "stable marriage problem" (Gusfield & Irving 1989). Although this algorithm is relatively simple and mathematically elegant, it is not suitable for the empirical simulation of the New Zealand census data for a number of reasons. The model assumes that every member of the population can evaluate every other member of the population, which is both unrealistic and computationally expensive at a population level. Also the algorithm requires the participants to be able to explicitly rank one another. In real life, traits such as education are also not immediately obvious, and neither is information about a person's potential pool of partners. This means that one of the problems that must be considered for the matching algorithm is that of incomplete information. For more recent applications of the stable marriage problem, see Mumcu and Saglam (2008) and Saglam (2011).

One of the more famous matching problems which deals with the problem of incomplete information is the "Secretary Problem" (also known as the "Dowry Problem") (Freeman 1983). The basis of this "problem" is that, because of the way in which the interview and hiring process is set up, the "best" candidate might not be selected. By incorporating the concept of incomplete information (Ferguson 1989), the Secretary/Dowry strategy is a more realistic approach to partnership matching than the stable marriage algorithm. Although this strategy is more realistic than the stable marriage algorithm, it assumes that people in the population will wait until they have viewed or dated nearly 37% of potential mates before making a decision. Any reasonable algorithm must include some form of satisficing in order for it to be considered practical as a partnership choice model.

An alternative to the combinatorial-based approaches such as the stable marriage algorithm, is to use some sort of stochastic strategy. This is often referred to as Monte Carlo simulation and generally involves drawing random numbers and comparing them to some existing probability value. In a number of marriage studies (Bouffard et.al. 2001; O'Donoghue et.al. 2009; Perese 2002; Spielauer & Vencatasawmy 2001) these probabilities were generated via logistic regressions, but they may be drawn from other sources such as actuarial life tables or suitable probability distributions.

Two contrasting examples of stochastic methods are the DYNASIM (Zedlewski 1990) and APPSIM (Bacon & Pennec 2007) models and the Congressional Budget Office's Long-Term (CBOLT) model (Perese 2002). The DYNASIM/APPSIM model randomly sorts the bachelors and bachelorettes. The probability that the first available bachelor will partner the first available bachelorette is calculated by:

P(unionmf) = exp(-0.5sqrt((agem - agef)2+ (edum - eduf)2))

This figure is compared to a random uniform number. If the random draw from a uniform distribution is less than the calculated probability then a match is made.

By comparison, the CBOLT model uses information about n couples to create an n-squared size dataset, where each male is paired to his real partner and then each other woman. A new binary variable is created, which takes a value of one for each male's actual partner, and each other woman with identical characteristics to his partner, and a zero otherwise. This binary variable is modelled by a series of covariates, using a logistic regression model. The resulting model is used to generate the probabilities of each match in the simulation. These probabilities are compared to randomly drawn values to determine which couples are matched together in the simulation.

The advantage of these kinds of stochastic approaches is that the simulation can be informed by empirical evidence, and it can incorporate variability through the use of random numbers. The downside of a stochastic approach is that it often requires more detailed information about the population or system that is being simulated, and can subsequently require further detailed data.

An additional problem with the logistic regression methods is that the way the data is formulated invalidates the regression assumption of independence. The algorithm works by creating a set of "hypothetical partners" for each male. It then uses logistic regression to generate the probability of the match. In reality, a person can only take one partner, so the additional "hypothetical partners" who are created are not independent observations, and therefore this has the potential to create bias in the logistic regression estimates.

A third approach that is seen more often in agent-based models is to use a set of sociological rules or heuristics for the agents in the model to make their partnership choices. These rules may be deterministic, or they may incorporate some kind of stochastic element into the decision-making process. For example, an agent may choose the partner with the most similar education level to themselves or consider a particular age difference to be ideal. Examples of such heuristics can be seen in Chen (2005) and Alam and Meyer (2008).

* Model Implementation

Our purpose is to implement a model of marriage markets with real-world data of population scale - i.e. the New Zealand Census - and to test its robustness for different parameter settings. This section discusses how our review of the literature on partnership matching led to a model implementation that extends the current work in the area, and is achievable given the available data.

Although the logistic matching routines used by Bouffard et.al. (2001) and Perese (2002) are not practical with the New Zealand data, both highlight the value of a stochastic method over deterministic models, while Bouffard et.al. also suggests that simulating segmented marriage markets through the use of independent sub-markets will improve accuracy. The DYNASIM (Zedlewski 1990) and APPSIM (Bacon & Pennec 2007; Harding 2007) models both utilised a simple probability function, which was demonstrated to be effective and can be produced with the available New Zealand Census data.

The literature on the agent-based modelling of partnership choice is sparser than that of microsimulation. The models typically use small artificial datasets rather than real data, although some of the more recent literature has incorporated some level of comparison to actual data (e.g. Hills & Todd (2008) comparing to real age-at-marriage distributions). One of the reasons for this is that the agent-based models tend to focus on process-related factors or on the demonstration of emergent properties, rather than prediction.

The simulation models of partnering fell into three broad categories. The first was combinatorial methods such as the stable marriage algorithm. These methods had their origins in mathematical theory, and although they were mathematically elegant, they assumed that every individual had prior knowledge about every other individual to use in their decision making. It was also a computationally expensive algorithm, which would potentially require billions of iterations for a city-sized dataset. The second category was microsimulation models that tended to use empirical data to match their agents. These models tended to be either very data intensive or use very simplistic matching algorithms. The third category was agent-based models. These were more likely to use artificial data but work with more complex algorithms and matching heuristics.

Our algorithm builds on the ones used in the DYNASIM (Zedlewski 1990) and APPSIM (Bacon & Pennec 2007) simulation models. The simplicity and intuitive reasoning used in the algorithms of these two models was one of the appealing factors for using a similar style of algorithm. In addition, the algorithm also appealed because it could be applied to the census datasets without requiring additional information or adaptation of the data.

Although there is evidence that social networks often exhibit homogamy (McPherson, Smith-Lovin, & Cook 2001), most simulation models still use randomly allocated social networks (Bacon & Pennec 2007; Hills & Todd 2008; Todd & Billari 2003; Todd et.al. 2005). The two main reasons for this are that either a more complex method of allocating networks would complicate the model and possibly confound any effects that are seen, or there is insufficient information about the structure of social networks in the population of interest for any significant improvements to be made in the allocation of the networks. For this simulation there was no information about the social networks of New Zealanders and there was only limited information available in the simulation dataset that could have been used for allocating the networks. Beyond this, a more complex method for allocating the social networks could confound any patterns that are generated.

One of the improvements that this simulation makes over the DYNASIM and APPSIM models is the use of a competitive marriage market, where the couples with the strongest level of attraction are paired first. Since the number of comparisons required in a competitive partnership model make it computationally expensive, marriage models have tended to match agents as they go. This creates the problem that the agents that get matched first are not necessarily the most attracted to each other or the best match, but are in part just a by-product of being first on a randomly sorted list. Once an agent has been paired with a partner, they are no longer available to be partnered again and will not get added to any new social networks.

* Method

The simulation model was programmed in Java and applied to real, unit-level micro data from the New Zealand Census. In order to deal with the large amount of data and the associated number of iterations involved in the matching algorithms, the code was run across a secure grid computer system. In addition to the increased processing power, the grid computer system also provided the high level of security that was necessary for the unit-level census data that was used as the main input for populating the simulation.

Simulation Data

The simulation is populated with unit level data from the New Zealand Census of Population and Dwellings in such a way as to create a simulation environment which closely resembles regions of New Zealand at each census date.

The agents in the model represent all those individuals who are unpartnered (not living in a married or de facto relationship) and aged between eighteen and thirty in the Auckland, Wellington and Canterbury territorial regions. This definition of "single" was necessary as the census does not provide information on non-cohabiting relationships. The regions were chosen for two reasons. Firstly, they are three of the largest territorial authorities, thus alleviating privacy concerns with the use of unit-level data. Secondly, the three regions represent three different levels of ethnic diversity, from the high level of ethnic diversity in Auckland to the lower levels in Canterbury and Wellington. Due to privacy requirements the construction of the simulation algorithm and scoring method was constrained to using the age, education and ethnicity of the agents in each region.

The age grouping of eighteen to thirty year olds was used in order to create a fixed cohort that could be compared from one census to the next. The eighteen to thirty year old group at one census would become the twenty three to thirty five year old group at the next, allowing inter-census comparisons to be made and validation of the results to be performed.

Simulation Approach

The model is based on the DYNASIM model (Zedlewski 1990) which was used to simulate partnership as part of a taxation model on American data and has subsequently been applied to a national level simulation (APPSIM) of Australian data (Bacon & Pennec 2007). To allow for model transparency and replicability the full simulation code is available from: https://researchspace.auckland.ac.nz/bitstream/handle/2292/5823/02whole.pdf?sequence=6. However, the data used in the simulation is covered by a privacy agreement with Statistics New Zealand so cannot be provided.

The premise of the model is that each single male agent is initially assigned a random "social network" of potential female partners. Each of these potential partners is assigned a score using an exponential function that is based on the DYNASIM and APPSIM models. Starting with the couple with the highest score, partnerships are formed and those agents are removed from the system. The scoring of men by women and women by men is symmetrical so matches are based on mutual attraction. The total number of partnerships formed at each time step in the simulation is determined by dividing the total number of partnerships that were actually formed over the five-year census period by the number of time steps used in that simulation. Additional female agents are then added to the social networks, simulating the male agents "meeting" new people. The three cities are treated as three separate simulations as they are geographically separate and only a small amount of mixing would be expected between them. However, there is no geographic segregation of social networks within each of the cities.

One full execution of the simulation represents a five-year census period. It is run for a pre-determined number of time steps, where each time step represents a period of five years divided by the number of time steps that are used. The default number of time steps used is five, meaning that each time step represents one year. Once the simulation completes the final time step a cross-tabulation of partnerships by ethnicity and a calculation of the deviation from the actual census figures are generated. The simulations for each of the three regions are run independently of one another.

Mate Selection Algorithm

The simulation starts by having each male agent form a randomly assigned social network of female agents. This is the initial pool of women that will be evaluated as possible partners.
INITIALISE social-network
FOR EACH time-step
	FOR EACH single-male
		Append social-network
		Score all potential partners in social network
		Sort all partner scores
		Match pre-determined number of couples (from highest score)
	END FOR EACH single-male
	Remove partnered individuals
	Increment age of agents
END FOR EACH time-step

The potential partnerships of each male agent with each female agent in his network are scored using an exponential function which is based on the DYNASIM and APPSIM models which is shown below. The scoring mechanism incorporates the similarity of the age and education levels of the agent, a stochastic "attraction" factor, an observed macro factor (in this set of experiments the number of relationships with the same ethnic pattern observed in the previous time step), and an age dependent component. This simulation incorporates the decaying expectations into the main scoring function rather than matching all couples above a certain threshold score. This is because the simulation sets the number of couples that will partner in each time period at the outset of the simulation and then matches the couples in order, starting from the highest attraction score.

The attraction score between male i and female j is calculated by:

Score i,j = exp(-sqrt( threshold + scoring variable))

where: Threshold = 80 - agei - agej and the scoring variables and ranges are shown below in Table 3.

Table 3: Varying social network size

VariableDefinitionScoring EquationRange of Values and Scores
AgeAge difference, squared to remove negatives and discourage extreme matches(agei - agej)2Agents aged 18 to 30 at start (23 to 35 at end) of simulation
Scores range 0 (same age) to 144 (maximum age difference)
EducationEducation difference, squared to remove negatives and discourage extreme matches(edui - eduj)2Education variable of 0 (no education), 1 (school), 2 (vocational), 3 (tertiary)
Difference can range 0 to 9
"Macro"Social pressure, measured by subtracting rate of same kind of ethnic match per 10 couples in last time step. More similar matches equating to more pressure for same kind of match.Macro score-10 to 0
RandomRandom number from Uniform[0,9] distributionRandom score0-9

The scoring structure was based on the APPSIM model described in the "algorithms of mate selection" section (using the exponential function and the age and education differences). It was also constructed in a way that would allow for multiple scoring variables to be weighted and combined within the exponential function in future research (which is why the threshold and age decay component was included for the testing of each scoring variable, and the similar ranges of scores for Education, Macro and Random were chosen).

At each time step, once the scores have been allocated, N/T couples are formed, where N is the net change in the number of couples in the age cohort over that five year census period and T is the number of time steps for that particular simulation. This method was one of the improvements that APPSIM made over DYNASIM, ensuring that the simulation will produce the correct number of couples seen in the census data. For this simulation it allows for a much easier comparison of the ethnic patterns within the simulation output.

The simulation algorithm requires the male agents to form social networks of the female agents. To ensure that this does not create different patterns from having the females select the males, the simulation was run with the male and female roles reversed. Given the symmetrical nature of the scoring/preferences, there was little change in the frequencies and patterns of ethnicity between the male-led and female-led simulations.

* Results

There are two subsections, the first examining the effect of changing some of the internal parameters of the model, and the second presenting the outcomes of running the simulation with each of the four scoring variables. The number of actual partnership formations is calculated by taking the difference in frequencies between the eighteen to thirty year-olds in the simulated census period and the twenty-three to thirty-five year-olds in the following census. For some groups, this net change is quite small, so the main focus is on the larger ethnic groups. One example of this is the number of couples where both partners are Maori. Although this group is fairly large overall, the net changes in the tables are relatively low. This is partly due to the small net change in the number of partnerships, but also due to changes in the self-definition of ethnicity (Carter, Hayward, Blakely, & Shaw 2009). Since there are so many combinations of year, region, ethnicity, and simulation parameter, only selected results are shown.

Changing Internal Parameters

The first step in testing the model was to examine the impact of varying some of the internal parameters of the model. This was done in order to check that ethnic patterns produced by the simulation were not just a by-product of using a particular parameter setting in the simulation. The effect of varying the initial size of the social network, and the number of time steps were examined. In addition, the simulation was run with the gender roles reversed to investigate whether there was any gender-based asymmetry in the patterns created by the simulation. The frequencies presented are for age difference scoring but these patterns were consistent for the other scoring variables as well.

Table 4: Varying social network size

YearCityEthnicity Combinationn10n50n100n500
1981CanterburyEuro only/Euro only6985702870837083
Euro only/Maori only225228224181
Euro only/Pacific only130126117148
Euro only/Asian only250230199118
1991AucklandEuro only/Euro only107611089210875.
Maori only/Maori only392416422.
Pacific only/Pacific only336286319.
Asian only/Asian only9810092.
Euro only/Maori only438742304240.
Euro only/Pacific only381338603825.
Euro only/Asian only222921872238.
Maori only/Pacific only693729710.

Table 4 shows a selection of simulated frequencies (numbers of couples) for the sensitivity analysis of the size of the social network. The "only" refers to agents who hold only a single ethnic group (ie Maori only is an agent who belongs only to the Maori ethnic group). Two regions and time point were used to test the social network sizes of 10 people, 50 people and 100 people. A social network with 500 people was attempted for Canterbury but not for Auckland as it was too computationally expensive to complete. There are some slight reductions in the number of European/Asian and European/Pacific couples as the size of the social network increased, but otherwise the frequencies were very similar. More ethnic combinations are shown for Auckland since the simulated frequencies were larger.

Table 5: Varying time steps

1981CanterburyEuro only/Euro only697870287061
Euro only/Maori only236228221
Euro only/Pacific only131126118
Euro only/Asian only256230210
1991AucklandEuro only/Euro only107481089210823
Maori only/Maori only391416408
Pacific only/Pacific only303286305
Asian only/Asian only9010094
Euro only/Maori only431242304279
Euro only/Pacific only387738603820
Euro only/Asian only224121872227
Maori only/Pacific only735729730

Table 5 shows the effect of the number of time steps used in the simulation. Overall, there was little variation in the simulated frequencies across the different numbers of time steps per simulation. The major constraint with increasing the number of time steps for the remainder of the simulations is the additional time the simulations would take to run. Since there are five years between each census, using five time steps equates to having an annual process. Five time steps provide sufficient opportunity for the macro variable to recalculate. Increasing the number of time steps to ten for the sensitivity analysis slowed the programme down to the extent that it was not practical to use it for the larger data sets.

Both Table 4 and 5 show some variation in the estimated number of couples for the different combinations of ethnic groups, depending on the number of time steps and the size of the social networks. The counts are not being compared to actual census data so are not under or over-estimates. What they do represent is some slight variations in estimation as the parameters are varied, together with the natural variation of the social process of partnership matching (where running the same simulation several times will produce similar but not identical results). Qualitatively, the patterns remain similar when the parameters are varied. This allows the patterns of ethnicity to be examined relative to the census counts without worrying that these patterns are a by-product of social network size or the number of times steps.

Single Parameter Weight Results

After examining the effects of varying the internal model parameters, the simulation was run using each of the scoring variables in isolation. This was done in order to test the feasibility of the model by examining the effect of each of the characteristics individually, before considering further work to combine them. Repeated trials of each simulation were run to ensure that the results were stable. However, due to the computational time required per trial, too few trials were run for it to be worthwhile creating standard deviations and confidence intervals. Due to small frequencies, and limits on space, the mixed ethnicity results are presented together and are not analysed separately by gender.

Table 6: European/European Couples; Actual & Simulated


Table 6 shows the actual and simulated numbers of homogamous (both partners of the same ethnicity) European couples for each of the time periods, in each of the regions. It shows that the number of European/European couples in Auckland was underestimated, particularly in the later census periods where all of the scoring variables produced very poor estimates. By comparison, Canterbury had the most consistent and accurate simulated frequencies for all of the scoring variables. The macro variable was the least consistent of the individual scoring variables for Auckland and Wellington. In the earlier periods it produced higher estimates than the other scoring variables (over-estimates in the case of Wellington), but by 1991 in Auckland, and 1996 in Wellington, it was producing the biggest under-estimates.

Table 7: Selected other homogamous combinations

YearCityEthnicity CombinationActualAgeEduMacroRandom
1981AucklandAsian only/Asian only6661142175
Maori only/Maori only11346210580558
Pacific only/Pacific only17341171771021129
1991WellingtonAsian only/Asian only420173938521
Pacific only/Pacific only162162634723
1996CanterburyAsian only/Asian only252562039
Maori only/Maori only5491620111
Pacific only/Pacific only9020320
2001CanterburyAsian only/Asian only924507360556
Maori only/Maori only81172328314
Pacific only/Pacific only11424620

Table 7 shows the simulation results for a selection of other homogamous ethnic combinations. The number of homogamous Asian Only couples estimated via the simulations was quite consistent across all of the regions and census periods. Unfortunately, this led to under-estimation of the frequencies in Canterbury. As previously mentioned, one difficulty with interpreting these figures is that the net changes between censuses for this group were very small. The macro variable produced the highest estimates for all three regions, and had the highest number of over-estimations. The age variable tended to produce the lowest estimates, followed by the education variable. As with the Asian/Asian partnerships, the Pacific/Pacific partnerships are significantly underestimated in Auckland. This is most noticeable in Auckland where the simulated frequencies were similar to the other regions, but much lower than the actual values. The actual frequencies in Canterbury are very low, and over-estimated by the macro and random methods, but very closely estimated with the education variable.

Table 8: Selected mixed ethnicity simulation results

Ethnic CombinationYearCityActualAgeEduMacroRandom
Euro only/Asian only1981Auckland225653642114643
Euro only/Maori only1981Auckland1665152312563241529
Euro only/Pacific only1981Auckland504204917735172021
Maori only/Pacific only1996Auckland21325136485252

Table 8 shows a selection of the mixed ethnicity partnerships. The number of mixed European/Maori partnerships is over-estimated by most of the scoring variables in most of the periods other than 1981. The random variable and macro variable produced the largest over-estimates in all three regions. The number of Asian/European mixed partnerships was consistently over-estimated by the simulation. The random scoring variable had the highest over-estimates in most of the census periods. The numbers of Pacific/European mixed partnerships are heavily over-estimated by all of the methods, particularly in Auckland. As with the previous mixed ethnicity groups, each of the scoring methods tended to produce over-estimates. The random scoring variable created the largest over-estimates in Canterbury, Wellington, and in the earlier census periods in Auckland. For the later census periods in Auckland, the macro scoring variable creates the largest over-estimates.

* Discussion

The purpose of this paper was to review the literature relating to the simulation of marriage markets, and to introduce a simulation model that addressed some of the shortcomings identified. The literature in the area was divided into two distinct groups: micro-simulation and agent-based simulation. The micro-simulation models discussed tended to operate at a realistic scale, but with very simple matching algorithms (often a Monte Carlo "roll the dice" styled decision rule). In contrast, the agent-based models used small, artificial data sets, but had more complexity in terms of how the agents viewed and chose partners. This simulation model bridges the gap between the areas, using real world data, but with a more complex matching system than current micro-simulations. There have been no previous attempts to simulate New Zealand marriage markets, so it is difficult to benchmark the simulation.

The empirical simulation model was applied to unit-level census data. Although privacy requirements meant that the number of variables about each agent were limited, the data still provided unit-level information about all of the singles in each of the regions of interest. At each time step the male agents in the model examined a social network of possible female partners. Those who had the highest score in the matching algorithm were paired off and removed from the system. Their social networks were then extended and the process repeated in the next time step. The model parameters were examined to confirm that the size of the social networks and the size of the time steps were not affecting the partnership matching. Simulations were then run to examine how each of the scoring variables affected the ethnic patterns in the data. The sensitivity testing of the model parameters showed that the model was robust to varying the size of the social network and the size of the time steps had little impact on the ethnic patterns that were created.

One of the strengths of the study is that the agents in the simulation are the actual unit-level records for the entire population of the selected regions. They are not scaled up from a census sub-sample (Pennec & Bacon 2007), nor have they been reverse-engineered from frequency tables in order to match marginal distributions (Bouffard et.al. 2001). They are the actual unit level records for the sub-population that is being studied. Beyond this, the algorithm and matching extend the APPSIM (Bacon & Pennec 2007) model. The simulation results show the feasibility of combining micro-simulation styled "big data" with agent-based simulation matching and sorting algorithms for examining patterns and matching dynamics.

A final consideration for this simulation model is whether it can provide any meaningful information about marriage matching, ethnic patterns and marriage markets in New Zealand. The simulations showed that a number of the patterns in ethnic partnering were qualitatively similar in the simulations that used age and education for matching to those that actually occurred. This suggests that these variables could play some part in the patterns of inter-ethnic cohabitation in New Zealand. However, they did not fully replicate the patterns, and in particular tended to under-estimate the homogamous partnerships. This indicates that there is some degree of ethnic preference that is not being captured by the model.

The "macro" scoring, where agent partnering decisions were not based on an individual ethnic preference, but were influenced by the patterns of ethnic partnering of the cohort at the previous time period, showed a much closer match of ethnic patterns to the census data. Although this could not be considered conclusive proof of an emergent pattern or a micro-macro link, it does provide encouragement to undertake further examination and experimentation with these simulation models for emergent properties in the New Zealand marriage market.

The simulations produced a mixture of under and over-estimates for the different partnering combinations. This was most evident with the Auckland estimates, where the number of homogamous partnerships for the various ethnic groups were generally under-estimated and the number of non-homogamous partnerships were over-estimated. The age and education variables tended to produce the lowest estimates for the number of couples in each group. However, the simulated frequencies were not completely unrealistic, indicating that a city level simulation of partnership matching is feasible. The combination of under- and over-estimates provides a case for future research in the area, looking at possible optimisation methods for combining the scoring variables.

It is not possible to give a single reason for the mixture of under and over-estimates in the simulated number of couples for the different combinations of ethnic groups. However, there are three main factors which between them provide the most compelling explanations for these under and over-estimates. The first factor is that the agents did not have an explicit ethnic preference in their partner selection. This could explain why there was under-estimation for a number of the homogamous ethnic combinations, where there were fewer same ethnicity matches in the simulations than actually occurred. This would also suggest that people do, either consciously or sub-consciously, have ethnic preferences in partner selection. The second factor is the assumption of random social networks. While the random allocation of agents to social networks was justified due to a lack of data about how such networks behave in New Zealand, and the random allocation was able to provide a more simple structure for the model, it would also explain why there tended to be over-estimates of the partnerships between one European Only partner (which was the largest ethnic group) and one minority partner. The third explanation of under and over-estimates is that partnership matching is a complex process, and in reality these decisions are not made across a single dimension. In addition, the variation in under and over-estimates also provide a direction for future work, examining the effect of combining the scoring variables in the simulations. Sanderson (1998) describes how benefit can be gained by combining forecasts if they are not highly correlated. There is no evidence of the under and over-estimates in the simulation being correlated, which would indicate that partner decisions based on more than one scoring variables would improve the accuracy of the simulation. An extension of this is that weighted combinations of variables could enable further insights into the partnership process itself by looking at how estimates vary as their relative weights vary.

* Conclusion

This paper demonstrated an empirical simulation model of marriage matching to examine how New Zealand marriage markets operate. Census data was used to populate the model, and age similarity, educational similarity, the rate of inter-ethnic partnership in the previous time period (referred to as the macro variable), and a random stochastic factor were each used to pair agents. The simulations produced mixed results, but with sufficient accuracy to indicate the model is feasible and further development of the model would be worthwhile. Beyond this, the model also provided evidence of ethnic preferences in partnership choices in New Zealand marriage markets.

Having demonstrated the feasibility of partnership matching simulation using census data, there are several methodological and substantive implications for future work in the area. The combination of under- and over-estimates in the ethnic patterns indicates that finding an optimisation method for combining the scoring variables could produce a significant increase in the accuracy of the simulated results. Furthermore, being able to combine the scoring variables will allow for a closer examination of the ethnic patterns, with a view to exploring emergent properties in New Zealand marriage markets.

* Acknowledgements

Funding for this project was provided by the New Zealand Royal Society via the Marsden Fund. In addition, as a condition of use of Statistics New Zealand census data, the author confirms that:

  1. The results presented in this study are the work of the author, not Statistics New Zealand.
  2. Access to the data used in this study was provided by Statistics New Zealand in a secure data environment designed to give effect to the confidentiality provisions of the Statistics Act 1975.
  3. I acknowledge Statistics New Zealand as the source of the Census data used in this paper.

* References

ALAM, S. J., & Meyer, R. (2008). Comparing two sexual mixing schemes for modelling the spread of HIV/AIDS. Paper presented at the World Congress on Social Simulation, George Mason University, Washington DC.

BACON, B., & Pennec, S. (2007). APPSIM - Modelling Family Formation and Dissolution, Working Paper No. 4, from http://www.canberra.edu.au/centres/natsem/publications

BILLARI, F. C., Diaz, B. A., Fent, T., & Prskawetz, A. (2007). The "Wedding-Ring": An agent-based marriage model based on social interaction. Demographic Research, 17(3), 59-82. [doi:10.4054/DemRes.2007.17.3]

BOUFFARD, N., Easther, R., Johnson, T., Morrison, R. J., & Vink, J. (2001). Matchmaker, matchmaker, make me a match. Brazilian Electronic Journal of Economics, 4(2).

CARTER, K., Hayward, M., Blakely, T., & Shaw, C. (2009). How much and for whom does self-identified ethnicity change over time in New Zealand? Results from a longitudinal study. Social Policy Journal of New Zealand, 36, 32-45.

CHANTARAT, S. and Barrett, C. B. (2008). Social network capital, economic mobility and poverty traps. Journal of Economic Inequality. Available at SSRN: http://ssrn.com/abstract=1151353 [doi:10.2139/ssrn.1151353]

CHEESBROUGH, S., & Scott, A. (2003). Simulating demographic events in the SAGE model, Technical Note 4. London: ESRC-Sage Research Group.

CHEN, C.-Y. (2005). Picking and choosing: The simulation of sequential mate selection process. Paper presented at the Population Association of American 2006 Annual Meeting, Los Angeles, California.

EUBANK, S., Guclu, H., Kumar, V. S. A., Marathe, M. V., Srinivasan, A., Toroczkai, Z., et al. (2004). Modelling disease outbreaks in realistic urban social networks. Nature, 429, 180-184. [doi:10.1038/nature02541]

FAVREAULT, M., & Smith, K. (2004). A primer on the dynamic simulation of income model (DYNASIM3), from http://www.urban.org/UploadedPDF/410961_Dynasim3Primer.pdf

FERGUSON, T. S. (1989). Who solved the secretary problem? Statistical Science, 4(3), 282-289. [doi:10.1214/ss/1177012493]

FREEMAN, P. R. (1983). The Secretary Problem and its extensions: A review. International Statistical Review / Revue Internationale de Statistique, 51(2), 189-206. [doi:10.2307/1402748]

GALE, D., & Shapley, L. S. (1962). College admissions and the stability of marriage. American Mathematical Monthly, 69(1), 9-15. [doi:10.2307/2312726]

GILBERT, N., Pyka, A., & Ahrweiler, P. (2001). Innovation networks - A simulation approach. Journal of Artificial Societies and Social Simulation, 4(3),8 https://www.jasss.org/4/3/8.html

GUSFIELD, D., & Irving, R. W. (1989). The Stable Marriage Problem. Cambridge, MA.: The MIT Press.

HARDING, A. (2007). APPSIM: The Australian dynamic population and policy microsimulation model. Paper presented at the Technical Workshops of the 1st General Conference of the International Microsimulation Association.

HILLS, T., & Todd, P. M. (2008). Population heterogeneity and individual differences in an assortative agent-based marriage and divorce model (MADAM) using search with relaxing expectations. Journal of Artificial Societies and Social Simulation, 11(4), 5 https://www.jasss.org/11/4/5.html.

HOPPITT, W., & Laland, K. N. (2011). Detecting social learning using networks: A users guide. American Journal of Primatology, 72, 1-11. [doi:10.1002/ajp.20920]

KALICK, S. M., & Hamilton, T. E. (1986). The matching hypothesis reexamined. Journal of Personality and Social Psychology, 51(4), 673-682. [doi:10.1037/0022-3514.51.4.673]

KALICK, S. M., & Hamilton, T. E. (1988). Closer look at a matching simulation: Reply to Aron. Journal of Personality and Social Psychology, 54(3), 447-451. [doi:10.1037/0022-3514.54.3.447]

MCPHERSON, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415-444. [doi:10.1146/annurev.soc.27.1.415]

MILLER, G. F., & Todd, P. M. (1998). Mate choice turns cognitive. Trends in Cognitive Sciences, 2(5), 190-198. [doi:10.1016/S1364-6613(98)01169-3]

MUMCU, A., & Saglam, I. (2008). Marriage formation/dissolution and marital distribution in a two-period economic model of matching with cooperative bargaining. Journal of Artificial Societies and Social Simulation, 11(4), 3 https://www.jasss.org/11/4/3/3.html.

O'DONOGHUE, C., Lennon, J., & Hynes, S. (2009). The Life-Cycle Income Analysis Model (LIAM): A study of a flexible dynamic microsimulation modelling computing framework. International Journal of Microsimulation, 2(1), 16-31.

ORCUTT, G. H., Caldwell, S., & Wertheimer II, R. (1976). Policy Exploration Through Microanalytic Simulation. Washington DC: Urban Institute Press.

PENNEC, S., & Bacon, B. (2007). APPSIM - Modelling fertility and mortality. National Centre for Social and Economic Modelling (NATSEM) Working Paper, No 7., from http://www.canberra.edu.au/centres/natsem/publications

PERESE, K. (2002). Mate Matching for Microsimulation Models. Technical Paper Series, Congressional Budget Office. [Online] http://www.cbo.gov/ftpdocs/39xx/doc3989/2002-3.pdf.

SAGLAM, I. (2011). Divorce costs and marital dissolution in a one-to-one matching framework with nontransferable utilities. MPRA Paper 38493, University Library of Munich, Germany. http://mpra.ub.uni-muenchen.de/38493/

SANDERSON, W. C. (1998). Knowledge can improve forecasts: A review of selected socioeconomic population projection models. Population and Development Review, 24, 88-117. [doi:10.2307/2808052]

SIMÃO, J., & Todd, P. M. (2001). A Model of Human Mate Choice with Courtship That Predicts Population Patterns. In W. Banzhaf, T. Christaller, P. Dittrich, J. T. Kim & J. Ziegler (Eds.), Advances in Artificial Life (pp. 377-380). Dortmund: Springer. [doi:10.1007/3-540-44811-X_40]

SIMÃO, J., & Todd, P. M. (2002). Modeling mate choice in monogamous mating systems with courtship. Adaptive Behavior, 10(2), 113-136. [doi:10.1177/1059712302010002003]

SIMÃO, J., & Todd, P. M. (2003). Emergent patterns of mate choice in human populations. Artificial Life, 9(4), 403-417. [doi:10.1162/106454603322694843]

SPIELAUER, M., & Vencatasawmy, C. P. (2001). FAMSIM: Dynamic microsimulation of life course interactions between education, work, partnership formation and birth in Austria, Belgium, Italy, Spain and Sweden. Brazilian Electronic Journal of Economics, 4(2).

TODD, P. M., & Billari, F. C. (2003). Population-wide marriage patterns produced by individual mate-search heuristics. In F. C. Billari & A. Prskawetz (Eds.), Agent-Based Computational Demography: Using Simulation to Improve Our Understanding of Demographic Behaviour (pp. 117-137). New York: Physica-Verlag. [doi:10.1007/978-3-7908-2715-6_7]

TODD, P. M., Billari, F. C., & Simão, J. (2005). Aggregate age at marriage patterns from individual mate search heuristics. Demography, 42(3), 559-574. [doi:10.1353/dem.2005.0027]

TODD, P. M., & Miller, G. F. (1999). From pride and prejudice to persuasion: Satisficing in mate search. In G. Gigerenzer, P. Todd & ABC Research Group (Eds.), Simple heuristics that make us smart (pp. 287-308). New York: Oxford University Press.

TODD, P. M., & Miller, G. F. (2002). From pride and prejudice to persuasion: Satisficing in mate search. In G. Gigerenzer & P. Todd (Eds.), Simple Heuristics That Make Us Smart. New York: Oxford University Press.

VAN IMHOFF, E., & Post, W. (1998). Microsimulation methods for population projection. Population: An English Selection, 10(1), 97-138.

WEERMAN, F. M. (2011). Delinquent peers in context: A longitudinal network analysis of selection and influence effects. Criminology, 49(1), 253-286. [doi:10.1111/j.1745-9125.2010.00223.x]

XU, Z. A., & Sui, D. Z. B. (2009). Effect of small-world networks on epidemic propagation and intervention. Geographical Analysis, 41(3), 263-282. [doi:10.1111/j.1538-4632.2009.00754.x]

ZEDLEWSKI, S., R (1990). The development of the dynamic simulation of income model (DYNASIM). In G. H. Lewis & R. C. Michel (Eds.), Microsimulation Techniques for Tax and Transfer Analysis. Washington, DC: Urban Institute Press.