The value of values and norms in social simulation

: Social simulations gain strength when agent behaviour can (1) represent human behaviour and (2) be explained in understandable terms. Agents with values and norms lead to simulation results that meet human needs for explanations, but have not been tested on their ability to reproduce human behaviour. This paper compares empirical data on human behaviour to simulated data on agents with values and norms in a psychological experiment on dividing money: the ultimatum game. We find that our agent model with values and norms produces aggregate behaviour that falls within the 95% confidence interval wherein human behaviour lies more often than other tested agent models. A main insight is that values serve as a static component in agent behaviour, whereas norms serve as a dynamic component.


Introduction
. Social simulations gain strength when explained in understandable terms. This paper proposes to explain agent behaviour in term of values and norms following (Hofstede ; Dechesne et al. ; Atkinson & Bench-Capon ). Values are generally understood as 'what one finds important in life', for example, privacy, wealth or fairness (van de Poel & Royakkers ). Norms generally refer to what is standard, acceptable or permissible behaviour in a group or society (Fishbein & Azjen ). Using values and norms in explanations has several advantages: they are shared among society (Hofstede ), they have moral weight (van de Poel & Royakkers ), they are applicable to multiple contexts (Miles ; Cranefield et al. ) and operationalized (Schwartz ). Moreover, humans use values and norms in folk explanations of their behaviour (Malle ; Miller ). Agents that use values and norms could thus lead to social simulation results that meet human needs for explanations. .
To understand the relevance of agents with values and norm for social simulation, we need to know to what extent they can represent humans. Models are always simplifications from the system they are meant to represent, but understanding these di erences clarifies the relevance of the model. Previous research primarily focussed on constructing agents that use human values and norms in their decision-making (Dechesne et al. ; Atkinson & Bench-Capon ; Cranefield et al. ). They gained insights in possibilities to synthesize theories on values and norms or how to formally argue in favour of an action in terms of values and norms. This paper aims to take the next step by comparing empirical data on human behaviour to simulated data on agents with values and norms.

.
We approach this by creating four agent models: a homo economicus model, an agent model with values, an agent model with norms and an agent model with both values and norms. By comparing several agent models we gain more insight into the relative properties of the models. We do not expect the models to fully reproduce .
We compare the simulated data to empirical data from a meta-analysis that studied how humans play the ultimatum game. We focus on aggregated results: the mean and standard deviation of the demands and acceptance rate. We find that based on these measures a combination of agents with values and norm produces aggregate behaviour that falls within the % confidence interval wherein human plays lies more o en than the other agent models. Furthermore, we find specific cases (responder behaviour in the multi-round scenario) for which agents with values and norms cannot reproduce the learning nuances human display. We interpret this result as showing that agents with values and norms can provide understandable explanations that reproduce average human behaviour more accurately than other tested agent models. Furthermore, it shows that social simulation researchers should be aware that agents with values and norm can di er from human behaviour in nuanced learning dynamics. We find several insights on what aspects agents with values and norms outperform agents with solely values, the role of values and norms as static and dynamic components and how norms can produce di erent behaviour in di erent cases. We discuss the generalizability of these results given the dependence of these results on our translation from theory to model, parameter settings, evaluation measures and the use case.
. The remainder of the paper is structured as follows. The next section presents theories on how agents use a homo economicus view, values, norms or, values and norms in their decision making. Section presents the two UG scenarios and the data on human behaviour in these scenarios. Section presents our translation from theories to domain specific computational agent models. Section presents the simulation experiments and the resulting behaviour of the di erent agent models. Section discusses the interpretation and generalizability of these results.

Theoretical Framework
. We use theories on the homo economicus, values and norms to model the simulated agents. These theories are briefly summarized in this section.

Homo economicus .
The homo economicus (HE) agent is the canonical agent in game theory (Myerson ) and classical economics (Mill ), that only cares about maximizing its direct own welfare, payo or utility. As the agent only cares about its own direct welfare it will accept any positive o er in the UG. Humans in contrast reject o ers as high as % of the pie (Oosterbeek et al. ). .
One approach to explaining these findings is by extending the HE agent model to incorporate learning (Gale et al. ; Roth & Erev ). The core of this explanation is that humans have learned through the feedback of repeated interaction to reject low o ers to force the proposer into making higher o ers. In this view, humans can be represented as learning homo economicus agents for which, roughly said, fairness only exist as an instrument for wealth.

Values .
We view values as 'what a person finds important in life' (van de Poel & Royakkers ) that function as 'guiding principles in behaviour'. In the remainder of this subsection, we will describe some of the work on values in psychology, sociology and philosophy focusing on how we can use values in the decision making of agents.

.
Schwartz developed several instruments (e.g. surveys) to measure values (Schwartz ). Based on these measurements Schwartz ( ) can distinguish ten di erent basic values: self-direction, stimulation, hedonism, achievement, power, security, conformity, tradition, benevolence and universalism. (These basic values, in turn, represent a number of more specific values like wealth and fairness.) Schwartz shows that although humans di er in what values they find important there is a general pattern in how these values correlate. For example, people who give positive answers to survey questions on wealth are more likely to give negative answers to survey questions on fairness. These findings on intervalue comparison have been extensively empirically tested and shown to be consistent across nations representing various age, cultural and religious groups (Schwartz ; Schwartz et al. ; Bilsky et al. ; Davidov et al. ; Fontaine et al. ).
. Values have a weak, but general connection with actions (Miles ; Gi ord ). Miles ( ) used data from the European Social Survey to show that values predict di erent measured actions over six behavioural domains and in every country included in the study. Gi ord ( , p. -) reviews environmental psychology and concludes that the correlation between action and values is consistent, but weak, such that moderating and mediating variables are needed to predict actions from values. Following this research, we view values as abstract fixed points that actions over many context can be traced back to. .
When making a decision between two actions there might be a conflict between two values. For example, when choosing to give away money or to keep it one might experience a conflict between the value of wealth and fairness. (van de Poel & Royakkers , p. -) discusses di erent ways to resolve a value conflict: a 'multicriteria analysis' or threshold comparison. In multi-criteria analysis, the di erent actions are weighted on the values and compared on a common measure; in threshold comparison an option is good as long as both values are promoted above a certain threshold. If one action upholds both thresholds, while the other one does not the former is chosen. If both options uphold both thresholds, threshold comparison does not specify what option to take. This paper uses multi-criteria analysis as this allows our agent to always make a concrete choice and therefore serve as a computational model for simulation.

.
Our theory on values thus encompasses:

V.
Humans are heterogeneous in the values they find important.

V.
The importance one attributes to these values is correlated according to the findings of Schwartz ( ). For example, the values of wealth and fairness are negatively correlated.

V.
Values are (for the aim of this study) the direct and only cognitive determiner for actions.

V.
When values are at conflict in a decision, humans use a multi-criteria analysis to resolve the conflict.

Norms .
We follow Crawford & Ostrom ( ) in that norms have four elements referred to as the 'ADIC'-elements: Attributes, Deontic, aIm and Condition. The attribute element distinguishes to whom the statement applies. The deontic element describes a permission, obligation or prohibition. The aim describes the action of the relevant agent. The condition gives a scope of when the norm applies. One example in the context of the UG can be found in Table . A

D I C
Proposers should demand % of the pie when in a one-shot Ultimatum Game ) use the term 'perceived norm' (or: subjective norm) to make clear that it is a persons individual perception that influence behaviour and that these perceptions may or may not reflect what most others actually do or expect. Thus, a norm exists, for a particular person, when that persons perceives other people do or expect it. To put it in terms of the ADIC syntax: a norm exists, for a particular person, if and if only the attribute perceives others do or expect the aim given that the condition holds. .
Empirical work shows that there is a correlation between norms and action. For example, a meta-analysis on the theory of planned behaviour shows an average R 2 of . between subjective norms and intentions. In other words, a linear model that takes measurements on the subjective norm as input can on average explain % of the variation of the measured intentions (Armitage & Conner ). (Intentions, in turn, can explain about half of the variance in behaviour.). There are many di erent theories on how this relation between action and norm precisely works. For the purpose of this study, we aim to explore to what extent we can explain agent behaviour using only an understandable concept as norms. .
Our theory on norms thus encompasses:

N.
A statement is a norm if and if only it has the following four elements: Attributes, Deontic, aIm and Condition.

N.
A norm exists, for a particular person, when that persons perceives other people do or expect it.

N.
The action a human does is the same as what they perceive as the norm.

Values and norms .
We follow Finlay & Trafimowm ( ) in that some humans use values while others use norms. In a metaanalysis covering 30 di erent behaviours, they found that some humans are primarily driven by attitude (which strongly correlates with values) and some individuals are primarily driven by norms. We choose this theory for its simplicity and postpone more complex combinations of values and norms to future work. . Note that V. and N. in the case of the third theory only attribute to a subset of the agents.

The Scenario
. In this section, we describe how humans behave in two UG scenarios. We will use the simulations to check if our models, which we will describe in the next section, can reproduce this behaviour.
. The UG has been the subject of many experimental studies since its first appearance in Güth et al. ( ). In this study, we use the meta-analysis by Cooper & Dutcher ( ) as our main data source for human behaviour. We obtained the data of 5 of the 6 studies from the authors, namely: (Roth et al. ), (Slonim & I ), (Anderson et al. ), (Hamaguchi ), and Cooper et al. ( ). We obtain a total of demands and replies with on average the following specifics: • An experiment has players: proposers and responders.
• The pie size P is 1000.
• A proposer can demand any d ∈ D = [0, P ] • A responder can choose a reply z ∈ Z = {accept, reject} • The players are paired to a di erent player each round, but do not changes roles.
• Players are anonymous to each other.

.
These studies can be separated on the amount of rounds the subjects play. One round comprises one demand for each proposer and one reply for each matched responder. We consider two scenarios: the one-round ultimatum game and the multi-round ultimatum game.
The ultimatum game where players only play one round is called the one-shot ultimatum game. We subset the dataset on first-round games and depict what humans do in these rounds in Table . datapoints demand (µ) with CI demand (σ) accept (µ) with CI accept (σ) -. . -. . Table : First-round human behaviour according to our adapted dataset. We display the estimated average demand (with its confidence interval (CI)) and acceptance rate. .
One popular explanation of why humans make these particular demands and accepts is that they have learned this in repeated interactions with other humans. When scholars talk about this type of learning, they mean an evolutionary sort of learning that takes place over long periods of time. Debove et al. ( ) reviewed theoretical models that all aim to explain first-round UG behaviour with such an evolutionary model. The idea behind these studies is that one simulates many rounds of behaviour in the ultimatum game and checks if this results in the demands human make in one-shot games. In Section Experiments & Results, we will check if our theories can explain the data in a similar way.
The original study of Cooper & Dutcher ( ) focuses on how behaviour of responders evolves over 10 rounds. In Figure , we use the obtained data to represent two of their main findings. Figure : Multi-round human behaviour according to our adapted dataset. We display the estimated average demand (le ) and acceptance rate (right) for di erent rounds. The grey area depicts the % confidence interval.
. In the le figure, we see that the share proposers demand slightly rises over time. In the right figure, we see that the responder's acceptance rate slightly falls and then rises. According to Cooper & Dutcher ( ), the behaviour in the first five rounds significantly di ers from the behaviour in the last five rounds. Although the di erences are small Cooper and Dutcher analyze them as they believe they can be informative. They assume that the mechanisms that are responsible for the change in behaviour over time, are also the mechanisms that bring about the behaviour in the first round. In Section Experiments & Results, we present experiments that check if our theories can explain this change in behaviour over time.

Model
. If we want our results to be relevant for our theory (instead of an ad-hoc model), we need to be clear about the relation between the theory and a domain-specific model. In this section, we present our ultimatum-game specific implementation of our normative and value-based agent theory. The normative model has been implemented in Repast Java (North et al. ), the value-based model has been implemented both in Repast Java and in R for verification. The code, documentation and a standalone installer are provided at the CoMSES Library and GitHub .

Learning homo economicus agent .
In case of the learning homo economicus agent there are already a few models available that can be applied to the UG. This paper uses the reinforcement learning models presented in Roth & Erev ( ) and Erev & Roth ( ), because in our view they focus on the core mechanisms of the homo economicus and they are well documented. .
In these models, each player keeps track of an utility u for a range of portions of the pie A (in our case A = {0, 0.1P, 0.2P, . . . , P }.) For the proposer, this number represents the demand it makes. For the responder this number represents a threshold; if the demand is above this threshold it will reject, if the demand is equal or below the threshold it will accept. The model is initiated by letting each player (n) attribute an initial utility (i) to each pie-portion a ∈ A, such that, u n (t = 1, a) = i. .
Each round the agent does the following: . Each round a player picks a pie-portion according to the distribution of these utilities. In other words, the probability H, to pick a pie-portion a, is defined by the following function H(a) = un(t,a) a∈A un(t,a) .
. The proposer's demand is equal to that chosen pie-portion. The responder accepts the demand if its below its chosen pie-portion and rejects otherwise.
. Each player n updates the utility u n of the played actionâ by adding the obtained money r to the previous utility, i.e. u n (t + 1,â) = u n (t,â) + r. The utility of the other actions remains the same. .

Roth & Erev ( ); Erev & Roth (
) present two versions of the homo economicus that di er in their approach to the initial utilities. Before introducing them we first introduce the parameter s(1), the initial strength of the model, defined as the ratio between sum of the initial utilities and the average reward, i.e. s(1) = The initial strength determines the initial learning speed of the agent. The two versions of the model are: . The initial utilities are all equal to each other i.e. u(a) = u(b) for all actions a, b ∈ A. (But s(1) is free.) . The initial utilities sum to , i.e. s( ) = . (But are randomly distributed.) .
For pragmatic reasons, we aim to test only one of the models to reproduce human behaviour. ) the authors show that for many games data can be reproduced with the simple reinforcement learning agent introduced and equal initial utilities, but do not treat the UG in this paper. In Roth & Erev ( ), the authors show that crudely UG results can be reproduced with random utilities and a fixed strength of 500, but do not provide the exact parameter settings nor specifically compare the learned distributions to first round play. In this study, we choose to further explore the first model (with equal utilities) as the parameter space is more manageable. Future work should explore other reinforcement models including versions where one can vary the learning rate of the agents.
. We now aim to specify which extra assumptions have been made when translating the theory to a domainspecific model: Players attach utilities to pie-portions that represent the demand for the proposer and a threshold for the responder.

LHE+.
The initial utilities for these pie-portions are all equal to each other in the first round.

LHE+.
There is a one-to-one relation to the utility of a pie-portion and the sum of the rewards it got you (e.g., no discount factor or utilities attached to sequences of actions).
Value-based agent .
Given V. there are ten basic values that each represent a number of specific values. In the context of the UG, we assume that the value of wealth and fairness are more relevant than other values. This is an educated guess based on that the behavioural economics literature frames the decision in these terms (Cooper & Kagel ) and the meaning we associate with the values of wealth and fairness.
. Given V. humans are heterogeneous in the values they find important. We represent this in the model by a parameter i v that represents the importance (or weight) one attributes to the value. .
Given V. this importance is correlated according to the findings of (Schwartz ). According to (Schwartz ) the two values are strongly negative correlated. For pragmatic reasons, we will assume these values are perfectly negative correlated. This allows us to simplify the model to have two parameters µ and σ that specify a normal distribution from which the di erence (di) in value strengths is drawn, i.e. for every agent To make a computational model, we propose a procedure where the agent attributes a utility to every action and chooses the action with the highest utility. This utility should be determined by both the value of wealth and fairness. In other words, the agent will do a multi-criteria analysis to decide on the best action (V. ). .
We present the decision-making model in three steps: ( ) we relate to what extent a value is satisfied to the resulting money the agent obtains in one round of UG-play ( ) we relate this value-satisfaction and the importance one attributes to the value to a utility per result ( ) we relate this utility to the action the agent chooses.
. First, to relate to what extent a value is satisfied to the resulting money the agent, we have to interpret the meaning of wealth and fairness. Given the meaning of wealth, we assume that the higher one values wealth the higher the demands one makes (and expects). Given the meaning of fairness, we assume that the higher one values fairness the more equal the demands one makes (and expects). We represent this in the following function: where s x specifies the extent to which the resulting money (of one round UG) r satisfies value x and P is the pie size. The satisfaction of wealth thus increases as one gets more money and the satisfaction fairness peaks around an equal split.
. Second, to relate this value-satisfaction (s) and value importance (i) to a utility (u) per result (r), we can combine s and i in several ways. This paper evaluates three possibilities. A divide function Every utility-function thus represents a di erent model. In the next section, we will evaluate which model can best reproduce human behaviour.
. Third, to relate this utility to the action the agent chooses we postulate that: • the proposer now demands that d ∈ [0, P ] for which the utility (as given by u(r)) is maximal.
• the responder chooses to accept if (and only if) the utility of what it receives -u(P − d) -is higher than the utility of a reject.
. We choose to model the utility of rejection by filling in the chosen utility function with s w (0) and s f (0.5P ), i.e. the agent interprets it as getting maximum fairness (as in the r= . P case), but getting almost no wealth (as in the r = 0 case). .
In summary, to translate our theory to our domain we have added the following parts to our theory:

V+.
The importance one attributes to wealth and the importance one attributes to fairness are perfectly negatively correlated.

V+.
The higher one values wealth the higher the demands one makes (and expects). The higher one values fairness the more equal the demands one makes (and expects).

V+.
Humans compare to what extent wealth and fairness are satisfied by (a) a divide function (function ( )).

Normative agent .
Given N. a statement is a norm if and if only it has the attribute, deontic, aim and condition element. In the context of the ultimatum game we consider the types of norms stated in Table . Note that according to our theory, all sorts of possible norms could be considered. For example, 'responders should reject in all cases'. We consider only the type of norms in Responders should reject if and if only the demand is above threshold t in the UG Table : The norms considered in the ultimatum game split out according to the ADIC-syntax, where p refers to a proposer, q to a responder, d to a demand and t to a threshold. .
To know which norms actually exist in a particular game, we look at N. . This part of our theory states that a norm exists, for a particular person, when they perceive other people do or expect it. Note that in our scenario an agent does not switch roles (i.e. proposers stay proposers, responders stay responders). Proposers thus never see the actions other proposers do, but can only rely on what they think responders expect (from proposers). The situation is analogous for responders. The question is thus, how does one derive what the opponent expects from you given his or her actions? .
In the case of the responder this is fairly straightforward. What does a proposer expect from a responder when demanding X% of the pie? He or she probably expects that the responder would accept that demand (and lower), but reject everything higher than that. In other words, the demand becomes a certain threshold for acceptance. For multiple rounds, we assume this threshold is calculated by averaging over all seen demands. Formally, this amounts to that norm N qt exists for responder q ∈ A and a threshold t ∈ D if and if only where OD are the demands responder r has observed in the games it participated.
. For the proposer, it's a bit more tricky to deduce what behaviour is expected. We postulate that the demand a proposer is expected to make is equal to the average of two indicators: the lowest demand that is rejected and the highest demand that is accepted. Formally, this amounts to that the norm N pd exists for a proposer p ∈ A and demandd ∈ D if and if onlyd where RD is the set of demands that the proposer p has seen rejected and AD is the set of demands that the proposer p has seen accepted.
. For most cases the action of the proposer and responder is now clear: they act according to what they perceive as the norm (N. ). However, our theory does not specify what agents should do when they perceive no norm. For the sake of making a computational model we postulate that if no norm exist the agent draws a random action from a uniform distribution. Section Experiments & Results explores uniform distribution with di erent means to gain insight into the relevance of this assumption on our results.

.
We postulate that if no norm exist the agent does a random action. Note that to translate our theory to our domain we have added the following parts to our theory: N+. Proposers expect that responders accept their demands, but reject everything higher than that.

N+.
Responders expect that proposers demand the average of the lowest demand that is rejected and the highest demand that is accepted.

N+.
If no norms exist, then humans draw a random action from a uniform distribution.

Experiments & Results
. In this section, we test four agent models in both the one-shot and multi-round scenario. We evaluate the models on their ability to reproduce human behaviour by comparing the % CI wherein human behaviour lies to the simulated behaviour.

Reproducing first-round behaviour .
To test our theories on their ability to reproduce human first-round behaviour we let the agents interact until their behaviour stabilizes. To set-up this experiment we thus need to simulate a number of 'pre-rounds'. We assume that these 'pre-rounds' are similar to the scenario as described above. For example, the amount of players is 32 and the agents do not switch roles. If the stable behaviour is the same as the human first-round behaviour, then the theory serves as an explanation of how humans have learned to make the demands and rejects they display. .
We find that if we average over 100 runs per parameter set-up the confidence interval around the estimated means is very small. The remainder of this section thus treats the estimated mean as the true mean.

Testing our learning homo economicus model .
We test our learning homo economicus model on its ability to reproduce first round behaviour in the UG. We can run di erent versions of the model dependent on the initial utilities the agents attribute to their actions (which are given by parameter s). Using explorative simulations we find that behaviour stabilizes around pre-round ' '. .
We run simulations for s ∈ [0.00005, 8] with a logarithmic stepsize as exploration learns that the result of simulations outside this interval do not significantly di er from the result of the bounds. We calculate for each parameter set-up the distance between human demand and acceptance rate and the simulated demand and acceptance rate and find that for s = 0.03 this distance is minimal; the results for this parameter setting are displayed in Table . Furthermore there is a negative exponential relation between s and the distance.
avg. demand sd. demand avg. accept sd. accept . . . . Table : The demand and acceptance rate of the lhe agent. Note that 'avg.' and 'sd.' refer to the average and standard deviation of the demand and acceptance rate over one run. .
We conclude from Table that the learning homo economicus agent particular di ers from humans in the distribution of demands and acceptance rate. Although the learning homo economicus can reproduce human demands it can only do this when other agents force it into making lower demands by rejecting enough. This model cannot explain why proposers make relatively equal demands while responders accept almost all demands ( %).

Testing our value-based agent model
. We test our value-based agent model (V) on its ability to reproduce first-round behaviour in the UG. We can run di erent versions of our value-based agent depending on which function the agent uses to combine the satisfaction of di erent values (V+. ) and with which µ and σ the di erence in value strength (di) is normally distributed. . By calculating which value-based agent model leads to which distribution of demand we can gain insight in which model best reproduces the demands human make. Figure compares the three di erent agent models by showing what the best demand for each agent is given the importance it attributes to its values. Recall, that human demands are normally distributed. Given that di is normally distributed, we can see that the divide function is the only function for which the demands agents make will be normally distributed. We conclude that our value-based agent model with extension V+. a has the best chance of reproducing human behaviour. .
To find out if the value-based agent can reproduce the demand and acceptance rates human display we simulate the agent. We run experiments for µ ∈ [−2, 2] and σ ∈ [0, 2] with stepsize 0.01 and denote the average demand and the average reject rate they result in. We calculate for each parameter set-up the distance between human demand and acceptance rate and the simulated demand and acceptance rate (i.e., the error). We find that for µ = −0.55 and σ = 1.14 the distance is minimal; the results for this parameter setting are displayed in Table  . Note that the resulting behaviour fall within the % CI wherein human play lies (see Table ). The distance between human play and simulated play increases with a linear relation to how far µ and σ move away from this optimal setting.
avg. demand sd. demand avg. accept sd. accept . . . . Table : The demand and acceptance rate of the value-based agents. Note that 'avg.' and 'sd.' refer to the average and standard deviation of the demand and acceptance rate over one run.
. We conclude that our value-based model can for a specific parameter range quite accurately reproduce human demands and acceptance rates.

Testing our normative agent model .
We test our normative agent model (N) on its ability to reproduce first-round behaviour in the UG. We can run di erent versions of our normative agent depending on what the agent does when no norm is specified (i.e. round ). Using explorative simulations we find that behaviour stabilizes around pre-round ' '. .
In our first experiment, the normative agents draw their demand from U (0, P ) and their acceptance rate from U (0, 1). Table presents the demand and acceptance rate the agents demonstrate when their behaviour stabilizes. The average demand and acceptance rate clearly significantly di er from human play (see Table ). The agents demand just a bit less than half of the pie, where the humans demand more than half. The agents have an accept rate of 0.5 and humans 0.85. This experiment gives some evidence that a normative theory cannot serve as an explanation for first-round behaviour. However, the stable behaviour is close to the initial behaviour: the mean of the uniform distributions the agents draw their initial actions from is close to 494.0 and 0.50. This raises the question how dependent our results are on the initial conditions (N+ ) and if other initial conditions might reproduce first-round human behaviour.
.  Table : The demand and acceptance rate normative agents display when their behaviour is stabilized (preround ). Note that 'avg.' and 'sd.' refer to the average and standard deviation of the demand and acceptance rate over one run.
experiments for di erent uniform distributions. Figure depicts the resulting demands and acceptance rate for di erent initial demands and acceptance rates. We find that the resulting demands are fairly close to the initial demands. The demands can converge to higher or low than the initial demands depending on the initial acceptance rates. For some initial conditions human demands can be reproduced. In contrast, the acceptance rate converges to either . or . , but never comes close to the human . . Although hard to display in this figure, inspection of the data shows that its the edge cases (e.g., where the initial demand is or ) that convert to an acceptance rate of . . Figure : Le : the average resulting demand for di erent initial demands (x-axis) and di erent initial acceptance rates (colour). Right: the average resulting acceptance rate (y-axis) for di erent initial acceptance rates (x-axis) and di erent initial demands (colour). .
To gain more insight in the role norms can have in human decision-making we highlight a few more aspects of these results. First, Figure shows that although one might expect a normative agent model to 'normalize' both the demand and acceptance rate can converge on di erent values than they started. Second, table shows a fairly large standard deviation for both the resulting demand and acceptance rate. This shows that although agents act according to a norm there are still individual di erences per agent. .
We conclude that our normative model can reproduce human demands, but not simultaneously reproduce human acceptance rates. The simulation shows though that normative models can have counter-intuitive results where resulting norms dri away from the original norm and where agents can have individual norms and reproduce a similar variance behaviour as humans do.

Testing a combination of normative and value-based agents .
In our second experiment, we test if we can reproduce human behaviour with our theory that some people act according to their values, while others act according to their norms (VN). In Figure , we depict the demand and acceptance rate for di erent amount of normative agent.
. We compare the average demand and acceptance rate of our agents against the confidence interval wherein human play lies (grey area). As we already knew, we can reproduce human behaviour by simulating only value-based agents (under the specific parameters mentioned in . ). We find that we can also reproduce the demands if we allow up to half of the agents to act out of norms. This can be explained by that if we allow enough value-based agents to make realistic demands, the normative agents will learn to adhere to the norm these agents set. In contrast, only for a very small range of normative agents we can reproduce the acceptance rate as well. .
We conclude that our theory on values and norms (VN) reproduces human demands and acceptance rate as long as the amount of normative agents is limited. Figure : The average demand (le ) and average acceptance rate (right) over all agents (y-axis) for di erent amounts of normative agents (x-axis). Note that if there are, for example, 10 normative agents, there will be 22 value-based agents.

Reproducing multi-round behaviour .
For multi-round behaviour, we are interested in the behaviour agents display in round -. In contrast to the one-shot scenario, we have empirical data on what the real demand and acceptance rate is humans initially display (round ). Therefore, we initiate the model based on the empirical data and then test if the agents reproduce human behaviour in subsequent rounds. We evaluate the simulated data on if it falls in the % confidence interval in which human play lies. In addition, we highlight aspect of the learning dynamics the agents display.
Testing our learning homo economicus agent model .
The learning homo economicus agent is tested on its ability to reproduce multi-round behaviour in the UG. We assume here that the learning homo economicus agent already reproduces human play in the first-round, and see if it can reproduce the learning process humans display. We can run di erent version of the model depending on the initial utilities the agent attributes to their action (which are given by parameter s) .
We run simulations for s ∈ [0, 50] as exploration learns that the result of simulations outside this interval do not significantly di er from the result of the bounds. We depict the results in Figure . We can see that for both the average demand as well as the average acceptance rate the simulated behaviour does not fall into the confidence interval in which human behaviour lies. However, we can see some similarities in the learning dynamics between the simulated behaviour and the empirical data. In case of the proposer, on round 3, 5, 6 and 8, the simulated data shows a similar rise and fall as the empirical data for most values of s. In case of the responder, from round 5 onwards the simulated data shows a similar rise and fall as the empirical data for some values of s.
. One explanation of these results is that the learning homo economicus di ers from humans in wanting to explore other (on average di erent) options than first-round behaviour. The other behaviour yields enough utility to not change it mind back.
. We conclude that the learning homo economicus agent cannot reproduce the average demand and acceptance rate humans display. For some values of s the learning homo economicus agent reproduces some of the learning dynamic humans display.

Testing our value-based agent .
For our value-based agent model we can analytically see that it will not be able to reproduce the human dynamics of multi-round behaviour. The value-based agent behaviour does not change over time and it does not learn.
Testing our normative agent model .
The theory on norms (N) is tested on its ability to reproduce multi-round behaviour in the UG. We assume here that the normative agent already reproduces human play in the first-round, and see if it can reproduce the learn- Figure : The average demand (le ) and average acceptance (right) at di erent rounds. The coloured lines depict the behaviour of the agents, where the colour signifies the value of the s parameter (which influences the initial conditions). The grey area represents the % confidence interval wherein human play lies.
ing process humans display. In other words, we adapt our normative agent such that it does not act randomly the first round, but does the demand and average accept humans do (i.e. we change N+. ). .
In Figure , we depict the demand and acceptance rate of the normative agent over multiple rounds. We found that the average demands normative agents display is very similar to the average demands humans display. For almost all the individual points we can say with % confidence that they are the same as human play.
In contrast, the acceptance rates of the normative agents does not match that of humans. In the case of the proposer, the dynamics of the simulated data and the empirical data match in the general rise, but not in the small fluctuations. In case of the responder, the dynamics primarily di er. In particular, the initial drop in the simulated data and the drop on round 5 of the empirical data have no counterpart. Figure : The average demand (le ) and average acceptance (right) at di erent rounds. The black line represents the behaviour of the normative agents, the grey area represents the % confidence interval wherein human play lies.
. We can explain the initial drop in acceptance rates as follows. A er the first turn, the responders set their threshold to the first demand they saw (N+. ). A er this, about half of the demands are above and about half of the demands are below this threshold leading to the . acceptance rate. .
We conclude that the normative model cannot reproduce both human demands and acceptance rate. The learning of the proposer agent is similar to that of humans, but (at the same time) responder learning strongly di ers.

Testing a combination of value-based and normative agents .
In our last experiment, we test if we can reproduce human behaviour with our theory that combines both values and norms (VN). We depict the results in Figure . . We found that no single combination of value-based and normative agent completely reproduces human play. However, when the amount of normative agents is limited (e.g., ) the proposer and responder learning is similar to that of humans. In this case, most of the simulated data points fall within the % confidence interval wherein human play lies. Note that in the proponent case human behaviour is best matched with a large majority of normative agents, while in the respondent case a majority of value-based agents gives the best fit. We find that the dynamics of the proposer agents match the slight increase in demand humans display. The dynamics of the simulated responders di er from those of humans. In particular, a rise in acceptance rate a er round 5 is not reproduced. Figure : The average demand (le ) and average acceptance (right) at di erent rounds. The coloured lines depict the behaviour of the agents, where the colour signifies the number of normative agents. The grey area represents the % confidence interval wherein human play lies. .
We conclude that no single combination of normative and value-based agent can completely reproduce human play, but that for some particular combinations the average demand and acceptance rate o en lies within the % confidence interval humans display. In addition, some of the dynamics of the proposer or reproduced. The dynamics of the simulated responder strongly di ers from human play.

Discussion
. This paper compared empirical data on human behaviour to simulated data on agents to gain insight in to what extent agents with values and norms can represent human behaviour. We found that agents with values and norms can both evolutionary reproduce one-shot UG behaviour and, for most rounds, reproduces aggregate human behaviour in a multi-round scenario. Given the methodology of this paper, an agent with values and norms thus outperforms a learning homo economicus agent or an agent that uses solely values and norms in its ability to reproduce human behaviour. It outperforms the learning homo economicus agent and the normative agent in the one-shot UG and the learning homo economicus, value-based and normative agent in the multiround UG. The remainder of this section discusses to what extent this means agent with values and norms could represent human behaviour in explainable terms.
. The generalizibility of these results is namely dependent on the fact that they hold under a specific translation from theory to model, specific evaluation, specific parameter settings and a specific use case.
. We aimed to be transparent in our translation from theory to model so that other researchers can pinpoint on what aspects they agree and disagree and how these aspects influence the results. On aspects of the model we found necessary out of a pragmatic viewpoint, but not fundamental to the theory we aimed to study their influence on the output. For example, we showed that the normative model cannot reproduce human acceptance rate independent of its initial conditions. We hope this paper can serve a discussion between the social and computational sciences to pinpoint essential and accidental properties of values and norms. .
We evaluated the agent models on to what extent aggregate measures of simulated behaviour fall inside a % confidence interval wherein aggregate measures of human behaviour lie. We find that based on this evaluation measure, we can di erentiate models that cannot possibly reproduce human behaviour from models that can. For example, our learning homo economicus agent cannot (under any parameter setting) come close to both humans demand and acceptance rate in the one-shot UG. Our value-based agent and agent with values and norms can reproduce human demand and acceptance rates. However, this latter result depends on specific parameter settings for these models (i.e., di erence in values follow a certain normal distribution and there are a specific amount of normative agents). In future work, these parameter settings can be evaluated by using empirical data on how values are normally distributed and how many humans act out of norms (i.e., what Moss & Edmonds ( ) call cross-validation). Based on this, we argue that interviews that measure how humans individually explain results would be a valuable addition to the data available on the UG. .
We compared our simulated agents on data obtained on human behaviour in the UG: a lab experiment. Lab experiments have the advantage of allowing measurements in a reproducible controlled settings, which is the reason we were able to obtain a relatively large homogenous dataset of a meta-study. Natural decision making criticizes the lab setting as being unrepresentative for real-life decision making (Klein et al. ). In the case of the UG, it is indeed unclear what real-life process the ultimatum game is exactly meant to represent. This has at least two consequences. First, it is unclear what modelling assumptions should be made about the setting (e.g., what do the initial conditions means in an evolutionary setting). Second, in a setting with unstable conditions, high stakes and more uncertainty humans could use a di erent decision-making process than in the UG. Future work should balance findings from this lab context with findings in a more natural context.

.
By comparing an agent with values and norms to other agent models we gained several insights. First, we found that in the one-shot UG a value-based agent can reproduce human behaviour just as well as an agent model with values and norms. We conclude that both agent models can be used to explain aggregate human behaviour in this scenario. Note that although an agent model with values and norms introduces another concept, a reason to choose it over a value-based model is that it can explain behaviour in a more general context (i.e., the multiround scenario) or fits better with explanations humans give (e.g., in interviews). Second, the experiments show that values act as a static component that anchors the agent to certain behaviour, where norms form a dynamic component that can allow agent behaviour to dri away from human behaviour (e.g., in the oneshot UG) or towards human behaviour (e.g., in the multi-round UG). This gives insight into the role values and norms can have in humans and agents (i.e., as anchors and as dynamic learning components). Furthermore, this is relevant for creating ethical AI: AI that we create according to our ideas of values and norms can still dri away from behaviour we find acceptable. Third, we found that normative agents can still individually di er in behaviour just like humans. This is because agents can form di erent ideas of 'the norm' based on the di erent interactions they had.
. Although social simulation focusses on reproducing general aggregate patterns in human behaviour, we should be aware that there are aspects of human behaviour agents with values and norms do not reproduce. The most notable di erence is that nuances in the learning dynamics of the proposer and (especially) the responder behaviour in the multi-round scenario are not reproduced. Furthermore, if one is primarily interested in nuanced learning dynamics, then this research suggests that other agent models should be used over agents with values and norms. For social simulation researchers, this means that they should be aware of possible di erences between their agents with values and norms and humans in learning dynamics. .
We suggest a few directions for future work to find agent models that on an aggregate level reproduce human behaviour. First, we could specify in more detail when to use norms and when values. The results in the multiround scenario show that for the proponent case human behaviour is best matched with a large majority of normative agents, while in the respondent case a majority of value-based agents gives the best fit. We are careful to not conflate this with the claim that proposers mainly use norms while responders mainly use values. The average demand, acceptance rate and the di erent rounds all depend on each other. There are simulation runs with predominantly normative agents (that explain proposer behaviour) and value-based agents (that explain responder behaviour), but this is not the same as one run where both results are explained simultaneously by agents first proposing out of norms and then the same agents responding out of values. It could very well be that the low acceptance rate in runs with a high amount of normative agents is due to the fact that in that same run the demands are higher (and not because the agents use norms). Future work should use simulations to check if agents that sometimes use values and sometimes use norms can explain aggregate UG behaviour.
. Second, there are other deontic operators than 'should' that can be used to improve the normative agent model (Wright ). For example, the deontic operator 'may' can represent a range of possible demands the proposer considers permissible. Future work could test to what extent these di erent deontic operators allow the normative agent to reproduce human aggregate behaviour. .
Third, in experimental economics, there are several models that have been used to reproduce human behaviour (Kagel & Roth ). As discussed, one of these models, the learning homo economicus outperforms the agent with values and norm by being able to reproduce some of the nuanced learning dynamics. Future work could look at how to combine the benefits of both models to reproduce human behaviour more accurately. .
Last, considering that the motivation for this work is to use understandable terminology to explain the agent behaviour, we are more inclined towards using concepts humans use in their explanation. For example, Fehr & Fischbacher ( ) suggested that the concept of reputation can explain the learning dynamics in the UG: a responder could aim to build a reputation as a strong ('selfish') player.

Conclusion
. This paper aimed to compare empirical data on human behaviour to simulated data on agents with values and norms. We found that agents with values and norms can both evolutionary reproduce average one-shot UG behaviour and, in most rounds, reproduces the average demands and acceptance rates humans display in a multi-round scenario. We interpret this result as showing that agents with values and norms can provide understandable explanations that reproduce average human behaviour more accurately than other tested agent models (e.g., the homo economicus).
. We gained several insights into the role of values and norms in agent models. First, we found that our agents with values and norms cannot reproduce the nuanced learning dynamics humans display (in particular the responder behaviour in the multi-round scenario). Second, we found that agent models with solely values or solely norms can reproduce some human behaviour in one scenario, but to reproduce behaviour in both scenarios a combined model is necessary. Third, the experiments show that values act as a static component that anchors the agent to certain behaviour, where norms form a dynamic component that can allow agent behaviour to dri away from human behaviour (e.g., in the one-shot UG) or towards human behaviour (e.g., in the multi-round UG). Fourth, normative agents can still individually di er in behaviour just like humans, because agents can form di erent ideas of 'the norm' based on the di erent interactions they had. .
We discussed the dependence of these results on our translation from theory to model, parameter settings, evaluation and use case. Future work should be directed at pinpointing essential and accidental properties of values and norms, interviews that measure how humans individually explain results to validate micro aspects of the model and balance findings from this artificial lab context with findings on natural decision-making. .
Our study is a first step that shows how agents with values and norm can provide an improvement over simpler models in representing human behaviour in explainable terms.

Notes
Crawford & Ostrom ( ) distinguishes norms from rules. Rules di er from norms in that they have a unique sanction when one does not abide them. In the UG, there are predominantly norms at play and not rules as players can di er in the sanctions they apply: reject the o er or accept but lower their esteem of the opponent.
Note that the concept of norm of both Fishbein & Azjen ( ) and Crawford & Ostrom ( ) overlaps with what is o en called a social norms (as opposed to e.g. a legal or moral norm).
For ease of presentation, we chose 1000 with no monetary unit to the pie size. Although empirical work (Oosterbeek et al. ) shows that the e ect of the pie size is relatively small, in further work we need to check the critically of this assumption.
The catch here, is that these scholars do not believe that humans have played ultimatum games since the dawn of time, but that they have learned to make fair demands in (ultimatum-game-like) life experiences. Humans then display this behaviour at the first round of the actual psychological experiment. This is in contrast with the multi-round scenario were the simulation is actually compared to multiple rounds of real human ultimatum game play.
Note that for a full statistical analysis we will need an ANOVA-test. For our purposes, it is enough to concern ourselves with the findings of (Cooper & Dutcher ).
For the Java model code see: https://www.comses.net/codebase-release/ b dec -cd -f -a da-f bcfa/ For the R model code see: https://github.com/rmercuur/UltimatValuesR Note that we chose to model the denominator as 1000 and not as P ; the rationale is that we think the satisfaction of wealth increases absolutely and not relative to the pie size. In further work, we should further explore empirical work to support this modelling choice.
Note that in the case of the value-based agent the behaviour stabilizes in round as the agents do not learn.
To exactly conclude what amount of normative agents reproduce human behaviour we should do more rigorous statistical analysis (e.g. an ANOVA-test). However, for our purposes it su ices to look at the % confidence interval. This is not the same as being % confident that the two processes are the same. One way to check the similarity of time series is by fitting ARIMA-models to the two lines and compare those. However, for our purposes it su ices to look at the % confindence interval.
One advantage of agent-based models is that we do not have to restrict our theories to some linear combination of values and norms (as much of psychology does), but can theorize any functional connection between them (Castelfranchi ).