An Empirical Game-Theoretic Analysis of the Dynamics of Cooperation in Small Groups

: Many models of the evolution of cooperation have shown the importance of direct reciprocity (for example “tit for tat” strategies) or alternatively indirect reciprocity (conspicuous altruism based on a reputation or “image score”). In the latter case many models make the implicit assumption that group sizes are large relative to the expected number of interactions, which makes their analysis more tractable in several ways, not least by allowing us to ignore any strategic interaction between the direct and indirect classes of reciprocation strategy. However, in smaller groups the possibility arises that both classes of strategy will play a role in determining the equilibrium behaviour. Therefore we introduce a replicator dynamics model which incorporates both direct and indirect reciprocity, and use simulation and numerical methods to quantitatively assess how the level of cooperation in equilibrium is affected by changes in the group size and the frequency with which other group members are encountered. Our analysis shows that, for intermediate group sizes, direct reciprocity persists in equilibrium alongside indirect reciprocity. In contrast to previous simulation studies, we provide a sound game-theoretic underpinning to our analysis, and examine the precise conditions which give rise to a mix of both forms of reciprocity.


Introduction
. Dunbar ( ) conjectures that many of the features of human cognition and neuro-anatomy that are unique compared to other species, for example large brain size and linguistic capabilities, can be explained by selection pressure for larger group sizes in the ancestral environment (the social brain hypothesis (Dunbar )). Although larger group sizes have many benefits, for example increased protection from predators, there are also many costs and unique challenges for species that attempt to exploit a niche by cooperating with other individuals. .
Cooperation occurs when an individual takes an action which benefits another but at a cost to itself. Cooperators can be successful when their help is reciprocated. Reciprocation occurs when cooperators receive benefits in turn from the actions of others. When others do not reciprocate, we say that they defect or free-ride. The benefits to a cooperator may accrue directly from those who have been helped by the individual, in which case it is called direct reciprocity. .
The classical example of cooperation via direct reciprocity is illustrated by the tit-for-tat strategy in the repeated Prisoner's Dilemma game (Axelrod & Hamilton ; Axelrod ). In this game, pairs of agents repeatedly interact over many rounds of play. In each round, both players simultaneously choose whether to cooperate (C), or to defect (D). The resulting payo s are given in Table where R is the reward for cooperation, P is the penalty for defection, T is the payo that results in a temptation to defect, and S is the so-called "sucker's payo ". If these values satisfy T > R > P > S then the game qualifies as a prisoner's dilemma. .
Play then proceeds over many rounds with random stopping, and players are able to change their choice of cooperation or defection contingent on the history of play. Axelrod ( ) held a computerised Prisoner's Dilemma tournament and famously a strategy called "tit-for-tat" won the competition. This strategy cooperates conditionally based on the action chosen by its partner in the previous round of play; if the opponent cooperates C D C R, R S, T D T, S P, P Table : Payo s for a single stage of the Prisoner's Dilemma then tit-for-tat reciprocates with cooperation. On the other hand it defects if, and only if, the opposing player defected in the previous round. If tit-for-tat encounters unconditional cooperators or other tit-for-tat players, then this results in direct reciprocity. Direct reciprocity occurs when cooperation is directly reciprocated by the partner who received the benefits of the cooperative act. .
On the other hand, indirect reciprocity occurs when a cooperative action is not reciprocated directly, but rather via a third-party who did not receive the original benefits. Indirect reciprocity can occur when agents make use of the reputation, or "image score", of other agents in conditioning their strategy, in contrast to the personal history of interactions which typically bootstraps direct reciprocity. Nowak & Sigmund ( a) model indirect reciprocity using a variant of the prisoner's dilemma called a donation game. As with the prisoner's dilemma, pairs of agents interact over many rounds of plays and the resulting payo s are given in Table . The payo s are chosen such that R = γ(k − 1), S = −γ, T = γk and P = 0, where γ > 0 is a constant representing the cost of cooperation, and k > 1 is a constant which determines the cost/benefit ratio. Here the act of cooperation can be interpreted as a donation from one agent to another, where the recipient receives some multiple greater than one of the original investment. .
Provided that the cost/benefit ratio is greater than one, then a social surplus can be generated through reciprocation. This allows us to model cooperation in group settings which closely resemble social dilemmas that occur in nature. For example, intuitively, in an ecological context, we might interpret the interaction between agents as an allo-grooming activity in which the positive fitness payo γk represents the fitness gains from parasite elimination, whereas the fitness cost −γ represents the opportunity cost of foregoing other activities, such as foraging, during the time γ allocated for grooming (Russell & Phelps ).
. Nowak & Sigmund ( ) start with the donation game, and introduce an "image score", which is an integer counter which is incremented every time an agent cooperates, and decremented every time an agent defects. They show that strategies which cooperate conditional on whether their partner's image-score is positivewhich they call discriminators -are able to survive in equilibrium under various models of natural selection.

.
Note that both direct and indirect reciprocity make use of information about the other players in the game. Direct-reciprocity, as embodied by tit-for-tat, makes use of direct observations of others' behaviour, and requires the player to personally remember the choices made by the other players they have interacted with. On the other hand, indirect reciprocity, as embodied by discriminatory cooperation, uses information that is shared with all players. We might expect the success of each form of reciprocity to be contingent on the reliability and availability of the underlying information used by each strategy. Moreover, we should expect the availability and quality of this information to vary between settings.
. Many theoretical models of the evolution of cooperation via reciprocity start with the assumption that agents interact in groups that are large relative to the expected number of pairwise interactions. These simple models are analytically tractable. However, in small or intermediate sized groups, the analysis is complicated by the fact that strategies based on both direct and indirect reciprocity can interact.
. This is of utmost importance if we are to take these models seriously as explanations of actual cooperative behaviour in the real-world since many collective-action problems, in both human and non-human societies, occur between small groups of agents. For example, in human societies collective-action problems can occur between small numbers of nation states in the context of trade and climate negotiations (Tietenberg ). Moreover, although we sometimes think of human societies as vastly interconnected, this is a parochial perspective; until very recently most people did not live in cities in developed nations, but rather in small isolated agricultural communities in the least developed countries (Ostrom , ; Diamond ). In nature, the representative group sizes of many non-human social animals are typically of the order of between 10 and 10 2 individuals (Baird & Dill ; Packer et al. ), and even single-celled organisms have demographic constraints limiting interactions to groups sizes of the order of 10 3 individuals (Cremer et al. ). Finally, in the context of artificial agents there are many scenarios which constrain interactions to smaller groups of agents; for example, geographic and spacial constraints can lead to coordination problems between small numbers of autonomous vehicles, e.g. at tra ic intersections (Arsie et al. ). .
In this paper we introduce a framework for studying the cooperation in groups of varying size and intimacy, which incorporates reputation in the form of indirect reciprocity (Nowak & Sigmund ) together with direct reciprocity in the form of tit-for-tat strategies (Axelrod & Hamilton ). .
We begin with a review of existing studies of trust and reputation: in Section we discuss one of the simplest models for studying trust and cooperation -the Prisoners' Dilemma -which has been extensively studied both theoretically and experimentally (with both human and computerised agents), and review various refinements and extensions to the basic game. We proceed to discuss some of the issues inherent in achieving stable cooperation in groups where more than two agents interact, and review more advanced models that attempt to incorporate reputation. In Section we describe our model of cooperation in detail. In Section we describe our methodology detailing how we solve the model numerically. Finally we present our results in Section and conclude in Section .

Related Work
. There have been a number of experimental studies of the Prisoner's Dilemma with human subjects in which tit-for-tat like strategies are commonly observed to be voluntarily used, e.g. (Wedekind & Milinski ). .

However, Roberts & Sherratt (
) noted that tit-for-tat like strategies are not always observed in ecological field studies, and postulate that this is because the original model cannot account for di erential levels of cooperation. They studied a simulated evolutionary tournament of a variant of the game that allows for di erent levels of cooperation, and found that a strategy raise-the-stakes was an evolutionary stable outcome. In later work Roberts & Renwick ( ) studied human subjects and found that they used a strategy similar to raise-the-stakes. This strategy starts o with a small level of cooperation and then rises to maximal cooperation dependent on the other player's level of cooperation in previous rounds. The behaviour of this strategy is qualitatively consistent with the self-reported behaviour of human subjects in longitudinal studies of friendship development as reported by Hays ( ). However, the latter study was restricted to North American students in their first year of study, and the model of Roberts & Sherratt ( ) has been questioned due to its reliance on discrete increments (Killingback & Doebeli ).
. These earlier studies focused on social dilemmas which involve only dyadic interactions. However, in reality many social dilemmas arise when many agents interact with each other. In many-agent interactions, two key additional considerations come into play, which we discuss in turn below.
. Firstly, the social-structure in which a population of agents are embedded can have a significant e ect on the outcome. For example, the topology of the social-network can have a significant e ect; in particular, scale-free networks can promote cooperation without the need for conditional reciprocation (Santos et al. b). This assumes that the social network is a given, which then constraints collective-action, but it may be more appropriate to view social-structure as arising from collective-action; Santos et al. ( a) allow for the possibility that the social network can change as agents break connections with defectors, and form new links chosen at random from the neighbourhood of the severed node, and Phelps ( ) introduces a model which allows agents to choose new nodes with which to connect based on reputation information. Thus, it may be more appropriate to view social structure and strategies based on conditional reciprocation as being in co-evolution with each-other. .
The second issue with many-player interactions is that as the group grows larger, information about previous encounters with particular individuals becomes less useful simply because the probability of re-encountering the same individual grows smaller. In this case strategies like tit-for-tat are not su icient on their own to prevent free-riding in larger groups.
. Tit-for-tat relies on private information that has been obtained directly from previous personal encounters with other agents. However, there are other potential sources of information about the propensity of agents to cooperate. Nowak and Sigmund (Nowak & Sigmund a,b, ) use an evolutionary game-thoeretic model to analyse the e ect of reputation information which is globally available to all agents in a population, which they call "image scoring". The central idea is that defection and cooperation are globally tracked and made available in a public score; agents can increase their image score by cooperating, but when they defect their score is reduced. Thus, when deciding whether or not to cooperate with an agent, indirect information about that agent's propensity to cooperate is now available.
. This information is indirect because it has not been obtained by personal experience, but rather by a third-party.
In turn, if agents now cooperate conditional on a positive image score, the presence of such discriminators can lead to indirect reciprocity; cooperating with somebody not because they are expected to reciprocate directly, but because the reputation so gained will encourage cooperation from strangers. Nowak & Sigmund ( a) showed that, under some restrictive assumptions, provided the population contains a su icient fraction of discriminators at the outset, then natural selection will eventually eliminate all defectors from the population.

.
Although the framework of evolutionary game-theory used in these models was originally formulated to describe the process of natural selection operating on genes, the same mathematical formalism can be used to describe a process of cultural evolution in which agents learn, rather than evolve, by imitating the strategies of other agents who appear to be more successful (Boyd & Richerson ; Weibull ; Kendal et al. ; Phelps & Wooldridge ). Indeed, Nowak and Sigmund's theoretical models are supported by evidence from empirical studies in which human subjects are observed to make-use of image-scores in social dilemma games played in the laboratory (Wedekind & Milinski ; Seinen & Schram ). .
Nowak and Sigmund analysed their original models under the assumption that the size of the population is very large. This assumption makes modelling more tractable since many terms in the model become zero in the limit as the number of agents tends to infinity, and moreover it is not necessary to consider interaction between strategies based on indirect versus direct reciprocity, since the probability of re-encounter is negligible. .
However, given that in reality many interactions do occur within smaller groups or populations, it is surprising that relatively little attention has been given in the literature to understanding the quantitative relationship between the size of the group and the resulting level of cooperation, when the e ects of di erent forms of reciprocity are considered. .
It is a well known empirical observation that in a traditional public-goods setting, free-riding increases as the group size increases (Olson ; Kollock ; Nosenzo et al. ). Although there have been some attempts to explain such phenomena theoretically, typical models do not explicitly consider the quantitative relationship between the group size and the resulting form and reliability of information available to strategies which are based on trust and reputation. For example, Heckathorn ( ) provides a conceptual framework which formulates public-goods games in terms of an underlying evolutionary game played between pairs of players randomly chosen from a larger population. Their model assumes that reputation-based strategies can acquire perfect information about their opponent's propensity to cooperate simply by paying a fixed information cost, without considering how this information is actually acquired, and how its reliability might vary with the group size. Under this restrictive assumption, the size of the population has no bearing on the final level of contribution to the public-good. .
However, it is interesting to ask whether similar results would be obtained if we drop this assumption, and explicitly consider how reputation information is obtained. This entails explicitly modelling direct and indirect reciprocity. A priori, we should expect the group size to have a significant e ect on the outcome. For example, in a small population, it may pay for an individual to switch to between direct reciprocity and reputation depending on the make-up of the rest of the group: in a population dominated by direct reciprocity there is little incentive to build a reputation. Similarly if the rest of the population o er help conditional on reputation. This reasoning suggests that the dynamics of switching between these two types of strategy would play an important role in determining the steady-state outcome. .
Agent-based models have been used to analyse asymptotic outcomes in small populations in which agents can use both direct and indirect reciprocity in order to condition their donations (Conte & Paolucci ; Bravo & Tamburino ; Roberts ; Boero et al. ; Phelps ). These simulation analyses demonstrate that both forms of reciprocity can persist in steady-state, either when agents use individual-learning to adjust their strategy (Phelps ), or when strategies evolve through natural selection (Bravo & Tamburino ; Roberts ). Although these models are able to account for both forms of reciprocity in smaller groups, their reliance on simulation methods means that they are not able to provide a systematic exploration of the dynamics of learning which lead to asymptotically-cooperative outcomes. .
For example, the model described in Roberts ( ) is able to deal with small populations and genetic dri , but the analysis is based on a restricted set of initial conditions in which the initial makeup of the population has equal propensity over all strategies. Similarly, Phelps ( ) provides a qualitative analysis of the dynamics of the learning, but lacks an account of static equilibria, and analyses only the average level of cooperation without di erentiating the social-welfare of di erent attractors. .
Bravo & Tamburino ( ) show that image-scoring in a simulated alternating trust game leads to cooperative outcomes under two distinct experimental treatments: one in which agents have a high probability of reencountering one another, and another in which re-encounter is extremely improbable. Thus in the former treatment, the image-score is more likely to encapsulate information about direct experience, whereas in the latter it encapsulates information from others. However, in this model agents cannot explicitly switch between using one form of information over the other, and there is no systematic analysis of the strategic interaction between these two forms of reciprocity other than reporting the final level of cooperation in each separate experimental treatment. .
Similarly Boero et al. ( ) introduce an agent-based model in which information about the returns of financial securities is communicated among a population of agents. Agents in this model can cheat by misreporting information. Two experimental treatments are analysed corresponding to trust and reputation: one in which agents can use their own private experience of previous interactions in order to judge the trustworthiness of other parties (analogous to direct reciprocity), and another in which the first-order accuracy of other agents' reports is itself shared and communicated (analogous to indirect reciprocity). However, again, as with the model of Bravo & Tamburino ( ), the interaction between these two behaviours is not considered, and agents do not have the ability to switch between them depending on how these strategies perform. .
We address the issues in the aforementioned analyses by introducing a model of cooperation which incorporates both direct and indirect reciprocity, and analysing it using a methodology called empirical game-theory (Phelps et al. ; Walsh et al. ; Wellman ), which uses a combination of simulation and rigorous game-theoretic analysis. By so doing we are able to quantitatively analyse cooperation in smaller groups without making assumptions in the limit, and we are able to gain insights into the strategic interaction between di erent forms of reciprocal behaviour by analysing both static (Nash) equilibria and also the dynamics of evolution of each of these strategies. .
In the following section we give a formal description of our model before describing the empirical game-theory methodology in Section .

The Model
. The population consists of a set of agents A = {a 1 , a 2 , . . . , a n }. Interaction occurs over discrete time periods t ∈ {1, 2, . . . , N }. During each time period a randomly chosen pair of agents (a i , a j ) interact with each other. We refer to n as the group size and N as the expected number of interactions. .
At each time period t agent a i may choose to invest a certain amount of e ort u (i,j,t) ∈ [0, U ] ⊂ R in helping their partner a j , where U ∈ R is a parameter determining the maximum investment. This results in a negative fitness payo −u to the donor, and a positive fitness payo ku to the recipient of help a j : where φ (i,t) ∈ R denotes the fitness of agent a i at time t, and k ∈ R is a constant parameter. .
In the special case that N = 2 and n = 2, this model has the same payo structure as the original one-shot prisoner's dilemma. However, the more general social dilemma modelled here allows for repeated interaction between di erent pairs of individuals in a larger group n > 2, who can modify their state and remember the history of play over a number of repeated interactions N > 2. Once these interactions have occurred, all the agents' state is discarded except their accrued fitness, and then evolution proceeds. When the total expected number of number of interactions, N , is large relative to n then repeated encounters between the same pairs of individuals are less likely, and intuitively we should expect this to negatively influence the e icacy of direct-reciprocity versus indirect-reciprocity, since there is correspondingly less information relating to direct interactions. .
Since we are interested in the evolution of cooperation, we analyse outcomes in which agents switch between values of u that maximise their own fitness φ i . Provided that k > 1, over many bouts of interaction it is possible for agents to enter into reciprocal relationships that are mutually-beneficial, since the donor's initial cost u may be reciprocated with k × u yielding a net benefit ku − u = u(k − 1). Provided that agents reciprocate, they can increase their net benefit by investing larger values of u. However, by increasing their investment they put themselves more at risk from exploitation, since just as in the alternating prisoner's dilemma (Nowak & Sigmund ), defection is the dominant strategy if the total number of bouts N is known: the optimal behaviour is to accept the help without any subsequent investment in others. In the case where N is unknown, and the number of agents is n = 2, it is well known that conditional reciprocation is one of several equilibria in the form of the so-called tit-for-tat strategy which copies the action that the opposing agent chose in the preceding bout at t−1 (Miller ). However, this result does not generalise to larger groups n > 2 (Fader & Hauser ).
. Nowak & Sigmund ( b) demonstrate that indirect reciprocity can emerge in large groups, provided that information about each agent's history of actions is summarised and made publicly available in the form of a reputation or "image-score" r (i,t) ∈ [r min , r max ] ⊂ Z. The image-score r i summarises the propensity-to-cooperate of agent a i . As in the Nowak and Sigmund model, image scores in our model are initialised ∀ i r (i,0) = 0 and are bound at r min = −5 and r max = 5. An agent's image score is incremented at t + 1 if the agent invests a non-zero amount at time t, otherwise it is decremented: and agents invest conditionally on their partner's image score: where σ i is a parameter determining the threshold image score above which agent a i will cooperate, and γ ∈ R is a global parameter. .
In general, it is the cost-benefit ratio γ/γk = 1/k relative to the social viscosity of the group that determines whether or not cooperation persists in equilibrium (Nowak ). Accordingly, in our analysis we hold the costbenefit ratio constant by choosing fixed parameter values for γ and k while systematically varying the number of agents n, and the expected number of rounds of play N , as described in the next section. We also repeat the analysis with k = 3.5.
. Nowak & Sigmund ( a) demonstrate that the conditions under which cooperation is achieved depend on the presence of discriminators; that is, agents which use a threshold of σ i = 0 and thus only cooperate with others if they have a good reputation. In their paper they show that if the initial fraction of discriminators in the population is above a critical value then the population converges to a mix of discriminators and cooperators, and defectors are completely eliminated. This implies that strategies based on indirect reciprocity via reputation are an essential prerequisite for the evolution of cooperation in large groups.

.
The above model contrasts with that of Roberts & Sherratt ( ) who study interactions in which agents make their investment decision solely on the basis of private information about the history of previous interactions. In their model an agent a i decides on the level of investment to give a j as a function ψ i of the most recent encounter with a j : In our model, we are interested in the interplay between both of these forms of decision making, and thus we allow agents to use either form of decision function with a view to exploring the tension between reputation and personal experience as a basis for a iliative behaviour. .
We are particularly interested in the e ect of group size n and the number of interactions N on the evolution of cooperation. The analytical model of Nowak & Sigmund ( a) assumes: a) that the group size n is large enough relative to N that strategies based on private history, such as tit-for-tat, are irrelevant (since the probability of encountering previous partners is very small); and b) that the we do not need to take into account the fact that an agent cannot cooperate with itself when calculating the probability with which any given agent is likely to encounter a particular strategy.

.
However, in order to model changes in group size, and hence interaction in smaller groups, it is necessary to drop both of these assumptions. The resulting model is more complicated, and it is not possible to derive closed-form solutions for the equilibrium behaviour. Therefore we use simulation to estimate payo s, and numerical methods to compute asymptotic outcomes, as described in the next section.

Methodology
. In order to study the evolution of populations of agents using the above strategies, we use methods based on evolutionary game-theory. However, rather than considering pairs of agents chosen randomly from an idealised very large population, our analysis concerns interactions amongst smaller groups of size n > 2 assembled from a larger population of individuals. The resulting game-theoretic analysis is complicated by the fact that this results in a many-player game, which presents issues of tractability for the standard methods for computing the equilibria of normal-form games. .
Heuristic approaches are o en used when faced with tractability issues such as these. In particular, heuristic optimisation algorithms, such as genetic algorithms, are o en used to model real adaptation in biological settings (Bullock ). The standard heuristic approach to modelling multi-agent interactions is to use a Coevolutionary algorithm (Hillis ; Miller ). In a co-evolutionary optimisation, the fitness of individuals in the population is evaluated relative to one another in joint interactions (similarly to payo s in a strategic game), and it is suggested that in certain circumstances the converged population is an approximate Nash solution to the underlying game; that is, the stable states, or equilibria, of the co-evolutionary process are related to the evolutionary stable strategies (ESS) of the corresponding game. However, there are many caveats to interpreting the equilibrium states of standard co-evolutionary algorithms as approximations of game-theoretic equilibria, as discussed in detail by Ficici & Pollack ( , ). .
In order to address this issue, we adopt the empirical game-theory methodology (Phelps et al. ; Walsh et al. ; Wellman ), which uses a combination of simulation and rigorous game-theoretic analysis. The empirical game-theory method uses a heuristic payo matrix which is computed by running very many simulations, as detailed below. .
The payo matrix is said to be heuristic because several simplifying assumptions are made in the interests of tractability. We can make one important simplification by assuming that the game is symmetric, and therefore that the payo to a given strategy depends only on the number of agents within the group adopting each strategy. Thus for a game with j strategies, we represent the payo matrix as a map f : Z j → R j . .

For a given a mapping f (p) =q
in the payo matrix, the vectorp = (p 1 , . . . , p j ) represents the group composition, where p i specifies the number of agents who are playing the i th strategy, andq represents the outcome in the formq = (q 1 , . . . , q j ) where q i specifies the expected payo to the i th strategy.
. For a game with n agents, the number of entries in the payo matrix is given by For example, for n = 10 agents and j = 5 strategies, we have a payo matrix with s = 1001 entries.
. For each entry in the payo matrix we estimate the expected payo to each strategy by running a total of 10 5 simulations and taking the mean fitness rounded according to the corresponding standard error. .
For example, if we have an entry in the payo matrix p = (5, 4, 1, 0, 0) then we would run 10 5 simulations with n = 5 + 4 + 1 + 0 + 0 = 10 agents, five of which would be initialised to use the first strategy, four to use the second strategy, one using the third strategy, and zero agents using the remaining strategies. The process for deriving a heuristic payo matrix is illustrated in Figure . .
With estimates of the payo s to each strategy in hand, we are in a position to model the evolution of populations of agents using these strategies. In our evolutionary model, we do not restrict reproduction to within-group mating; rather, we consider a larger population which temporarily forms groups of size n in order to perform some ecological task. Thus we use the standard replicator dynamics equation (Weibull ) to model how the frequency of each strategy in the larger population changes over time in response to the within-group payo s: wherem is a mixed-strategy vector, u(m,m) is the mean payo when all players playm, and u(e i ,m) is the average payo to pure strategy i when all players playm, andṁ i is the first derivative of m i with respect to time. Strategies that gain above-average payo become more likely to be played, and this equation models a simple co-evolutionary process of adaptation.

Each of the individual simulations consists of the iterative process illustrated in
In our analysis we solve this system numerically: we choose 10 3 randomly sampled initial values which are chosen uniformly from the unit simplex by sampling from a Dirichlet distribution (Kotz et al. ), and for each of these initial mixed-strategies we solve Eq. as an initial value problem using R (R Core Team ; Soetaert & Petzoldt ). This results in 10 3 trajectories which either terminate at stationary points, or enter cycles. This process is illustrated in Figure below. .
We consider j = 4 strategies: . C which cooperates unconditionally (σ i = r min ); For i = 1 to 10,000 For i = 1 to 10,000 q 1 q 2 q 3 C S D Figure : Computing the heuristic payo matrix f (p) =q for n = 3 agents and j = 3 strategies. . D which defects unconditionally (σ i = r max + 1); . S which cooperates conditionally with agents who have a good reputation (σ i = 0) but cooperates unconditionally when reputations have not yet been established; . T which cooperates with agent a j provided that a j cooperated with a i on the previous encounter, and cooperates unconditionally against unseen opponents. .
Although this space of strategies is limited, and there are many variants of the discriminatory strategy which, for example, use di erent assessment rules for ascribing reputation depending on the reputation of both the donor and the recipient (Nowak & Sigmund ), we restrict attention to the simplest strategies because of their simplicity and corresponding universality across species: direct reciprocity has been observed in primate grooming interactions (Barrett et al. ), and there is some empirical evidence to suggest that chimpanzees are capable of at least the simplest assessment rule for indirect reciprocity -"scoring" -as modelled here (Russell et al. ). The image scoring assessment rule is more plausible as a mechanism for understanding reciprocity in, e.g. primate allo-grooming, because it does not require language or other sophisticated cognitive resources: Choose a random vector x of size j such that ∑x = 1 Numerically Integrate equation (2)   .
A major advantage of our approach over other simulation studies is that there are very few free parameter settings, which are summarised in Table . The key values are the parameter settings for the multiplier and cost values, which determine the cost-benefit ratio. As discussed previously, theoretical considerations suggest that it is the social-viscosity relative to the cost-benefit ratio which determines the asymptotic outcome, and we systematically vary the former as described in the next section. The values γ = 10 −1 and k = 10 were chosen to correspond to those of the original simulation study of Nowak & Sigmund ( a)-see p.
therein, but we also analyse the model with a di erent multiplier of k = 3.5 to test the robustness of our results.

.
The remaining parameters e ect numerical estimates and sample sizes. As can be seen we have chosen large sample sizes, and as shown in the next section the corresponding p-values and standard errors are very small throughout our study. All of the code used in our experiments is available in the public domain and can be downloaded under an open source license (Phelps ).

Results
. Initially we restrict ourselves to a very simple scenario where we have n = 3 agents choosing between two strategies. Table shows the heuristic payo for three agents choosing between unconditional defection D or unconditional cooperation C. The le -hand column shows the number of agents adopting each pure strategy, and the right-hand column shows the estimated expected payo to each strategy. It is illuminating to compare this with the standard analytical -player normal-form payo matrix for our model expressed in the same combinational form in Table . In the analytical two-player case the multiplier k determines the ranking of the four possible payo combinations, which are denoted T , R, P and S representing the temptation to defect, the reward for cooperation, the punishment for defection, and the "sucker's" payo respectively. For only two agents the ranking of these payo s determines the structure of the social dilemma. In the case that k > 1 then T > R > P > S which is the classical prisoner's dilemma. However, when there are more than two agents, there are more payo combinations. In Table we see that when one cooperator interacts with two defectors, the temptation to defect is lessened since the two defectors interact with each other as well as the cooperator. On the other hand, the single cooperator in this case su ers the full sucker's payo . This can be contrasted with next row in the table where we have two cooperators interacting with a single defector. Here the temptation to defect is strong since the defector can fully exploit the cooperators without su ering any defection, but the sucker's payo is compensated by the reward from cooperation arising from the interaction of the two cooperators.
. . Table : Heuristic payo matrix for n = 3 agents choosing between unconditional cooperation C, or unconditional defection D n(C) n(D) u(C) u(D) Table : Analytical payo matrix for n = 2 agents .
Although Table could have been obtained analytically, the situation becomes much more subtle and complex when we introduce additional agents, and additional strategies representing reciprocity. We next extend our analysis to n = 10 agents, j = 3 strategies, resulting in a payo matrix with 264 rows. Rather than tabulate the numerically-obtained payo s, we proceed directly to analysing this heuristic game by sampling 10 2 initial values and integrating the replicator dynamics specified by Eq. .

.
Since mixed strategies represent population frequencies, the components ofm sum to one. Therefore the vectorsm lie in the unit-simplex j−1 = {x ∈ R j : In the case of j = 3 strategies the unit-simplex 2 is a two-dimensional triangle embedded in a three-dimensional space which passes through the coordinates corresponding to pure strategy mixes: (1, 0, 0), (0, 1, 0), and (0, 0, 1). We use a two dimensional projection of this triangle to visualise the population dynamics. Figures and show the phase diagram for the population frequencies when we analyse the interaction between cooperators (C), defectors (D) and discriminators (S) when we have a small group of n = 10 agents. .
Each point in the above graphs represents the state of the population at a given moment in time. The triangle represents the unit simplex; i.e. it contains all vectors whose components sum to one. In the bottom-le corner   .

As in
Nowak & Sigmund ( a) we find that a minimum initial frequency of discriminators is necessary to prevent widespread convergence to the defection strategy (the basin of attraction whose trajectories terminate in the bottom right of the simplex). .
Using our analysis, we can quantify how the size of this basin changes in response to the number of pairwise interactions per generation N . As N is increased from N = 13 ( Fig. ) to N = 100 (Fig. ), we see that the basin of attraction of the pure defection equilibrium is significantly decreased, and correspondingly the critical threshold of initial discriminators necessary to avoid widespread defection. Defection is less likely as we increase the number of interaction relative to the group size.
. In smaller groups it is important to take into account the interaction between strategies representing both direct and indirect reciprocity, since there is a non-negligible probability that agents will repeatedly encounter previous partners; if we increase N relative to n we need to consider the e ect of strategies that take into account private interaction history (direct reciprocity) as represented by the T strategy.
. Figures and show the co-evolution between the T , D and S strategies. For N = 13, the results are virtually indistinguishable from the scenario in which we substitute T with unconditional cooperation C ( Fig. ). This is not surprising since the default behaviour of T is to cooperate in the absence of specific information about a particular partner. However as we increase N we see that T becomes more e ective; for N = 100 interactions, D remains a pure-strategy equilibrium, but with a significantly reduced basin-size compared to the scenario where S interacts with S and D. Neither form of reciprocation is dominant over the other, suggesting that both forms of reciprocity could play an important role in smaller groups.
. This is highlighted by an analysis of the heuristic payo s for each strategy in a simplified setting where we have only three agents and three strategies. Table shows the heuristic payo matrix for this setting. Clearly both the S and T strategies are vulnerable to defectors. However, there is a quantifiable relative di erence in how well they perform against defection: indirect reciprocity (S) gains a slightly higher payo than direct reciprocity strategy (T ) in the case where each agent adopts a di erent strategy (the fi h row of Table ).
. This is not surprising, since the agent using the discriminatory strategy S can gain valuable information by observing the interaction of the other two agents. The T strategy has no prior information about the behaviour of the defector, and so sacrifices a significant payo by cooperating on the first move. On the other hand, the discriminatory strategy has a probability of observing this interaction, in which case it will consistently defect against the defector on subsequent encounter. Moreover the pure-strategy best-reply for the defecting player in this situation is to switch to either S or T (row eight or nine). On the other hand, the decision for a player  Tit-for-tat, discrimination and defection for n = 3 agents adopting one of the reciprocating strategies is asymmetric: starting from row five, we can obtain a significantly higher payo (0.45) by choosing direct reciprocity (T ) in the next row down, as compared with a switch to indirect reciprocity in the previous row (0.15). Here the informational advantages to a passive S player are cancelled out by the fact that the scoring strategy S considered here is unable to distinguish between punishment and defection because a discriminator who fails to help a defector themselves incurs a reputation penalty. .
In small groups, this e ect can lead to an increase in the frequency of direct reciprocity. By analysing the mixedstrategy case using the replicator dynamics we can see that although neither S nor T dominate each other, direct reciprocity attracts a slightly higher following; that is, the distribution of mixed-strategy equilibria containing both T and S is skewed towards T . This can be seen from the slight curvature of the trajectories towards the T direction in Fig. (we will return to this discussion with greater statistical rigour below). .
We obtain qualitatively similar results as we increase the number of agents to n = 10 while holding N fixed -Figures and shows how the direction field changes as we move from a smaller ( Fig. ) to a larger (Fig. ) group. Here defection becomes slightly more stable, and the curvature of the trajectories in the T direction is  . When repeated encounters are rare, T does not gain useful information and its behaviour, and corresponding frequency, is identical to C. However, as the viscosity of the group increases and we move towards the right on the graph we see that the frequency of direct reciprocity increases as it gains more information. Asymptotically, its frequency approaches that of S. In a well-mixed group both forms of reciprocity are e ective in reducing the attractor for all-out defection. As defectors become less prominent, so do the strengths and weaknesses of each form of reciprocity in dealing with them, and their frequencies converge. Thus as the quality information available to direct reciprocity increases with increased interaction, we see direct reciprocity in intermediate frequencies between cooperation and discrimination. .
Although there is some overlap in the standard error of the observed frequencies, we are able to reject the two    Table : p−values under a t-test. We compare the observed frequency of the tit-for-tat strategy (T ) with that of the cooperate strategy (C) and the discriminatory strategy (S) using a two-sample t-test. Intermediate values of N/n show statistically-significant di erence in expected frequencies.
null hypotheses that the frequencies of T versus C, and T versus S are identically distributed. .
The inflexion in the graph for small values at N = 5 and N = 10 occurs because of noise in the payo s introduced by the fact that with a small number of interactions not every agent is chosen to interact before reproduction or learning occurs. Nevertheless the persistence of indirect reciprocity at intermediate frequencies between cooperation and discrimination is still robust at this extremity. We obtain similar results when the number of agents is increased to n = 10 ( Fig. ). .
In the -player version of the game the outcome of the social dilemma is the same for all values of the multiplier k > 1, since this defines the ranking in payo s over the four possible strategy combinations. However, this does not necessarily hold in the extended version of the game, so we must take care to explore the e ect of the multiplier. Fig. shows the e ect of keeping the number of agents at n = 10, but reducing the multiplier to k = 3.5. Although, as we would intuitively expect, this reduces the level cooperation, we see that both forms of reciprocity continue to persist, and that our central result still holds. .
Returning to Figures and we see that each of the stationary points on the edge T S, which represent mixedstrategy equilibria over direct and indirect reciprocity, are the terminations of trajectories which originate in the interior of the simplex, which implies that these are also Nash equilibria. This would suggest that mixedstrategies between direct and indirect reciprocity might be observed under alternative learning dynamics. As discussed in Section , the replicator dynamics was originally proposed as a model of genetic evolution but can also be interpreted as a model of social learning in which strategies replicate through imitation. However, when strategies are acquired through social learning it may be di icult for agents to accurately determine their utility (Boyd & Richerson ). In such cases, a useful heuristic to determine the most useful strategies may be to copy the most frequently-occurring variant (Kendal et al. ; Skyrms ). This conformist bias can be modelled by introducing a weighting β over the utility u(e i ) of a strategy i and its frequency x i : and then substituting u in place of u in the ODE for the replicator dynamics (Eq. ). We might expect any small di erences in equilibrium outcomes to be amplified by conformity, and indeed this is precisely what is observed. Fig. illustrates the e ect of introducing a small amount of conformist bias β = 0.2 into the model. Di erences in frequencies between direct and indirect reciprocity and unconditional cooperation are more pronounced, and more rounds of play are required in order to achieve symmetry between direct and indirect reciprocity. Nevertheless, our central finding is robust under this alternative model; both forms of reciprocity persist, and direct reciprocity is found at intermediate frequencies between cooperation and discrimination.

Conclusion
. We have introduced a framework for analysing reciprocity within small groups of varying size. By using simulation and numerical methods we are able to avoid making assumptions contingent on values tending to infinity or zero, while simultaneously retaining the rigour of a game-theoretic analysis.

.
Our model incorporates both direct and indirect reciprocity, and we showed that for small groups both direct and indirect reciprocity persist in equilibrium, with neither strategy dominating the other. This finding is robust to an alternative model of agents' adaptation based on social learning with conformist bias, and also under lowviscosity conditions which induce noise. In contrast to previous studies, by using the empirical game-theory methodology, we have been able to provide a sound game-theoretic underpinning to our analysis, showing that the results obtained from analytical models are a special case under our analysis. In so doing we have been able to show that our results are not contingent on a restricted set of initial conditions. Moreover, by identifying the existence of mixed-strategy Nash equilibria over direct and indirect reciprocity in a symmetric game of cooperation, we have shown that in settings where interactions with previously-encountered agents are likely, it pays to to use trust in addition to reputation in the design of autonomous agents.
. The persistence of both forms of reciprocity occurs as a direct consequence of trade-o s inherent in the type of information each strategy uses; indirect reciprocity can gain useful information about unseen partners, provided that the scoring information is su iciently accurate to enable selective aid to potential cooperators. However, there is a negative feedback e ect as increasing levels of discrimination coupled with the presence of defectors introduce unreliable scores due to the attribution issue. In such circumstances, the information provided by personal experience may be more reliable.
. This is particularly relevant in a setting with bounds on rationality, which is representative of many real-world settings; there is much evidence to suggest that human strategic interaction is best explained from the perspective of bounded rationality (Erev & Roth ), and this is even more important in non-human species (Russell et al. ; Russell & Phelps ). Our findings are therefore also of relevance to biologists who seek to understand ecological interactions such as allo-grooming in terms of a social dilemma, or a "biological market" (Barrett et al. ; Henzi & Barrett ; Newton-Fisher & Lee ). .
Although there is a great deal of research applying methods for detection of direct reciprocity to such interactions, in both non-human and human societies, our model suggests that both direct and indirect reciprocity may both interact, and thus it may be important to develop methods to additionally detect indirect reciprocity in field data. For example, it is a well-known empirical observation that in directed social-networks from human studies, a link from A to B tends to be reciprocated with a link from B to A; see e.g. Rivera et al. ( ). In certain contexts, this type of dyadic link reciprocation in a social-network might be interpreted as evidence for tit-for-tat-like behaviour, i.e. direct reciprocity in an underlying social-dilemma. However, our model suggests indirect reciprocity should play an equally important role, in which case we should look for triadic patterns, e.g. A helps B who helps C who helps A, or more generally cycles in sub-graphs of size n > 2. There is evidence that such triadic patterns do indeed existing in human social networks (Cross et al. ; Rank et al. ). In future work, we will use similar studies to quantitatively validate our model.