Social Norms and the Dominance of Low-Doers

Social norms play a fundamental role in holding groups together. The rationale behind most of them is to coordinate individual actions into a beneficial societal outcome. However, there are cases where pro-social behavior within a community seems, to the contrary, to cause inefficiencies and suboptimal collective outcomes. An explanation for this is that individuals in a society are of different types and their type determines the norm of fairness they adopt. Not all such norms are bound to be beneficial at the societal level. When individuals of different types meet a clash of norms can arise. This, in turn, can determine an advantage for the â€œwrongâ€ type. We show this by a game-theoretic analysis in a very simple setting. To test this result - as well as its possible remedies - we also devise a specific simulation model. Our model is written in NETLOGO and is a first attempt to study our problem within an artificial environment that simulates the evolution of a society over time.


Introduction
. Jon Elster ( ) addresses a fundamental question for both moral philosophers and social scientists: "what is it that glues societies together and prevents them from disintegrating?" A short answer to this question is, according to Elster and many others, "social norms". A norm is social insofar as it is "(a) shared by other people and (b) partly sustained by the approval or disapproval of others" (Elster , p. ). Indeed, in the absence of any norm of behavior that coordinates individual actions together, the tendency to maximize individual utility may naturally lead to disastrous outcomes for everybody, as witnessed by many real-life instances of the tragedy of commons. However, it is also true that some social norms which are perceived as fair by a subgroup -and sometimes by the whole society -may also lead to detrimental collective upshots. Amoral familism is a general notion encompassing many such conducts. Bureaucracies, call centers, public and private providers etc. o en seem to be driven by Kafkian norms promoting ine iciency among its members. The present work aims at providing a rigorous analysis of this problem. The key question is: why and how things may go wrong even when individuals follow social norms?
Section . In particular we show that fair ls s agents fare much better than fair hs s under neutral initial conditions in a totally connected network. Interestingly, only non-selfish high-minded agents (we may call them "heroes" or "saints") can sustain the impact of the ls s. We then study the e ect of some intuitive global policies to promote e iciency by fostering high-quality exchanges among agents. The first one consists of hiring more hs s and the second one is to implement a system of rewards and sanctions. The model we present is quite abstract and, as we shall explain, there are many possible ways it can be refined and detailed in order to cope with more specific scenarios. Despite this, some interesting insights may already be extrapolated from our simulations. We conclude, in Section , by discussing our results, some possible refinements and lines for further inquiry.

A Game-Theoretic Analysis of the Collective Preference for Low-Doing
Building upon Gambetta and Origgi's analysis . We can frame the life of an institution as a series of exchanges among agents, such as for example co-authoring (among authors), teaching and learning (among teacher and students), and paying for a lecture and delivering it (among administration and lecturer). At a very abstract level, such exchanges can be viewed as combinations of two kinds of individual actions: agents may either deliver high quality of a given good (H) or low quality (L). The set A of possible actions in our games is therefore In game-theoretic terms an exchange among n agents is an action profile a , . . . a n of n individual actions of players to n. For simplicity, the analysis is here restricted to exchanges among two agents, i.e. binary action profiles a a such as HH, HL, LL and LH. A specific game is defined when for every action profile a ,a we set an individual payo for both player and player . The payo is determined by a function p i (·) from the set of action profiles to a set V of values. Thus, for example p (H,L) = means that the exchange consisting of player providing H and player providing L has payo for player . For every player, the payo s of the di erent action profiles should form a preferential order ≤. Incidentally, for the purposes of our present work we shall mostly restrict, as Gambetta & Origgi ( ), to scenarios where outcomes form a strict preferential order < . As an example, a possible preferential order for player could be p (HL) < p (LL) < p (HH) < p (LH), i.e. player mostly prefers to provide L and to receive H, as a second option he prefers to provide H and receive H etc. For simplicity we shall also write LL < HH < LH < LL (without indexes) when no confusion is possible. The le most table in Table is a game in strategic form where both player and player have such a preferential order and the same payo s.

.
According to Gambetta & Origgi ( ), there are three possible ways to explain why, in a context such as that of Italian Academia, LL should be the expected outcome and therefore high-doers, i.e. agents who deliver H, end up being at odds. The first option is to postulate that the preferential order for most of the players is HL < LL < HH < LH (le most table in Table ). This option provides an instance of the famous prisoner's dilemma where LL is the only Nash equilibrium. In this case rational agents, while preferring HH exchanges, will end up exchanging LL as a solution of this strategic game. The second possible explanation is that agents may instead have the preference ranking HL < HH < LL < LH, i.e. they eventually prefer sloppiness (LL) to perfectionism (HH). This option is instantiated by the payo s in the central table of Table (again with equal payo s for both players). Here again LL is the only Nash equilibrium and the preference for LL is even stronger. A third possible explanation is depicted in the rightmost table and consists in players having the following preferential ranking: HL < HH < LH < LL, i.e. they prefer to be involved in an LL exchange rather than in an LH one. Using Gambetta and Origgi's words, they are "pro-social" L-doers, as they prefer receiving L over the embarrassment of receiving H and providing L. Here too LL is a Nash equilibrium that makes players better o .

.
All options can explain why individuals end up delivering L. However, only the third and possibly the second option can motivate why someone who delivers L and receives H could complain. These are indeed the only cases where the LL action profile is Pareto optimal, i.e. any deviation from it would make some of the players worse o . LL is the action profile in both the center and rightmost options of Table which maximizes the total  utility, and this explains why it is an expected collaborative outcome.  H   L   H  ,  ,  L  ,  ,   H  L   H  ,  ,  L  ,  ,   H  L   H  ,  ,  L , , Table : Gambetta and Origgi's three possible explanations for low-doing.
The HL framework .
From our perspective, these three explanations are part of a more general scenario. Indeed , Table displays only three of many games in the HL framework, i.e. the class of all the two-player games that can be defined over the actions H and L. It is important to consider this larger class because exchanges need not happen solely between players with the same payo s, nor even with the same preferential order over action profiles. In principle, a society may be populated by individuals with quite di erent individual tastes. To capture such aspect, we define the type of an agent as her preferential order over action profiles (when she is player ). We restrict ourselves to strict preferential orders, where we have ! = di erent types which are listed in Table . Type Preferential order as player Selfishness Mindedness Name non-selfish high hn HL < LH < LL < HH non-selfish high hn LH < HL < HH < LL non-selfish low ln HL < LH < HH < LL non-selfish low ln LH < HH < HL < LL non-selfish low ln HH < LH < HL < LL non-selfish low ln HL < HH < LH < LL non-selfish low ln HH < HL < LH < LL non-selfish low ln LH < HH < LL < HL non-selfish low ln HH < LH < LL < HL non-selfish low ln LH < LL < HH < HL non-selfish high hn LL < LH < HH < HL non-selfish high hn HH < LL < LH < HL non-selfish low ln LL < HH < LH < HL non-selfish high hn Table : Types of players in the HL/framework.

.
We adopt the following conventions (see Table ). We call a player high-minded when its payo for HH is greater than that for LL, and call it low-minded when HH < LL. A player is classified as selfish when the profile LH is on top of her preferences.

.
The full list of possible interactions among di erent types of players is 24 2 = 576. Gambetta and Origgi's first explanation describes a specific game played by two hs players, that are both high-minded and selfish: they prefer to exchange HH over LL (high-mindedness) but nevertheless don't mind (and indeed prefer) trading L in exchange for H (selfishness). Analogously, the second explanation describes a game played by two selfish and low-minded players of the ls type, while the third one describes a game among two ln .
. Having defined all types, we have not yet set the following fundamental question. What kind of strategy are two players, of any type whatsoever, going to play against each other? According to Gambetta and Origgi, both ls and hs seem conjured to play L if they are rational in the game-theoretic sense, i.e. utility-maximizers. They are indeed bound to produce the socially detrimental outcome LL, since the latter is the only Nash equilibrium of the game. Game-theoretic wisdom is arguably not the only thing that dictates peoples' strategies. However, it is fair to claim that such inference is a bit too fast in this context and that fair and pro-social behavior is instead compatible with standard rationality. Indeed, the most important for our analysis is that the life of an institution is a repeated game: co-authoring, teaching and other exchanges of this kind are very likely (or even bound) to be repeated over time. Nash equilibria and rational strategies in such repeated games are usually di erent from those in the one-shot case. As we show in the next section, it is not incompatible for one player to be rational and to play according to a strategy which is dictated by a fair social norm.

Repeated games and social norms .
We shall focus our analysis on two types of player, the hs and the ls . This allows us to analyze an interaction between high-minded and low-minded players where both types are otherwise very similar in their preferences.
They are both selfish insofar as they mostly prefer delivering L and receiving H. Both also mostly dislike delivering H and receiving L. Such agents are arguably likely to be found in a competitive society where individuals are incentivized to participate in many activities (for example improving their CV by publishing, teaching, participating to conferences and research projects) while at the same time economizing their e orts and getting the most out of them. .
For simplicity, we assume that their individual payo s are on the same scale, with numerical values from to , as described in Table (where the numbers represent the payo of player , i.e. the row-player).  Table : Payo tables for the hs type (le ) and ls type (right).
. Let us look closer to the game played among two hs . As mentioned, in an indefinitely (or infinitely) repeated game many equilibria are possible, in contrast with the one-shot case where the Nash equilibrium is LL. As a straightforward consequence of the Folk Theorem, any combination of strategies leading both hs to play H together most of the times leads to a Nash equilibrium for this game. Therefore playing H repeatedly is reasonable in the hs vs hs repeated game. For similar reasons, playing L leads to an equilibrium in the ls vs ls game.
. According to Binmore ( , ), social norms are best seen as a device of equilibrium selection in a society. A social norm somehow dictates an individual strategy s for the players to follow. The combination of such individual strategies should generate an outcome which fulfills three important properties: stability, e iciency and fairness. Of course, in a context with many games and di erent types of players, as the HL-framework, it is likely that no strategy s satisfies these properties for all possible interactions among types, and therefore there is no universal social norm. Nonetheless, some strategy can still satisfy the required properties relative to some specific game among players of the same type t, which is the case in our context. This allows us to endow di erent types with di erent norms and to work on the assumption that agents play fair, at least according to their type-specific norm. .
We define a "t-norm follower" as someone who plays according to a strategy s which, if played by both players in a t vs t game, leads to . a Nash equilibrium (stability) . which is Pareto optimal (e iciency) . provides player with the same payo (fairness) .
In a Nash equilibrium no player has an incentive to deviate and this is an essential prerequisite for a norm to be stable. Pareto optimality means that any deviation from s would make some of the players worse o . Thus, the condition of Pareto optimality encodes e iciency and restricts the set of Nash equilibria a society converges upon. Providing both players with the same payo is a way of encoding fairness in an egalitarian sense, which explains why a norm becomes shared with mutual benefit. We add a fourth condition on the strategy s, namely . The strategy s allows players to minimize their loss and to possibly sanction harmful deviations. (enforcement) .
The latter is a fundamental prerequisite for a norm to be enforced: deviations cause damage to someone else and should therefore be resisted. Enforcement can last only when players can endure deviations and are proactive in sanctioning them. Conditions -are altogether largely endorsed minimal criteria for a player to be a fair norm follower.
. Back to our case, it is not di icult to see that any strategy s that dictates to play H by default and switch to L if deceived a certain number of times (i.e. if the opponent plays L and thereby diminishes my payo ) fulfills the conditions -of a fair norm among hs players. Players sticking to a strategy of this type will be categorized as followers of an hs -norm. The hs -type behavior is very welcome from a societal point of view because it leads to HH exchanges, i.e. to social e iciency. Symmetrically, it is important to remark that a strategy dictating L most of the time and deviating if deceived also enforces a fair norm among ls (we shall call it an ls -norm). The latter clearly does not promote social e iciency, for it induces LL exchanges. .
Since we are interested in studying the impact of social norms, in what follows we assume that hs players follow an hs -norm and ls players follow an ls -norm, i.e. that all types are playing fair (at least according to their type). It is then interesting to see what happens when we face a clash of norms, for example when an hs plays against some ls . It is not di icult to see that the initial outcome of such an exchange will be HL, i.e. very favorable to the ls (ls gets while hs gets ). Repeating the interaction leads the hs to possibly change her strategy and switch to L. The ls has however nothing to sanction: H is not unwelcome to her, since it generates a high payo . Readjustment will therefore lead to LL, which is still more favorable to the ls than to the hs . As a consequence, we can draw the conclusion that a ls who plays fair will end up being better o , in terms of interpersonal welfare comparison than an hs who does the same. This becomes particularly relevant if we assume, as natural, that personal welfare is an indicator of "fitness" and that more wealthy agents endure major chances of staying longer in the system. This assumption will indeed be implemented in our simulation model and will therefore provide an explanation for the dominance of the ls type in a society. The following more general result provides a series of su icient conditions ensuring that a player of a given type is more fit than a player of another type. Such conditions will serve as a useful thread for our simulations. .

Proposition . Let Player follow an hs -norm and Player follow an ls -norm. If the following conditions hold
• p (HL) > p (HL), • p (HL) > p (LH), • p (HL) > p (HH) and • p (LL) > p (LL) then Player 's payo in the repeated game will be higher than Player 's. .
Proof. Since both players follow their norm, the first exchange will be HL and p (HL) > p (HL) by condition (a). Following her strategy, Player will not switch to H since, according to conditions (b) and (c) HL is a better outcome for her than HH or LH. Player may either continue playing H or switch to playing L. In the first case Player gets a higher payo , again because of condition (a). Otherwise the outcome will be LL and Player is again better o than the Player because of condition (d). .
Assuming that the payo s of both player are on the same scale, as in Table , the hs vs ls game is a case in point, for it satisfies the conditions (a)-(d) of Proposition .
. We can make a similar case by framing the situation in terms of an evolutionary game. In Appendix A we show that hs is not an evolutionarily stable strategy against ls . It is also interesting to analyze the replicator dynamics of a game where more than two types are involved. However, the real-life situations we would like to model are di icult to capture by purely analytic means. Given the advantage of playing L and the consequent dominance of LL exchanges, we indeed want to study its consequences in more complex scenarios. Our main concern is the following: how can be an hs -norm enforced externally, e.g. by a policy maker? This is where we need a computer model to run artificial societies. The next Section will introduce to its architecture and its main features.

Building Up an Artificial Society
Assumptions of the model . Briefly described, our model reproduces and runs an artificial society of agents of two di erent types . Our society is intended to reproduce, at a very abstract level, the structure of a private or public organization such as a university, as in the original case study of Gambetta & Origgi ( ). The agents in the artificial society are seen as member of the organization, e.g. academic and non-academic sta in the case of Academia, who are bound to collaborate over time. Collaborations are encoded by interactions among agents. Each interaction among agents generate an individual payo for both of them. The payo is determined by the action profile of the interaction and the type of the agent (as specified by Table ). .
We investigate how social norms impact social e iciency when the society is populated by selfish low-doers (ls type following an ls -norm) and selfish high-doers (hs type following and hs -norm). Proposition shows that the first type of agent has an advantage, in terms of interpersonal welfare comparison, over the second. In our model, this translates into a dominance of the first type in the long run. The model runs on the following main assumptions.
a The agents welfare is determined by their cumulative payo , i.e. the sum of the payo s produced by all the interactions with other agents over time b Agents stay in the society for a limited amount of time (their working life). When they exit the society (retire), they are replaced by another agent whose type is randomly determined. The latter can be interpreted as a hiring procedure.
c Agents can retire earlier when their payo falls under a certain threshold.
d The society has a fixed structure. The structure is determined by links between agents. Interactions among agents can only happen through the links.
e The agents play fair. Each agent follows a strategy fulfilling the conditions -of their type-specific social norm (see Sections . -. ) and never deviates from it.
f Strategies are adaptive. The choice of action takes in consideration previous interactions and is calibrated to minimize loss. Memory and calibration are determined pairwise, i.e. relative to each specific partner the agent is interacting with.
g Agents don't know the type of the agent they are interacting with. This also means that they will not form a theory, on the basis of previous exchanges, about the strategy of the agent they are interacting with.

Description of the model .
In Figure the main interface of the simulation program is presented. Agents are connected by social links which enable mutual exchanges. As time passes the agents get older and, when they reach a given age, retire and are replaced by a new agent. By means of mutual exchanges the agents cumulate an individual payo -determined by their type and the outcome of the interaction -which is meant to represent how satisfied they are in their position. When their satisfaction falls under a certain threshold they decide to pre-retire. Therefore, computing the average time before retirement for a type of agent will give us an insightful measure of how well such a type of agent fares in the society. Importantly, our model also provides the percentage of H-actions as an indicator of social e iciency. In the next paragraphs we present all these features in more detail. In Appendix B we provide the full ODD description of our model (Grimm et al. ) which includes the precise formulations for its most relevant operations. .
Setup. The setup procedure builds a society of n agents, with n being a parameter specified by the user. In this setup agents are of two types. Types are determined by two di erent payo tables that can be reset by the experimenter. The default types for most (but not all!) the experiments are hs and ls . Agents are connected by directed links which enable exchanges among them. The experimenter may choose to arrange links either in a totally connected or a scale-free network. Exchanges happen with a certain probability determined by both the probability of one player to propose an interaction and the probability of the other player to accept it. The maximum number of interactions per agent is also determined by a parameter. The number of exchanges proposed in each round is drawn from a uniform distribution between and the maximum number of interactions. In our first batch of simulations we fixed the probability of accepting an interaction to . and the maximum number of interactions to , so that every agent interacts, on average, with every other agent once every two rounds. .
The clock. The model simulates the evolution of a society over time. Ticks are meant to represent a unit of time (one year). During each tick agents perform exchanges with their connections as determined in the setup features. A er every tick agents get one year older and the system updates. At the beginning agents are assigned a random age between and . Agents reaching the limit age of retire and are replaced with a new agent.
The new agent starts her career with a random age between and and is either a hs or a ls . The type of the new agent is determined according to a probability set by the experimenter (probability-hirehs -type). .

Exchanges and norms.
Available actions for the agents are H and L. As mentioned, agents are of two types. The first type follows a hs -norm and the second type follows an ls -norm (Sections . -. ). In the default payo settings hs -norm followers are hs and ls -norm followers are ls . Each exchange determines a payo for both agents involved. The payo of each agent is calculated according to the given payo table. Payo tables are set by default to the values of Table . The hs -norm dictates the following strategy. The agent starts by playing H with everyone else. She keeps track, for each one of her partners pa, of the cumulative payo she obtains by exchanging with pa. She also keeps track of the optimal cumulative payo she should get with pa. The optimal cumulative payo is the payo she would have gained with a series of fair HH exchanges with another hs -norm follower. When her actual cumulative payo falls by a given amount under the optimal cumulative payo she reconsiders her future action profile towards pa. This amount is fixed by a specific parameter. The procedure of reconsidering runs as follows. First she computes (a) the payo cumulated from the last action switch w.r.t. pa (e.g. from H to L). Then she computes (b) the payo she would have cumulated by playing otherwise with pa (e.g. H instead of L) in the same period. The subtraction of (b) from (a) provides her the balance against pa, let us call it BALANCE(pa). If BALANCE(pa) is positive then the player keeps playing the same action, otherwise she switches. We shall call this an hs -strategy. An hs -strategy fulfills the conditions provided in Section for being an hs -norm: it clearly leads to a Pareto optimal Nash equilibrium among hs players. It also enables sanctioning and minimizing losses thanks to the reconsidering rule. Moreover, the reconsidering procedure keeps track of several factors (see Section ) that are not considered by standard sanctioning strategies such as TIT-FOR-TAT or GRIM. Therefore, it seems more adequate to describe the behavior of a rational player in this kind of situations. ls -norm followers behave analogously. They start by playing L by default and possibly switch to H. The mechanism is the same except that, in this case, a fair exchange is considered to be an LL one (ls -strategy). Here again the ls -strategy enforces a fair social norm among ls agents.
. Pre-retirement rules. Two di erent exit strategies are available for the experimenter before setting up the simulation. The first option is a quantile-based strategy. In this mode the agent calculates her payo at each tick and compares it with that of other agents. If her acquired payo falls within a given percentage (selected as a parameter) of people with the lowest payo , then she decides to pre-retire. The second option is an expectationbased strategy. The latter simply compares the agent's actual cumulative payo with the maximum possible cumulative payo that she could have got from fair past exchanges (i.e. the payo of an HH exchange for an hs , and LL for an ls one). If the actual payo falls below a given ratio of the best possible payo then the agent decides to leave. As an additional parameter the experimenter can make the agents postpone their decision about preretirement by a given number of years.
. Running the model. At each tick the system implements the following procedure: every agent . Proposes a number k of exchanges to each one of its links.
Accepts proposals with a given probability.
Computes her next moves according to her strategy.
Decides whether to stay or to leave.
. At every tick the system calculates the following:

Simulation Results
. First we test our analytic result of Section and check whether the model confirms the analytic result of Proposition , i.e. that selfish low-minded agents (ls ) have an advantage over selfish high-minded ones (hs ). We test this hypothesis both on the totally connected and the scale-free network and the outcomes are quite di erent. The totally connected configuration confirms our analytic result. Furthermore, this advantage is quite robust and holds even by rescaling the payo tables. Indeed, the only way to obtain an equilibrium between types is by transforming the hs s into non-selfish agents of the types hn or hn (see Table ). On the other hand, in the scale-free network the hs s fare as well as the ls s. We conjecture that this result is highly dependent on the low connectivity of the scale-free network (an average of . links per node) , but this should be tested further.
The totally connected and the scale free network with low connectivity are two extreme cases. For most reallife social networks connectivity falls inbetween and, by consequence, in most cases ls s keep an advantage. External policies are therefore needed to improve the situation. Hiring more hs agents (Sections . -. ) and implementing a system of rewards and sanctions (Sections . -. ) are the most intuitive options.

Basic setup .
Our first setup is meant to test the fitness of the hs and ls in a totally connected network under normal conditions, i.e. where there is a fi y-fi y probability that a new agent entering in the society is hs or ls , with the payo tables set as in Table , under a quantile-based preretirement strategy and no additional features specified. With the help of Netlogo's Behavior space we ran a society with , , , and individuals for ticks (a good approximation of the asymptotical behavior of the system) with repetitions, which are su icient to reach a % confidence interval. The change-of-strategy-threshold ranges between , and . The quantile ranges over %, % and %. We therefore get a total of × × × = runs of the simulation. Each run tests the final value of (a) the mean time before retirement for both types of agents We also re-ran some configurations setting the parameter "postpone" to (for postponing the pre-retirement decision), which was set to in the original case. .
Results are quite unambiguous. When "postpone" is set to the final values of the H actions fall steadily below % for all configurations and decrease w.r.t. the size of the society (Figure , le ). The situation improves slightly when we set postpone to (Figure , right) for the simple fact that hs s endure longer. Furthermore, no matter what, the ls agents fare always much better in terms of mean time before retirement than the hs s.

Figure :
Rate of H actions (Y-axis) a er steps for a % quantile-based pre-retirement strategy and "postpone" set to ticks (le subfigure) and ticks (right subfigure). X-axis values represent the number of agents in the society. Di erent line colors are attributed to di erent thresholds for the change-of-strategy parameter. .
We repeated the same experiment on the scale-free configuration and the situation changes significantly. We indeed get an equilibrium between types: the rate of H actions is constantly at % and the mean time before retirement is the same. .
These experiments indicate that, as far as selfish low-minded and selfish high-minded agents are concerned, the former have a big advantage when the connectivity is su iciently high. Of course this happens when the probability of both types being hired is equal and there is no external pressure or incentive by the system to modify their behavior. A provisional conclusion is therefore that, insofar as these conditions can be deemed realistic, the prevalence of low-doing is a natural outcome and low-minded people are in a better position overall. .
To test the robustness of this result in the totally connected network we varied the distances among the payo values for the agents. We tried several combinations and the outcomes confirmed the robustness of the advantage for the ls s. Indeed, one may suspect that insofar as we validate the conditions (a)-(d) of Proposition the outcome will not change much. However, quite surprisingly, the advantage was confirmed even by allowing the hs s to get a higher payo for LL than that of the ls s. The latter corresponds to undermining the condition d) of Proposition . To do so we repeated the first experiment twice -this time with quantiles % and % -by setting the payo tables as in Table . H L H L H L H L Table : Revised payo tables for the hs type (le ) and ls type (right). Here the hs s receive a higher individual payo for LL than that of the ls s.
. In other words, we augmented the distance between the payo of HL and LL for the hs s while keeping the same distances for the ls s. Distances are adjusted so that the minima and the maxima of utility are the same for both types. The results are plotted in Figure and   . Under such conditions, the e iciency of an institution can be sustained if high-minded people are not selfish, we may call them "heroes" or "saints". However, this is not end of the story. Hopefully in actual societies many di erent measures are taken by policy makers, research councils, employers, etc. to improve the e iciency of an institution. Any kind of evaluation, project funding, career incentive and selection process is meant to work in this direction. Such measures are o en combined. The fundamental question to ask is then: what is the most e ective incentive strategy? Answering this question with precision requires much more data and lies far beyond the advancement and the goals of the present work. However, we shall investigate the e ects of two distinct ideal policies to see whether and to what extent they can improve the dramatic situation thus far depicted.
Possible Improvements. Introducing More hs s.
. The most natural policy for keeping up the e iciency of an institution is to try to raise the quality at the outset by a careful selection process of new employees. In our setup this corresponds to increasing the probability of introducing (hiring) an hs as a new agent in the society. The plots in Figure show the result of varying the probability of hiring an hs respectively to % and % under the same setup of the first experiment with "postpone" set to .
Figure : Rate of HH exchanges (Y-axis) a er steps for a % quantile-based pre-retirement strategy with % of probability of hiring HS (le ) and % of probability of hiring HS (right) with "postpone" set to .
. These simulations show that, in order to achieve a sensible improvement, one has to ensure a very high precision in selecting hs agents. Indeed, with just % of probability of hiring an hs the rate of H exchanges falls under % as the society grows (Figure , le ) -and things are worse with "postpone" set to . With a % probability such rate reaches % (Figure , right) but tends nonetheless to decrease as the society grows and falls to % with "postpone" set to . Moreover, even at %, ls agents keep their advantage, since their mean time before pre-retirement is still higher (between and years) than that of the hs agents (around years). .
We repeated these experiments on the scale-free network and the results of Sections . -. are confirmed: with % of probability of hiring an ls the rate of H actions raises to %. Analogously, it raises to % with a % probability. Again, high connectivity makes the situation worse. Furthermore, it is quite challenging for a policy maker or employer to succeed in hiring such a high percentage of high-minded individuals.

Possible improvements. Rewards and sanctions .
Another way of promoting the e iciency of an institution is to implement a system of rewards (promotions, monetary rewards, facilities etc.) and sanctions (firing, fines or any kind of negative reward). In our setting, this amounts to modifying the payo attributed to agents as a consequence of their behavior. The functioning of actual rewarding systems can be quite complex and its consequences on the individual satisfaction very hard to measure. For the purposes of this initial stage of our research we choose to implement an ideal mechanism. Such a mechanism keeps track of every exchange among the agents and rewards them with a probability proportional to the number of high-quality exchanges they have been involved in. Such a rewarding system is idealized insofar as it is perfect in recognizing and rewarding high quality. .

Our mechanism of rewards and sanctions consists of proportional raises and decreases of the agent's payo .
Each agent can be rewarded for the HH exchanges he has taken part in. Analogously he can be sanctioned for the LL exchanges. The frequency with which rewards and sanctions are attributed is an experimental parameter. The amount of the reward (sanction) is calculated every f years and is a percentage of the payo cumulated by the individual during f . Such a percentage is also a parameter. Additionally, the reconsidering procedure of agents now also takes into account whether a certain agent contributed to rewards (sanctions), thus enabling a change in strategy towards this individual. For a detailed description of the reconsider procedure, and the reward procedure, please refer to Appendix B. .
Bringing back the probability of hiring hs s to % we then tried the following experiment. We fix a society of agents with a quantile pre-retirement threshold fixed at %. The change-of-strategy threshold is fixed at . The frequency of rewards and sanctions varies: it is either every tick or every ticks. The reward for HH exchanges varies in the range % and %. The sanction for LL exchanges varies between %, -% and -%.
(Frequency of the reward= . ) Reward for doing HH ex-changes= .
Reward for doing HH ex-changes= .   Table : Varying sanctions and rewards at % hiring probability for hs with a reward frequency of . In parenthesis the % confidence interval.

Fraction of h exchanges
.
The results of this simulation show that in order to achieve a percentage of H exchanges above % we should couple high rewards (above % for HH exchanges) with consistent sanctions (above % for LL exchanges) and with high frequency (Table ). Indeed, as soon as the rewarding frequency is set to ticks all benefits tend to vanish (Table ). .
Finally we also tried to change the probability of hiring hs s to % to check the benefits of a combined policy. Results are presented in Table and . As shown, the situation improves with respect to the equal probability case and seems to be less dependent on the frequency of rewards and sanctions.

(Frequency of the reward= . ) Reward for doing HH ex-changes= .
Reward for doing HH ex-changes= .

Discussion and Conclusions
. The initial input for our work was to analyze and test Origgi and Gambetta's game-theoretic explanations for the odd dynamics and the ine iciency of Italian academia. The first outcome of our analysis (Proposition ) leads to a more general insight. Insofar as the HL framework is a reliable approximation of the "game of life" in certain communities, the advantage of low-doing is an extreme one. We suspect that a similar problem undermines the e iciency and quality standards of many organized groups and institutions. Indeed, assuming that both highminded and low-minded individuals are selfish, and nevertheless play according to social norms in their typespecific way, the low-minded ones have a consistent advantage, in terms of interpersonal welfare comparison, and social e iciency gets radically compromised.
. This incentivized us to test the HL-framework at work in complex artificial societies where the programmer can modularly bring in additional factors in order to represent more articulated scenarios. Our first batch of simulations -where the hs s and ls s are equally likely to enter the system -confirms our analytic result (Proposition ), but only in cases of high connectivity. Furthermore, it shows that, as the society develops, the hs s are even worse o . Therefore, the larger the collectivity the milder the advantage of hiring more high-minded individuals. The only way, under these conditions, to reach an equilibrium between types is to modify the payo settings for the high-minded agents and transform them into "heroes" and "saints" (hn and hn ) which, the reader may agree, is unrealistic for a society to have.
. For a second and third batch of simulations we modified our settings and allowed for external interventionsmeant to model the action of a policy-maker -such as raising the probability of hiring hs agents and introducing a system of rewards and sanctions. Under external influence the system can improve its e iciency to higher standards. There is, however, an argument for claiming that both strategies are quite expensive for a policy-maker. The best combination of strategies is still unknown and further policies are to be tested. Another problematic point is that both kinds of external influence are implemented by an ideal policy-maker, i.e. someone that has full knowledge of the state of the system and can distinguish perfectly high quality from low quality. Both features are quite unrealistic. Intuition tells instead that evaluations and decisions are o en made under partial ignorance and are also biased by the evaluator's preference: only in ideal cases one may hope to have an expert evaluation committee composed only by high-minded and totally knowledgeable members. This problem is a version of the dilemma quis custodies ipsos custodies. In future research we aim to study, within our model, the e ects of more realistic and fallible systems of rewards. Other interesting venues consist of studying more specific instances of our game and of using our model to test the e ect of di erent social norms in action. .
We conclude with some considerations on the case of Italian academia which inspired our work. Structural problems of the Italian research system are one major cause of its "brain drain". The latter is a constant object of debate in media, newspaper articles and books (Di Giorgio ) as well as articles in scientific journals (Abbott ; Battiston ; Burr ; Morano-Foadi ). Individual reports of Italian researchers working abroad stress, as the major "push" factors to leave Italian academia, the absence of meritocracy, nepotism, the baron system and, last but not least, the scarcity of funding (Morano-Foadi ). In one sense nepotism and the baron system are based on a strong link of loyalty between parties: the professor promotes the careers and secures the position of those who collaborated with him (or her) over several years and adapted to their standards. Arguably, the scarcity of investments and resources is likely to emphasize the dominance of a group when this is already in place: the few positions and allocated funds are to be secured to the members of the group who abide by the law (Morano-Foadi ). By consequence, di erent cultures or clusters are unlikely to emerge in the system. To this we must add a long term resilience to external evaluation in many sectors (Bo o & Moscati ). All these causes are complex to disentangle and some of them seem to be correlated. Our analysis provides a general clue to understand how they come together. Academia, as well as other institutions, is made of di erent "cultures", i.e. groups holding to di erent standards of e iciency and cooperation. Each of these cultures is likely to promote a di erent ingroup social norm. Deviations from the ingroup norm tend to be resisted and outsiders are o en sanctioned by other ingroup members, e.g. in the ways documented by Gambetta and Origgi. Sanctioning becomes marginalization when the ingroup culture is largely dominant in the society. However such dominance need not be the outcome of an evil global plan, for it may only reflect a natural tendency of the system. It is easy to check that neither (a) nor (b) is the case and therefore hs can be invaded by an arbitrarily small population of ls .

Appendix B: ODD protocol Overview Purpose
The purpose of the model is to understand how the social norms followed by two di erent types of individuals a ect the e iciency of an institution and determine the dominance of one type over the other. Individuals are either low-minded or high-minded. Low-minded individuals prefer collaborating with others providing low quality of a given good and receiving low quality over providing high quality and receiving high quality. The contrary holds for high-minded individuals. Both types of individuals are pro-social in the sense that their attitude is fair, stable and e icient when they collaborate with individuals of the same type.

State variables and scales
The model comprises a single level of entities, individuals and links among them. Individuals are characterized by state variables that determine their identity (one of two types), age and payo (per single time unit and cumulative) and a number of variables used to store information about agents they are linked to. Links enable collaborations among two individuals and are characterized by state variables that determine the number of collaborations to propose (from one individual to the other), the probability of accepting a collaboration and the number of collaborations in common. Global variables are used to account for the payo tables of the individuals as well as a number of statistics. A complete list of the variables employed is reported in Table . Variable Description Agent variables type-of-academic LL or HH etc.

number-of-collaborations
Number of total collaborations age Age in ticks age-hired Age at which the agent was hired total-payo Total payo earned so far payo -this-year Payo only for this year payo -between-rewards Maps payo between two reward times di erence-from-optimal-per-agent Maps other agents to the di erence from the optimal payo cumulated so far. It Is reset each time the agent changes strategy my-id Agent ID type-of-exchange-to-number-of-exchanges Maps the type of exchanges to the number of exchanges of that type type-of-exchange-to-ids Maps the type of exchanges to ids (not necessarily unique) -It is reset every reward period is done.

payo -per-type-of-exchange
Maps the type of exchange to the payo gained doing that type of exchange payo -per-type-of-exchange-between-rewards Maps the type of exchange to the payo gained doing that type of exchange between two reward times percentage-of-payo -per-type-of-exchange Maps the type of exchange to the percentage of the payo gained by doing that type of exchange real-exchanges-to-id Maps ids to the map (exchanges -> number of exchanges of that type). It Is reset per agent when a reconsideration is done.

saldo-per-agent
Maps the other agents to the saldo so far counterfactual-payo -between-rewards Accounts for the payo the agent would have gained by exerting the counterfactual strategy between reward times.

counterfactual-payo -per-type-of-exchange-betweenrewards
Accounts for the payo the agent would have gained by exerting the counterfactual strategy between reward times, subdivided by type of exchange.

delay-counter
Time before executing the first change Link variables probability-of-acceptance Probability of accepting a collaboration collabs-to-propose Number of collaborations to propose per link collabs-in-common Number of collaborations in common Global variables types-of-academics LL HH etc.

color-of-types
Color of LL HH etc.

age-of-retirement
Maximum age to exit academia age-min Minimum age to enter academia payo -tables Maps type of academic to payo tables payo -

Submodels
Rewards: The mechanism works as follows. We call A the set of agents active between two rewarding times. For each p ∈ A, let N p xx be the number of exchanges of type XX (e.g. HH) between two rewarding times of agent p. Let P p xx the probability of getting a reward (resp. sanction) per exchange of type XX for agent a between two rewarding times. We calculate the latter as N p XX max p ∈A N p XX , i.e. the ratio between the XX exchanges performed by the agent between two rewarding times, and the maximum of XX exchanges of all the agents active between those rewarding times. The jest behind this choice is to give a higher reward to the relatively more e icient agents as it should happen in a fair and proportional rewarding system.

Reconsider:
Given the presence of rewards and sanctions, we consider it to be natural that agents should weigh them when reconsidering whether or not to change their actions towards a given opponent p. When the agent reconsiders, she takes into account how much she gained by pursuing her actions from her last reconsideration and how much she would have gained, by pursuing the opposite actions. More formally, if the real actions before the last reconsideration are a = {a i , a i+1 , . . . , a k }, the opposite actions would beá = {á i ,á i+1 , . . . ,á k }, where the bar operator denotes the opposite of the non-barred action, e.g.H = L.
The agent computes the payo she has gained from her last reconsideration u j (a), and the payo she would have gained pursuing the opposite actions u j (á) without taking into account rewards or sanctions. Then she computes the expected reward/sanction she would gain if there would be given rewards and sanctions at this moment, due solely to her opponent, given her actions E[r j (a)], and the expected reward/sanction she would have gained by pursuing the opposite actions, if there would be given rewards and sanctions at this moment, due solely to her opponent E[r j (á)]. She then computes u j (a) + E[r j (a)] − u j (á) − E[r j (á)] and if this sum is less than , she plays the opposite of the last action she played.

Notes
The tragedy of commons is a social dilemma introduced and explored by the economist G. Hardin (Hardin ). Hardin's analysis originates from the insight of W.F. Lloyd (Loyd ). Lloyd analyses the possible e ects of unregulated grazing over common fields. He remarks that, when the individual advantage of a certain extent of free-riding, e.g. grazing some sheep in a common parcel for cows, overrides the loss induced by spoiling the common good then individuals may easily destroy the shared resource.
The term "amoral familism" was coined by E. Banfield in his book The moral basis of a backward society (Banfield ). Amoral familism amounts to the maxim "Maximize the material, short-run advantage of the nuclear family; assume that all others will do likewise". Banfield employs it to explain the backwardness of certain rural societies in southern Italy. Backwardness would have been very di icult to understand otherwise, especially in the light of the welfare of other rural communities with similar initial conditions. As witnessed by many reports of Italian researchers in Italy and abroad, misbehavior is not only deeply rooted but also largely justified in Italian Academia. For example, the authors illustrate several cases of public debate over complaints of plagiarism where most of the arguments go in defense of plagiarists and against whistle-blowers or victims. We investigate a phenomenon which we have experienced as common when dealing with an assortment of Italian public and private institutions: people promise to exchange high quality goods and services (H), but then something goes wrong and the quality delivered is lower than promised (L). While this is perceived as 'cheating' by outsiders, insiders seem not only to adapt but to rely on this outcome. They do not resent low quality exchanges, in fact they seem to resent high quality ones, and are inclined to put pressure on or avoid dealing with agents who deliver high quality. (Gambetta & Origgi , p. ) Three such conditions encode stability, e iciency and fairness, as specified by Binmore (  ,  ,  ). Alternative game-theoretic accounts of normativity have been recently formulated -e.g. Bicchieri ( , ) and Gintis ( , ) -which may not fully agree with one another, as shown by Paternotte & Grose ( ). It is not among our present purposes to give a fully developed definition of the notion of social norm nor to discuss the pros and cons of the competing approaches. However, the minimal conditions we set for agents to be norm-followers are agreed upon by all mainstream accounts of normativity.

As explained by Gambetta and Origgi (
), this is a strong simplification (maybe too strong in those contexts where the quantities of deliverable goods are finely scalable) but good enough for the points we want to make. Alternatively one might read H as "work in accordance with the best of the agent's abilities" and L as the negation of H. This is, again, a simplification leaving out some specific types of exchanges such as coauthoring among more than two individuals.
We remark however that our model of Section is not bound to such restriction.
Here player is the row player and player is the column player and the payo s of both are associated to each of the four possible outcomes (the ones on the le for player and the ones on the right for player ).
A Nash equilibrium is an action profile for which no player has an incentive to be the only one who deviates. We can easily check that this is the case for LL in the le most table. Indeed, both players get and would get less (namely ) by playing H instead. On the contrary, HH is not a Nash equilibrium, although providing a better payo for both players, because anyone would be better o by being the one to play L, i.e. she would get instead of .
The full HL framework has a larger cardinality. This restriction excludes for example the well-known Hi-Lo game where the preferences for both players are HL = LH < LL < HH.
Among other things, experimental evidence on actual social groups shows that agents most o en avoid "rational moves" that are perceived as unfair, e.g. when they are put in strategic settings like the Ultimatum game (see Sanfey et al. ). Such results are usually interpreted as demonstrating a deeply rooted sense of normativity or sociality in human agents, which should partly compromise with their game-theoretic wisdom.
The payo of both players in this case dominates the minmax profile (LL) and this is a su icient condition for it to be a Nash equilibrium in the repeated game. Moreover, if the player gives a su iciently high value to the payo earned in future exchanges with respect to the payo of his next move, then the former condition becomes also a necessary one. See Fudenberg & Tirole ( ).
There are many alternative ways of encoding fairness in philosophical literature. J. Rawls ( ) is probably the most prominent contemporary defender of an egalitarian conception of fairness. On the contrary, Harsanyi ( ) defends a utilitarian reading of it, i.e. fairness as maximization of total utility. We don't take a stand in this discussion here. We only point out that, in our specific case, where both players have the same scaling of individual payo s, the t-norms we will introduce turn out to be fair both in an egalitarian and utilitarian sense.
They surely are in the game-theoretic framework of Binmore ( ). Prima facie it seems that this is not the case for other accounts of social norms such as for Gintis ( , , ) and Bicchieri ( , ), as stressed by Paternotte & Grose ( ). For example, Gintis sees social norms as "choreographers", inducing a correlated equilibrium on a game G. The latter becomes the central notion instead of that of Nash equilibrium. However, a correlated equilibrium on a game G is still described as a Nash equilibrium of a larger game G + . To some extent, this is analogous to what happens when one transforms a one-shot game into a repeated one. In general, most of the examples provided by both Bicchieri and Gintis -are also satisfied.
We shall expand more on this point in Section .
The study of replicator dynamics for the HL framework is a very interesting subject in itself but it would carry us too far away from our present concern. We therefore leave it for future work.
In Sections . -. we perform experiments with other type of players, namely, hn and hn .
The nature of an exchange is le unspecified at this stage. While totally connected networks approximate situations of full collaboration within a group, scale-free networks are best suited to capture, e.g., social dynamics as co-authoring, where some agents are more connected than others and a power-law structure applies. Modelling more specific dynamics may require a more sophisticated structure for the social network.
Both the hs -strategy and the ls -strategy allow agents to change their action a er some point to minimize their loss. However, in Section we shall introduce some modifications (sanctions and rewards) that push agents to further considerations and allow them to possibly flip their action back in order to improve their payo .
The kind of strategy described is indeed very close to the no-regret-learning method (see Hannan ; Hart & Mas-Colell ).
Making the agents postpone their decision about pre-retirement by a su icient number of years immunizes them (and the outcome of an experiment) from some unwanted initial side e ects, i.e. it helps the steady state not to be dependent on the transient states (see Section ). o generate a scale-free network here, we used the Albert & Barabási ( ) algorithm. It generates a scalefree network with an exponent of . If N is the number of nodes, the average connections a node has is N k=1 k −α+1 . It quickly converges to about . as N increases.
Di erent thresholds stand for di erent levels of adherence to a norm. If the threshold is set to then the agent will revise her future action profile immediately a er the first loss. On the other hand, when the threshold is set to , the agent will revise only a er losing units w.r.t. the maximum payo given by fair exchanges. In our setup agents exchange in average once every two ticks. This means that hs agents, by losing units each time, will wait in average ticks before reconsidering their action.
The average time before retirement for the hs s ranges between and with postpone and between and with postpone . Instead, ls s retire a er around -ticks with postpone and -with postpone .
We tried a lower quantile threshold to make exit conditions quite relaxed for the agents. Indeed, repeated trials showed that a er some point the ls s start to take over numerically. We therefore wanted to allow the hs s more chances to survive longer in the game.
Anecdotal evidence may give a hint for this result. The authors have experienced various examples of zealous functionaries who, working alone and following the call of duty, contribute to keep up dysfunctional institutions to an acceptable level of e iciency.
The agent spreads the value of her reward or sanctions for exchanges of type XX (i.e. HH or LL) over all the agents active in exchanges of type XX. This action counteracts the change of strategy triggered in order to contain the losses. We have seen that an hs agent will play L against an ls agent when the loss due to playing H falls beyond the change of strategy threshold. However, if the sanction due to playing L imbalances the loss, the hs agent will go back playing H even with ls agents.
The scarce allocation of funds to research and development is most likely to be a major cause of ine iciency. According to most recent data provided by the Italian agency for the evaluation of the university and research system (ANVUR), the percentage of GDP allocated by the Italian government to research and development is . % over the years -, where the average of the OECD countries is . %.
The so-called Baroni in Italian Academia are representatives of a system of specific hierarchical relationships between professors and assistants, which was dominant and is still present in certain areas of Academia in many countries.
If we have a discrete probability distribution P r{x = i} in support of a domain D, then the expected value will be E[x] = i∈D i * P r{x = i}.