Cristiano Castelfranchi, Rosaria Conte and Mario Paolucci: Normative reputation and the costs of compliance

Cristiano Castelfranchi, Rosaria Conte and Mario Paolucci (1998)

Normative reputation and the costs of compliance

Journal of Artificial Societies and Social Simulation vol. 1, no. 3, <https://www.jasss.org/1/3/3.html>

To cite articles published in the Journal of Artificial Societies and Social Simulation, please reference the above information and include paragraph numbers if necessary

Received: 28-Nov-97 Accepted: 12-Jun-98 Published: 30-Jun-98

Abstract

In this paper, the role of normative reputation in reducing the costs of complying with norms will be explored. In previous simulations (Conte & Castelfranchi 1995), in contrast to a traditional view of norms as means for increasing co-ordination among agents, the effects of normative and non-normative strategies in the control of aggression among agents in a common environment was confronted. Normative strategies were found to reduce aggression to a much greater extent than non-normative strategies, and also to afford the highest average strength and the lowest polarisation of strength among the agents. The present study explores the effects of the interaction between populations following different criteria for aggression control. In such a situation the normative agents alone bear the cost of norms, due to their less aggressive behaviour, while other agents benefit from their presence. Equity is then restored by raising the cost of aggression through the introduction of agents' reputation. This allows normative agents to avoid respecting the cheaters' private property, and to impose a price for transgression. The relevance of knowledge communication is then emphasised by allowing neighbour normative agents to communicate. In particular, the spreading of agents' reputation via communication allows normative agents to co-operate without deliberation at the expense of non-normative agents, thereby redistributing the costs of normative strategies.

Keywords:

norms, reputation, compliance

Open Questions In The Computational Study Of Norms

1.1

The computational study of norms through simulation is on the increase. While in previous studies (Shoham & Tenneholtz, 1992a; 1992b), norms were essentially shown to improve co-ordination in multi-agent systems, other functions of norms have been more recently addressed, for instance, their role in controlling aggression (Conte & Castelfranchi, 1995; Walker & Wooldridge, 1995). In these studies, the spreading of norms is usually accounted for in terms of the imitation of successful strategies. However, in real life, norms do not only spread through imitation. Their diffusion is also due to (a) a recognition of norms as such, and (b) an active defence of norms on the part of norm-observers. We will call this behaviour normative influencing. Existing theories of the emergence of norms (be they computational or merely formal-theoretical) have usually focused on the behaviour of the agent who undergoes the external pressures to comply. But some social scientists (e.g., Heckathorn, 1990) have drawn attention to the complementary side of the social control, that is, on the agent who exercises such pressures. Let us see this in some detail.

Related Work On Normative Influencing

1.2

In social psychological terms, agents cast their behaviours on the behaviours of other agents assumed as social "models" (Bandura, 1977). More precisely, according to Homans (1951; 1974), agents comply with intra-group obligations in order to obtain the approval they are in need of. In his theory of compliance in cohesive groups, Homans (1974) argues that the higher the agents' cohesion, the higher their need for social "approval", and, consequently, the higher their compliance with intra-groups obligations. But as some authors (Macy and Flache, 1995; Flache, 1996) seem to suggest, the effect of a strong need for approval may weaken, under given conditions, people's compliance with obligations internal to the group. Furthermore, agents are also actors of social influence, and not only vectors and targets of it. They may influence others by means of informal sanctions, or they may indirectly put pressure on others by means of their actions.

1.3

But why do people exercise social control ? Heckathorn refers to the agents sanctioning those who disobey the norms as the "intra-group sanctioning system" (1990, p. 368), that is, people not only conform to others' behaviours, but they also want others to obey the norms. They even urge others to do so. The question is, of course, why agents exercise this type of influence on their "peers". Self-interested agents are expected to do precisely the opposite, that is, leave this burden to others (Coleman, 1990). Oliver (1980) observed that social control may be defined as the "second-order free-riding problem". In terms of the rational theory of collective action, norm compliance is seen as a public good because its benefits can be enjoyed by all members regardless of their contribution to its provision. Therefore, when deciding whether to comply or not with a norm, each agent faces not only the choice of whether to comply with that norm or not, but also a second order choice, that is whether or not to urge other people to do so (Heckathorn, 1990). In other words, agents are expected, even obliged to urge others to obey the norms. Full compliance is achieved only when an agent chooses to comply at both levels. As with any other form of "co-operation", normative influencing is then explained in terms of an iterated n-person prisoner's dilemma game. Factors accounting for the first order choice (to comply or not) are called for in order to explain also the second order choice (to urge others to obey the norms or not): the agent calculates what the chances are that she will enjoy a share of the public good (the application of the norm) without co-operating at any level to achieve it (by neither obeying the norm nor urging others to obey). Depending on the value of the expected chances, the agent decides whether to co-operate at the first, at the second level or at both. More specifically, within this view, agents are supposed to co-operate as long as they expect to have some chances to interact in the future with the same agents. This is so because future interactions may lead the agent to expect their current co-operation to be reciprocated (or, which is the same in the game-theoretic framework, not punished). The application of such a recursive argument leads into infinite regression. If normative influencing serves to avoid being punished, then there is a social prescription that normative influencing occurs. Therefore, a third order collective problem arises, namely that which accounts for influencing others to influence others to comply with norms, and so on and so forth ad lib.

1.4

But if normative influencing is not to be explained in terms of avoidance of punishment, how else can it be explained? Macy and Flache (1995; cf. Flache, 1996) have attempted an answer to this question in terms of a theory of social approval. Social approval is defined by the authors as an "abstract social commodity", that is, a resource that anyone is in search of and at the same time can dispense. Therefore, it is an ideal commodity for social exchanges. Agents will give approval in order to obtain it. On the grounds of such a conceptualisation of social approval, it should be possible to explain social control both from the side of the controlled agents and from the side of the controllers. However,

this theory does not account for social control as a specific phenomenon, different from, say, personal relationships. A married lady will probably approve of her husband coming home late at night in order to obtain his approval (prevent his disapproval) when she stays out all day long. Analogously, I will not say a word against my colleague's flexible concept of work time in order to prevent her from raising objections against my frequent absences from work. While exchange of approval is sufficient to account for colleagues' and spouses' reciprocal complaisance, it does not account for normative approval. For the latter, one further mechanism at least is necessary, namely agents' recognition and representation of the behaviour under approval as ruled by a norm.
The hypothesis of approval exchange does not easily account for disapproval, which is the most direct and common form of social control. While it is reasonable to expect that agents give approval in order to obtain it, why should they disapprove of others? As one can see, the hypothesis in question depicts a society of complaisant, rather than compliant, agents.

1.5

In this paper, we are not going to provide a complete answer to the crucial issue of why self-interested agents exercise social control. However, we will start tackling some aspects of it. In particular, we will try to explore the effects of a specific mechanism of normative influencing, i.e. the spreading of normative reputation, on the re-distribution of the costs of normative compliance. The assumption underlying this paper is that the decision to comply with norms in mixed populations, where norm-observers may happen to interact with cheaters, is strongly disadvantageous for norm-observers. The latter are indeed bound to support all the costs of normative regulation, unless they resort to some mechanisms of cost re-distribution. The hypothesis put forward here is that the spreading of normative reputation, under certain conditions, allows the costs of normative compliance to be re-distributed over the whole population. When urging others to obey the norms, agents co-operate with the norms, but this is a not necessarily intended consequence of communication about normative reputation. This communication allows compliant agents to easily acquire preventive information, sparing them the costs of direct confrontations with cheaters. By spreading the news that some guys cheat, the good guys (a) protect themselves, (b) at the same time punish the cheaters, and possibly (c) exercise an indirect influence on the bad guys to obey the norm. Social control is therefore explained as an indirect effect of a "reciprocal altruism" of knowledge. Some authors have provided a game-theoretical interpretation (Julstrom, 1997) of the theory of reciprocal altruism, in which the TIT-FOR-TAT strategy (co-operate until you are cheated; cf. Axelrod, 1987), is interpreted as corresponding to the notion of reciprocal altruism. But while the TIT-FOR-TAT strategy applies to the same opponent, reciprocal altruism is intended to explain the dislocation of reciprocity to organisms which have not been met in the past. How is this possible? In line with Trivers (1971), we put forward the hypothesis that the spread of norms over a given population is achieved thanks to the spread of information about normative reputation. Compliant agents co-operate among themselves by spreading information about the reputation of other agents. Agents might deliver important preventive knowledge to members of their group (say, other good guys) because they may receive from them the same type of information. Normative influencing will not be explained as a means for avoiding punishment, but more intuitively as a means for punishing others, thus restoring or increasing equity. Let us see how.

Aims And Organisation Of The Paper

2.1

In this paper, we will present an experimental simulation of the role of normative reputation in re-distributing the costs of norm compliance among agents in a mixed population including norm-observers and cheaters. A normative mechanism of aggression control will be confronted with a non-normative one within the same population. Two sub-populations, the Strategic and the Normative, are implemented in a 2-dimensional finite world with scarce food randomly scattered. Each sub-population is defined according to a given mechanism of aggression control: strategic agents attack only non-stronger (equally strong or weaker) agents, while normative agents observe a sort of finders-keepers precept: they do not attack "eaters" who have found some food within their territories at the onset of the experiment. The inclusion of reputation in the algorithm will allow the normative agents to identify the cheaters. Differences between these two mechanisms will be measured in terms of both average strength and polarisation of strength within each sub-population. This study proceeds from a previous one (Conte & Castelfranchi 1995) about the effects of norms controlling aggression in social groups. There, the normative mechanism of aggression control had been found to both increase the average strength and strongly diminish the polarisation (variance) of strength within the same population. However, in that study, the two mechanisms were not allowed to co-exist. In the present study, we want to observe the effects of these two strategies competing within one and the same population in order to: (a) explore more realistically the costs of norm compliance; therefore, (b) find out the circumstances in which norms are likely to survive and spread; and, finally, (c) explore through simulation the role of normative reputation. A secondary objective of this work is to observe the effects of norms in populations of agents endowed with some normative beliefs. More specifically, we are interested in looking at the contribution of cognitive, or quasi-cognitive mechanisms to social, namely normative, regulation. Is cognition only a mere obstacle to the functioning of norms, or should it be taken as an instrument implementing external forces, that is, supporting their action? Our claim is that cognition plays such a role, and that the mechanism of normative reputation is an example of this. Normative reputation supports normative regulation in two possible ways: (a) it allows the costs of norm-compliance to be sustained also by cheaters, thus re-distributing them in a more equitable way within the same population; (b) it accounts, at least to some extent, for the phenomenon of normative influence. Gossip and the spreading of reputation represent one important instrument for intra-group control.

2.2

The paper is organised as follows:

a short description of the previous study and its findings will be provided. The experiment has been repeated in order to control the role of some parameters, and will be shown here for ease of comparison. The previous results have been confirmed and are now more accurate.
The new study will be presented as consisting of two sets of experiments: the former set is aimed at confronting the normative and non-normative strategies of aggression control in a mixed population; the latter is aimed at exploring the role of normative reputation, and its spreading, in the re-distribution of the costs of norm compliance.
Finally, some conclusions will be drawn, and suggestions for further studies will be discussed.

Glossary

3.1

In this section, a vocabulary for the comprehension of the model presented below is provided.

Agents. In the present context, agents are roughly defined as endowed with: (a) knowledge, (b) some elementary rules of action, and (c) an initially equal energetic level which we call strength (varying over time as a direct function of the quantity of food eaten and as a reverse function of the attempts to obtain it). An agent operates transitions between states of the simulated world (e.g., "eat food") on the grounds of previous knowledge about others (their strength and, possibly, their reputation) or the world (e.g., location of food) and by means of simple production rules (e.g., if the eater is stronger than you, then do not attack). In our simulation models, agents share some features of cellular automata in that they are situated in a two-dimensional torus (a grid); however, they differ from automata in that they are not homogeneous and have quasi-cognitive features. Finally, agents learn via communication and previous experience.
Actions. Actions are defined as transitions between states of the world operated by agents in the way described above; actions are performed upon verified conditions, and produce results in the simulated world. In previous studies, the following types of action were possible: move to the source of food seen (one cell away from the agent's location) or smelled (two cells away from agent's location), eat food, attack another agent which is eating food, stay (if moves are physically impossible or inconvenient) and do nothing. In the present study, a further action is implemented, namely exchange of information about others' reputation. Actions have different but permanent costs; therefore they are ranked according to the following preference order, eat = exchange < stay < move < attack, where "<" stands for "is preferred over".
Mental objects (states, intentions, rules). In a general sense, these are internal representations upon which agents' behaviours are based and which agents can, in principle, manipulate when recording, communicating, planning, etc.. In the context of the present simulation model, where the agents' internal architecture is still rudimentary, mental objects are segments of the algorithm, in which logistic and social information are conditions for the application of given routines.
Routines (normative, strategic, etc.). In our simulation model, aggression is not deterministic; depending on built-in routines and knowledge, agents may decide to attack one another. In previous studies, three routines were available, blind ("attack an eater to get its food, unless free food is available at a lower cost"); strategic ("attack an eater whenever you perceive it as no stronger than you, unless free food is available at a lower cost"); normative ("attack an eater unless the food item being eaten is marked as 'owned' by that agent at the onset of simulation"). At the beginning of the simulation, possession of food is randomly attributed; food is scarce but self-replenishing (eaten food items randomly reappear in the simulated world). In the new study, only the normative and strategic routines have been compared.
Social knowledge. This is here intended as agents' knowledge about properties of the others. In previous studies, knowledge was always learned by agents through perception (others' strength is perceived and recorded only within a limited spatial (one-cell) distance), or personal experience (if I am a food-owner and you attack me you are a cheater), or via exchange of information (if a normative meets an agent known to be a normative, they exchange their knowledge about others' reputation).
Norm. (of aggression control). A norm is here viewed as a special type of routine of aggression; it differs from the others because it does not obtain any immediate advantage for the executor. The specific norm implemented in our model is a sort of "finder-keeper" precept: as they appear on the grid, food items are "assigned" to the closest agents, and are flagged by them until consumption. When a food item is eaten, a new one randomly reappears on the grid. As a consequence, food possession may be lost and gained by agents several times during the same simulation experiment.
Aggression. This is a type of action. It is merely physical (steal a food item) and involves two or more agents (an eater may be simultaneously attacked by several agents).
Cheat (cheater, cheating). This is one possible result of aggression. A strategic agent attacking a food-owner is a cheater. More generally, strategic agents are cheaters since they do not respect the norm of possession.
Aggression control. This is here intended as a mere reduction of the number of physical attacks; however, since attacks are costly for all the agents involved, a reduction of attacks allows for an increase in the average "strength" of the overall population.
Normative reputation. This is viewed here as the knowledge others have of one's routine of aggression. There can be only two types of reputation, cheater and compliant. Agents who are observed to comply with the "finder-keeper" precept are recorded as compliant; those who are observed to violate it are recorded as cheaters.
Communication concerns only knowledge about reputation; when meeting, normative agents exchange with each other their information about others' normative reputation.

The Previous Study: What's The Use Of Norms?

4.1

In the Artificial Intelligence literature a norm is treated as a behavioural constraint, that is, a reduction of the action repertoire and therefore of the actions physically available to the system. According to this view, norms allow co-ordination within social systems, the latter being defined as groups of jointly acting agents (Shoham & Tenneholtz 1992a; 1992b; Moses & Tenneholtz 1992). This view is insufficient because it accounts exclusively for what are known as norms of co-ordination (Schelling 1960; Lewis 1969). The function of these norms is essentially that of permitting or improving co-ordination among participants. However, we claim that norms of co-ordination are only a sub-set of possible norms. For example, beside head-on collisions, norms can also reduce aggression. One specific function of norms is to control and reduce aggression among agents in a common world, that is, in a world where one's actions have consequences for another's achievements. In our simulations (Conte & Castelfranchi 1995), agents performed some elementary routines for surviving in a situation of food scarcity. A utilitarian strategy was compared with a normative one and the differences were observed and measured on several indicators (rate of aggression, average strength, and its variance).

The Experimental Design

4.2

The program, implemented in C++, defines agents as objects moving in a two-dimensional common world (a 10 x 10 grid) with randomly scattered food. An experiment consists of a set of matches, each including a fixed number of turns. At the beginning of a match, agents and food items are assigned locations at random. A location is a cell in the grid. The same cell cannot contain more than one object at a time (except when an agent is eating). The agents move through the grid in search of food, stopping to eat when they find it. The agents can be attacked only when eating: no other type of aggression is allowed. At the beginning of each turn, every agent selects an action from the six available routines: EAT, MOVE-TO-FOOD-SEEN, MOVE-TO-FOOD-SMELLED, AGGRESS, MOVE-RANDOM, and PAUSE. Actions are supposed to be simultaneous and time consuming. The most convenient choice for an agent is EAT. Eating begins at a given turn and may end two turns later if it isn't interrupted by aggression. To simplify matters, the eater's strength changes only when eating has been completed. Therefore, while the action of eating is gradual (to give players the chance of attacking each other), both the food's nutritional value and the eater's strength change instantaneously. When a food item has been consumed, it is immediately restored at a randomly chosen location. This is in order to allow for all agents to survive and obtain comparable results from all simulations. The agent will then look for unoccupied food items within its "territory" (consisting of the four cells to which an agent can move in one step from its current location), choosing MOVE-TO-FOOD-SEEN if any is found. If not, it will smell within its neighbourhood (extending two steps in each direction from the agents' current location), in order to choose MOVE-TO-FOOD-SMELLED; the agent does not know whether this food location will be occupied or not, because agents can detect one another only within their "territory". At this point, the agent will take into consideration AGGRESSing against any other eating neighbour, weighting this option with its own norm affiliation. The outcomes of an attack are determined by the agents' respective strengths (the stronger agent is always the winner). When the competitors are equally strong, the defender is the winner. The cost of aggression is equal to the cost of receiving aggression. However, winners obtain the contested food item. Agents might be attacked by more than one agent at a time, in which case the victim's cost is multiplied by the number of aggressors. In the case of multiple aggression, only the strongest attacker carries out the attack, while the others must pass. If none of the above actions are possible, the agent is left with the sad options of MOVE-RANDOM or PAUSE if no close cell is free. Matches consisted of 2000 time steps, and included 50 agents with a default strength of 40, plus 25 food items with a nutritive value of 20. The costs are 0 for pausing, 1 for moving to an adjacent cell, 4 for attacking or receiving attacks.

4.3

The experiment was designed to investigate the role of norms in the control of aggression. To do this, situations in which agents follow norms were compared with identical situations where they follow utilitarian rules. In other words, a normative routine was compared with a strategic and a "blind" one. However, given the extreme naiveté of the modelling of agents, norms could not be distinguished from other rules in terms of their implementation. The difference between norms and rules is only factual: a normative strategy is one which may be disadvantageous to the agent that applies it, and would therefore be discarded, could that agent decide in strictly utilitarian terms. Three conditions were compared:

Blind or savage aggression (B), in which aggression is constrained only by personal utility, with no reference to the eaters' strength. Agents attack eaters when the cost of alternative action, if any, is higher. They are aware of neither their personal nor the eaters' strengths.
Strategic (S): in this condition, aggression is constrained by strategic reasoning. Agents will only attack those eaters whose strength is not higher than their own. An eater's strength is visible one step away from the agent's current location.
Normative (N): in this condition, norms are introduced. A norm of precedence to finders was implemented: finders become possessors of food. At the onset of a match, agents are randomly allocated on the grid and are assigned those food items which happen to fall into their own territories (one step away in each direction from their location). Possession of food is ascribed to an agent on the grounds of spatial vicinity. Food possessed is flagged and every player knows to whom it belongs. Agents cannot attack possessors eating their own food.

Findings

4.4

Findings have been gathered from 100 matches for each condition (that is, for a total of 300 matches). For each set of 100 matches, the number of attacks (Agg), the average strength (Str), and the variance of individual strengths (Var) were recorded, and the significance of the differences tested. Findings are reported in tables together with the standard deviation. From now on, the variance of strength is considered as a measure of inequality; the larger the variance, the more polarised, that is, the less equitable the distribution of strength is, and vice versa. Other measures of variance, for example the number of survivors, do not allow the results from different simulations to be compared easily. The results of the repeateds experiment are shown in Table 1 and Figure 1.

Table 1

	Str	st. dev.	Var	st. dev.	Agg	st. dev.

Blind	4287	204	1443	58	9235	661
Strategic	4727	135	1775	59	4634	248
Normative	5585	27	604	41	3018	76

4.5

As can be seen from Table 1, the normative and non-normative rules are found to do well at constraining aggression, but the normative rule does far better than the strategic one. In terms of strength, the normative condition is higher than the other conditions. This is due to the efficacy of this strategy in the control of aggression. With variance, or inequality, however, the pattern varies. The normative condition is found to be the most equitable. In the normative condition, although the average strength is considerably higher than in any other condition, variance is considerably lower. Therefore, while strength is an increasing function of the control of aggression, variance is not. In sum, the control of aggression per se is neutral: it is neither pro-social, nor anti-social. Whether it plays a positive or negative social role depends on the type of control employed. In the strategic condition, the control of aggression is carried out mainly by the weaker individuals, who, although employing a strictly utilitarian strategy (to avoid fatal attacks), do not obtain advantages from it: the gap between the weak and the strong agents grows with time. In the normative condition the distribution is more equitable. The costs of the control of aggression, that could seem equally shared among the strong and the weak, are really sustained in direct proportion to the agents' strength: the stronger part of the population gives up possibly advantageous aggressions. Because food is constant and the match length is fixed, the differences among agents introduced by the norm of precedence are not maintained throughout the match. Every time a food item re-appears, the agent's status may change. Some agents, who earlier may have been non-possessors, now find themselves endowed with a new possession, while some others, who have been privileged in the past, now become members of the unlucky part of the population. The norm of finders-keepers helps the agents find a solution to competitions for the same food item, if any such competition occurs. Interestingly, this type of norm, which seems to have a biological foundation (de Waal 1982; Eibl-Eibesfeldt 1978), while efficaciously controlling aggression, reduces the variance of strength among the agents, that is, their inequality.

4.6

Of course, several aspects of this experiment should be controlled: what happens with non-self-replenishable resources? What about the role of food concentration and density? These, and other questions, should be examined in future studies. The experiment suggests that: (a) norms may have the function of constraining aggression; (b) by controlling aggression, norms also have an indirect positive effect on the average strength of the population (at least as effective as utilitarian constraints); (c) norms have a rather specific and positive impact on each agent's share of the overall "cake". This seems to be due to the specific type of aggression control allowed by normative constraints, which reduce both advantageous and disadvantageous attacks and redistribute the costs of controlling aggression over the population. In the non-normative conditions, advantageous attacks were not reduced, causing an increasing gap between strong and weak agents.

The Latter Study: When Is Compliance Convenient?

The Costs Of Compliance In Mixed Populations

5.1

From the preceding work we see the advantages of a normative behaviour. But we are also compelled to ask ourselves, what would be the reactions of a normative sub-population interacting with a more utilitarian one? One could expect that the normative mechanism is sensitive to such interference - simply stated, the non-normative could benefit from exploitation of the normative behaviour. The highly efficient normative population could prove also the most fragile, the most sensitive to interference.

The experimental design and the findings

5.2

To test the above hypothesis, the general form of the experiment has been kept the same, while allowing agents to follow different norms during the match. In order to maintain a balance between populations, three experimental simulations each with two different sub-populations (Blind/Strategic, Blind/Normative, Strategic/Normative) have been run. The average scores from three sets of 100 matches are reported in Table 2 and Figures 2, 3, and 4. We now have two values for all indicators, including aggressions (Agg): these are the active aggressions, i.e. the number of aggressions performed by each sub-population, without distinction of target.

Table 2: The costs of compliance (50%/50%)

Str st. dev. Var st. dev. Agg st. dev.

Blind 4142 335 1855 156 4686 451

Strategic 4890 256 1287 102 2437 210

Blind 5221 126 1393 86 4911 229

Normative 4124 187 590 80 1856 74

Strategic 5897 85 1219 72 3168 122

Normative 3634 134 651 108 2034 71

5.3

The sub-populations composed of Blind and Strategic agents behave as we would have expected: the Strategic agents, whose strategy is but an improvement of the Blinds' one, are far better-off than everybody else: they obtained the highest average strength, the lowest variance and performed the smallest number of aggressions. It is interesting to see that the strategic results are also slightly better than in the standalone condition, and are much less polarised. Here, the "magic" of normative efficiency has vanished. In both S/N and B/N simulations the Normative agents obtained a far lower result than their counterparts, that appear to benefit from this co-existence. Indeed, rather surprisingly, the presence of normative agents not only allows both the strategic and the blind to obtain a better average strength than in the standalone condition, but also allows them, especially the strategic agents, to obtain results significantly less polarised. It must also be noted that the normative agents maintain their exceptionally low variance: even in this unfavourable setting, they manage to redistribute costs among themselves. It is quite obvious that, in order to obtain a better performance, the normative ones need some kind of knowledge of the others' attitudes. Their algorithm is essentially a defensive one, which works better in a standalone condition, where there is no chance for transgression. It performs poorly in mixed populations, because it gives a cost-free advantage to transgressors. To introduce a cost for transgression, we need to implement social knowledge, a mechanism of normative reputation. This experiment's findings can be summarised as follows: (a) in a mixed population, the normative strategy, which was the best in the standalone condition, becomes the worst. (b) The non-normative agents benefit from the existence of normative ones. (c) This requires that some penalty for transgression be introduced in the system; one such penalty can be implemented as a specific type of social knowledge, namely normative reputation.

Re-Distributing The Costs Of Compliance: The Role Of Reputation

The experimental settings

5.4

A "reputation" is added to the preceding experimental picture, in the following way: each normative agent can have access to a vector of information about the behaviour of other agents. This information is binary and discriminates between "friends", that will abide with the norm (from now on, the Respectful) and "enemies", that will not respect the principle of finders-keepers (Cheaters). The reputation vector is initialised to "all Respectful" (presumption of innocence), but every time a normative agent is attacked while eating its own food the attacker is recorded as a Cheater. Moreover, the normative algorithm is modified so that agents respect the norm only with agents known as Respectful. Finally, a form of communication has been added, allowing neighbours (agents on each other's territory) to exchange their lists of cheaters. From now on, our attention will be focused only on the Normative-Strategic experiment which will be repeated with and without communication of reputations.

First experiment - no communication

5.5

In the first set of experimental runs the normative agents can learn about the behaviour of other agents from direct interaction, but they cannot exchange the knowledge thus gained. The results, reported in Table 3 and Figure 5, are quite disappointing: a slight increase in Cheaters's average strength is accompanied by a slight decrease in the average strength of Respectfuls, providing an overall worse result with respect to the preceding situation. We note also that the average number of recognised Cheaters is quite high at 2000 turns: the Respectful have recognised about the 84% of the Cheaters. We could infer that knowledge does not prove very useful in this scheme. But let us look at the next experiment.

Table 3: The role of social knowledge

Str st. dev Var st. dev. Agg st. dev.

Cheaters 5973 89 1314 96 3142 140

Respectful 3764 158 631 101 1284 59

Figure 5: The role of social knowledge

Second experiment: neighbours' communication

5.6

In this run the normative agents exchange their knowledge about Cheaters. When two Respectfuls meet, each transmits its knowledge to the other. Since in this world knowledge can be incomplete but must be true, exchange of information is incremental. Each agent will add the other's information to its own. The results (Table 4 and Figure 6) are very different from the no-communication situation. The average strengths of Respectfuls and Cheaters are now nearly comparable; both populations seem to pay for this redistribution of strength with a rise in the variance to near standalone levels for Cheaters and much higher than the standalone levels for Respectful. The number of attacks from the strategic agents has much reduced, due to the higher strength of normatives, while the latter score higher in the number of attacks in this experiment. The communication has brought the average recognition of Cheaters by Respectfuls to 100%.

Table 4: The role of communication

Str st. dev. Var st. dev. Agg st. dev.

Cheaters 4968 309 2130 108 2417 227

Respectful 4734 301 737 136 2031 253

Figure 6: The role of communication

5.7

The difference with the preceding run suggests the importance of anticipating knowledge. Let us examine what could happen when a Cheater first meets a Respectful that is endowed only with personal memory. A stronger Cheater will immediately attack the other agent. The latter will be soon informed of the former's affiliation (Cheater), but will be unable to use this information. Being weaker, in fact, it cannot retaliate. Moreover, a weaker Cheater will not attack its enemy, and the latter will have lost both a positive attack and useful information. This shows why useful knowledge cannot be drawn only from personal experience: there must be some form of communication.

Findings

5.8

From our experiments we suggest that: (a) a form of knowledge about others' behaviours improves normative agents' outcomes in a mixed context; (b) but communicating such knowledge is necessary, although in the long run -- that is, once you have been informed about the reputation of everybody else, without paying any personal cost for getting this information -- and with an unchanging population, the advantages that derive from receiving communication about this type of social knowledge tend to vanish. (c) Anyway, the final result is slightly favourable to Cheaters under the current condition of density of the agents over the grid (50%), concentration of food (1 food item every 2 agents), and subpopulations' proportion (50% Cheaters against 50% Respectful). But we have reason to believe that at least two of these factors have a differentiating effect: while the average strength is a decreasing function of density for both sub-populations, the normative curve is less steep (cf. Paolucci et al. 1997) up to the point where the normative strength exceeds the strategic one. With regard to the proportion between the sub-populations, it is easy to believe that a society half composed of non-compliant individuals is not likely to keep together! We have also evidence of a prevalence of normative efficiency in normative-balanced societies (25/75). But again, the normative agents seem to be less sensitive than the other sub-population to the effect of such proportion (cf. Paolucci et al. 1997): both sub-populations do better in favourable than in unfavourable numerical conditions; but while the average strength of strategics is from six to eight times better when they represent a majority (75%/25%) than when they represent a minority of the total population (25%/75%), the average strength of a majority of normatives is only twice as good as that of a minority of normatives.

Robustness of findings

5.9

A partial but relevant exploration of the parameter space of the model presented in this paper has been conducted, checking for variation of numerical parameters (dimension of the population, density, food concentration, number of time steps, and sub-population proportion) under different algorithms, namely different retaliation strategies used by normative agents when meeting with cheaters, in particular:

rational: normative agents punish only non-stronger cheaters
quasi-rational: normative agents punish any cheater

The results (cf. Paolucci et al., 1997) confirm those previously established; moreover, some interesting new features emerge. First, normative agents perform better in critical structural conditions (high global density) than in less troublesome ones. This reminds us of the Hobbesian grounds for normative regulation: norms are called for in a really tough world, in which personal utility no longer guarantees the growth of global outcomes. Secondly, the rapid spread of knowledge is crucial for the functioning of normative systems, when compared with non-normative opponents. In all the experiments without information exchange the performance of compliant agents has never been compatible with that of cheaters, not even in dramatic density conditions. In other words, gossip (communication about others' reputation) may have an important role in rendering the normative behaviour competitive with the non-normative one. Finally, the rational strategy of retaliation, which proved to be the most efficient in the equally balanced population (50%/50%) condition, has been outperformed by the quasi-rational in any unbalanced situation.

Conclusions And Future Work

6.1: In an initial experiment, repeated here, a function of norms as aggression-reducer has been simulated. Normative populations were found to obtain better average outcomes than purely utilitarian, strategic populations, in terms of aggression rate, average strength and equality. In mixed populations, however, where normative agents could meet with non-normative ones, the simulation results show the opposite pattern: the non-normatives do much better than the normatives on the same measures. They benefit from the behaviour of normatives, who sustain all the costs of normative regulation. A social knowledge condition has been introduced, namely normative reputation, which works as a penalty for transgressors, once they have been detected. However, social knowledge has been found insufficient to redistribute the costs of normative compliance. It is a mere local knowledge, and is gained exclusively through personal learning, and therefore proves useless. Normative agents who learn the reputation of cheaters at their own expense will not profit much from such knowledge since they will have paid for it dearly. On the other hand, if knowledge about the cheaters' reputation is spread among normative agents through communication, their average strength improves considerably. Not surprisingly, communication works as a mechanism for spreading vital information at zero or low cost. Of course, such a hypothesis needs further control. In a current study (Paolucci et al., 1997), the robustness of the findings is being tested with different parameter values (dimension of the population, density, food concentration, number of time steps, and sub-population proportion). Apparently, some factors (density) have a stronger and more consistent impact than others (dimension of the population) on the simulation results, which shows that, in these simulations, social structures are more influential than mere statistical phenomena. Furthermore, the normative strategy that we have implemented behaves in a consistent way (lower polarisation of strength and slightly lower average strength than the strategic routine) and is relatively less sensitive to some environmental and structural changes (e.g. density). But the present experiment should be repeated in order to explore the effects of extinguishable resources (i.e., food items which do not re-appear).
6.2: To sum up, normative agents show an individual interest in being informed about the reputation of Cheaters (as much as they are interested in receiving any other type of help) rather than acquiring this information via direct experience. To this end, they must resort to communication. On the other hand, given its low cost, this "co-operative" behaviour is less likely to be invaded by non-cooperative strategies (keep silent). Whether it pushes cheaters to comply with the norm or not, communication about reputation has a balancing effect on the population: it causes a re-distribution of costs among Cheaters and Respectfuls, reducing the distance between the two categories and rendering the agents' affiliation to the latter more rewarding, or less penalised. The reason why people should stick to a locally less convenient, but co-operative behaviour, is here identified in a mechanism of redistribution of costs: the agents communicating about the others' normative reputation. Interestingly, such a mechanism does not need a "second order" explanation, since it is based upon a direct reward: if normative agents communicate their social knowledge to one another, they will use this preventive information to punish Cheaters before suffering from the direct experience of being cheated! In such a way, normative agents kill two birds with one stone: they redistribute the costs of compliance over the whole population, including Cheaters (global outcome); and they also avoid the extra cost of complying with the norm at their own expense, and at the advantage of the Cheater (local outcome). The latter outcome explains directly their "punishing" behaviour without hypothesising, as Heckathorn seems to do, that they are co-operating with the norm itself. Or, they may be said to co-operate with the norm de facto, while achieving a local benefit.
6.3: In our model, agents have built-in strategies of aggression control. So far, we have endeavoured to show the relative global and local advantages of such strategies. But several well-known questions arise: how are such strategies formed? How do they spread over the population, at what relative speed, etc.. In particular, through which learning mechanisms does a normative strategy emerge? In this paper, we have shown that, in order to not be out-competed by a mere utilitarian strategy, the normative strategy needs to be supported by a social cognitive phenomenon, namely by communication about reputation. This mechanism seems to be responsible for the spreading and the stability of norms (their capacity to compete with a non-normative strategy). But which cognitive mechanisms are responsible for the emergence of norms? This is the crucial question we want to address in future studies.

References

AXELROD, R. 1987. The evolution of strategies in the iterated prisoner's dilemma. In Genetic Algorithms and simulated annealing. Ed. L.D. Davis, Los Altos, CA, Kaufmann, 32-41.

BANDURA, A. 1977. Social learning theory. Englewood Cliffs, NJ, Prentice Hall.

COLEMAN, J.S. 1990. Foundations of social theory. Cambridge, MA, Harvard University Press.

CONTE, R., and Castelfranchi, C. 1995. Understanding the effects of norms in social groups through simulation. In Artificial societies: the computer simulation of social life. Eds. G.N. Gilbert and R. Conte, London, UCL Press.

EIBL-EIBESFELDT, I. 1967/1978. Grundri der vergleichenden Verhaltensforschung: Ethologie. Munchen: Piper.

FLACHE, A. 1996.The double edge of networks: An analysis of the effect of informal networks on co-operation in social dilemmas (PhD Thesis), Amsterdam, Thesis Publishers.

HECKATHORN, D.D. 1988. Collective sanctions and the emergence of Prisoner's Dilemma norms, American Journal of Sociology, 94, 535-562.

HECKATHORN, D.D. 1990. Collective sanctions and compliance norms: a formal theory of group-mediated social control, American Sociological Review, 55, 366-383.

HOMANS, G.C. 1951.The human group. N.Y., Harcourt.

HOMANS, G.C. 1974. Social Behaviour. Its elementary forms. N.Y., Harcourt.

JULSTROM, B.A.1997. Effects of contest length and noise on reciprocal altruism, co-operation, and payoffs in the iterated Prisoner's Dilemma, Proceedings of the 7th International Conference on Genetic Algorithms, 386-92.

LEWIS, D. 1969. Convention. Cambridge, MA, Harvard University Press.

MACY, M. and Flache, A. 1995. Beyond rationality in models of choice. Annual Review of Sociology, 21, 73-91.

MOSES, Y. and Tenneholtz, M. 1992. On computational aspects of artificial social systems. Proceedings of the 11th DAI Workshop, Glen Arbor, February.

OLIVER, P. 1980. Rewards and punishments as selective incentives for collective action: theoretical investigations, American Journal of Sociology, 85, 1357-75.

PAOLUCCI, M., Marsero, M., Conte, R. 1997. What's the use of gossip? A sensitivity analysis of the spreading of respectful reputation. Paper presented at the Schloss Dagstuhl Seminar on Social Science Microsimulation. Tools for Modeling, Parameter Optimization, and Sensitivity Analysis, May 1-5.

SCHELLING, T.C. 1960. The strategy of conflict. Oxford, Oxford University Press.

SHOHAM, Y and Tenneholtz, M. 1992a. On the synthesis of useful social laws for artificial agent societies (preliminary report). Proceedings of the AAAI Conference, 276-281.

SHOHAM, Y. and Tenneholtz, M. 1992b. Emergent conventions in multi agent systems: Initial experimental results and observations. Proceedings of the 3rd International Conference on KR&R. Cambridge, MA, 225-232.

TRIVERS, R.L. The evolution of reciprocal altruism. The Quarterly Review of Biology, 46, 35-57. WAAL, F. de. 1982. Chimpanzee politics. London, Jonathan Cape.

WALKER, A. and Wooldridge, M. 1995. Understanding the emergence of conventions in multi-agent systems. Proceedings of ICMAS, (International Joint Conference on Multi Agent Systems), San Francisco.

Button Return to Contents of this issue

Table 3: The role of social knowledge

	Str	st. dev	Var	st. dev.	Agg	st. dev.

Cheaters	5973	89	1314	96	3142	140
Respectful	3764	158	631	101	1284	59



Figure 5: The role of social knowledge

Table 4: The role of communication

	Str	st. dev.	Var	st. dev.	Agg	st. dev.

Cheaters	4968	309	2130	108	2417	227
Respectful	4734	301	737	136	2031	253



Figure 6: The role of communication