Ilan Fischer: Evolutionary Development and Learning

Ilan Fischer (2003)

Evolutionary Development and Learning: Two Facets of Strategy Generation

Journal of Artificial Societies and Social Simulation vol. 6, no. 1
<https://www.jasss.org/6/1/7.html>

To cite articles published in the Journal of Artificial Societies and Social Simulation, please reference the above information and include paragraph numbers if necessary

Received: 7-Dec-2002 Published: 31-Jan-2003

Abstract

The study examines two approaches to the development of behavioral strategies: i) the evolutionary approach manifested in a Genetic Algorithm, which accounts for gradual development and simultaneous refinement of an entire population; and ii) the behavioral learning approach, which focuses on reinforcements at the individual's level. The current work is part from an ongoing project dealing with the development of strategic behavior. The reported study evaluates the potential of differential reinforcements to provide probabilistic noisy Tit-For-Tat strategies with the motivation to adopt a pure Tit-For-Tat strategy. Results show that provocability and forgiveness, the traits that account for Tit-For-Tat's successes, also prevent it from gaining relative fitness and become an attractor for noisy (non-perfect) Tit-For-Tat strategies.
Keywords:: Errors; Evolution; Genetic-algorithm; Learning; Noise; Strategy; Tit-For-Tat

Introduction

1.1

The success and survival of a behavioral strategy depends upon its capacity to provide resources and gain relative fitness. Axelrod (1984) explained the survival of a strategy as indicating a personal or societal selection process. "In human terms, a rule which was not scoring well might be less likely to appear in the future for several different reasons. One possibility is that a player will try different strategies over time, and then stick with what seems to work best. Another possibility is that a person using a rule sees that other strategies are more successful and therefore switches to one of those strategies." (p. 50). The underlying assumption is that a behavioral strategy should equip the organism with relative advantages, relevant to its surroundings. A strategy that does not provide such outcomes must be abandoned, or else its host might suffer negative, or even fatal, consequences. If one thinks of a strategy development in pure behavioral and evolutionary terms (disregarding the role of thinking and reasoning), then the environment, comprised of learning opportunities, rivals, and reinforcements, should account for most, if not all, possible developments.

1.2

A well-known result of Axelrod's tournaments (1980a, 1980b), supported by a theoretical analysis (Axelrod 1981) and computer simulations (Axelrod 1984), is that when cooperation was likely to be reciprocated, the Tit-For-Tat (TFT) strategy (contributed by Anatol Rapoport) exhibited the highest fitness of all participating strategies. TFT, which starts with cooperation, followed by mimicking of the opponent's last move, is cooperative, provocable and forgiving.

1.3

However, Axelrod (1980a, 1980b) seeded the computer tournaments, with "ready-made" strategies. These were generated by game theory experts (as well as laymen), who were asked to generate successful strategies. Thus the tournament studies have focused on conditions and processes that lead to the endurance and survival of existing strategies. They have left open an important question regarding the processes which generate strategies and render them as candidates for evolutionary selection.

1.4

Two main possibilities need to be considered: i) an evolutionary process, where relative advantages increase fitness and account for gradual development; and ii) a learning process where successful strategies provide attractors for less successful strategies, which in turn mimic the formers' more successful behavior. Of particular interest to the current study is the question whether the TFT strategy may become an attractor for other similar, yet less fit, strategies.

1.5

Axelrod (1997) addressed the first possibility by applying a Genetic Algorithm (GA) that builds upon the three last outcomes to make a choice in the current move. Axelrod reports that "Most of the strategies that evolved in the simulation actually resemble Tit For Tat, having many of the properties that make Tit For Tat so successful." (p.20).

1.6

GAs, developed by Holland (1975) have proven effective in searching large and complex solution spaces and in solving nonlinear problems (see Gorrini and Dorigo 1996). Instead of progressing from point to point like other techniques, GAs search from one set of problem solutions to another. This feature allows them to escape local optimums, to be dependent on relative rather than absolute fitness, and to function without global system level knowledge (Chattoe 1998). GAs process coded linear sequences (vectors) of binary or real numbers resembling the information structure of a chromosome.

1.7

Although there are several variants of GAs, the generic configuration comprises the following stages: the initialization of a population of vectors; the transformation and recombination of vectors according to specific evolutionary operands; and the evaluation and selection of individual vectors according to a fitness function (so that the least fit individuals of each generation are removed from the set). The evaluation and transformation stages are iterated until convergence occurs. The addition of a small mutation rate allows for the appearance of new genotypes, helping to avoid local minima. Assuming a sufficiently lengthy evolutionary modeling sequence, each individual of the converged set should represent a good solution for the evolutionary conditions embedded in the fitness function. Figure 1 depicts an evolutionary development of a simple GA operating on a vector set with five possible elements (depicted as five distinct colors) distributed along a sequence of 100 possible locations. The (initially defined) fitness function is programmed to favor a predefined color distribution, and thus grants vectors, which exhibit such color distributions, a higher fitness^[1]. Panel a depicts an initial random distribution of the colored vectors (rows), panel b shows the same population after several hundred generations, and panel c shows the converged set. The convergence along the evolutionary pass is easily observed, as more colored columns pop out (since similar vectors have identical color distributions). Clearly, the similarity of the vector set increases along the evolutionary pass, until all vectors have highly identical structures. Applying GAs to the study of strategy development implies that the development of a strategy is conditioned upon the existence of other relatively similar strategies. Once convergence occurs, the survival of strategies depends on their ability to interact with mutants and copies of themselves, which exhibit only small strategic deviations from each other.

(a)

(b)

(c)

Figure 1. Three stages in a Genetic Algorithm development: a - Initial random distribution; b - Partial convergence; c - A converged vector set

1.8

One may conclude that the GA approach provides a powerful tool for exploring group and social developments only for the cases where all individuals develop along an identical evolutionary pass, governed by a similar evolutionary fitness function. Yet GAs do not provide good vehicles for exploring the individual's perspective of learning and adaptation.

1.9

The current study uses the simulation tool to examine the individual's ability to learn from a TFT strategy. The simulation is designed to illustrate the payoffs accumulated by agents, which exhibit various levels of similarity to a perfect TFT strategy, with which they interact. The simulation allows the agents to interact for a sufficiently long period. During this period the agents may accumulate gains and losses, providing the basis for reinforcement learning and motivating the shift towards or away from the opponent's strategy.

1.10

For simplicity each agent is characterized by a single probability, which determines to what extent the TFT response mechanism is being executed without errors. An agent with 0% errors will respond to cooperation with cooperation and to defection with defection at each and every interaction (following the initial move of cooperation). A 30% error will result in 30 out of 100 responses to be wrongly executed (thus responding to cooperation with defection and vice versa); and a 100% error will result in a "Bully" type of behavior (always responding with the opposite behavior).

1.11

Such errors (or noise) in the execution of strategies have been pointed out as an important determinant in strategic games. Selten (1975) refers to the "trembling hand", Bendor (1987) describes errors in implementation of behavior, and Kollock (1993) speaks of exogenous events that may prevent an actor from performing a cooperative act (see also Fudenberg & Maskin 1990; Bendor, Kramer, & Stout 1991; Wu & Axelrod 1997). Kollock (1993) reports that the introduction of noise to a population of TFT strategies results in their performance plunging almost immediately to a level expected of actors playing a completely random strategy. However, for a Suspicious TFT (a TFT variant that defects on the first round instead of cooperating) the addition of a relative low level of noise may provide a positive effect. Kollock concludes that under no noise or low noise levels relaxed strategies have advantages against other strategies, but lose their advantage when the levels of noise increase.

1.12

The following simulation manipulates the noise levels (in the execution of the TFT response mechanism) of both players, examining the prospects of "noisy agents" to acquire a pure TFT strategy (executed with 0% errors).

Initialization and Simulation Parameters

2.1

All simulations were initialized by fixing the: number of agents (n), duration of each individual simulation (d), after which the simulation terminates and the overall gains are recorded, the number of replications of each simulation (r), and the error level (e) for each interacting agent. All reported simulations were run with n = 2, d = 10,000 and r = 30. The error levels for the two opponents varied according to a simple factorial design, with the first player assuming the values of 30, 5, 0, and the second player assuming the values of 0, 0.01, 0.05, 0.1, 0.25, 0.50, 0.75, 1, 2, 3, 4, 5, and 10 to 100 in intervals of 10. The simulations were run as a simple PD tournament according to the payoff matrix denoted in table 1.


Table 1: A Prisoner's Dilemma Payoff Matrix

		Player B
		C	D
Player A	C	5, 5	-10, 10
	D	10, -10	-5, -5

Results

3.1

Figure 2 depicts the gains of a TFT playing agent with a 30% error level, against a series of opponents with error levels ranging from 0 to 100%. Both the gains of the 30% error level TFT player and those of its opponents vary across the error levels with no clear trend (ranging from -300 to 650 units). Observing one's own outcomes and comparing them to the various opponents does not lead to any systematic conclusion. The 30% error level agent might accidentally conclude that the range of 90% error is the best strategy, but this result is likely to change randomly when other games are played. Generally, this agent can conclude that no error level seems to perform better than others, and thus find no reason to change its 30% error level.

Figure 2. Gains of a TFT playing agent with 30% errors (square), against TFT opponents with various error levels (diamond).

3.2

Assuming that a better approximation to pure TFT has been obtained, figure 3 depicts both the gains of a 5% error level TFT player and those of its opponents, with error levels ranging from 0 to 100%.

3.3

The results for this lower error level playing agent seem quite confusing. Playing against agents with higher error levels resulted in a relatively systematic average of zero gains. However, the interactions with the lower error level strategies, exhibit a very disordered pattern, including both relatively high gains and relatively high losses. Once more, this mix is not systematic and cannot provide the grounds for a learning process.

3.4

The simulations depicted in Figure 4 allow agents with various error levels (0 to 100% errors) to experience a game against a perfect TFT player (error level of 0%). The results show that: i) agents with high error levels inflict an average of zero gains on themselves as well as on the pure TFT strategy; ii) lower error levels result in a random mix of gains and losses; and iii) the outcomes of both players seem indistinguishable. These characteristics do not provide motivation to alter one's strategy or to reduce one's error levels. They do not encourage acquisition of the pure TFT strategy.

Figure 3. Gains of a TFT playing agent with 5% errors (square) , against TFT opponents with various error levels (diamond).

Figure 4. Gains of a pure TFT agent (square), playing against TFT opponents with various error levels (triangle)

Discussion

4.1

The simulations show that pure TFT is not easily gained. The two traits of provocability and forgiveness that account for TFT's successes, also prevent it from gaining relative advantages. TFT's efficiency in responding to both cooperation and defection ensures a balanced outcome for both players, but it does not provide the reinforcement required to encourage adoption of the pure TFT strategy. Paradoxically, agents that have very low error levels and thus perform almost pure TFT, obtain (and inflict upon their opponents) the most confusing payoffs. Their outcomes are disordered and may result in either gains or losses with similar probabilities; while under the high error levels the positive and negative outcomes balance each other within a relatively short interaction period. The crucial factor here is actually not the error level itself, but the ratio between the error level and the period required for learning and consolidation of the newly acquired knowledge. If learning would take place only after an infinite amount of information has accumulated then the TFT response mechanism would balance the outcomes for both players. Under such conditions the outcomes would not reinforce any shift towards or away from the pure TFT strategy. However, when learning takes place before a sufficient quantity of information has been acquired, the shifts in strategy reflect local non-representative sequences of information. Such a disordered pattern may result in some agents being motivated to distance themselves from the TFT strategy after it has already been approximated, but it may also result in some strategies being motivated to adopt the TFT strategy. Interestingly, local representativeness was suggested by Kahneman and Tversky (1972) as the reason for biases in human reasoning. It has been used to explain phenomena like the gambler's fallacy first described by LaPlace (1951), the hot hand fallacy (Gilovich, Vallone, & Tversky 1985), and the misperception of random sequences (Budescu 1987).

4.2

Three conclusions regarding the emergence of TFT suggest themselves:

Learning from an existing TFT strategy is rather unlikely for opponents that already utilize TFT, yet still exhibit high error levels. Approximation of TFT with relatively low noise levels and shorter interaction periods results in disordered outcomes, which may provide the motivation for either: drifting away (since both strategies are losing), not changing one's strategy (since it does better than TFT), or adopting the pure TFT strategy (since it gained higher payoffs).
Since the TFT response mechanism is very efficient in leveling the outcomes of both players, it is not efficient in motivating strategic shifts (towards TFT) of opponents and observers.
The development of TFT does not necessarily derive from the motivation induced by TFT playing agents. It may be based on compliance to more general principles (such as similarity, resemblance, or equity), and/or upon the capability to reason and analyze outcomes in regard to their respective causes. In fact, it was higher order reasoning, which accounted for the contribution of the TFT and most other strategies to Axelrod's (1981) original tournament.

Notes

¹ The function is stylized after Fischer & Sullivan's (in preparation) Time Use Genetic Algorithm, which models a set of human activities, distributed along a diurnal cycle, within a typical week.

Acknowledgment

This research is part of a project supported by the Israel Science Foundation founded by The Academy of Sciences and Humanities.

References

AXELROD, R. (1980a). Effective choice in the Prisoner's Dilemma. Journal of Conflict Resolution, 24, 3-25.

AXELROD, R. (1980b). More effective choice in the Prisoner's Dilemma. Journal of Conflict Resolution, 24, 379-403.

AXELROD, R. (1981). The emergence of cooperation among egoists. The American Political Science Review, 75, 306-318.

AXELROD, R. (1984). The evolution of cooperation. New York: Basic Books.

AXELROD, R. (1997). Evolving new strategies: The evolution of strategies in the iterated Prisoner's Dilemma. In R. Axelrod (Ed.), The complexity of cooperation (pp. 41-29). New Jersey: Princeton University Press.

BENDOR, J. (1987). In good times and bad: Reciprocity in an uncertain world. American Journal of Political Science, 31, 531-538.

BENDOR,J., KRAMER, R. M., & STOUT, S. (1991). When in doubt: Cooperation in a noisy Prisoner's Dilemma. Journal of Conflict Resolution, 35, 691-719.

BUDESCU, D.V. (1987). A Markov model for generation of random binary sequences. Journal of Experimental Psychology: Human Perception and Performance, 13, 25-39.

CHATTOE, E. (1998). Just How (Un)realistic are Evolutionary Algorithms as Representations of Social Process? Journal of Artificial Societies and Social Simulation. 1, 3, <https://www.jasss.org/1/3/2.html>

FISCHER, I., & SULLIVAN, O. (in preparation). Evolutionary modeling of time-use vectors.

FUDENBERG, D., MASKIN, E. (1990). Evolution and cooperation in noisy repeated games. American Economic Review, 80, 274-279.

GILOVICH, T., VALLONE, R., & TVERSKY, A. (1985). The hot hand in basketball: On the misperception of random sequences. Cognitive Psychology, 17, 295-314.

GORRINII, V., & DORIGO, M., (1996). An application of evolutionary algorithms to the scheduling of robotic operations. In Alliot, J. M., Lutton, E., Ronald, E., and Schoenauer, M. (Eds.) Artificial Evolution. New York: Anchor Books

HOLLAND J. H. 1975. Adaption in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI.

KAHNEMAN, D., & TVERSKY, A. (1972).Subjective probability: A judgment of representativeness. Cognitive Psychology, 3, 430-454.

KOLLOCK, P. (1993). An eye for an eye leaves everyone blind: Cooperation and accounting systems. American Sociological Review, 58, 768-786.

LAPLACE, P. -S. (1951), A philosophical essay on probabilities (F. W. Truscott & F. L. Emory, Trans.). New York: Dover. (original work published 1814).

SELTEN, R., (1975). Reexamination of the perfectness concept for equilibrium points in extensive games. International Journal of Game Theory, 4, 22-55.

WU, J., & AXELROD, R. (1997). Coping with noise: How to cope with noise in the Iterated Prisoner's Dilemma. In R. Axelrod (Ed.), The complexity of cooperation (pp. 41-29). New Jersey: Princeton University Press.

Button Return to Contents of this issue