Scott Moss: Critical Incident Management

Scott Moss (1998)

Critical Incident Management: An Empirically Derived Computational Model

Journal of Artificial Societies and Social Simulation vol. 1, no. 4, <https://www.jasss.org/1/4/1.html>

To cite articles published in the Journal of Artificial Societies and Social Simulation, please reference the above information and include paragraph numbers if necessary

Received: 20-Aug-98 Accepted: 10-Oct-98-98 Published: 15-Oct-98

Abstract

The main purpose of this paper is to demonstrate an empirical approach to social simulation. The systems and the behaviour of middle-level managers of a real company are modelled. The managers' cognition is represented by problem space architectures drawn from cognitive science and an endorsements mechanism adapted from the literature on conflict resolution in rulebased systems. Both aspects of the representation of cognition are based on information provided by domain experts. Qualitative and numerical results accord with the views of domain experts.

Keywords:

crisis management, agent cognition, model verification, simulation methodology

Introduction

1.1

The simulation models and experiments reported in this paper extend the social simulation literature in three ways.

The structure of the models and the key parameter values conform to descriptions and data provided by a key manager concerned with the organizational function of concern here. In particular, the representation of cognition not only supports important elements of well verified representations from computational cognitive science (Soar (Laird et al., 1987) and ACTR (Anderson, 1993)), but it does so in a way which is open to inputs from and evaluations by the decision makers whose behaviour is captured by the model.
Where social simulation models are devised to support policy analysis, it is clearly a virtue if they are constructed in a way which is open to empirical inputs and, moreover, which makes clear the limits of those inputs. Although no effort in this direction was made on the basis of the experiments reported here, other work (e.g., Moss and Edmonds, 1997) has shown that appropriately specified models can be used to help domain experts identify and ameliorate inconsistencies between their own qualitative judgements and quantitative relationships drawn either from statistical data or physical theory. The models reported here were used to investigate a particular issue of the relationship between direct communication among a group of managers and the efficiency with which their common organizational task is carried out. The results were in line with the beliefs and experience of our informant.
The model includes a direct representation of relevant physical relations so that physical and social aspects of the processes captured by the model are integrated without (intensionally) changing the representations of either the physical or social processes in order to effect that integration. Since integrated physical and social modelling is arguably an important item on the social simulation agenda in, for example, studies of climate change or sustainable development, the methods for capturing and integrating the relevant data in this model might well inform the methods and procedures for integrated modelling and assessment more generally. Empirical openness also helps to develop models which integrate the modelling of physical and social relationships and processes. Although it was not indicated in this case, if the outputs from the simulations had turned out to be systematically inconsistent with the historical record, we would have considered the accuracy of the managers' beliefs about the relevant physical relationships as well as their qualitative judgements about social processes.
Specifically in relation to computational organization theory, the models reported in this paper fill a gap between organizational models set in a pure recognition task environment and models set in a co-operative action task environment. Prime examples of pure recognition task environments are the various radar models and their canonical form in which agents are tasked to recognise characteristics of digit strings representing the environment. Key examples are Ye and Carley (1995) and Carley and Svoboda (1996). Co-operative action task environments are represented in the VDT system reported by Jin and Levitt (1996) and in the Soar-based models of Tambe (e.g. 1997).
While many tasks undertaken within organizations require cooperation among several of the units comprising the organization, there are important cases in which tasks are efficiently and accurately represented as requiring action by a single individual or unit within the organization. One such task is the identification and management of critical incidents. In such cases, it would always be possible to model the process at a sufficiently low level that the required co-operation among units or individuals must be represented explicitly. However, it is not always necessary to capture that level of detail in order to analyse the management issues involved. The issue of detail or, equivalently, reductionism is pragmatic: we choose the level of detail to elucidate the relationships of interest and to hide unwanted detail.

1.2

In the case reported here, there are managers who identify tasks to be completed, there are systems to schedule and plan the work, and there are repair teams to carry it out. The model described below is concerned with the representation of learning by the managers whose job is to diagnose and remedy problems which create critical incidents. These managers work in shifts and the effects of communication between the manager finishing one shift and the manager starting the next shift is investigated. If the work of the repair teams were a subject of the analysis, then a representation of the co-operative activity among electricians, plumbers, building teams, etc. would be indicated. For the purposes of the present analysis, however, such detail would obscure the main points and it is enough to presume that the repair teams do the work required of them.

1.3

A clear virtue of recognition task models is that the hierarchical structure and reporting arrangements are the key elements in determining the performance of the organization. The complexity of the task itself is of no moment. One of the results of the simulations reported here is the demonstration of the importance of complexity in turning a critical incident into a full-blown crisis. But just as the independent action task environment reported here does not support issues addressed by models of co-operative task environments (such as the VDT system), the recognition task environment does not support analysis of communication among sequentially active agents or the effects of endogenously increasing complexity of the task to be accomplished. These models have different purposes.

1.4

One early objective in the development of social simulation as a discipline will surely be to provide an overarching framework for these various task environments. It is first necessary, however, to elaborate the properties of each task environment and to demonstrate the empirical relevance of each. The independent action task environment reported here bridges the gap between the recognition and co-operative action task environments thereby to provide a useful step in the development of a general framework for the analysis of the influence of organizational structure on individual cognition.

The North West Water incident management organization: model design and implementation

2.1

The organization and information systems and networks relevant to incident management were described and documented by the company's emergency planning officer who also provided impressionistic information about the nature of critical incidents as well as the likelihood of an incident of one kind leading to indicents of other kinds. For example, when intruders gain access to remote sites, there is a substantial probability that the intruders will set a fire. Fires damage equipment thereby causing various failures that themselves constitute critical incidents. One virtue of the model reported here is that it provides a check of the consistency between the assessments of cause and effect, the nature of the information systems and the informants' views of the effectiveness of their systems and procedures in containing critical incidents.

2.2

The information systems of North West Water were developed to support local empowerment of the operations managers who have to rely on central services and systems for the provision of resources to manage critical incidents. A central systems organization decides on the activities to be undertaken during the course of a critical incident but leaves the planning and scheduling of these activities to a specialist planning and scheduling department. The operational activities modelled here are inspection, reporting and repair. There is an infrastructure of telemetry from operations sites and verbal communications channels among specified agencies within North West Water. The public can report incidents at any time of the day or night but the company's contact point with the public varies over the course of the day. From 8h00 until 22h00 the company's Customer Service Centre takes all calls from the public and passes on calls indicating the possibility of a critical incident directly to Central Systems. During the night-time hours, 22h00 until 8h00, calls from the public are received by the Operations Control Centre who pass on any reports indicating critical incidents to Central Systems. The information flows involved in the incident management procedures are captured in Figure 1.

Figure 1: Incident management organization

2.3

In this section, we describe the model structure and the substantive implications of the implementation in the SDML modelling environment^[1] and then, in the next section, the representation of cognition including its roots in cognitive science.

2.4

A key element in any model of an organization must be the specification of who knows what. Some knowledge is common to everyone in the organization, some is common within departments and some only among specific groupings of individuals. It is not always necessary explicitly to model how information comes to be commonly held in this way. Sometimes, however, individuals decide whom to tell about specific phenomena and then address the relevant information to those specified individuals. Such communication can usefully be modelled.

2.5

SDML supports a "container hierarchy" in which some types of agents (composite agents) can contain subagents. An organization can contain departments which can, in turn, contain activities. SDML also supports compilation based on the prior definition of clauses which can be asserted to databases. These clauses are defined as part of the definitions of agents. If an agent, say an individual contributing to an activity, is represented as a subagent of the activity and a clause is defined at the level of the activity, then every time the agent asserts an instance of that clause definition, the clause is actually asserted to a database of the activity and can therefore be accessed by every individual in that activity as if it were on their own respective databases. If the clause were defined at the level of the department, then whether asserted by the department, a component activity or an individual contained by an activity, the clause would be stored on the department's database and could be read by every activity and individual as if it were on their own respective databases.^[2]

Figure 2: Container structure of critical-incident model

2.6

The container structure of the critical-incident model is shown in Figure 2. The outermost container is an instance of CrisisModel which specifies the sequence in which its immediate subagents will fire their rulebases.

2.7

An instance of the type PhysicalWorld has databases with information about the causes and effects of events which can occur at the various operating sites together with the combinations of actions that will remedy an untoward event. Seventeen such events were considered including discoloured, evil tasting or smelling water, road collapse, burst mains or foul sewers, fire, intruders, contamination incidents, power supply or pump failures and chlorine leaks. These and the other events were allocated probabilities of spontaneous occurrence as well as probabilities of occurring as a consequence of one of the other events. In some cases, the probability of spontaneous occurrence is 0 because the events can only be consequences of some other event. Pollution incidents, for example, are always caused by something. That one event will be a consequence of another involves both a primary and a secondary probability. The primary probability is that the first event causes the second. The secondary probability is that the remedial action in respect of the first event causes the second.

2.8

On each occasion when an event might occur, the PhysicalWorld decides which, if any, primary events will occur spontaneously according to the specified probabilities and, if any such event does occur, assigns it to an operating site at random. If there are already events occurring at an operating site, the PhysicalWorld propagates consequential events at random from the specified probabilities and assigns them to the appropriate operations site.

2.9

The causes of specific events are not in practice known to the individuals involved in critical incident management until the manifestations of the incident have been observed and the situation analysed. Even then, they might make the wrong diagnosis. For these reasons, it would not be appropriate to post the causes of a particular incident to a database accessible by all agents. Consequently, the relevant clauses are asserted privately by the PhysicalWorld to its own databases and the fact of the occurrence of the event is asserted to the database of the operation site at which the event is to occur. This assertion is achieved by the explicit addressing of the clause eventOccuring event where event is actually fire, pumpFailure, contaminationIncident, or the like.

2.10

The domain experts have specified probabilities that the incidence of an event of one type will be followed by an event of some other type. Consequently, the allocation of a critical event to an operating site is followed with the expert-specified probability at the next time period by the other events which are known to follow from the initial event. Of course, many of these consequential events are followed with some known probabilities by yet other events. There is therefore the possibility of chains of events over time beginning with the randomly allocated events at a given site. Events, once allocated, remain a feature of the site until they are remedied, if there are remedies, and the events which gave rise to them have been eliminated.

2.11

The operating sites (in practice unstaffed) recognise two kinds of event: telemetered and publicly observable events. When a site has had a telemetered event asserted to its databases, it sends a message stating that it has that event to the OperationsControlCentre. When a site has a publicly observable event asserted to its databases, it selects at random a percentage of households and asserts the occurrence of that event to their databases. In the simulations reported here, there were 100 households and each household had a 10 per cent probability of being selected to receive such a message. Because the information contained in those assertions refers to actual information which is available selectively, once again explicit addressing of the assertions is appropriate.

2.12

The OperationsControlCentre agent forwards the telemetry and public reports to the CentralSystems agent who decides on the actions to be taken. The instructions to take these actions are addressed explicitly to the WorkPlanningAndScheduling agent who allocates the work by addressing the instructions of an agent of type RepairGang or Controller as appropriate. The reports by the repair gangs or controllers are addressed to CentralSystems agent. Repairs take the form of actions asserted by the repair gang to the operating site and then read from the operating site's databases by the PhysicalWorld instance.

2.13

The cognitive behaviour in this model is by the instances of type Controller, and the workPlanningAndScheduling, OperationControl and CentralSystems instances. These agents learn by specifying and testing models which are held on their respective databases as private clauses -- i.e. clauses which can be asserted by an agent only to its own database and read only by that agent.

Agent cognition: learning as modelling

3.1

Agent cognition is represented as a process of model generation and testing within a means-ends framework (Moss and Edmonds, 1998). This approach has its origins in the cognitive science literature, classically the Soar software architecture as specified by Laird, Newell and Rosenbloom (1987) and, in a slightly different vein, Anderson's (1993) ACT-R. Both rely on problem-space architectures which are in effect relationships between goals and the sub-goals needed to achieve those goals. Soar is an implementation of Newell's (1990) unified theory of cognition which is intended in principle to bring together in a single theory representations of decision-making, learning, memory and other aspects of less relevance here. The ACT-R theory is intended as a theory of memory. Both embody their theoretical structures in software architectures, though Cooper et. al. (1996) have argued that the theories are not implementation-specific so that the Soar and ACT-R software must entail implicit non-theoretical elements. For this reason, the particular representation of agent cognition specified in this paper is not claimed to be an exact implementation of any particular theory of cognition.

3.2

Both Soar and ACT-R represent cognition as the building up of structures of declarative relationships. Both entail the learning of increasingly complex models as a process of "chunking" though "chunking" itself has different meanings in the two theories. In Soar, chunking amounts to the replacement of procedural knowledge in the form of elements of the problem-space architecture with declarative knowledge in the form of condition-action rules. In ACT-R, chunking is the creation of data structures with special slots. Since some of the slots can themselves contain chunks, there is no theoretical limit on chunk complexity.

3.3

There are, as yet, no models of multi-agent interaction in ACT-R. There are several such models implemented in Soar (e.g. Ye and Carley 1995, Tambe and Rosenbloom 1996) but these have a small number of agents and, if any hierarchical relations, only two layers. Moreover, Ye and Carley found that the full cognitive capabilities of Soar became unusable in even a simple multi-agent model for two reasons. One is that Soar becomes computationally expensive in multi-agent settings; the other is that the repetitiveness that Soar requires for learning is not found in those models. In the North West Water model, the complexity and probabilistic nature of the causal relationships makes exact repetition of complex events unlikely and certainly infrequent. So learning dependent on the observation of repetitive events is not a promising representation of cognition in those cases. This result is itself important since it enables us to distinguish between events which, by virtue of organizational learning, become routine and which require special structures, measures and cognitive processes to be embodied in crisis management teams.

3.4

The virtue of the representation cognition implemented in this model is that it conforms to key elements in the Soar and Act-R theories particularly in relation to the problem space architechture. No attempt is made to capture chunking as is done in either Soar or ACT-R. Instead, the increasing ability of agents to observe ever more complicated sets of events and to act quickly and effectively in response to such observations is captured by assuming agents to learn by building models of causal relationships complemented by a system of endorsements which enable them quickly to select the models which are appropriate to the situation and which they have found to be most reliable in the past. In some cases, the perceived reliability of the model is influenced by agents' assessments of the reliability of agents who have suggested to them relationships incorporated in their mental models.

3.5

Since, as Cooper et al. have pointed out, Soar (and, by extension, ACT-R) must contain non-theoretical elements, we have no possibility of conforming exactly to either of the underlying cognitive theories. However, the implementation of agent cognition used in this model does support the explicit introduction of domain knowledge into the representation of cognition. This is because the problem space architecture can be built up on the basis of descriptions by the doamin experts of their planning and operational procedures in the course of a critical incident. In addition, the accounts of domain experts concerning the relationships they believe to prevail and why they have different degrees of confidence in different identified relationships is used to develop the system of endorsements which itself determines the agents' choices of mental models applicable at any time.

3.6

In the North West Water model, the controllers build models relating remediable causes to consequences. They are assumed to know which individual events are causes of which other individual events but not the associated probabilities. Consequently, it is not always clear which causal events are the most important to remedy first. The procedure they follow to formulate models is, in the absence of any applicable model, to generate a new model which postulates as a single cause the kind of event which is a cause of the largest number of the other observed events at the same site. For example, early in one simulation run, two events occurred spontaneously at one of the operating sites: an intruder forced entry and the water pressure dropped. As a result, virtually everything else that could happen did happen. These events included a fire, a chlorine leak, a power supply failure, discoloured water, contamination and pollution, low water levels, no water to customers and a water taste or odour. The controller sent to inspect the site concluded that the key event to resolve was the presence of the intruder because among the observed events, more of them had intrusion as a possible cause than they had for any other causal event. The runner-up as the key event was fire which came second to intrusion only because one possible cause of a fire is an intrusion but fires do not cause intrusion.

3.7

The models were used by the controllers to identify and report to central systems the primary cause or causes of an incident. If, as a result of that report, the remedy applied on the instruction of central systems eliminated at least one of the events identified by the model, then the model was endorsed as having reduced the severity of the incident. If the result of applying the model was to eliminate all of the events covered by the model (i.e. all of the causes and all of the effects), then there was a further endorsement to that effect. These endorsements are part of an endorsements framework which provides an alternative to chunking in the representation of knowledge gained by experience.

Endorsement

3.8

The use of endorsements was first defined by Paul Cohen (1985) as a device for resolving conflicts in rule-based expert systems. It was modified and extended by Moss (1995) within a model of learning by social agents. The Moss version is the one that is used here.

3.9

Endorsements are tokens that have associated numerical values. The agents of type Controller, for example, could endorse their own mental models with any or all of the endorsements in Table 1.

token	value
Table 1: Controllers' endorsements
noEffect	-1
newModel	0
reducedEvents	1
eliminatedAllEvents	2
reportedModel	2
specializedModel	3

3.10

Whenever a new model was formulated by a controller, it was endorsed as "newModel". If the model was used to generate recommend an action or set of actions and, that action having been taken, fewer critical events were reported in the next period, then the model was endorsed has having reduced the number of events. If the incident were ended, then the model would be endorsed as having eliminated all events.

3.11

Whenever there was a choice of models to invoke during a critical incident, the best endorsed model was used. The best endorsed model was the one with the highest endorsement value. The total endorsement value E was calculated as

where b is an arbitrary number base not less than 1. Each term on the right of equation (1) is the decimal value of sum of the values of the endorsements in number base b. So if the number base is 2, then an endorsement of the third level of importance (such as specializedModel in Table 1) will be twice as important as an endorsement of the second level of importance (such as eliminatedAllEvents in Table 1). An endorsement of any level of importance will always be b times as important as an endorsement of the next lower level of importance.

3.12

Negative endorsement values are interesting for their magnitude and their negativity so the second term on the left in equation (1) is the sum of the magnitudes of the negative endorsement value in number base b. The magnitude of the negative endorsement values is subtracted from the magnitude of the positive endorsement values to obtain the total endorsement value of the model or other endorsed object.

3.13

In the North West Water models reported below, the endorsement base was either 1 or 1.2. In other models, where a coarser distinction is to be made among endorsement levels, the number base might be (say) 2 or 3. For higher values of b, the choice of model will tend always to be dominated by the single most important endorsement. The choice is up to the modeller and, in empirical models such as this, that choice will be informed by discussions with the decision makers. The mnemonic endorsement tokens and the relative importance of each endorsement can be obtained form domain experts. In this case, they were obtained from the emergency planning manager of North West Water but in a more extensive and detailed model it would be appropriate to devise the endorsement schemes in consultation with the network controllers. Although the repair gangs represented here are not specified as cognitive agents, it is by no means impossible that the members of repair gangs would have views of what procedures and relationships can be relied on that differ fromt he views of the network controllers. Certainly one line of enquiry would reasonably be to investigate conflict arising from different approaches to mental model evaluation where these different approaches are represented either by different endorsement schemes or different number bases for evaluating total endoresment values.

Basic model-specification process

3.14

In the first of the simulation set-ups reported here, b = 1 so that each endorsement added one unit of "weight" to the model. Since each model started with a weight of 1 (with an endorsement as a new model), a model which had once been believed to have reduced the number of events at a site had a weight of 2 and, if all of the causes ascribed by the model as the sources of all of the effects and all of the claimed effects of those causes were eliminated, the model had a weight of 3.

3.15

Any cognitive agent in the model will try to specialise successful models by taking the union of the set of causes specified by two models and the union of the predicted effects. Generalisation involves taking the intersection of the causes of two models and the intersection of their predicted effects. Because a specialised model has more conditions of application, it will apply to fewer cases. The specialisation procedure and endorsement together serve the same role as chunking in Soar and ACT-R in that together they allow for more complicated patterns and sets of causal relations to be used in the course of cognition. The controller would report every condition of the successful model and would be able to identify with greater accuracy the core causes of any incident. Consequently, we would expect more remedies to be applied more quickly as a result of model specialisation and the most effective remedies to be applied as a result of model generalisation.

3.16

The controllers filtered the models they retained actively in memory over time be remembering from one day to the next only those models with weights of 2 or more. The controller would select a model to apply by devising a list of candidate models and then choosing the best of the candidates. Among those models which had a single causal element, the agent would consider those which specified as a cause an event which had actually occurred and among all such models would keep in consideration those which had the largest number of actual events among their predicted states. There could be several of these. A multi-cause model would be selected whenever all of the events it treated as conditions and as actions were actually observed.

3.17

All of the candidate models were collected and given a probability of being chosen which was proportional to their respective weights. The selected model was then chosen at random according to the weighted probabilities.

Elaborated model-specification process

3.18

In the second simulation set-up reported here, the basic cognitive process was altered in several ways. From the perspective of organizational science, the most important addition was that, at the end of each shift, the controller would "tell" the controller on the next shift about any models which had been applied and resulted in the elimination of all hypothesised causes and effects. If the succeeding controller did not have an equivalent model, it would formulate one. If it already had such a model, it would note the reported success achieved with that model. The remaining changes concerned the valuation of models by the individual controllers. Specifically, the number of endorsements which the controllers could apply to their respective models was increased so that specialised models were valued for their specialisation and different endorsements were valued differently. The endorsements were put into classes so that a the "new model" endorsement was in class 0, the "reduced critical events" endorsement was in class 1, the endorsement corresponding to the elimination of all modelled events, the endorsement corresponding to a report of successful use of a model by another controller were both in class 2 and the "specialised model" endorsement was in class 3. The value of an endorsement in each class was 1.2 times the value of the endorsement in the next lower class with the least valuable endorsements (in class 0) having a value of 1.

3.19

In effect, it was assumed that each controller would value a report from the previous shift's controller as being 1.2 times as important as the successful use of a mental model by the controller himself. The sensitivity of emergent behaviour to the specification of these relationships is clearly an important issue for further research.

Results

4.1

The model results are reported in terms of overall organizational performance as well as an account of the models and procedures which emerged from agents' cognition. Two results have proved to be general relative to these simulations. One is that, the sharing of successful models by controllers is sufficient and essential to the achievement of significant reductions in the time required to run simulations of any given number of time frames. Model sharing was implemented on top of the simpler model entirely by the addition of rules and clauses for network controllers to communicate the successful use of a model to the controller on the next shift and also to copy the content of a reported model and then to endorse it as a reported model. Since no other rules were affected, the increased simulation speed implies that the cognitive burden on the agents was much less than in the absence of model-sharing. Each agent was able to use reported models rather than to generate and test sequences of models to build up the experience necessary to identify reliably useful mental models on its own. The second result is that, with model-sharing and increased value accorded to specialised models, the controllers more rapidly and systematically developed procedures for dealing with relatively small incidents but neither with nor without model-sharing were the controllers able to formulate procedures for dealing with the larger and more complex sets of critical events.

Organizational performance

4.2

Incidents are better managed if critical events are remedied within a shorter time and have fewer consequential results. In this model the second criterion is a consequence of the first since consequential results occur with a constant likelihood in each event cycle preceding by a causal event. We observed no long-term improvement in organizational performance over any simulation run under any set of conditions. As indicated by the data reported in Table 2, where there is learning, it takes place early in the simulation run and the results thereafter show some variance due to environmental noise resulting in different patterns of spontaneous occurrence of events.

Figure 3: Time pattern of the duration of critical incidents

4.3

The data in Figure 3 is taken from a run of 40 simulation days, each containing 18 event cycles. The vertical axis measures the number of event cycles during which an operating site was continuously host to at least one critical incident. The horizontal axis indicates the date at which the site became incident-free. By observation, the pattern gets thinner from the bottom upwards. That is, the number of incidents resolved in a single event cycle exceeded the number resolved in two or more event cycle; the number solved in two event cycles was greater than the number solved in three or more cycles, and so on. But the pattern from left to right - that is, over the course of the simulation run - shows no systematic changes. Indeed, the longest-lived incident was resolved after 15 event cycles at elapsed event cycle 552 (day 30, event cycle 12) out of 720 event cycles (40 days) in all. However the density of points towards the bottom of the graph indicates that, most of the time, critical episodes lasted only one or two event cycles.

4.4

Table 2 and Table 3 report the distribution over time of lengths of critical incidents with and without model-sharing, respectively. Much of the variation in the distribution over the simulation runs is due to the variation in the numbers and types of events occurring within each period. As we see from Table 3, after the first 50 or so event cycles, between 50 and 87.5 percent of incidents in every three-day period were completely resolved within two event cycles. The position, as indicated in Table 2, was much more variable and generally less successful when agents did not share successful models.

4.5

The difference is in the way that the cognitive development of the controller agents supported the emergence of operating procedures for identifying and dealing with the causes of incidents. Initially, there was no systematic difference. The procedures which emerged to resolve the events lowLevel, lowPressure, noWater, discolouredWater, contaminationOrPollutionIncident and tasteOrOdorOfWater followed from the same cognitive processes in both simulation set-ups. The excerpt from the simulation transcript reported in Figure 4 shows how these procedures emerged as a result of the formulation of mental models by the controllers.

Elapsed event cycles	0<n<=2	2<n<=4	4<n<=6	6<n<=8	8<n

Table 2: Percentage distributions of episode lengths (no model-sharing)

0-53	68.75	6.25	12.5	12.5	0
54-107	58.82	17.65	11.76	0	11.76
108-161	60.00	13.33	13.33	6.67	6.67
162-215	40.00	20.00	26.67	13.33	0
216-269	25.00	25.00	25.00	16.67	8.33
270-323	22.22	22.22	33.33	11.11	11.11
324-377	61.11	22.22	16.67	0	0
378-431	53.33	26.67	13.33	6.67	0
432-485	64.71	17.65	5.88	0	11.76
486-539	43.75	25.00	25.00	6.25	0
Avg overall	49.77	19.60	18.35	7.32	4.95
Std dev	15.54	5.89	8.18	5.72	5.15

Table 3: Percentage distributions of episode lengths (with model-sharing)

Elapsed event cycles 0<n<=2 2<n<=4 4<n<=6 6<n<=8 8<n

0-53 37.50 12.50 12.50 37.50 0

54-107 56.26 18.75 12.50 6.25 6.25

108-161 53.33 13.33 0 20.00 13.33

162-215 64.71 29.41 0 5.88 0

216-269 57.89 15.79 21.05 0 5.26

270-323 53.85 23.08 15.38 0 7.69

324-377 68.72 10.53 10.53 0 10.53

378-431 87.50 6.25 0 6.25 0

432-485 50 31.25 6.25 0 12.50

486-539 64.71 17.65 11.76 0 5.88

Avg overall 59.45 17.85 9.00 7.59 6.41

Std dev 12.57 7.64 6.86 11.59 4.77

4.6

Table 3: Percentage distributions of episode lengths (with model-sharing)

Elapsed event cycles	0<n<=2	2<n<=4	4<n<=6	6<n<=8	8<n
0-53	37.50	12.50	12.50	37.50	0
54-107	56.26	18.75	12.50	6.25	6.25
108-161	53.33	13.33	0	20.00	13.33
162-215	64.71	29.41	0	5.88	0
216-269	57.89	15.79	21.05	0	5.26
270-323	53.85	23.08	15.38	0	7.69
324-377	68.72	10.53	10.53	0	10.53
378-431	87.50	6.25	0	6.25	0
432-485	50	31.25	6.25	0	12.50
486-539	64.71	17.65	11.76	0	5.88
Avg overall	59.45	17.85	9.00	7.59	6.41
Std dev	12.57	7.64	6.86	11.59	4.77

The excerpt from the simulation transcript in Figure 4 reports the application by controller-2 of that agent's model designated controller-2: model-2. That model specified lowLevel as a cause with lowPressure and noWater as the effects. Since the action repairLeak specified by central systems resolves the lowLevel problem and there are no other causes of either lowPressure or noWater at the site, those effects of lowLevel are also resolved. This means that all three events related by the model have been eliminated and the model is, as a result, strongly endorsed. The same pattern recurred whenever there was a spontaneous occurrence of lowLevel. In addition, the controller with the immediately preceding shift to that of controller-2 repeatedly reported that its own equivalent model had correctly predicted the causal relations. These different sources of endorsement were asserted sufficiently often that controller-2: model-2 became the second-best endorsed of controller-2's models.

4.7

The same model was adopted by controller-1 on the basis of the report by controller-2. controller-1 then had the same successful experience with it which was reported to, and adopted by the controller in the next shift, controller-3. In the transcript excerpt of its first use, reported for day 8 in Figure 5, the model is identified as controller-1(reported): model-1.

Day 0, Event Cycle 5: Spontaneous occurrence of lowLevel at operationSite-3 Day 0, Event Cycle 5: Occurrence of lowPressure at operationSite-3 is an impact consequence of lowLevel Day 0, Event Cycle 5: Occurrence of noWater at operationSite-3 is an impact consequence of lowLevel Day 0, Event Cycle 5: Occurrence of discolouredWater at operationSite-3 is an impact consequence of noWater Day 0, Event Cycle 5: Occurrence of lowPressure at operationSite-3 is an impact consequence of noWater Day 0, Event Cycle 5: Occurrence of contaminationOrPollutionIncident at operationSite-3 is an impact consequence of discolouredWater Day 0, Event Cycle 5: Occurrence of tasteOrOdorOfWater at operationSite-3 is an impact consequence of discolouredWater Day 0, Event Cycle 5: Occurrence of discolouredWater at operationSite-3 is an impact consequence of contaminationOrPollutionIncident centralSystems is instructing inspection of operationSite-3 controller-2 is using controller-2: model-2 conditions: [(eventOccurring lowLevel)] consequences: [(eventOccurring lowPressure) (eventOccurring noWater)] controller-2is reporting belief ['operationSite-3' (eventOccurring lowLevel)] to centralSystems centralSystems is instructing remedial action [repairLeak] at operationSite-3 Day 0 Event Cycle 5 The events at operationSite-3 are [lowPressure discolouredWater lowLevel noWater contaminationOrPollutionIncident tasteOrOdorOfWater] The remedial actions taken at day 0 event cycle 5 (the previous event cycle) were [repairLeak] The events eliminated were [lowPressure lowLevel noWater]

Figure 4: Excerpt from simulation transcript
(spontaneous occurrence of lowLevel at day 0)

Day 8, Event Cycle 15: Spontaneous occurrence of lowLevel at operationSite-1 Day 8, Event Cycle 15: Occurrence of noWater at operationSite-1 is an impact consequence of lowLevel Day 8, Event Cycle 15: Occurrence of discolouredWater at operationSite-1 is an impact consequence of noWater Day 8, Event Cycle 15: Occurrence of lowPressure at operationSite-1 is an impact consequence of noWater centralSystems is instructing inspection of operationSite-1 controller-3 is using controller-1(reported): model-1 conditions: [(eventOccurring lowLevel)] consequences: [(eventOccurring lowPressure) (eventOccurring noWater)] controller-3 is reporting belief ['operationSite-1' (eventOccurring lowLevel)] to centralSystems centralSystems is instructing remedial action [repairLeak] at operationSite-1 Day 8 Event Cycle 15 The events at operationSite-1 are [lowPressure lowLevel noWater discolouredWater] The remedial actions taken at day 8 event cycle 15 (the previous event cycle) were [repairLeak] The events eliminated were [lowPressure lowLevel noWater discolouredWater]

Figure 5: Excerpt from simulation transcript
(spontaneous occurrence of lowLevel at day 8)

4.8

The same model, formulated initially by controller-2, was the third best-endorsed model of the other two controllers. The identification of robust and simple relationships such as those associated with the occurrence of the lowLevel event are not necessarily a benefit. Indeed, they sometimes prevented the controller agents from developing more useful models for application in more complicated and difficult episodes. One such episode is reported in the transcript excerpt of Figure 6.

4.9

In Figure 6, the episode starts with a spontaneous occurrence of the event fire that has a number of immediate and then, in the subsequent event cycle, secondary consequences. The impact of the fire is to create the events disinfectionFailure, lowLevel, pumpFailure, contaminationOrPollutionIncident and noWater. These impact consequences themselves have impact consequences which, in this case, constitute additional causes of events resulting directly from the fire. controller-1's inspections leads that agent to apply the previously endorsed model controller-1: model-10 which is itself a specialised model relating lowLevel and contaminationOrPollutionIncident to the remaining events other than fire (and incorrectly including discolouredWater). Since fire is directly or indirectly a cause of all of the events, no event is remedied until the following event cycle when the model controller-1: model-3 relating fire to other events is applied. In the meantime, however, further secondary consequences of the original fire event and its impact effects are manifest which will require further action keeping the incident alive for several more event cycles.

4.10

The result of this complexity of actual cause-effect relations relative to the cognitive abilities of the controller agents is that no strategy to reduce the time required to remedy fire-induced episodes has ever been learned in the course of some 30 simulation runs with different assumptions about model sharing and model development.

4.11

In part, this failure of effective procedures to emerge for such cases is a result of the limitations of the cognitive behaviour ascribed to the cognitive agents in these simulation models. One natural extension, for example, would be to write rules for cognitive agents to identify sequences of mental models which were successful in eliminating events that became more complicated over time as a result of delayed consequences of earlier events or additional spontaneously generated events. Whether the controllers or some other agent should be given this responsibility is a matter for further experimentation and the results will relate clearly to the crisis management literature. The issues involved will become clear when we have analysed the cognitive activities of the individual controllers in these simulations.

Figure 6: Excerpt from simulation transcript
(spontaneous occurrence of fire at day 4)

Individual abilities and performance

4.12

Our assessment of the abilities of individual agents is taken from cognitive science. An individual is better able to function effectively in a domain of activity the better that individual "understands" the relationships in that domain. A "better understanding" connotes the recognition of which relationships are applicable or inapplicable and the ability to act more quickly because more complicated relationships are seen and used to determine successful activity in any given situation. At the same time, relationships are not more appropriate in any sense simply because they are more complicated. Indeed, one aspect of a "better understanding" is the choice of the right relationships of the degree of complexity necessary for correct action. In the simulation model reported here, correct action is action which leads to the elimination of critical incidents by remedying the events which cause and sustain those incidents. The agent's understanding is represented by that agent's models relating some events as causes to other events as effects. As is common in cognitive science, we represent the development of more sophisticated understanding as a process of chunking which is the combination of simple or elementary bits of knowledge into relationships and then combining those relationships into more complicated relationships. The standard reference is to Miller (1956) who argued that individual humans can hold a limited number (five to nine) chunks in short-term memory at one time but that with increased understanding each chunk contains more information. We therefore equate increased ability of the individual to understand and act within a domain of expertise with

increased ability specifically to know when to apply relationships,
the knowledge of a growing number of relevant relationships,
the knowledge of increasingly complicated relationships.

Figure 7: Problem space architecture of type Controller

4.13

The first of these criteria is represented in the simulation model by the process of model selection. This process fits into the goal structure of the agent. Agents of type Controller have the goal structure depicted in Figure 7. The adoption of the goal and each subgoal is explicitly specified by rules. So the problem space noAlarms is entered when there is an instruction from central systems. The problem space communicate is a direct consequence of the noAlarms problem space and getInstruction follows directly from communicate. Being in the problem space communicate and actually having an instruction cause the problem space executeInstruction to pertain. In order to execute the instruction, it is necessary to satisfy the goal decideAction which itself requires the goal selectModel to be achieved. Since the model selected determines the action to be decided, all of the procedural, cognitive work goes on in the selectModel problem space.

4.14

So far, all of this is consistent with the Soar and ACT-R theories of cognition. The difference comes with the representation of chunking which takes the form of model specialisation and endorsement. Specialised models are composed of more elaborate conditions in which they are to be applied and also more elaborate specifications of the implications of those conditions. If a more specialised, hence more complex, model is highly endorsed as reliable and important, then that model will be selected quickly in the appropriate circumstances. The increased complexity and speed of application is the essential effect of chunking.

4.15

In the selectModel problem space, single-cause models are candidate models if the cause is realized in the current state and the effects predicted by the model include the largest number of other currently realized events. All models specifying multiple causes are selection candidates provided that all of the causes specified by the model are currently realized and at least one of the effects of that collection of causal events is predicted by the model. The model to be used in deciding on the action to be taken is drawn from the set of candidate models with the probability proportional to its endorsement value relative to the endorsement values of the other candidate models.

4.16

The difference in results between simulations in which agents shared their understandings with one another and those in which they simply learned in parallel was not in what they learned but, rather, the efficiency of their individual cognitive behaviour.

4.17

This difference is seen in part by comparing Figure 8 with Figure 9. Figure 8 is taken from data generated in a simulation without model sharing by controllers while Figure 9 is taken from a simulation run with model sharing after the same number of simulation "days". The models are ranked by endorsement values and the values themselves are given by the heights of the bars in each graph. The numbers are not directly comparable here because a part of the model sharing involves the sharing of endorsements, thereby to inflate the endorsement values of the shared models relative to the models which are not shared. None the less, we see the same characteristic skewness in the distribution of endorsement values of models in both cases. Where we do see a difference is in the tail of the distribution. With model sharing, the number of models used by each controller is about half the number used by agents without model sharing. In Figure 9, the smallest number of models held by an agent was 18 as compared with 33 in Figure 8. The largest numbers of models held by a single controller were 23 and 54, respectively. One consequence of this increased focus on a smaller number of successful models is in the speed of simulations: notwithstanding the additional communication among agents and more highly articulated endorsement scheme, simulations without model-sharing run at the rate of about three simulation days per hour of computer time and simulations with model-sharing run at the rate of nearly six simulation days per hour.

4.18

Although each controller informs only the controller in the succeeding shift of models which have performed exactly as specified, they do end up with much the same rank order of shared models. We also find that the individual cognitive activities involved in specialising models is less important than the social activity of sharing models. For all three controllers, the most successful models, and therefore the models which determine the procedures of the organization, are shared models. A few specialised models are also shared but by and large the specialised models are the least well-endorsed. This is in part, of course, that simply because they are specialised and therefore not often used.

4.19

We conclude that, in the simulation experiments reported here, efficient organizational performance is enhanced by some sharing of cognitive representations of the problem space and that, for relatively simple problems, this is more important than increasing the cognitive capacities of the agents. At the same time, neither specialisation resulting from the combination of predictively successful agent models nor sharing are individually or together sufficient for organization-level learning to cope more efficiently with the more complex problems.

Figure 8: Model endorsement values by instances of Controller without communication between controllers

Figure 9: Model endorsement values by instances of Controller with communication between controllers

Conclusion and implications for further development

5.1: The results reported above demonstrate a technique for representing cognition that both captures essential features of computational cognitive science as represented by Soar and ACT-R and can be implemented to correspond to information obtained from domain experts. This technique has been effective in the analysis of organizational procedures for sharing experience locally.
5.2: This representation of cognition is effectively an explicit encoding of some of the literature on business strategy and organizations. The dominant managerial logic defined by Prahalad and Bettis (1986) "is a mind set or a world view or conceptualisation of the business and administrative tools to accomplish goals and make decisions.... It is stored as a shared cognitive map (or set of schemas) among the dominant coalition. It is expressed as a learned, problem-solving behaviour." The North West Water model clearly entails the schema and the learned, problem-solving behaviour which define the dominant logic in the sense of Prahalad and Bettis. In these models, moreover, the representations are well grounded in cognitive science (Soar and ACT-R). The process of model development by the agents representing the network controllers of North West Water also fits neatly within Huber's (1991) taxonomy of organizational learning. It amounts to knowledge acquisition through experiential learning by means of organizational experiments and self-appraisal. It is therefore reasonable to claim that the agent representations in the North West Water model complement and enhance the meaning of concepts expressed verbally in the softer end of the management sciences.
5.3: The results obtained with the North West Water model indicate a clear need for an investigation of appropriate organizational structures and procedures to deal with full-blown crises. When events interact and get beyond the control of agents working in the normal organizational environment, it is customary to convene a crisis management team. When asked what is expected to make such a team more effective in a crisis than the normal organization and procedures in those circumstances, managers in organizations typically in our experience claim that crisis management teams can wield more authority than operations managers or that they will have more information. These reasons might be true or false, partial or complete. Investigation of different compositions of organizational locations and procedures by means of simulation experiments will help to identify questions in more detail and to test putative answers before crises are actually encountered.
5.4: A virtue of the modelling techniques and agent representations reported here is the need to state explicitly the problem space architecture and the endorsement scheme representing planning and judgement in any decision-making environment. Empirically based models of crisis management will therefore require discussion with the managers involved to determine how they structure their approaches to dealing with crisis, represented by the problem space architecture, and the criteria they use and the relative importance of the difference criteria in deciding on the important features of a crisis and the various tasks to be undertaken in attempting to resolve the crisis. The process of building simulation models based on this information will support explicit analysis of crisis management procedures. The simulations with the resulting models will support the evaluation of the compatibility of existing systems with the crisis management procedures. These benefits follow directly from the representation of agent cognition in a way that is open to the use of domain expertise and the use of a modelling environment that supports such representations together with efficient representations of communication among agents within articulated social structures.

Acknowledgements

: I am very grateful to the editor and the anonymous referees for their helpful comments. These results were first presented at Kathleen Carley's CMOT Workshop in San Diego in 1997. Comments from the participants there were important in the development of my thinking on the matters of concern here. This paper could not have been written without the unstinting support of Geoff Miller of North West Water. None of the above are responsible for the remaining errors and omissions. The influence and advice of my colleagues, Bruce Edmonds, Steve Wallis and Helen Gaylard (now at UMIST) has been important in this and all my recent work.

Notes

¹ The SDML module containing this model is available by ftp (binary mode) from ftp://www.cpm.mmu.ac.uk/pub/scott/critical/nww.sdm . Instructions are in the readme.txt file in the same directory. To run the model, it is necessary to install SDML. For instructions on obtaining and running SDML, see http://www.cpm.mmu.ac.uk/sdml.

² It should be noted that clause definitions can be made private in which case they can be asserted and retrieved only by the agent on whose databases they are stored. Any such agent must be of the type for which the clause is defined.

References

ANDERSON, J.R. (1993), Rules of the Mind (Hillsdale NJ: Lawrence Erlbaum Associates).

CARLEY, K. M. and D. Svoboda (1996), "Modeling Organizational Adaptation as a Simulated Annealing Process," Sociological Methods and Research 25(1), pp. 138-168.

COHEN, P.R. (1985), Heuristic Reasoning: An Artificial Intelligence Approach (Boston: Pitman Advanced Publishing Program). COOPER, R., J. Fox, J. Farringdom and T. Shallice (1996), "A systematic methodology for cognitive modelling", Artificial Intelligence, v. 85, pp. 3-44.

HUBER, G.P. 1991, "Organizational learning: The contributing processes and the literatures", Organizational Science, v. 2, pp. 88-115.

JIN, Y. and R. Levitt (1996), "The Virtual Design Team: A computational Model of Project Organizations", Computational and Mathematical Organization Theory, v. 2, pp. 171-195.

LAIRD, J.E., A. Newell and P.S. Rosenbloom (1987), "Soar: An architecture for general intelligence", Artificial Intelligence, v. 33, pp. 1-64.

MILLER, G.A. (1956), "The magic number seven, plus or minus two: Some limits on our capacity for processing information", Psychological Review, v. 63, pp. 81-97.

MOSS, S. (1995), "Control metaphors in the modelling of decision-making behaviour", Computational Economics, v. 8, pp. 283-301.

MOSS, S. and Edmonds, B. (1997), "A knoweldge-based model of context-dependent attribute preferences for fast-moving consumer goods", Omega, v. 25, pp. 155-169.

MOSS, S. and Edmonds, B. (1998), "Modelling economic learning as modelling", Cybernetics and Systems, v. 29, pp. 215-247.

NEWELL, A.(1990), Unified Theories of Cognition, (Cambridge MA: Harvard University Press).

PRAHALAD, C.K. and R.A. Bettis (1986), "The dominant logic: A new linkage between diversity and performance", Strategic Management Journal, v.7, pp.485-501.

TAMBE, M. (1997) "Towards Flexible Teamwork", Journal of Artificial Intelligence Research, v. 7, pp. 83-124.

TAMBE, M. and P.S. Rosenbloom (1996) "Architectures for Agents that Track Other Agents in Multi-agent Worlds", Intelligent Agents, II, Springer Verlag Lecture Notes in Artificial Intelligence (LNAI 1037).

YE, M. and K.E. Carley (1995), "Radar Soar: towards and artificial organization composed of intelligent agents", Journal of Mathematical Sociology, v. 20, pp. 219-246.

Button Return to Contents of this issue