Theory Development Via Replicated Simulations and the Added Value of Standards

: Using the agent-based model of Miller et al. (2012), which depicts how different types of individuals’ memory affect the formation and performance of organizational routines, we show how a replicated simulation model can be used to develop theory. We also assess how standards, such as the ODD (Overview, Design concepts, and Details) protocol and DOE (design of experiments) principles, support the replication, evalua-tion, and further analysis of this model. Using the verified model, we conduct several simulation experiments as examples of different types of theory development. First, we show how previous theoretical insights can be generalized by investigating additional scenarios, such as mergers. Second, we show the potential of replicated simulation models for theory refinement, such as analyzing in-depth the relationship between memory functions and routine performance or routine adaptation. of ( 50). The results show that are involved in the problem-solving though the This offers


.
The replication aims to reproduce the output pattern of the original model (Grimm et al. ) as a criterion of success (Wilensky & Rand ). We further evaluate our replication according to the three-tier classification of Axelrod ( ): . The re-implemented model generates identical results to the original model. Such "numerical identity" is only possible with a model having no stochastic elements or using the same random number generator and seeds.
. The results of the re-implemented model do not statistically deviate from the original; they are "distributionally equivalent," which is su icient for most purposes.
. The results of the re-implemented model show "relational equivalence" to the results produced by the original model. This weakest level refers to models with approximately similar internal relationships among their results. For example, output functions may have comparable gradients but deviate statistically (e.g., di ering coe icients of determination). .
Additional DOE analysis (see Appendix C) allows examination "under the hood" of a simulation result. Opening the typically "black box" of simulation results allows systematic verification and validation, further increasing the credibility of the replication. Based on the replicated model, we perform additional experiments to complement and extend the results of Miller et al. ( ), thereby developing a deeper understanding of routines by analyzing agents' knowledge base and developing a broader understanding by modeling merging organizations and organizations operating in volatile environments.

Model Description
. A condensed model description follows below (for a full description, see the ODD protocol in Appendix A).
The model aims to show how cognitive properties of individuals and their distinct forms of memory a ect the formation and performance of organizational routines in environments characterized both by stability and by crisis (see also Miller et al. ).
. Table overviews the model parameters. Agents represent human individuals; together, they form an organization. By default, the organization comprises n agents. The organization must handle problems that it faces from its environment. A problem consists of a sequence of k di erent tasks (Miller et al. ).

Variable Description Value (Default)
n Number of agents in the organization , , k Number of di erent tasks in a problem a Task awareness of an agent , , p t Probability that an agent updates its transactive memory . , . , . , . p d Probability that an agent updates its declarative memory . , . , . , . w d Declarative memory capacity of an agent , , Table : Overview of model parameters as applied by Miller et al. ( ) .
Agents have di erent skills, though skills themselves are not varied. Each agent has the skill to perform a particular task (Miller et al. ). The number of agents equals at least the number of di erent tasks in a problem, thus ensuring that the organization is always capable of solving a problem. The number of agents can exceed the number of tasks (n > k), according to the parameter ranges (Miller et al. ). The k di erent skills are assumed to be distributed uniformly among the agents. .
Any agent is aware of a number a of randomly assigned tasks, and each agent is at least aware of the task the agent is skilled for (Miller et al. ). Agents can recognize tasks of which they are aware and are blind to unfamiliar tasks (Miller et al. ). Each agent is aware of a limited number of tasks in any problem (1 ≤ a ≤ k). .
Agents have a chance to memorize a subsequent task w d in their declarative memory once they have performed a task and handed the problem over to another agent, who then accomplishes the next task. An agent memorizes a task with a certain probability given by the variable p d . Additionally, agents can memorize the skills of other agents in their transactive memory. The number of agents and their skills which each agent can memorize is limited by the number of agents in the organization. By default, the probability p t is . that an agent will add an entry to transactive memory (Miller et al. ). .
Agents are distributed across the organization. Scale and distance are not modeled explicitly, but time is crucial. First, operationally, each organizational problem-solving process is time-consuming. Second, strategically, an organization that consecutively solves problems might form routines over time.
. Organizations have to perform the tasks in a given order to solve a problem. Once each task is performed, the problem is solved (Miller et al. ). The organization copes with several problems over time, whether recurring or changing in terms of the task sequence. .
Agents self-organize the problem-solving process (see Figure ) for given task sequences of the generated problems, except for the first task of each problem, which is always assigned to an agent that is aware of the task and has the required skill. An agent in charge of performing a task in a problem is also responsible for passing the next task in the sequence to another agent. Thus, the agent in charge might remember or must search for another agent that seems capable of handling the next task (Miller et al. ). As long as the performed task is not last in the problem sequence, each agent is responsible for advancing the solution by assigning an agent to the next task. Once a problem is solved, a new problem is generated, initiating a new problem-solving process (Miller et al. ).
Figure : Flow chart of an agent's behavior (according to Miller et al. ( ) and the improved conceptual design. We provide reasons for the highlighted changes in Section .

.
Organizational performance is measured by cycle time, calculated for each problem-solving process. Until a problem is solved, cycle time increases incrementally when agents (n) perform either necessary (n t ) or unnecessary (u t ) tasks and due to search costs (s t ) caused by unsuccessful random search attempts by agents. An organization achieves minimum cycle time if it only performs necessary tasks and if no search costs occur (Miller et al. ). The minimum cycle time equals the number of tasks in a problem.

Clarification of the Conceptual Model and Critical Reflections on the Design
. The ODD protocol enables standardized descriptions of agent-based models with the intent to increase the e iciency of communicating conceptual models and preventing ambiguous model descriptions (Grimm et al. , ). In particular, the ODD protocol fosters the clear, comprehensive, and non-overlapping model specifications required to replicate a model.

.
The ODD protocol can be used to transfer the unstructured, possibly scattered descriptions of a model into a standardized, accessible format for e icient subsequent consultation. A replicating modeler should avoid re-implementing a model from the original code to prevent bias (Wilensky & Rand ). Using the explicit intermediate result of the ODD protocol avoids this problem.
Experimental clarification of ambiguous model assumptions . We discovered an unclear assumption from the model description in Miller et al. ( ) when transferring their information into the structure of the ODD protocol. We clarified this ambiguity experimentally, without consulting the original code, to identify the underlying assumptions used in the original paper. The abstract model description also allows for model improvements without violating its original assumptions. .
Specifically, Miller et al. ( , p. ) state that the first task of a new problem is assigned at random to an agent that is skilled for this task. Hence, one can conclude that this statement is valid for each problem, although the modeled organization faces recurring problems by default. Another passage on changing problems makes this statement ambiguous, however: To simulate a one-time exogenous change in the organization's operating environment, we introduced a permanent change in the problem to be solved. For the st problem, the k (= ) tasks were randomly reordered, and the organization faced this new problem repeatedly for the remaining duration of a simulation run (Miller et al. , p. ).
. This passage suggests that new problems are characterized by reordered task sequences. Hence, one can also conclude that recurring problems are not new problems. This opens two di erent model assumptions: A. The first task of each problem is assigned to an agent who is skilled in that task.
B. Only the first task of a changed problem with reordered task sequence is assigned to an agent who is skilled in that task.
.  . While a model description in the ODD format cannot protect against all ambiguities, it does make models' conceptual foundations more explicit. The overall value of the standardized, ODD model description has been comprehensively discussed elsewhere (Grimm et al. ); here, we particularly emphasize its value for replication. Our precisely formulated submodel descriptions form a solid basis for writing corresponding functions in the NetLogo code. The model description in the ODD format explicitly expresses the formerly ambiguous assumption (see Appendix A, ODD Protocol, Submodels, problem generation, and task assignment). The final ODD description comprehensively specifies the model in an acknowledged format, which both helps other scholars to understand more precisely the model of Miller et al. ( ) and provides a solid ground for further extensions.
Critical reflections on the conceptual design .
Transferring information from the conceptual model into the ODD structure enhanced our understanding of the model, and subsequent pretests revealed two opportunities for improvement.
. Figure highlights the first improvement. This modification does not break any model assumptions. In the modified flow chart, an agent searches randomly for agents until one accepts the problem. In the original model, a failed random search attempt results in repetitive scrutiny of the task and consultation of memory. This does not change the agent's cognitive state, again resulting in a random search.
. Second, we argue that random search can be more sophisticated. The original random search is designed as an urn model with replacement. The active agent randomly approaches other agents that might be able to perform the requested task or that can help the searching agent by making a referral to another skilled agent. A er an unsuccessful search, the agent again searches randomly among all agents. Hence, the searching agent might approach the same agent again, implying that the searching agent would not remember which agents were approached unsuccessfully before. This assumption is counterintuitive and empirically unlikely. On the one hand, agents in general can remember other agents and their skills. On the other hand, agents do not remember meeting an approached agent during a search attempt. An alternative model design could be tested in which agents are also able to learn from an unsuccessful random search attempt. An alternative urn model without replacement would reduce the search costs and cycle time of a problem-solving process. .
Overall, using the ODD protocol helped to define the conceptual model and revealed where the original model description allowed two contradictory assumptions. Furthermore, the ODD structure helped to identify opportunities for model improvements without violating the initial assumptions and highlighted alternative model designs that extend the original model.

Using DOE Principles to Evaluate the Replicated Model
. Since the simulation model has stochastic elements, the results reported risk being unrepresentative, which could threaten the reliability of conclusions drawn from the simulation experiments. DOE principles, therefore, demand specification of the required number of runs based on the coe icient of variation for the performed experiments, which allows consideration of stochastically induced variation and thereby enhances the credibility of results.
. Our design incorporates low (L), medium (M), and high (H) factor levels, as highlighted in Table . These three design points reflect the applied settings to estimate the number of simulation runs needed to produce su iciently robust results given model properties and stochasticity. Table shows the error variance matrix with mean values and coe icients of variation for design point M (for the full error variance matrix, see Appendix C). We measured cycle time at five selected steps during the simulation runs, namely when the problems (P) , , , , and are solved, to account for the dynamic characteristic of the dependent variable. The coe icient of variation (c v ) is calculated as the standard deviation (σ) divided by the arithmetic mean (µ) of a specific number of runs (Lorscheid et al. ). The cycle times in Table result  from di erent number of simulation runs ranging between and , . The coe icients of variation stabilize with increasing number of runs at about , runs; the mean values and coe icients of variation change only slightly from , to , runs. We therefore conclude that , runs are su icient to produce robust results.
With significant error variance detected for simulation runs, results averaged over runs or fewer should be carefully interpreted. Regarding the cycle time for the th problem, the coe icient of variation is . for runs and . for , runs, which is a considerable di erence. Visual comparison of experimental results based on averaged runs is thus imprecise and error-prone compared to a comparison based on , simulation runs. .
A high number of simulation runs also confirm the expected values for cycle time as determined analytically (see Appendix D), which o ers further evidence that the conceptual model is implemented correctly. The analytically calculated cycle time for the first problem-solving instance (P ) of the medium-sized organization (n = ) is . , and the simulated average cycle time over , runs is close to this at . . Such an approximate "numerical identity" is also found for a small organization (n = ), with expected and simulated cycle times of . and . , respectively, and for a large organization (n = ), with anticipated and simulated cycle times of . and . , respectively (see Appendix C).

Number of runs
.
To illustrate the value of defining the number of runs based on the coe icient of variation, we o er the following example. Miller et al. ( ) model in their final experiment an external change to and simultaneous downsizing of an organization; downsizing is thus modeled as a response to external change. The organization faces a changed problem once the th recurrent problem is solved. At the same time, the organization is downsized from (n = ) to (n = ) and from (n = ) to (n = ) agents.
. Figure shows the considerable increase in cycle time a er simultaneous problem change and downsizing.
In terms of cycle time, the organization that continuously operates with agents peaks at , whereas the downsized organization of members peaks at , and the downsized organization with ten members peaks at . Hence, downsizing initially interferes with organizational performance (see also Miller et al. ). The organization lost experienced members and their crucial knowledge for coordinating activities. .
Although the averaged results of simulation runs suggest that downsized organizations potentially learn more quickly in the new situation, no reliable statement can be made about which organization performs better a er the change. An increased number of runs enables more detailed interpretation (see Figure ). The heavily downsized organization with only ten remaining members shows the highest performance a er the change. At first, the heavily downsized organization performs worst, but learns much faster to handle the new situation. Still, none of the organizations regain optimal performance. This suggests that smaller organizations are more agile in creating a new knowledge network among agents. .
In line with this example, we have replicated each experiment of Miller et al. ( ) with runs and with , runs (see Appendix B). The results, while qualitatively identical, nevertheless slightly di er quantitatively, which is likely driven by stochasticity. Based on the qualitative equivalence of the results, especially regarding the patterns in behavior a er problem changes and downsizing, we conclude that the original model and our replication have identical assumptions. .
The simulation results show high variance derived from model stochasticity (for a detailed analysis, see Appendix C). We defined the coe icient of variation to improve our understanding of the model's behavior and assess the precision of both our results and those as published by Miller et al. ( ). Calculation of e ect sizes and interaction e ects (see Appendix C) further deepened our understanding of the model's behavior, o ering still further evidence that both models behave identical.

.
Overall, applying DOE principles enabled us to analyze the model's behavior systematically. For evaluating the replicated model, we found it crucial to determine the number of runs and understand stochastically induced variance. The replicated model produces quantitatively similar and qualitatively identical results. According to the classification of Axelrod ( ), the results are "relationally equivalent" and hint overall at "distributional equivalence" once error variance is taken into account. Hence, we conclude the model is replicated successfully. Constructs, propositions that link those constructs together, logical arguments that explain the underlying theoretical rationale for the propositions, and assumptions that define the scope or boundary conditions of the theory. Consistent with these views, we define theory as consisting of constructs linked together by propositions that have an underlying, coherent logic and related assumptions.

Miller et al. (
) address the theory of routines by Feldman & Pentland ( ), which states a reciprocal relationship between the performative and ostensive aspects of routines. The interaction between these two aspects, however, is only vaguely understood, with only partial empirical grounding (Biesenthal et al. ). Formal modeling provides the means to investigate underlying mechanisms by operationalizing theoretical constructs. In this respect, Miller et al. ( ) operationalize the dynamic interdependence of actions and memory distributed across an organization. In their computational representation, routines' ostensive aspect is constructed via three types of memory residing in individuals distributed across the organization. As individuals draw on their memory to solve incoming problem sequences, the performative aspect of routines is made observable.
. Davis et al. ( ) suggested a roadmap for developing theory using simulations, including the vital step of experimentation given the traditional strengths of a simulation: testing in a safe environment, low costs to explore experimental settings, and high experimental precision. New theoretical insights may be thereby generated by unpacking or varying the value of constructs, modifying assumptions, or adding new features to the computational representation. .
We proceed from our successful replication to this crucial step of experimentation, developing theory in the following three ways: extension, in-depth analysis, and theoretical connection. First, we extend the model by exploring a merger in addition to the downsizing analyzed in the original study. By adding another scenario of external change, we extend the scope or boundary conditions and therefore further generalize the theory. Second, we analyze the model more deeply to show how an initial problem leads to a traceable path dependency in routine formation, gaining nuance on how memory functions a ect the formation of routines. We thus unpack the theoretical constructs analytically rather than representationally. Third and finally, we elucidate connections to dynamic capabilities, taking our new insights back to the literature to look for intertwined processes not previously considered. In brief, we uncover the path dependency of routines (Vergne & Durand ), look for related theory, identify the concept of dynamic capabilities, and extend the experiment to investigate this concept in more detail. .
The model simulates organizational routines, which Feldman & Pentland ( , p. ) define as "repetitive, recognizable patterns of interdependent action, involving multiple actors. " Feldman & Pentland ( ) conceptualized routines as adhering to recursively connected performative and ostensive aspects, which helps explain the mechanisms of stability and change. The ostensive aspect embodies the abstract, stable idea of a routine, while the performative aspect embodies the patterns of action individuals perform at specific times and places (Feldman & Pentland ).
. Hodgson ( ) suggested defining routines as capabilities because of their inherent potential. The capabilities they generate are innate to organizations' ambidextrous capabilities to balance the exploitation of existent competencies with the exploration of new opportunities (Carayannis et al. ). On the one hand, organizational performance is contingent on exploration so that the organization can remain competitive in the face of changing demands. On the other hand, organizational performance is contingent on the capability to exploit resources and knowledge. The latter type of performance can be measured in terms of e iciency, that is, a reduction in cycle time by drawing on past experience (Lubatkin et al. ). Ambidexterity is usually related to fundamental measures of success such as firm survival, resistance to crises, and corporate reputation (Raisch et al. ).
. Organizations' ability to operate in a specific environmental setting is determined by the suitability of their routine portfolios (Aggarwal et al. ; Nelson & Winter ). Routines facilitate e iciency, stability, robustness, and resilience (Feldman & Rafaeli ); innovation (Carayannis et al. ); and variation, flexibility, and adaptability (Farjoun ). An underlying assumption is that organizations achieve optimal performance by finding appropriate responses to changes in the environment. Hence, organizations aim to align external problems with internal problem-solving procedures so they may respond adequately to their environment and maintain equilibrium between internal (organizational) and external (environmental) aspects (Roh et al. ).

Generalizing theory: Routine disruptions when organizations merge .
Besides downsizing -which Miller et al. ( ) studied, as mentioned above -mergers are another frequent activity by which organizations respond to external changes (Andrade et al. ; Bena & Li ). Because mergers require the integration of new personnel, human resource issues are critical, but the literature on mergers and acquisitions o en neglects this aspect (Sarala et al. ). Therefore, to complement the experimental results of Miller et al. ( ) concerning downsizing, we investigate a merger scenario to generalize the understanding of routine disruptions. .
Organizations comprise personnel with di erent experiences, which, as indicated by previous results, are crucial to form routines. Thus, we expect that integrating new sta , whether experienced or inexperienced, a ects post-merger routine performance. We model untrained employees as agents with empty declarative memory (a) and model experienced employees as agents with randomly replenished declarative memories (b), thereby assuming that agents have some operational knowledge. Figure depicts organizational performance under di erent post-merger processes of routine formation. The following analysis models the merger activity as an organization's response to an external shock, as reflected by a change in problem.

Figure :
Routine disruptions in a merger and acquisition scenario. Note: Each case is averaged over , simulation runs. The initial organization comprises ten agents. Remaining parameters are set according to their defaults. The organization acquires new personnel (n = ) at the fi ieth problem-solving instance; in Case , the problem changes at the same time. The solid lines represent merger type a, when the new agents have empty declarative memories. .
Case represents an organization that integrates new personnel in stable environmental conditions. This integration initially disrupts the original routines whether the new personnel members are inexperienced (a) or experienced (b), which negatively a ects organizational performance in a similar pattern as downsizing, albeit less intensively (see Appendix B). The integration of inexperienced personnel (Case a) allows organizations to form new routines with optimal performance, suggesting that the new sta adopt the lived routines. In contrast, the integration of experienced personnel (Case b) results in lower organizational performance, even in the long run; the new sta does not completely unlearn obsolete sequences of task accomplishment.
. Case represents an organization that integrates new personnel in response to an external shock, as reflected by a problem change. The change and simultaneous integration of new personnel force the organization to learn new routines. The learning curves of merged organizations are quite similar to those of downsized organizations (see Appendix B). Organizations with new, inexperienced personnel (Case a) perform worse, suggesting that the new sta is not well integrated; organizational behavior is predominantly determined by core personnel (n = ). On the other hand, organizations integrating experienced personnel (Case b) can form routines that result in optimal performance. .
We can now generalize that mergers and downsized organizations show similar patterns in organizational performance (see Appendix B); both involve disrupted routines. Comparison between Cases and shows the conditions under which merging organizations can develop e icient routines. The finding that mergers can initially decay adherence to routine agrees with empirical results (see, e.g. Anand et al. ). Moreover, the literature on successful mergers highlights the importance of forming new, high-order routines that can resist blocking e ects from existing routines; successful mergers can then, a erward, realize radical innovations (Heimeriks et al. ; Lin et al. ). In other words, the success of a merger depends on individuals' experience, as this a ects whether lived routines can be maintained and whether new e icient routines can be formed. .
In conclusion, organizations that downsize or merge as a response to an external shock stimulate the formation of new routines. We found that both downsizing and merging initially reinforce the disruption of established routines. Loss of organizational knowledge initially reduces performance in downsized organizations, but such organizations quickly form new, e icient routines. In a complementary finding, Brauer & Laamanen ( ) found that the pressure of downsizing on the remaining individuals forces them to engage in path-breaking cognitive e orts that can lead to better results than the repair of routines by drawing on experience. In a further generalization of the ideas Miller et al. ( ) presented, we conclude that the routines of organizations are similarly a ected when organizations downsize or merge in response to an external shock.

Deeper analysis: Routine persistence in organizations facing volatility .
If routines are a recurrent pattern of actions, the question remains which patterns can emerge. An appropri-ate organizational routine matches the task sequence of the problem at hand. Some less e icient organizations, however, struggle to coordinate their activities with the problem. In particular, inappropriate behavior by agents might create unnecessary activity. .
To explore the link between the behavior of individuals and emerging routines, we performed an experiment in which an organization again begins by facing recurrent problems. The organization thereby has the chance to form a routine. Therea er, it faces di erent problems, each characterized by a new, randomly shu led task sequence. Hence, the modeled organization must adapt to multiple, distinct problems. At the end of the simulation, in the th problem-solving instance, we measure the frequency of emerging patterns of actions to investigate whether the organization has unlearned the routine, initially developed over the first problems, that has since become obsolete. .
Table shows the frequencies of subsequently performed tasks by the organization. The matrix contains the relative frequency of performed actions as measured on the th problem-solving instance of a simulation, averaged over , runs. The actions that the organization performs to solve the generated problems comprise necessary ( %) and unnecessary actions ( %). Most combinations of subsequent, accomplished tasks occur similarly o en, with a probability of around %. However, a few interdependent actions have a likelihood of emerging around %. These correspond to the subsequent, ordered tasks of the initial problem. Table : Occurrence probabilities of recurrent patterns of interdependent actions. Note: The rows indicate the performed tasks and the columns indicate the task subsequently performed. The values indicate the probability frequency that one task is performed a er another, calculated as: P (E) = n(E) N ×100 where n(E) is the number of trials in which event E occurred and N is the total number of trials. The occurrence probability that tasks are immediately repeated is very low. Agents with a misleading notion of what to do can get stuck in loops in which the problem is passed between agents. Such loops are broken in the model. Therefore, we exclude entries on the matrix diagonal for calculating the occurrence probabilities.

Subsequent performed task
.
The initially learned routine (to solve the recurrent problems numbered to ) persists. Although the organization copes more recently with diverse situations (random problems to ), the prior, learned behavior of the organization remains traceable. This persistence of organizational behavior matches the detected behavior of individuals (see Appendix E). Individuals and the organization maintain obsolete knowledge, implying that an organization's past pattern of action partially persists. Recurrent patterns of interdependent actions reduce organizational performance if these actions do not match the situation at hand. Developed routines can be detrimental when an organization faces change. .
The development of organizational capabilities in terms of routines is path dependent (Aggarwal et al. ). The results of a similarly designed experiment o er further support. When the organization exclusively copes with di erent problems, the original action pattern remains traceable (see Appendix F). Therefore, one might consider this development of organizational capabilities to be path dependent. This is in line with some scholars position, portraying routines as organizational dispositions or even genes. However, conceptualizing routines as dispositions is untenable, because other factors, such as individuals' high task awareness, can prevent the persistence of routines (see Appendix C).

Refining simple theory: Dynamic capabilities .
If processes of knowledge integration could provide micro-foundations for dynamic capabilities, the model of Miller et al. ( ) resembles knowledge-integration routines, conceptualizing an individual's memory as three di erent types or functions. The distinct properties of an agent's memory function correspond to distributed, specialized knowledge in a firm. To solve collective problems, agents coordinate their actions based on their memory functions. The ability to learn from previous actions leads to the development of routines with recurring properties for problem-solving, with the formation and performance of these routines a ected by distinct properties of individual's memory. .
We found that an initial problem leads to traceable path dependency in the routine-formation process, which prevents an organization from again reaching initially achieved cycle times a er an external shock and thereby constituting a natural limitation on dynamic capabilities. This newly gained insight motivates a closer investigation of the e ects of such path dependencies on dynamic capabilities, using our replicated model.

.
The model enables interpretations from an operational and strategic perspective. On an operational level, a change in problem decreases organizational performance because established working procedures become obsolete and forming new routines requires search costs. This consideration is short term, however. On a strategic level, organizations that face environmental changes have the opportunity to learn; in the long run, the experience thus gained might improve their capability to handle such changes. .
In Figure , an organization learns sequentially over ten di erent problems with problem instances each, highlighting the organization's performance on both levels. The individuals in the organization search for new paths to adapt their activities to new situations induced by the problem changes. The organization thereby develops operational capabilities to reduce the cycle time between problem changes and gains a dynamic capability over the long run to manage external changes. Figure : Strategic and operational perspectives on organizational routines. Note: The learning curves on the operational level result from , simulation runs for the default parameter setting and at three sizes of organization (n). Each run covers ten problem changes induced at discrete steps of problems. The learning curve on the strategic level is the interpolated result from the peak cycle times of the operational curves of the default-sized organization (n = ). .
The organization's dynamic capability emerges from the cognitive properties of individuals. The development of such dynamic capabilities has, according to the model design, two prerequisites. First, individuals can revise their declarative memories so that they can change their learned problem-solving sequence. Second, the internal sta ing structures of the organization are non-rigid. The more individuals are forced to search for new paths to solve problems, the more likely they are to search for and randomly meet other individuals. This yields an experienced organization comprising members who know each other very well. The organization exploits this knowledge when it faces a change. Modeled here is an ambidextrous organization that can both exploit acquired knowledge and explore new paths. .
Organizations that recurrently encounter external changes develop dynamic capabilities that enable them to handle changes in a experienced manner, which enhances their operational performance during crisis-like events. Overall, the simulation o ers evidence that organizations can form both dynamic and operational capabilities based on routines formed through individual's memory functions. In the long run, organizations that regularly form new routines develop dynamic capabilities. Given this result, we hypothesize that even an organization operating in a highly volatile environment can form routines.
. Therefore, we model a volatile environment using continuous changes in problem. Figure shows the averaged results over , simulation runs for three di erent organizations operating in volatile environments. We set the model parameters to the defaults except for the memory update probabilities of individuals. The organization without memory (p t = and p d = ) is unable to learn and solves problems exclusively through random search, which results in consistently poor performance over time. The simulated cycle time is approximately . , which tracks the analytically determined cycle time (see Appendix D). The organization with transactive and declarative memory (p t = . and p d = . ) can learn and performs better over the long run. The organization with transactive memory but without declarative memory (p t = . and p d = . ) shows, in the long run, the best operational performance in the volatile environment. .
The results suggest that organizations can learn and form routines, even in volatile environments. Routines may be flexibly enacted based on organizational experience through mechanisms that can be explained by incorporating the previous findings. .
Transactive memory allows agents to learn about the skills of their colleagues, implementing a network for who knows what. Continuously changing problems force agents to coordinate to accomplish tasks, which teaches agents about the skills of multiple colleagues. Agents in charge of but not skilled at or aware of a task draw on their personally developed networks. Most agents, by gaining experience over time, develop such networks, which are interrelated. They allow the organization to retrieve distributed knowledge and flexibly coordinate whichever activities are appropriate to the current situation.
. Agents' declarative memory negatively a ects organizational performance in the midst of volatility, standing in contrast to its positive e ect in stable environments. Besides their personal networks, agents' actions also result from their learned problem-solving sequence, which becomes inappropriate when tasks change. The resulting behavior is then detrimental to organizational performance and perturbs the formation of e icient routines. .
In summary, individuals' learning capabilities enable organizations to form e icient (meta)routines, independent of environmental conditions. The performance of organizations in terms of learning varies with the type of memory combined with the type of environment. The particular e ect of transactive memory was highlighted in a follow-up study by Miller et al. ( ), which applied a similar model design. Investigating organizations operating in volatile environments, we found that individuals' transactive memory enables organizations to develop dynamic capabilities, while their declarative memory can weaken that e ect. .
Overall, our results show that individual and organizational learning are antecedents of the development of both routines and dynamic capabilities in organizations, as Argote ( ) had postulated. Individuals in an organization learn problem-solving sequences and apply their knowledge, which is a prerequisite for the formation of routines. This positively a ects organizational performance as long as the organization operates in a stable environment. However, a learned problem-solving sequence is detrimental to organizational performance when conditions change, although this detrimental e ect is not necessarily linear, because interactions among individuals can compensate for some problem-inappropriate behavior. .
Routines are related to the concepts of cognitive e iciency and the complexity of problem-solving processes (Feldman & Pentland ), but existing literature has not examined whether environmental shocks and volatility counter the cognitive e iciency generated by organizational routines (Billinger et al. ). Using the replicated model, we demonstrated that organizations can form routines while operating in volatile environments. When problems change frequently or continuously, such (meta)routines are not detectable merely based on observable patterns of action.

Conclusion
. This paper used a replication of a simulation model, namely that of Miller et al. ( ), to develop theory, and demonstrated the benefit of using standards, such as ODD and DOE, in the replication process. Our replicated model produces quantitatively similar and qualitatively identical results that are "relationally equivalent" and hint overall at "distributional equivalence," following the classification of Axelrod ( ). .
Replications of simulation models must rely on published conceptual model descriptions, which are o en not straightforward (Will & Hegselmann ), even for a relatively simple model, as was the case here. The use of the ODD protocol fosters a full model description through its sophisticated, standardized structure. It is an explicit intermediate result that provides a steppingstone in the replication process (Thiele & Grimm ). Transferring the original model description published by Miller et al. ( ) into the ODD format helped to identify formally ambiguous assumptions that we subsequently clarified during pretests with the re-implemented model. .
The application of DOE principles was also helpful in several respects. The original model results were unavailable as raw data, presented mainly graphically, averaged over simulation runs, and subject to stochastic influences. Using the DOE principles suggested by Lorscheid et al. ( ), we quantified statistical errors to determine , simulation runs as an appropriate number enabling reliable visual comparison of graphically depicted outputs. The results of the replicated model generated on this basis match those highlighted by Miller et al. ( ). Hence, we primarily exclude errors due to stochasticity in the replicated results. Moreover, the application of the DOE principles yielded insight into model behavior and validated simulation results against the conceptual model. Analyses of the original code further increased the credibility of the replication.

.
Our successfully replicated and then verified model o ered a solid foundation for further extensions and experiments to develop and refine theory. First, we generalized previous theoretical insights by investigating a merger scenario in addition to the downsizing scenario examined in the original paper, finding a similar qualitative pattern for both. Either disrupts an organization's established routines, initially reducing performance due to lost organizational knowledge, but organizations can quickly form new, e icient routines. Second, we illustrate how replicated simulation models may be used to refine theory, such as analyzing in-depth the relationship between memory functions and the performance of routines. In this respect we show that initially learned routines persist, locating their path dependence in the memory functions of individuals. Progressing from this finding, new experiments with multiple problem changes allow us to clarify and formally specify a potential mechanism (Smaldino et al. ) underlying the still actively debated theoretical concept of dynamic capabilities. Here, given the longitudinal and processual character of the concept, as well as the fact that empirical data are challenging to obtain, simulations o er comparative methodological advantages (Davis et al. ). .
Some limitations exist, as well. We document the benefits of using the ODD protocol and DOE principles with respect to a replication endeavor. Also, as discussed above, we used quite a large number of runs to obtain stable results. The model's abstract design enables general interpretations, but its assumptions have not been validated empirically. Moreover, we investigate dynamic capabilities with respect to knowledge integration, but the foundations of the concept of dynamic capabilities are not restricted to this respect. Nevertheless, the agent-based model depicts a potential fundamental mechanism for routine formation and what a ects their performance. .
The model suggests promising directions to explore in future research on organizational routines. First, the performance of routines that organizations enact to handle volatility could be empirically investigated. Second, regarding model design, future research could test additional submodels. For example, agents' search could be modeled as an urn model without replacement, which would reduce organizations' search costs and cycle times. Third, regarding the use of the ODD protocol and DOE principles in model replications, we suggest further testing of these standards in future replication studies to more broadly establish their benefits.

Appendix A: ODD protocol Purpose
The model aims to show how the cognitive properties of individuals and their distinct types of memory a ect the formation and performance of organizational routines in environments characterized by stability, crisis (see Miller et al. ) and volatility.

Entities, state variables, and scales
Entities in the model are agents, representing human individuals. The collective of agents forms an organization. Table reports the model parameters. The global variables are the numbers of agents and tasks. By default, the organization comprises (n = ) agents. The organization faces problems from its environment. A problem involves a sequence of (k = ) di erent tasks (Miller et al. ). The organization must perform the tasks in a given order to solve a problem; the order of tasks defines the abstract problem in terms of its complete solution process. Once the organization performs each task in the required sequence, the problem is solved (Miller et al. ). The organization solves several problems over time, which can either recur or change in terms of the required task sequence. The time an organization requires to solve a problem is defined as cycle time (Miller et al. ), which represents organizational performance.

Variable Description Value (Default)
n Number of agents in the organization , , k Number of di erent tasks in a problem a Task awareness of an agent , , p t Probability that an agent updates its transactive memory . , . , . , . p d Probability that an agent updates its declarative memory . , . , . , . w d Declarative memory capacity of an agent , , Table further defines the individual variables used to set agent behavior. Agents are heterogeneous in terms of skill, but the skills themselves are not varied and are thus not reflected by a variable. Each agent has a particular skill stored in its procedural memory that enables the agent to perform a specific task (Miller et al. ). On the one hand, the number of agents equals at least the number of di erent tasks in a problem, thus ensuring that an organization can always solve a problem, if the organization can organize the task accomplishment in the defined sequential order. On the other hand, the number of agents can exceed the number of tasks (n > k) (Miller et al. ). In such cases, the k di erent skills are assumed to be uniformly distributed among the agents.
Any agent is aware of a randomly assigned tasks (Miller et al. ). Each agent is aware of a limited number of tasks of a problem (1 ≤ a ≤ k). An agent's awareness set contains at least the task for which they are skilled, thus assuming that agents who can perform a specific task are also capable of recognizing this task. Agents are otherwise blind to unfamiliar tasks (Miller et al. ).
Declarative memory enables agents to memorize the subsequently assigned task once they have performed their task. Agents have limited declarative memory capacity (w d = ) and memorize a task with a probability set by the variable (p d = . ) (Miller et al. ). Further, agents can memorize the skills of other agents in their transactive memory. The number of agents and their skills which each agent can memorize is limited by the number of agents in the organization. The probability that an agent adds an entry to transactive memory is defined by the parameter (p t = . ) (Miller et al. ).
The agents are distributed across the organization. Scale and distance are not modeled explicitly, but time is crucial in two ways. On an operational dimension, the problem-solving process requires the accomplishment of tasks, as measured by the cycle time. An organization that consecutively solves problems over time might form routines.

Process overview and scheduling
The organization faces consecutive occurring problems. The generated problems trigger organizational activities. Except for the first task of each problem, the agents self-organize the problem-solving processes given the task sequences of the generated problems. The first task in each task sequence is assigned to an agent that is skilled to perform the task. An agent in charge of performing a task in a problem is also responsible for passing the next task in the sequence to another agent. Thus, the agent in charge might remember or must search for another agent that seems capable of handling the next task (Miller et al. ). Then, the agent in charge hands the problem over to the identified agent, who then becomes in charge of the problem (Miller et al. ).
Figure depicts the schedule that an agent follows when in charge of a problem. An agent first scrutinizes the task. If the agent is aware of and skilled for the task, the agent updates its declarative memory and perform the necessary task. The agent then advances to the next task if the problem has not yet been solved (Miller et al. ).
An agent that lacks the skill to perform the task at hand starts a local search process. An agent that is aware of the task but not skilled consults its transactive memory. If the transactive memory reveals another agent skilled to perform the required task, the searching agent tries to hand the task o to this agent. An agent that is unaware of a task consults their declarative memory, which might reveal a task that is usually due. If declarative memory indicates a task (what usually should be done), the agent further consults the transactive memory (of who has the appropriate skill) to hand the task over to a skilled agent. If this local search is unsuccessful or if an agent's memory is undeveloped, the agent proceeds with a distance search process to hando the problem (Miller et al. ).
Distance search involves a random search for a skilled agent to hand over the problem. If the searching agent finds a skilled agent, the agent updates the respective types of memory and hands o the problem. An approached agent without the skill required for the task of the searching agent might nevertheless be able to make a referral to another agent. In this case, the searching agent hands o the task to the referred agent and updates the transactive and declarative memory (Miller et al. ). An unsuccessful search attempt results in a new random search. ).
As long as the performed task is not last in the problem, an agent advances to the next task of the problem. Once a problem is solved, a new problem is generated and a new problem-solving process is initiated (Miller et al. ).

Design concepts Basic principle
The model design is abstract. Conceptually, it proceeds from the idea that organizational routines form as a result of individuals' cognitive properties and activities. The model is designed from the perspective of distributed cognition: the individuals are distributed and have distinct properties. The model assumes that individuals selforganize the problem-solving process and adapt their behavior to recurrent or di erent problems. Individuals can learn, which a ects the coordination of activities and organizational performance.

Emergence
Organizational routines emerge from individuals' initially independent skill sets and capacities (Winter ). The micro-foundations on the individual level are thus well-reasoned and explicitly modeled. Organizational macro-behavior is not explicitly modeled. The organizational behavior that emerges from the properties of the individuals is analyzed. The presumed emergent phenomenon is that the modeled organization develops routines over time.

Adaption
The individuals in the organization adapt their activities to recurrent and changing problems. Recurrent problems reflect stable environmental conditions. In this case, the organization adapts to the problem by forming a routine. A crisis event is modeled as a one-time change in problem, which forces the organization to adapt to the new situation and learn a new routine. A volatile environment is modeled as a continuously changing problem, in which the organization has to cope with varying conditions. The organization might even instantiate routines to operate e iciently in such a volatile environment. In terms of the flexible use of action patterns and their adaption to certain situations, routine dynamic can be traced back to individuals (Howard-Grenville ). Since individuals perform activities contingent on their situations, routines can be applied flexibly in a volatile environment (Adler et al. ; Bogner & Barr ).

Objectives
The organizations' objective is to organize the problem-solving process as e iciently as possible in terms of cycle time. Agents' primary objective is to perform tasks and to organize the problem-solving process. Overall, agents follow this objective to ensure the completion of all task sequences for each occurring problem.

Learning
Learning is an important design concept. On the individual level, agents have three types of memory: procedural, transactive, and declarative. In procedural memory, agents store their skill (Miller et al. ). According to the model design, each agent owns one skill; agents are assumed to have learned the skill in prior training. Agents do not learn new skills; agents are assumed to be specialists in their roles. Agents learn through their transactive and declarative memory. Transactive memory allows the agents to store who knows what in the organization. Declarative memory enables the agents to learn what should usually be done given a problem's task sequence. On the macro level, the organization can learn to handle problems in a routinized manner.

Prediction
Prediction by agents in the organization is only implicitly modeled. Agents that are not aware of a task at hand try to predict the task from the information in their declarative memory. Agents not skilled to perform the task at hand try to predict who else in the organization is skilled to perform that task. This prediction is based on their transactive memory.

Sensing
On the macro level, the organization senses problems. On the micro level, agents sense tasks. Their awareness models their sensing capabilities. Organizational sensing capabilities depend on the organization's ability to include task-aware agents in the problem-solving process at the right time.

Interaction
Interaction between two agents is communication in which they can exchange information about a task, their skills, and the skills of other agents. Communication can also result in a problem being handed over between the agents. This interaction is not explicitly modeled, task hando s may encompass communication or a virtual or physical exchange of work in progress. Moreover, a er task handover, the transmitting agent can observe the actions of the receiving agent. Some scholars of social cognition distinguish such social observations from interactions (see e.g. Tylén et al. ), but one might indeed consider this observation to be an indirect interaction.

Stochasticity
Organizations that regularly face problems from the environment might not be aware of the specifics of a particular problem. This is reflected by stochasticity. Moreover, an organizational member who searches for a colleague with a particular skill but has no clue whom to ask will ask randomly chosen colleagues. This random choice is also modeled as stochasticity.

Collectives
An organization is the resulting collection of individuals, the personnel at a company. Furthermore, within the organization, small collectives or dyads can form. Dyads form, for example, when an agent interacts with another agent to hand over a task or to exchange information about other colleagues.

Observations
The performance of an organization is observable as cycle time. The model allows the observation of cycle time under di erent conditions, since the problem and the parameters of the individuals can be varied.

Initialization
The model is initialized according to the variable settings. A er agents of the organization are created, an initial problem is generated. The model is generic and requires no input.

Submodels Problem generation and task assignment
The problems comprise a set of k tasks [ . . .k]. Based on random distribution, the tasks are initially shu led to reflect a specific problem that the organization must solve. In a stable scenario, each problem is generated with an identical task sequence. A crisis event is modeled as a permanent change in a problem: the task sequence is shu led once. A volatile scenario is modeled as a continuous change in a problem: the task sequence is shu led for each problem or following a defined frequency. In any case, the first task of a problem is always assigned to an agent that is aware of and skilled in performing the task.

Agents scrutinize tasks
An agent scrutinizes a task at hand to check if the task is represented in their awareness set.

Random search
An agent's random search attempt is modeled as an urn sample with replacement. The searching agent draws another agent to approach at random. This search attempt is successful if the searching agent finds another agent to take the task. Otherwise, the agent searches again, repeating the search until an approached agent accepts the task.

Communication between searching and approached agents
Agents communicate to hand over tasks. This communication also a ects which task is performed next and particularly depends on agents' task awareness. Agents that are aware of a task at hand approach an agent to perform the required task. Agents who are unaware of a task but have a notion of what to do due to their declarative memory approach an agent to perform the requested task. The response of the approached agents depends on their task awareness and skill. Four responses are possible: [ a] the agent is aware of and skilled for the required task and performs it; [ b] the agent is aware of but unskilled for the required task and tries to make a referral to another agent with the skill to perform it; [ a] the agent is unaware of the required task but skilled for the requested task and performs this necessary or unnecessary task; [ b] the agent is unaware of and unskilled for the requested task and tries to make a referral to another agent with the skill for the requested task (Miller et al. ).

Problem responsibility and task handover
An agent hands o a task if another agent is found to have the skill to perform the required or requested task. With the task hando , the approached agent becomes responsible for the problem, while the agent handing over the task relinquishes responsibility for it. Thus, always one single agent in the organization is responsible for advancing the problem-solving process.

Declarative memory
Agents observe and can learn what is done next. An approaching agent that hands o a task to another agent has the chance to store the information of the task performed next in their declarative memory. An experienced agent who is unaware of a task can draw on declarative memory to obtain an idea of what usually should be done. In this case, the agent assumes that the next task is the one that occurs most frequently in the declarative memory. This assumption can be misleading, particularly if the problem has changed over time.

Transactive memory
An agent that hands o a task to another agent has a chance to learn about the skill of the successor. Agents store this information in their transactive memory, updated with the probability (0 ≤ p t ≤ 1).

Necessary and unnecessary task accomplishments
Agents perform two types of tasks: necessary tasks, according to the given task sequence, and unnecessary tasks, resulting from task requests by searching agents with a wrong notion of what to do based on the wrong interference from their declarative memory. The problem-solving process only advances with the accomplishment of necessary tasks.

Cycle time
Cycle time measures the length of time taken for organizations' problem-solving processes and is calculated for each problem individually. Until a problem is solved, cycle time increases incrementally when agents (n) perform either necessary (n t ) or unnecessary (u t ) tasks and due to search costs (s t ) caused by unsuccessful random search attempts by agents. An organization achieves minimum cycle time if it only performs necessary tasks and if no search costs occur (Miller et al. ). The minimum cycle time equals the number of tasks in a problem.

The e ect of individual transactive memory on the initial formation of a routine
In the first experiment, we investigate how individuals' transactive memory a ects routine formation when an organization faces a recurrent problem. We test four di erent settings of memory update probability (p t ); other model parameters are held constant and to their defaults.  While the overall qualitative result is the same between the replicated the original simulation, quantitative divergences must be discussed. The replicated model shows, for a low value of p t and in the first two problemsolving instances, an increase in the average cycle time from to . In the original model, the cycle time does not exceed . Which result is correct? The divergence could be due to a mistaken model assumption or stochastic variance. Therefore, Figure depicts the results averaged over , simulation runs. The graph shows smooth learning curves compared to results averaged over runs. This indicates that the model has high stochastic variance that can be reduced by a higher number of runs. Although the three results are qualitatively similar, they are quantitatively di erent.

The e ect of individual declarative memory on the initial formation of a routine
The second experiment is performed to investigate the e ect of declarative memory on an organization's routineformation process. Similar to the first experiment, the parameter values of (p d ) are varied, and the other parameters are held constant. The organization again faces a recurrent problem.
Figure depicts the resulting cycle times for the repetitive problem-solving process of the modeled organization. The generated results of the replicated model again show learning curves decreasing to approximate the optimal cycle time, as expected. Di erent parameter settings of (p d ) appear to have a small e ect, indicating that higher declarative memory capability among members slightly increases the organization's learning capability. Thus, individual learning enables the organization to reach higher routine performance in less time. ) explain they found an e ect of declarative memory, but the e ect is low, because agents can discern half of the tasks (a = ). Furthermore, they mention a high update probability for declarative memory could substitute for low task awareness. Figure depicts the averaged results over , simulation runs, which allows more precise determination of the influence of declarative memory. While the e ect of declarative memory is low, a higher update probability of declarative memory does make organizations quicker to form routines. Indeed, the divergences between the original and replicated results demand an investigation of experimental error (see Appendix C). However, Miller et al. ( ) stated that declarative memory a ects routine formation in this setting due to the model design: "If the agent holding the problem is unaware of the next task, then it presumes that the next task is that occurring most frequently in its declarative memory associated with the task it just completed" (p. ).
Consequently, an agent's declarative memory can substitute for lacking task awareness. Moreover, agents initially learn the correct sequence because the helping agents are aware of the task a searching agent is looking for and thus, correctly decide what should be done next: If the agent completed the task for the first time, so that its declarative memory is blank, then the agent moves to step in the search process and seeks help from a randomly chosen agent who happens to be aware of the next task in the problem (Miller et al. , p. ).
Hence, if organizations face recurrent problems, agents' declarative memory can correctly substitute for lacking task awareness. This positively a ects organizational performance, although the e ect is weak, as the authors noted: Over a wide range of positive values ( . ≤ p d ≤ ), the probability of remembering past task sequences has little e ect on the cycle-time path. Because agents can discern half of the tasks (a = ) and the task sequence is fixed across problems, agents quickly fill the gaps in their knowledge of the task sequence needed to solve problems. (Miller et al. , p. ).
In summary, the authors mention, and the replicated results show that declarative memory (p d ) has a slight e ect on the initial formation of routines. The authors reason this e ect logically, but do not provide experimental evidence. Our replication discovers small e ects also experimentally, suspecting the divergence from the original model founded in the model stochasticity.

Routine disruption due to downsizing
The third experiment is performed to analyze what happens if the organization loses sta . In this experiment, the organization faces recurrent problems and then abruptly drops sta when the fi ieth problem is solved. Two scenarios are analyzed: a moderate sta reduction from to and a substantial reduction from to organizational members.
Figure highlights the resulting learning curves of the organization. In both cases, the cycle time greatly increases once the organization downsizes. In the extensive downsizing scenario, the average cycle time peaks at ; in the moderate scenario, it peaks at . This indicates that downsizing, particularly extensive downsizing, disrupts an initially formed routine. The loss of organizational knowledge explains this e ect. However, once the downsized organization has solved approximately ten more problems, it regains optimal performance, implying that the organization formed a new routine. Moreover, the extensively downsized organization recovers slightly faster (see also Miller et al. ).  A comparison between the replicated and original results shows that they are quite similar. The moderately downsized organization recovers more slowly because learning in this organization is distributed over more redundant agents (Miller et al. ). This supports the hypothesis that organizations might fail to adapt due to their inertia. Moreover, the cognitive properties of the organization depend not only on the properties of its constituting elements. The number of redundant elements also matters and increases the e ort required for coordination.
The averaged results over simulation runs (see Figure ) o er evidence of the reliability of the results from both models, especially the conclusion about which size of organization recovers more quickly, since the di erences are quite small.

Adaptation of routine to an external change contingent on organization size
Experiment represents an external change, modeled as a permanent, one-time change in problem. This is considered an environmental change because organizations do not influence the given problem structure. Figure illustrates the formation of routines when organizations with , , and members face such a change a er solving fi y problems. The problems one to fi y and the problems fi y-one to one hundred are identical (see also Miller et al. ).
Initially, organizations learn to handle the recurrent problem, as observed in the previous experiments. Small organizations learn faster than larger organizations. Once the problem changes, the organization's initially formed routines fail to meet the new challenge. The disruption in routine results in an abrupt increase in cycle time. The organization's acquired experience is obsolete. Beyond that, organizational knowledge hampers the formation of new e icient routines, as the organization does not achieve the optimal cycle time again. This indicates that organizations are unable to unlearn initially learned routines. On the micro-level, this might be explained by persistent and misleading entries within individuals' declarative memory that impede the learning of new task sequences (see also Miller et al. ). Thus, organizations get stuck in less than optimal routines caused by residuals of prior routines. The averaged results over simulation runs do not make it possible to predict which organization performs better over the long term. The averaged results over , runs suggest that bigger organizations develop better performing routines over the long run, although, smaller organizations recover faster (Figure ). However, an explanation of long-term performance is given by neither (Miller et al. ) nor by a simple examination of the model design and experiment. This demands investigation in this scenario and the agents' declarative memory states that reflect their knowledge base.

Adaptation of routine to an external change contingent on declarative memory
In Experiment , similar to the previous experiment, organizations again face an external change, but agents' declarative memory capacity (w d ) is analyzed. Figure illustrates the learning curves of two organizations comprising agents with di erent declarative memory capacities (w d = and w d = ). Once the problem changes, the organizations' formed routines collapse, as observed in the previous experiment. The organization comprising agents with a rather low declarative memory capacity shows a slightly higher performance in the long run (see also Miller et al. ). This indicates that highly experienced organizational members may restrain adaptation. Organizational unlearning of obsolete activities might thus be hampered because outdated memory remains stored across the distributed system. The results generated with both models and those averaged over more simulation runs are qualitatively comparable (see also Figure ). The replicated results indicate that declarative memory capacity does not a ect initial routine formation. In the initial phase, agents exclusively memorized the correct subsequent task. Therefore, it does not matter if they store the right task only once or fi y times. This result gives further evidence that the replicated model is built on the same assumptions as the original. Given the external change, the precise e ect size of the parameter (w d ) remains unclear. The analysis also shows that the complexity of the modeled behavior increases when the problem changes due to agents' learning capability.

Adaptation of routine to an external change contingent on task awareness
Experiment is designed to test how agents' task awareness a ects organizations' adaptive properties. This experiment is similar to experiments and , but agents' task awareness is varied (a = , a = , and a = ). Furthermore, the experiment addresses the substitution of task awareness with declarative memory. Figure depicts the organizational learning curves.
The replicated model shows that agents' task awareness has a marginal e ect on the initial formation of routines. Agents with high task awareness enable their organizations to form routines more e iciently compared to organizations comprising agents with low task awareness. Once the problem changes, this e ect intensifies. Indeed, organizations with agents that have limited task awareness cannot recover their previously achieved performance (see also Miller et al. ). The presence of obsolete declarative memory can again explain this observation; organizations must have members with high task awareness to unlearn old routines. Figure : Replicated results of experiment compared to the original results.
The results of both models are qualitatively similar, but the replicated model has slightly deviating curves in the initial routine-formation phase. These marginal di erence becomes visible when averaged over , runs (Figure ).

Routine adaption throughout an external change and simultaneous downsizing
(This experiment is discussed in Section of the main text to exemplify how DOE principles enabled systematic analysis of model behavior and results. For readability and to provide a complete overview of all replicated experiments in this appendix, we present it again below.) The final experiment models an external change to and simultaneous downsizing of an organization; downsizing is thus modeled as a response to external change. The organization faces a changed problem once the th recurrent problem is solved. At the same time, the organization is downsized from (n = ) to (n = ) and from (n = ) to (n = ) agents.
Figure shows the considerable increase in modeled cycle time a er simultaneous problem change and downsizing. In terms of cycle time, the organization that continuously operates with agents' peaks at , whereas the downsized organization of members peaks at , and the downsized organization with ten members JASSS, ( ) , http://jasss.soc.surrey.ac.uk/ / / .html Doi: . /jasss. peaks at . Hence, downsizing initially disrupts the organizational performance (see also Miller et al. ). The organization lost experienced members and their crucial knowledge for coordinating activities. Although the averaged results of simulation runs suggest that downsized organizations potentially learn more quickly in the new situation, we can make no reliable statement about which organization performs better a er the change. An increased number of runs enables more detailed interpretation (see Figure ). The heavily downsized organization with only ten remaining members shows the highest performance a er the change. At first, the heavily downsized organization performs worst, but learns much faster to handle the new situation. Still, none of the organizations regains optimal performance. This suggests that smaller organizations are more agile in creating a new knowledge network among agents. Overall, while the comparison of the results reveals some di erences, both models appear to be built on identical assumptions. The generation of such similar results with two quite simple models that are built on unequal assumptions is unlikely. The complex model behavior a er problem changes and downsizing is qualitatively equal. The results instead suggest high variance that results from stochasticity in the model.

Appendix C: The value of the systematic design of experiments (DOE)
Comparing the results generated by the original and replicated model reveals slight di erences. The additional simulation runs yield, when averaged, smooth graphs without outliers. The original graphs point towards more stochasticity and raising the question of statistical error. The DOE technique addresses this issue, facilitating standardized communication of the experimental design and determination of e ect sizes of model parameters with established statistical methods. Here, we apply the DOE technique to improve our understanding of the replicated model and to illustrate the value of the DOE technique for model evaluation.

Definition of the factorial design
DOE is appropriate for systematically exploring diverse parameter settings (Lorscheid et al. ). The replicated experiments are based on selected parameter variations. The factorial design, as applied in the following, predefines the varied parameter settings. A k factorial design is also chosen to incorporate the parameter settings used by Miller et al. ( ). This setting excludes a control variable, the number of tasks in a problem (k = ) is held constant. The dependent variable is cycle time. For a full overview of the classification of the variables as applied, see Appendix A. The cycle time is measured for each problem instance. Since the modeled organizations face several consecutive problems, the resulting cycle time is dynamic. To incorporate dynamic behavior over time, we conduct our analysis in discrete steps according to the number of problems.

Determining an appropriate number of simulation runs
(In reduced form, Tables and are already discussed in Section of the main text. For readability and to provide a complete presentation of all examined design points, we reproduce the full tables and some discussion here in the Appendix.) The simulation model uses stochasticity, which demands to determine the error variance. Determining the error variance supports the choice of an appropriate number of simulation runs for the experiments. Disclosure of the error variance also enhances the credibility of reported results and allows the inclusion of stochastically induced error in model evaluation. The factorial design defines three design points with low (L), medium (M), and high (H) factor levels, as highlighted in Table . The design points reflect the applied settings to estimate the number of simulation runs needed to produce su iciently robust results.

Design Points Factors Representation
. Low factor levels M . . Medium factor levels H . . High factor levels Table : Table of design points for the estimation of error variance   Table shows the mean values and coe icients of variation for the design points H, M, and L. We measured cycle time at five selected steps during the simulation runs, when the problems (P) , , , , and are solved, to account for the dynamic characteristic of the dependent variable. The coe icient of variation (c v ) is calculated as the standard deviation (σ) divided by the arithmetic mean (µ) of a specific number of runs (Lorscheid et al. ). The results in Table come from numbers of simulation runs ranging between and , . The coe icients of variation stabilize with an increasing number of runs up to about , runs; the mean values

Number of runs
. . Table : Error variance matrix of the replicated model and coe icients of variation change only slightly from , to , runs. We therefore conclude that , runs are su icient to produce robust results.
With large error variance detected for simulation runs, results averaged over runs or fewer should be carefully interpreted. Given the design point H and the cycle time of the twenty-fi h problem, the coe icient of variation is . for runs and . for , runs, which is a substantial di erence. A quantitative evaluation of experimental results based on averaged runs is thus imprecise and error-prone compared to an assessment based on , or more simulation runs.
The results confirm the expected values of cycle time as determined analytically (see Appendix D). The analytically calculated cycle time for the first problem (P ) and a small organization (n = ) is . . The simulated average cycle time over , runs is . for a small organization (design point L). These results are approximately equal. Such an approximate "numerical identity" is also found for medium-sized and large organizations, with expected cycle time compared to simulated cycle time of . to . and . to . , respectively. This o ers further evidence that the replicated model is implemented correctly.

An investigation of model-induced stochasticity
The relatively high error variance for simulation runs explains the deviations identified between the replicated and original results, illustrated in the example of experiment (see Appendix B). A deviation in terms of cycle   Table . The results are broadly scattered, as expected. Variance is limited to the lower bound by the minimum cycle time ( ). The median values of the learning curves approximate this lower bound over time, but outliers still occur above the threshold value of the minimum cycle time. This skews the individual distributions (see Table ).
For problem , the % quantile cycle time is , the median cycle time is , and the % quantile cycle time is (see Table ). Consequently, for half of the simulation runs, the resulting cycle time is in a wide range between and . The other % of results deviate even more, suggesting that divergence between the original and replicated results could be caused by high stochasticity, particularly due to the relative low number of performed simulation runs.

Figure
shows the e ect sizes for each factor over problem-solving instances and includes a problem change once organizations have solved the fi ieth problem. The calculation is based on the , , simulation runs resulting from the full DOE setting. The e ect sizes are standardized beta coe icients, and the coe icients indicate the negative and positive e ects of the factors on the dependent variable. The graph thus shows in which situation a rather high or low level of a factor increases or decreases cycle time.
The number of agents has a positive e ect on cycle time. More agents (n) in an organization thus increases cycle time with a varying e ect size over the number of solved problems. The e ect size peaks at . in the seventh problem-solving instance and a er that declines. The problem change again increases the e ect size to . . However, in the long run, the e ect size of organization size approximates zero. This result indicates that small organizations are more agile and outperform larger ones in changing environments.
The e ect of agents' updating probability of transactive memory (p t ) moves contrary to the e ect of (n), and the most substantial e ects are negative. High updating probability of agents' transactive memory decreases the cycle time. The peak e ect size ( . ) is observed for the fi h problem-solving instance, peaking again shortly a er the problem changes ( . ). The cognitive capability of agents to learn who knows what in the organization is consequently crucial to reduce cycle time, as observed in experiment (see Appendix B). Moreover, the transactive memory updating probability has a significant e ect a er a problem change; higher cognitive capabilities by agents might compensate for an increase in organizational size.
The e ect of agents' updating probability of declarative memory (p d ) is similar to but weaker than the impact of transactive memory until the problem changes. A er the problem change, the e ect size increases to a marginal value of . but returns into slightly negative territory once the organization has solved the new problem three times. This supports the results of experiment observed for the replicated model (see Appendix B). Declarative memory capacity (w d ) does not have an e ect until the problem changes, when the e ect size increases to . ; hence, higher cognitive capacity in terms of agents' declarative memory reduces organizational performance.
Higher task awareness by agents (a) reduces cycle time. The e ect is already strong for the first fi y problemsolving instances, peaking at -. . The e ect becomes still more substantial a er a problem change when agents who can discern many tasks avoid taking actions guided by misleading declarative memory. Unlike the other factors, the influence of task awareness on cycle time continuously increases. Over the long run, the e ect size approximates -. , in line with the observations of experiment (see Appendix B).
Overall, the e ect sizes support the observations from the discrete experiments. The standardized linear regression coe icients show dynamic model behavior, including for the scenario of a problem change. The e ects of agents' individual cognitive properties di er before and a er an organization faces a change in problem, suggesting further investigation of more volatile scenarios might yield valuable insights. Besides, some factors and their e ects might compensate for or reinforce each other. The following section considers such interaction effects.

Interaction e ects of model parameters
Factors might a ect the dependent variable di erently depending on the state of other factors. One parameter might moderate the e ect size of another parameter. Therefore, we analyze interaction e ects among factors based on linear regression. Table depicts the main e ects and two-way interaction e ects of the model parameters, measured for problem-solving instances and , cases at which the previous analysis indicated particularly strong e ects. This selection of cases enables comparison of interaction e ects before and a er a problem change.
-. . Declarative memory probability (p d ) -. . Declarative memory capacity (w d ) . Table : Interaction e ects of the model parameters. Note: The matrices contain the standardized beta coefficients of linear regression. The values on the diagonal indicate the main e ects of the individual factors. The other values are the two-factor interaction e ects between the two di erent experimental factors. Each e ect = is significant at the ≤ . level.
The performance of an organization that has solved ten recurrent problems is predominantly a ected by the number of agents in the organization and their transactive memory. On the one hand, a higher number of agents (n) increases cycle time, while, on the other hand, higher updating probability of transactive memory (p t ) decreases the cycle time. The slight interaction e ect (-. ) shows that these e ects are dependent; a proportional increase in both factor levels would reduce cycle time because the interaction e ect is negative. The lower performance of bigger organizations can be compensated by improved cognitive capabilities of their agents, specifically their transactive memory.
A slight interaction e ect ( . ) between task awareness and declarative memory updating probability supports the assumption that low task awareness might be substituted by agents who frequently update their declarative memory.
A er a problem change, in the sixtieth problem-solving instance, the previously discussed interaction e ects largely diminish, except for a marginal increase in the interaction e ect between the number of agents and their task awareness. The performance of organizations that handle a problem change is primarily positively a ected (that is, cycle time is reduced) by high task awareness and high update rate of agents' transactive memory. Performance is reduced by an increased number of agents and an increased declarative memory capacity.
These interaction e ects support previous observations. The e ect sizes help to understand the fundamental model behavior and simulation results. The error variance matrix supports in the estimation of an appropriate number of simulation runs of the replicated model to facilitate robust results. Thereby, we provided an demonstration of the value of the DOE for agent-based model analyses.
In this model, organizational behavior results from individual behavior. During a simulation run, agents learn. In their declarative memory, agents store which task follows the task they accomplished. They retrieve information from this memory whenever they are not aware of the next task, behaving in accordance with their declarative memory. Therefore, the habits that agents acquire over time are reflected in their knowledge base.
The following investigation aims to analyze agents' developed knowledge base, measuring the amount and correctness of information stored in their declarative memory. The average agent knowledge base is representative because the agents are homogenous. The analysis focuses on the one-hundredth problem-solving instance to ensure that organizational performance has re-stabilized a er the change in the fi ieth problem instance. In other words, the agents have been given a decent chance to learn how to handle the new problem.
Table shows the results of the investigation of agents' knowledge base. Three types of organizations are investigated, comprising agents with low, medium, and high task awareness. The cycle time for the hundredth problem-solving instance is depicted, as well as the average experience of the agents and their behavior.

Agents with low task awareness (a = )
Agents with medium task awareness (a = )  Table : Experience of agents and their behavior. Note: Except for the varied task awareness, the default parameter settings are applied. Results are averaged over , simulation runs, each including a problem change in the fi ieth problem-solving instance. Measurements of agents' knowledge are based on simulation data from the hundredth problem-solving instance. Regardless of agents' task awareness, most agents gained experience. For this analysis, the number of agents in the organizations is held constant at (n = ). The results show that most agents are involved in the problemsolving process, even though the organizations only require (n = ) agents to solve the (k = ) tasks related to a problem. This o ers evidence that the knowledge of routines is highly distributed among agents.
Agents gain the most experience if their task awareness is rather low. About % of agents with low task awareness (a = ) gain experience compared to about % of agents with high task awareness (a = ). An agent that cannot recognize what to do next always searches randomly for help from others. This drives the number of interactions among di erent agents. Therefore, most agents are involved in the problem-solving processes and gain experience.
Agents that gain experience can also develop habits. Agents who are unaware of what to do next draw on their experience about what they have done in similar situations. They then behave as suggested by their declarative memory. However, their habits can be more or less appropriate, given the job at hand. Three types of habits are identified: ( ) agents behave appropriately given the problem and perform a necessary action; ( ) agents behave inappropriately because they perform an action that has become obsolete since the problem changed; and ( ) agents behave inappropriately because they learned something wrong and perform an unnecessary action.
Although most agents are experienced, their habits overall are inappropriate for the problem, even those aware of % of the tasks (a = ). One might expect, in this case, for agents' habits to also match % of their situations at hand. However, agents' habits only match the problem at hand in . % of cases, as reflected in their declarative memories. Nevertheless, agents with high task awareness e iciently unlearn obsolete behavior (reduced to . % of actions, compared to . % of actions by agents with low task awareness). Notably, too, the habits of agents with low task awareness ( . %) outperform their awareness, as they are only aware of % of the tasks (a = ). Experienced agents are likely to develop habits ( . %, . %, and . %, respectively, for agents with low, medium, and high task awareness) that neither match the initial nor the actual problem.
Most habits can lead to unnecessary activities for a given problem. Organizations are thus o en enmeshed into special subroutines, reducing their performance. Indeed, agents with high task awareness seldom rely on their declarative memory. In contrast, agents with low task awareness commonly consult their declarative memory.
Overall, . % of agents perform unnecessary ( . %) or obsolete ( . %) actions, increasing the expected average cycle time proportionally to × %/( % -. %) = . . Yet the average cycle time is only . , indicating that interactions among agents prevent the performance of unnecessary tasks.
To sum up, agents' habits alone cannot explain organizational performance. Organizations need personnel with high task awareness to mitigate the emergence of unusual routines. Agents with low task awareness have a high potential to develop unusual routines when facing change. In such an organization, experienced agents follow habits that are inappropriate to the organization's goal, although the enactment of their inappropriate habits is mitigated through interactions among individuals.

Appendix F: The path dependency of the development of organizational routines
In this experiment, the organization faces di erent problems that are randomly generated, except for the very first problem. Although the simulated organization solves distinct problems, the original action pattern that matches the very first problem is still detectable (see Table ).  Table : Occurrence probabilities of recurrent patterns of interdependent actions. Note: The rows indicate the performed tasks, numbered according to the initial sequence of the very first problem, and the columns indicate the task subsequently performed. The values indicate the probability frequency that one task is performed a er another, calculated as: P (E) = n(E) N ×100 where n(E) is the number of trials in which event E occurred and N is the total number of trials. The occurrence probability that tasks are immediately repeated is very low. Agents with a misleading notion of what to do can get stuck in loops in which the problem is passed between agents. Such loops are broken in the model. Therefore, we exclude entries on the matrix diagonal for calculating the occurrence probabilities.

Notes
One could argue that simulation models that are not independently replicated have only marginal scientific value due to their prototype character.
Procedural memory reflects agents' "know how," declarative memory reflects their knowledge of "what to do," and transactive memory reflects "who knows what" (for a comprehensive description of the three memory concepts of routines see Miller et al. , p. ). Agents draw information from their memory to perform routines.
Based on Google Scholar citations through June . Another model -Pentland et al. ( ) -has citations, but it is not agent-based (Kahl & Meyer ). For a more recent agent-based model of routines, see Gao et al. ( ).
The studied conceptual model is, in principle, generic, shi ing the focus to verification of the fit between the conceptual and implemented models. In explaining their assumptions, the authors refer briefly to the example of a medical service unit (see Miller et al. , p. ), but their conceptual model can represent diverse organizational settings because of its design at a high level of abstraction.
One incentive to replicate agent-based models was the Volterra Replication Prize, but the prize has not been awarded since (http://cress.soc.surrey.ac.uk/essa2009/volterraPrize.php).
However, we want to recognize the trend within the ABM community to make models and accompanying data fully available online (Hauke et al. ; Janssen ). Therefore, setting an example for good scientific practice in comparison to other disciplines, where transparent data sharing is o en still lacking.
To our surprise, the authors do not mention agent-based modeling among the listed simulation approaches, which is perhaps why they fail to highlight implications of the strength of the Keep It Descriptive, Stupid (KIDS) approach -handling social complexity in connection with theory development (Edmonds & Moss ) -in favor of the Keep It Simple, Stupid (KISS) approach.
We acknowledge that such a view on theory is not uncontroversial. However, the discussion of what makes a theory unsettles philosophy of science until today. We see their concept of simple theory as a useful substantiation of the to be developed building blocks of a theory: "Constructs, propositions that link those constructs together, logical arguments that explain the underlying theoretical rationale for the propositions, and assumptions that define the scope or boundary conditions of the theory" (Davis et al. ). Explicitly addressing these building blocks supports the process of theory development as an evolutionary process (Weick ; Whetten ). As such, it might be understood as a theory under construction.
For an assessment of the concept of dynamic capabilities as a theory, see Denrell & Powell ( ).
We also screened the MATLAB code of the original model for anomalies and misspecifications.
The replicated model code can be found at https://www.comses.net/codebase-release/ a -e cd-c -b ad ce ac / (file name: Dynamics of Organizational Routines: A Model Replication).
If skills are not approximately distributed uniformly among agents, this can lead to di erent results, as highlighted in the original study.
Cycle time does not increase when an agent scrutinizes or hands o a task to another agent who accepts the problem. This simplifying assumption implies that scrutinizing tasks and hando s requires no e ort. Miller et al. ( ) provided no information regarding how they chose the number of simulation runs or regarding the coe icient of variation.
Recall that the problem changes a er the organization solves the fi ieth problem.
We acknowledge that this number of runs is rather high. In this study, we aimed to obtain particularly stable results to enable visual comparison with the original graphs. See, in this respect, discussions regarding Figures and . More recent approaches to determining the appropriate number of runs adopt a power analysis framework (Secchi & Seri ), which supports an argument for fewer simulation runs. As a matter of fact, Secchi & Seri ( ) concluded that the original simulation experiments with the model are overpowered, while the majority of investigated papers in their review lacked su icient model runs and are therefore underpowered. Having too many runs poses the risk, besides the added computational costs, that economically insignificant results become statistically significant. For this reason, we argue that e ect sizes should be considered to distinguish between economic and statistical significance. For a discussion of the problems of over-and underpowered simulation experiments, including issues with Error Type II, see Secchi & Seri ( ).
Another advantage of having a replicated model is that we can now calculate the respective e ect sizes. We calculated Cohen's d for the relevant e ects; they fall in the range assumed by Secchi & Seri ( ).
Moreover, a er we completed the replication we examined the code provided by Miller et al. ( ). The basic processes correspond to the flow chart depicted in Figure , and key model elements were implemented the same, conceptually, as in our replicated model. Miller et al. ( ) denominate these aspects as performative and ostensive routines.
The term "organization" also refers to organizations within organizations; that is, departments are suborganizations within firms.
Knowledge can represent experience gained in another organization or acquired in training. For example, an employee might learn to follow a new procedure that does not correspond to the previously lived routine.
In Germany, in line with this result, Tesla avoids recruiting experienced personnel from the automotive industry (according to a personal conversation with one of the authors).
The random shu ling of tasks in a sequence of k tasks allows the generation of k! distinct problems, or with tasks, as in the experiment here, ! = , , . An identical sequence is unlikely to reoccur.
The occurrence probability that tasks are immediately repeated is very low. Agents with a misleading notion of what to do can get stuck in loops in which the problem is passed between agents. Such loops are broken in the model. Therefore, we exclude entries on the matrix diagonal for calculating the occurrence probabilities.
For example, a new company might develop a particular behavior in its start-up phase. This behavior becomes the company's disposition (firm culture). The company might act according to this disposition even years later.
An extended explanation according to the model design follows: An agent that is aware of a task performs it. Otherwise, the agent searches for help from another agent. The approached agent is likely to have di erent task awareness. Thus, both agents taken together are, with a higher probability, aware of what to do. The approached agent also knows other agents and is thus o en able to refer the task to an agent that can perform it. A distant (random) search is then unnecessary.
The formation of such routines depends on organizational size. In volatile environments, small organizations are more agile and form routines faster compared to larger organizations.
If skills are not approximately distributed uniformly among agents, this can lead to di erent results.
In pretests with the replication model, di erent submodels were tested to clarify ambiguous assumptions.
Cycle time does not increase when an agent scrutinizes or hands o a task to another agent who accepts the problem. This simplifying assumption implies that scrutinizing tasks and hando s require no e ort.
One might regard downsizing as an endogenous change.
Recall that the problem changes a er the organization solves the fi ieth problem.
An averaged calculation of cycle time over several problem-solving instances would deteriorate the informative value of the e ect sizes.
The factorial design and , repetitions of each simulation run yields × , = , , simulation runs in total.
No main e ect or interaction e ects are observable for declarative memory capacity, because the capacity is not varied below the value of . Agents are always capable of solving a subsequent task. Moreover, in the initial routine-formation phase, declarative memory contains only correct entries. Therefore, it does not matter how o en a correct, subsequent task is stored.
Moreover, an appropriate number of simulation runs is incorporated to obtain representative results.
Moreover, transactive memory always indicates correctly who knows what.
The occurrence probability that tasks are immediately repeated is very low. Agents with a misleading notion of what to do can get stuck in loops in which the problem is passed between agents. Such loops are broken in the model. Therefore, we exclude entries on the matrix diagonal for calculating the occurrence probabilities.