Seed Selection Strategies for Information Di usion inSocialNetworks: AnAgent-Based Model Applied to Rural Zambia

The successful adoptionof innovationsdependson theprovisionof adequate information to farmers. In rural areas of developing countries, farmers usually rely on their social networks as an information source. Hence, policy-makers and program-implementers can benefit from social di usion processes to e ectively disseminate information. This study aims to identify the set of farmers who initially obtain information (‘seeds’) that optimises di usion through the network. It systematically evaluates di erent criteria for seed selection, number of seeds, and their interaction e ects. An empirical Agent-Based Model adjusted to a case study in rural Zambia was applied to predict di usion outcomes for varying seed sets ex ante. Simulations revealed that informing farmers with the most connections leads to highest di usion speed and reach. Also targeting village heads and farmers with high betweenness centrality, who function as bridges connecting di erent parts of the network, enhances di usion. An increased number of seeds improves reach, but the marginal e ects of additional seeds decline. Interdependencies between seed set size and selection criteria highlight the importance of considering both seed selection criteria and seed set size for optimising seeding strategies to enhance information di usion.


Introduction
. Many regions in developing countries lack adequate access to formal information sources, which is why information is spread through social networks (Saint Ville et al. ; Songsermsawas et al. ; Rink & Wong-Grünwald ). This channel of knowledge dissemination is especially important for resource-poor small-scale farmers. They rely on informal information sources since sharing information through word-of-mouth communication is convenient, reduces transaction costs, and is easily accessible (Feder et al. ; Matuschke ). Access to information is particularly important in the context of adopting innovations, which have high potential to improve farmers' productivity and adaptation abilities (Pratiwi & Suzuki ; Vasilaky & Leonard ). Farmers might decide to adopt an innovation only if they are aware of it and have su icient knowledge about its benefits and application. Consequently, access to information increases the likelihood of innovation adoption (Rogers ; Khonje et al. ; Xiong et al. ).
. However, when designing interventions to disseminate knowledge, policy-makers and program-implementers o en face limited resources that obstruct the provision of information to every farmer individually. Consequently, they commonly target only a subset of farmers who receive the information initially. They then rely on the initially informed farmers, who are referred to as 'seeds', to spread the knowledge within their community. Because all other farmers depend on the seeds to obtain the information through their network, seeds are crucial to the success of the di usion process (Genius et al. ; Magnan et al. ; D'Angelo et al. ). Choosing a subgroup of farmers as seeds therefore raises the question of how to identify the actors with the highest di usion potential (Erlandsson et al. ). .
Many studies examine the di usion of innovations and thereby acknowledge information as an essential prerequisite for adoption, but research addressing the improvement of information spread remains limited. Some studies focus on the role of seed selection on e ective information dissemination, but they predominantly analyse centrality measures as seed selection criteria exclusively, which account for local hierarchical structures only indirectly, or develop approximation algorithms to maximize spread. Furthermore, due to the paucity of empirical data many studies follow a theoretical approach. Empirical research is particularly limited in the context of developing countries despite the high relevance of information spread through word-of-mouth communication in rural economies (Muller & Peres ). .
To address these limitations in research, this paper focusses on the di usion of information as a precondition of innovation adoption. By systematically evaluating performance of di erent seed sets, the study aims to improve information dissemination through optimising seed selection. Specifically, the present research evaluates information of low complexity, such as awareness of the existence of an agricultural innovation, which can be transmitted via single encounters between farmers and thus spreads like a simple contagion (Centola ). We focus on word-of-mouth communication as the mean for information sharing between farming households. By adjusting an agent-based model (ABM) to a case study in rural Zambia, the study provides empirical evidence while accounting for context-specific dynamics and heterogeneity of farmers. The ABM systematically assesses the outcomes of various seeding strategies à priori to support the design of di usion processes (Scheller et al. ; Barbuto et al. ; Dijkxhoorn et al. ). Thus, this study provides policy-makers and development practitioners with insights on the importance of seeding strategies for di usion processes prior to the intervention implementation. Results are specifically relevant for applications in rural areas of developing countries where social networks are the main source of information and projects depend on seeds to disseminate knowledge. In particular, the paper takes the following steps. Firstly, it systematically assesses centrality measures as seed selection criteria with respect to their impact on the speed and reach of information di usion. Additionally, it tests hierarchy (village heads) as a seed selection criterion. Secondly, the paper examines the influence of the number of seeds on the di usion process. Thirdly, a robustness check analyses how the success of the seed selection criteria depends on the number of seeds. The analysis confirms that optimising the seeding strategy with respect to selection criteria and size strongly impacts the di usion success. .
The remainder of this paper is structured as follows: Section presents the state of the literature. The consequent section describes the study area and data, followed by the outline of the ABM in Section . Section presents and discusses the simulations results, and Section summarizes and concludes.

Literature Review
. ; Xiong et al. ). Whereas most investigations focus on the process of adoption di usion, some researchers also assess the information e ect of social networks (e.g., Cadger et al.
; Mekonnen et al. ; Xiong et al. ; Shikuku ). Studies such as Anderson & Feder ( ) confirm that endowment with human capital is frequently linked to farmers' performance because innovation adoption requires access to information. Especially in rural areas of developing countries, farmers rely on their social networks to access knowledge which in turn influences adoption decisions (Foster & Rosenzweig ; Conley & Udry ; Bandiera & Rasul ; Saint Ville et al. ). Social learning between farmers influences the di usion process in all phases, but is particularly important in the early stages (Xiong et al. ). Therefore, social networks are of major relevance for information and innovation di usion processes due to their function as informal communication channels ( ; Bogner et al. ). Several authors also emphasize the role of weak ties for connecting communities and thereby preventing local trapping of information (Granovetter ; Zhao et al. ). Overall, the specific network structure is an important factor for the di usion of information (Barbuto et al. ; Qiao et al. ).
In addition, the likelihood of receiving relevant information depends on the farmer's centrality in the network (Hinz et al. ; Muange et al. ; Pratiwi & Suzuki ). To identify well-connected actors with high abilities to exchange information, social network theory provides several centrality measures such as degree, closeness, betweenness, and eigenvector centrality (Lü et al. ; Borgatti et al. ). Degree centrality equals the number of connections an actor has in the network. The inverse sum of distances of an actor to all other actors in the network describes closeness centrality, which represents how close an actor is to all others in the network. Betweenness centrality indicates the frequency with which an actor is on the shortest path between any combination of two actors in the network. Actors with high betweenness function as bridges between communities and are more likely to influence people from di erent groups. Eigenvector centrality is based on the idea that each actor's centrality depends on the weighted average of the actors with whom it is connected (Hussain et al. ; Borgatti et al. ). A variety of studies confirms that network centrality is positively associated with enhanced information sharing capabilities (Hinz et al. ; ; Barbuto et al. ). The literature has proposed algorithms to identify actors that maximize spread (e.g. Kempe et al. ; Wang et al. ; D'Angelo et al. ). In practice, however, well-connected or popular persons are commonly selected as seeds (Delre et al. ; Mochalova & Nanopoulos ; Banerjee et al. ). Seeding strategies have been analysed in contexts such as di usion of microfinance (Banerjee et al. ) and adoption of agricultural technologies (Beaman et al. ). .
Not only do seed selection criteria matter, but also the number of seeds is relevant for a successful di usion process since a larger number of seeds leads to greater reach (Bampo et al. ; Barbuto et al. ). Expanding the number of randomly chosen seeds can even outperform centrality-based seed selections (Akbarpour et al. ). Additionally, the interaction between the number of seeds and centrality measures chosen as criteria for seed selection a ects the di usion success (Mochalova & Nanopoulos ). Overall, the literature shows that seeding strategies play an important role for di usion processes, but detailed studies comparing criteria and considering interdependencies in an empirical context remain limited.

Study Area and Data
. The model is adjusted to a study area called Mantapala, which is located within the Congo Basin in northern Luapula Province of Zambia as illustrated in Figure . The study area was selected within the framework of the project Food Security in Rural Zambia (FoSeZa), funded by the German Federal Ministry of Food and Agriculture (BMEL). According to stakeholder discussions and results of a pilot survey, the region shows typical features of a rural village in a developing country such as a remote location, lack of infrastructure development (roads, electricity, market integration, etc.), pronounced malnutrition, and food insecurity (Hampwaye et al. ; Central Statistical O ice Zambia ; Gronau et al. ). Given the distinct impoverishment combined with high food insecurity and dependence on farming, research regarding the spread of agricultural innovations can contribute to improving local livelihoods (Hampwaye et al. ; Central Statistical O ice Zambia , ; Gronau et al. ). .
The household characteristics derived from the survey data displayed in Tables -illustrate the poverty, food insecurity, low levels of education, and agriculturally oriented livelihoods that prevail in the study area. .
The study area consists of eight villages, which are overseen by seven village heads. Although the study area is rather isolated, exchange in terms of communication and economic activities amongst the eight villages takes place, o ering interesting potential for social network analysis. Each village consists of about ten to households summing up to households. A structured household survey was conducted in as part of a census. Thus, the data set covers all households in the study area. In addition to sections on socio-economic characteristics, agricultural activities, and food security amongst other, the interviews contained a comprehensive segment on social capital, which provides information about the social network in the study area, which households exchange agricultural information, and how frequently they do so. Furthermore, GPS-data of the households were collected.

Number of households Share of households (in %)
Female headed households Farming as the main income source of household head Member of an agricultural, livestock, producer group Households with poor food security according to food consumption index Households with borderline food security according to food consumption index Households with acceptable food security according to food consumption index Table : Descriptives: household characteristics ( ). Note: n = .

Characteristics Mean
Age

Methodology: The information di usion model
. The present agent-based model simulates information di usion through word-of-mouth communication between farming households in rural Zambia to identify seed sets with high potential for information spread. The following section provides an overview over the ABM, and the Appendix B contains the full description in detail. .

The households:
The acting entities in the model are farming households, who exchange agricultural information. Variables indicating both location and hierarchical status describe the households. Households can be connected via links, which include data specifying whether and how frequently the respective households discuss agricultural matters. Data to initialize the households and the network origin from a household survey conducted in in Zambia. .

Information transmission:
The model employs an empirical approach to simulate transmission of agricultural information between households. Thereby, the di usion is modelled as a simple contagion process, where information can be successfully transmitted through a single contact between farming households (Centola ). Each time step, information exchange between households takes place as follows: the households already informed randomly choose another household with which they share an information link. If the selected household has not yet obtained the information, the acting household transmits the information based on a probability that increases with higher frequency of agricultural discussions between these households, as indicated in the survey. The newly informed households then update their status accordingly. .
Each model run starts with the initialization of the households and the social network, which remains static over the course of the simulation. Initially, only those households selected as seeds set their state to "informed", whereas all other households are "uninformed". The model proceeds in weekly time steps. As information is assumed to lose relevance a er a certain point in time, the time frame is set to one year, limiting the simulation to steps.

.
Model output: The model output includes data regarding reach (total number of households informed) and speed (rate of di usion, number of households informed monthly). Reach provides information on final spread; speed provides information on progress over time. .

Scenarios:
The analysis distinguishes between three scenarios to find the most eligible set of seeds for widespread and rapid information di usion within a social network.
. The first scenario examines the impact of the following selection criteria for seeds on the di usion process: i) random ii) hierarchy (village heads) iii) degree centrality iv) closeness centrality v) betweenness centrality and vi) eigenvector centrality. In this first scenario, the number of seeds is eight, which corresponds to approximately % of the population (Leeuwis ; Natcher et al. ). If the seeds are selected according to hierarchy, the total number of seeds is seven, which equals the number of village heads in the study region.
. The second scenario assesses the e ect of the number of seeds on the di usion process by varying it from two to by increments of two, corresponding to to % of the total population (Bampo et al. ; Erlandsson et al. ). Seeds selection is random.
. In order to test robustness of the results, scenario examines the interaction e ects between the selection criteria and number of seeds by simultaneously changing the seed set size and selection criteria.

. Model calibration and verification:
The applied model provides generic insights into how information spreads in a sparse social network. To ensure consistency with reality, the model parameters such as network structure and interaction behaviour are based on survey data from a case study in rural Zambia. A careful scan of the code was carried out whilst discussing it amongst researchers. Additionally, extreme values of the input variables were implemented to test corner cases. The simulation was run with at least repetitions for each specific scenario. .

Robustness tests:
In order to test the dependence of the results on the specific network, the static network, as derived from the survey, was varied by changing the total number of links in the network during initialization (random addition and deletion of % and % of the existing links). Furthermore, % and % of existing links were rewired while keeping the total number of links constant to test for the results' dependence on network properties and transferability. .

Data analysis:
The results of the simulations were analysed by applying t-tests and one-and two-way Analysis of Variance (ANOVA) statistics using Stata .

Results
. The following chapter presents descriptive findings and simulation results. The presentation of the simulation results is structured according to the three scenarios.

Descriptive results
.
The subsequent section presents selected descriptive findings with respect to the social network in the study area. Further seed-specific network descriptives are included in the Appendix A. .
The household heads have on average . connections to other households in the study area as visualized in Figure . Connections are links between farmers that the respondent perceived as important for business or personal reasons. The low number of connections results in an overall network density of . implying low trust in the study area. This result is in line with Leavy ( ), who also does not find large networks in rural Zambia. On average, closeness centrality is . , betweenness centrality . , eigenvector centrality . , and the cluster coe icient . . The average shortest path in the largest component is . , and transitivity is . . In the network, households ( %) are connected in a big component, but disconnected from the ( %) remaining households, who are partially connected with each other in smaller components. Most connections represent close friendships or are based on family relations as displayed in Table . Type of connection Absolute Relative (in %) Number of connections that are blood relatives Number of connections that are related by marriage Number of connections that are best friends Number of connections that are friends Number of connections that are neighbours Table : Social network descriptives. .
Despite the fact that the network is sparse and trust seems to be generally low in the study area, social networks nevertheless play a vital role for information exchange, and the existing connections are used to disseminate information as presented in Table .

Number of households / connections Absolute Relative (in %)
Households relying mostly on family and friends for agricultural information Households relying mostly on experts and extension services for agricultural information Households relying mostly on village meetings for agricultural information Households relying mostly on media for agricultural information Connections exchanging agricultural advice Connections that discuss business several times per week Connections that discuss business several times per month Connections that discuss business several times per year or only when a decision is made The following section presents the results of scenario , which tests the e ects of di erent seed selection criteria on the di usion process. As summarized in Table , seed selection based on degree centrality leads to highest final reach amongst all selection criteria with mean di usion rates of . % of the whole population. Hierarchy-based ( . %), betweenness-based ( . %), and random selection ( . %) achieve very similar results. In contrast, seed selection based on closeness and eigenvector centrality leads to significantly lower reach with only a share of . % and . % of the population informed (p = . , degrees of freedom (DF )= , F = . ). Degree-, betweenness-, and hierarchy-based seeds lead to similar results in each simulation run, whereas random, closeness-, and eigenvector-based seeds fluctuate more between simulations. . Figure , all seed selection criteria lead to distinct numbers of informed households during the first stages of the di usion process (p = . , DF = , F = . for timestep ). However, results caused by selection based on hierarchy, betweenness, degree, and random choice converge at the end of the year. When about % of the population received the information, a saturation level seems to be reached. However, the distinct seed selection criteria attain this saturation level at di erent points in time. The graph shows that degree-based selection leads to the quickest spread. Figure : Scenario : E ect of seed selection criteria over time.

As visualized in
. As most village heads are well connected in the network, hierarchy-based seed selection yields relatively good results. Despite the slightly lower number of seeds in the hierarchy-based scenario (seven compared to eight), reach does not significantly di er compared with betweenness-based (p = . ), degree-based (p = . ), and random seed selection (p = . ) as the Bonferroni multiple comparison test shows (DF = , F = . ). In addition, random seed-selection performs quite well and yields similar results as degree-based, betweennessbased, or hierarchical-based seed selection.

Scenario : The impact of number of seeds .
Scenario investigates how the number of seeds influences spread over time. Hereby, the number of randomly chosen seeds is varied from to seeds, incrementally increased by two covering the span of circa % up to % of the population (Bampo et al. ; Erlandsson et al. ). .
The ANOVA statistic examining the di erences in seed set size shows that a higher number of seeds is positively associated with reach (p = . , DF = , F = . ). Furthermore, seed set size impacts the di usion speed as the distinct slopes for di erent seed set sizes indicate (Figure ). Relatively large seed set sizes increase speed in early stages of the di usion process; relatively small seed set sizes enhance speed to a larger extent in later stages instead. Figure : Scenario : E ect of seed set size over time. .
For smaller seed set sizes di usion speed is lower at the beginning. Consequently, increasing small seed sets still significantly impacts reach by the end of the year (p = . for increasing from to seeds in step ), contrary to increasing rather large seed sets (p = . for increasing from to seeds in step , DF = , F = . ). However, due to the less frequent information transmission in the beginning and resulting long take-o phase, small seed set sizes result in lower overall reach despite the higher speed at the end.

Scenario : Simulating interaction e ects .
Scenario explores how the success of seed selection criteria depends on the number of seeds. By analysing the interaction e ect between seed selection criteria and set size, robustness of the previous results is tested. The ANOVA analysis investigating seed set size, choosing mechanism as well as their interaction e ects confirms that at all points in time the number of seeds, the selection criteria, and their interaction e ect influence reach (p = . , DF = , F = . in time step ). The interaction e ects, depicted in Figures -, prove that random choice, betweenness-, and degree-based seed selection support the previous results from Scenario : for these selection criteria, an increase in seed set size leads to higher reach, but the e ect of each additional seed on the di usion success declines. In contrast, the performances of closeness-and eigenvector-based seed selection are not robust to the number of seeds since seed set size impacts the performances of seed selection based on these criteria. Thus, closeness-and eigenvector-based seed selection should be considered critically. .
The simulations show that choosing village heads as seeds leads to quite high reach although the number of seeds is always restricted to seven (this restriction is the reason for excluding hierarchy from Figures -)  di erence of only additional households informed at the end of the year compared to hierarchy-based seed selection. Thus, informing village heads seems quite e icient.

Robustness of the results .
To test the robustness of the previous results, the network was systematically varied through rewiring % and % of the links, while the total number of links remained constant, and adding as well as deleting links ( % and %). The results show that while the number of links is positively associated with reach and speed, the general findings remain valid. Thus, the results prove to be robust towards small changes in the network and therefore demonstrate transferability of the derived policy recommendations for seeding strategies to other applications with similar network characteristics.

Discussion and Conclusions
. Our analysis explored how the selection of farmers who receive information first ('seeds') influences knowledge di usion in a sparse social network. An empirical ABM was implemented and adjusted to a case study area in rural Zambia. Our results provide policy makers and development practitioners with insights into how seeding strategies can support the promotion of innovations, while making use of social di usion processes in rural communities in developing countries. Our study emphasizes the role of adequate seed selection strategies, which take into account both the seed selection criterion and the set size for enhancing information dissemination.
. Scenario showed that selection of farmers with most connections (highest degree centrality) improves the speed and reach of di usion the most. Selecting farmers with high betweenness centrality, who link network  ) have confirmed that degree centrality performs well as a seed selection criterion for rapid and widespread information dissemination. Households with great degree centrality have the largest number of direct connections and therefore a relatively high level of immediate influence on others (Delre et al.
; Barbuto et al. ). In addition, degree centrality poses a simple and cost-e ective method for seed selection, which does not require census data (Erlandsson et al. ; Borgatti et al. ). Thus, choosing the most popular farmers, farmers with the most connections, o ers high potential for improving the dissemination of information. .
The finding that betweenness-based seeds achieve quick and wide reach is in line with Goldenberg et al. ). Thus, targeting farmers who function as bridges and connect villages or friendship groups is important for the di usion success, particularly in sparse networks. .
The fact that seeds selected on the basis of eigenvector centrality lead to poor di usion is rather surprising as it is in contrast to studies such as Banerjee et al. ( ). However, eigenvector centrality relates to the centrality of the household's links rather than to the household itself (Chen et al. ; Borgatti et al. ). According to Barbuto et al. ( ), directly exercised influence has a stronger impact on di usion than indirect influence. Besides, Borgatti et al. ( ) argued that eigenvector centrality is not well defined in disconnected networks. Overall, while it may be advantageous for some applications to address farmers with high eigenvector centrality, in the context of our study area this leads to poor di usion success.

.
In line with Mochalova & Nanopoulos ( ), closeness-based selection causes low average reach. As closeness centrality represents the distance from an actor to all other actors in the network but does not focalize the direct influence, e ects of closeness-based seeds are more likely to be visible in the long run Muller & Peres ( ).  Closeness-based seed selection indeed induces a relatively long take-o phase of the di usion process, and in some simulation runs the di usion is not even fully executed at the end of the year. Consequently, closenessbased seed selection calls for critical consideration and is not advisable in this case study for dissemination of information related to agricultural innovations. .
Hierarchy-based seed selection causes reach that does not significantly di er compared with betweennessbased, degree-based, or random seed selection despite the slightly lower number of seeds in this scenario. Thus, informing village heads is cost-e ective due to the smaller number of required seeds and gains particular appeal if budget restrictions exist. Furthermore, village heads are easy to identify, and informing them is not only convenient to implement, but also meets social norms. Receiving information from village heads directly might be perceived and accepted di erently than knowledge transmitted by other community members (Muller & Peres ). However, higher ranks can increase social distance, which reduces the probability of information exchange (Beaman & Dillon ; Shikuku ). Additionally, informing the village heads first might reinforce existing hierarchical structures and favour elites that may capture early rents from the innovation. Which e ect predominates is subject to further research. Overall, the choice of village heads as seeds is an attractive option due to the resulting high reach, but might cause unforeseen complications in the practical implementation.
. All these results are compared with the scenario where seeds are randomly selected. Random selection of seeds achieves comparatively good results because information can be seeded in the main as well as smaller components and consequently be disseminated simultaneously to several parts of the network. Since random distribution in the community reduces the risk of knowledge being trapped, it can cause a large network reach. In contrast, centrality-based seeding strategies might lead to seed redundancy due to the choice of farmers who are part of the connected core in the network's big component (Akbarpour et al. ; Beaman & Dillon ). Furthermore, Isaac et al. ( ) found intensified information exchange between farmers in the core of the network compared with peripheral farmers. However, the high reach of random seeds relates to the mean value of the simulations, but can vary considerably in individual simulation runs as the high standard deviation indicates. Therefore, random seed selection achieves comparatively good results on average, but should be treated carefully in practice due to high uncertainty.
. Scenario reveals that an increase in seed set sizes generally improves information spread, but the marginal e ects of additional seeds decline. Whereas larger seed set sizes accelerate the di usion process and therefore reach saturation levels earlier, smaller seed set sizes lead to an initially slower information exchange. These results are in line with expectations and the literature (Bampo et al.
; Akbarpour et al. ). Bampo et al. ( ) explained that the probability of information transmission between households and thus speed in early time steps increase with larger seed set sizes. If the number of seeds is high, the maximum spread is reached more promptly. Because a saturation level of the di usion is attained eventually, increasing large quantities of seeds a er a certain point does not further enhance reach, and the marginal value of additional seeds decreases (Bampo et al.
; Akbarpour et al. ). Overall, policy-makers can improve the reach by increasing the seed set size. Especially if information is time-critical, expanding the seed set is e icient to enhance reach in the short run. Enlarging seed sets also increases reach in the long run if the seed set size is small. However, for larger seed set sizes other interventions than increasing the number of seeds can be more e icient to improve di usion as a saturation level will be eventually reached. .
Scenario shows that interaction e ects between seed set size and selection criteria exist. In the case of random, degree-, and betweenness-based seed selection, larger seed set sizes result in wider reach with decreasing marginal e ects of additional seeds, whereas the performance of closeness-and eigenvector-based seed selection depends on the size of the seed set. Targeting a large number of farmers with high degree centrality achieves best results in terms of speed and reach. Overall, the simulation of the interaction e ects highlights that both seed selection criterion and seed size should be considered simultaneously in the design of innovation interventions. .
Our simulations as well as the growing use of ABMs in policy in general demonstrate that ABMs can be a beneficial tool to support policy and the design of interventions. By conducting experiments in a virtual setting, several policy options and possible future development paths can be explored ex-ante at relatively low costs (Ahrweiler ; Gilbert et al. ). Accordingly, the ABM implemented in this study can allow the user to investigate the performance of a range of di erent seed sets before deciding which strategy to employ for the actual project implementation. ABMs o er some advantages over other modelling techniques as they do not impose assumptions on stationary, linearity, and homogeneity. Furthermore, they account for individual decision making and interaction between heterogenous agents and can cover a range of potential system states, including outcomes that might be of low probability, path-dependent, or emergent (Dechesne et al. ; Polhill et al. ). In this way, ABMs enable the user to relate the system's dynamics and structure to the properties and behaviours of individual agents (Ahrweiler ). Consequently, ABMs contribute to the understanding of system dynamics and development paths that can be expected under di erent policy scenarios (Ahrweiler ; Gilbert et al. ). Our ABM shows how the transfer of information to a set of individuals and the resulting information exchange on the farmer level shape the process of knowledge dissemination at the community level.

.
However, ABMs do not provide any prediction into the future. Rather, they help to identify possible and probable paths and undesirable or unintended consequences (Ahrweiler ; Polhill et al. ). For instance, our analysis showed that degree-based seeds perform well in all scenarios, but choosing seeds based on eigenvector centrality leads to undesirable di usion results and should therefore be avoided. While our ABM could contribute to understanding the role of di erent actors for knowledge di usion processes, our insights can inform development practitioners in designing innovation interventions. .
This said, our study also has several limitations. Whereas our study focuses on information as a prerequisite for adoption, other factors also play a role in adoption decisions of farmers. According to the literature, these include financial constraints (Krejci & Beamon ; Nguyen et al. ; Perello-Moragues et al. ), attractiveness of alternative choices (Xu et al. ), compatibility of the innovation with existing livelihood strategies (Gassner et al. ) and characteristics of the innovation itself (Okello et al. ; Ca aro et al. ). In addition, adoption decisions are influenced by socio-economic characteristics of farmers such as education and land size (Nguyen et al. ; Beyene et al. ), biophysical factors such as irrigation (Olum et al. ), as well as psychological and behavioural factors such as risk attitude (Kahneman & Tversky ; Chavas & Nauges ) and time preference (Lawrance ; Liebenehm & Waibel ; Llewellyn & Brown ). Our study only looked at information that is transferred by word-of-mouth communication and thus does not consider dissemination via other information channels, such as observation or media (Perello-Moragues et al.
; Xu et al. ). Furthermore, networks may evolve over time as a result of knowledge transfer and interaction between farmers (Isaac et al. ; Wood et al. ). However, in our study, the network was assumed to be static. Our analysis focussed on information of low complexity that is transferable between farmers via single encounters as simple contagion. This implies that the validity of our results is restricted to this type of information. .
At the same time, these limitations can stimulate further research. The role of alternative information sources may be considered for further investigation. Furthermore, a systematic testing of additional criteria such as di usion centrality and combination of criteria, strategies that consider individual characteristics like persuasiveness, and, in this context, the role of village heads and gender are worth of note (Hinz et al. ; Banerjee et al. ). Future research should also assess ine iciencies that may occur in the transfer process, such as declining values of information, lack of incentives to pass information on or individual resistance to knowledge transfer due to competitiveness between farmers. Therefore, it would be interesting to evaluate ways to enhance not only seed selection, but also the transfer process itself by testing di erent interventions and comparing their cost-e ectiveness. The simultaneous consideration of several factors as prerequisites for adoption could increase our understanding of adoption decisions. In general, more empirical research is needed on comprehensive and system-level approaches to improve information dissemination (Aral et al. ; Barbuto et al. ).

Appendix B: ODD-protocol
The description of the ABM follows the ODD (Overview, Design concepts, Details) protocol as presented in the subsequent section (Grimm et al. , ). The model was implemented using NetLogo . . .

Overview
Purpose: The purpose of the model is to simulate the information di usion by word-of-mouth communication between households in rural Zambia. Thereby, the model aims at identifying the set of seeds that enhances information spread the most. It addresses policy-makers, extension o icers, and project managers to support them spreading information within a social network e ectively when resources limit the direct contact with all farmers.
Entities, state variables, and scales: The acting entities in the model are households represented by their household head. They correspond to the interviewed households in the study area. Households are defined by a variable describing their state of information. For the village heads, a variable indicating their hierarchical status is introduced. GPS data of the households provide input for the model environment, which reproduces the original village structure. Households can be connected via links. The links include variables indicating whether the respective households discuss agricultural matters and how o en they exchange agricultural information. One simulation run includes steps and represents a year.
Process overview and scheduling: Each model run starts with the initialization procedure (see initialization). The model proceeds in weekly time steps. During each time step, the households can exchange information as illustrated in Figure : The already informed households randomly pick another household to whom they are connected through an information link. If the other household has not received the information yet, the acting household may inform the other household: The more frequently the two households discuss agriculture among each other according to the survey, the higher the likelihood of information transmission. In case of successful knowledge transmission, the newly informed households update their status. At the end of each time step, plots and global variables are updated.

Design concepts
Basic principles: The model employs an empirical approach to simulate information exchange between households. The application of empirically grounded behavioural information based on the household census re- quires no further assumptions regarding a specific behavioural theory, and the model resembles reality. The information is assumed to be non-rival in consumption and to have low complexity so that one-time contacts between farming households su ice for successful transmission (simple contagion) (Centola ).
Emergence: System dynamics emerge from actions on the household level. The individual behaviour of households represented by the household heads in the model replicates the patterns observed in the survey coupled with some randomness. The network itself is static.
Adaptation: Decision rules are implicitly modelled based on the empirical data on information exchange from the census.

Objectives:
The households do not follow an explicit objective; they act according to the behavioural data collected.
Sensing: Households know whether they have received the information, to whom they are linked, and with whom they discuss business. They are also aware whether their contacts have obtained the information.
Interaction: Interaction occurs every time step through direct word-of-mouth communication during which households can exchange information.
Stochasticity: Households choosing other households to interact within each time step takes place in a random order. If a household is connected to several households, the choice of a household to exchange information with is random with equal likelihoods. Additionally, the probability of information exchange between two households depends on the frequency of discussion as indicated in the survey, but is also subject to stochasticity. Randomness can also occur during seed selection.

Observation:
The model output includes data regarding the reach (total number of households informed) and speed (rate of di usion, number of households informed monthly). Thereby, reach provides information about final spread; speed provides information about progress over time. The interface presents several graphic outputs that visualize the dynamics of the model.

Details
Initialization: To initialize the model, households (agents), their characteristics (location, hierarchical status) as well as the links and their features describing the exchange of agricultural information are set up matching the survey data. Initially, all households set their status to "uninformed"; only the households selected as seeds set their state to "informed". Global variables are set to zero.
Input data: A household survey conducted in in Zambia provides the data used as input. The data includes household and network characteristics. GPS-data of the households spatially locate the households.
Interventions: Three scenarios are conducted to find the most eligible set of seeds for improved information di usion within a social network.
. The first scenario examines the impact of selection criteria for seeds on the di usion process. Selecting seeds is based on the following selection criteria i) random ii) hierarchy (village heads) iii) degree centrality iv) closeness centrality v) betweenness centrality and vi) eigenvector centrality. If seeds are chosen according to a centrality measure, isolated households are rejected as seeds. For closeness-centrality, the intra-component closeness of a household is calculated, and thus considers only the distances to the households that are part of the same component, since distance to households in other components is undefined. The closeness centrality of an isolated household is defined to be zero. While comparing these seed selection criteria, the number of seeds is kept constant at eight in the first scenario. This corresponds to approximately % of the whole population (Leeuwis ; Natcher et al. ). Because seven village heads administer the study area, in the case of hierarchy-based seed selection the number of seeds is held constant at seven.
. To assess the e ect of the number of seeds on the di usion process, the number of seeds in the setup varies in the second scenario from two to by increments of two, which corresponds to up to % of the whole population (Bampo et al. ; Erlandsson et al. ). Thereby, the seed selection is random.
. Scenario investigates the interaction e ects between the selection criteria of the seeds and number of seeds by simultaneously changing the seed set size and selection criteria as summarized in Table . Parameter

Function Options
Number-entry-points Indicates the seed set size From to ; increment by Choosing-mechanism Indicates seed selection criterion Random Degree centrality Eigenvector centrality Closeness centrality Betweenness centrality Hierarchy (village heads)