Utility , Impact , Fashion and Lobbying : An Agent-Based Model of the Funding and Epistemic Landscape of Research

The paper presents an agent-based model of an evolution of research interests in a scientific community. The research epistemic/funding landscape is divided into separate domains, which di er in impact on society and the perceived utility, which may determine the public willingness to fund. Scientific domains also di er in their potential for attention grabbing, crucial discoveries, which make them fashionable and also attract funding. The scientists may ‘follow’ the availability of funds via a stylized grant based scheme. Themodel includes possible e ects of the additional public relation and lobbying e orts, promoting certain disciplines at the cost of others. Results are based on two multi-parameter NetLogo models. The first uses an abstract, square lattice topology, and serves as a tool to understand the e ects of the parameters describing the individual preferences. The second model, sharing the internal dynamics with the first one, is based on an actual research topics map and projects statistics, derived from the UK Research Council data for 2007–2016. Despite simplifications, results reproduce characteristics of the British research community surprisingly well.


Introduction
. Science as a social process has been the topic of agent-based models (from now on, ABM) studies for a long time.
The model covered aspects of the research organization and activities.For instance, a lot of attention has been paid to citation networks and bibliometric statistics (Watts & Gilbert ; Zhou et al. . Discussing the utility and societal impact dependent funding decisions, we should remember that the scientists are, by and large, very intelligent people.They are capable of influencing (not to say manipulating) the system.A er all, the senior researchers o en serve on the very committees that decide the funding allocation, or, at the very least, on important advisory boards.Any discussion of the science related appropriations should, therefore, include the e ects of lobbying, as well as the advantage that one research community or domain may get over the others, less e ective in their lobbying e orts.This leads to situations where the amount of funding devoted to specific topics and technologies exceeds not only the short term application revenues, but also long term estimates of future benefits.In some cases, such over-funding is due to a natural interest in the topic, in some others it is a result of a successful lobbying (an example is readily provided by the graphene research ).
. The decisions of individual scientists and research teams regarding the choice of topics are not based solely (or even predominantly) on funding availability.Curiosity, desire to make a lasting contribution and for recognition are just as important.Therefore it is fortunate that not all research policies are decided with utility and impact in mind.Some research fields are still driven by excitement due to breakthrough, paradigm-changing discoveries.Scientists are sometimes able to convince the governments or companies that some domains of basic research, even those without a visible link to the economic growth, are still worthy of increased attention and funds.The recent experimental discovery of gravitational waves may serve as an example.The 'fertility' of a research field, measured by the expectations for such attention-grabbing discoveries, is an important factor when choosing the career paths of young scientists, and weighs in decisions related to funds distribution, especially in basic science.There are numerous examples of research directions that gain popularity (and research jobs) because they are 'fashionable'.In some cases, these fashions, leading to the virtual exclusion of the competitive investigation routes, have attracted bitter criticism and heated discussions (for example in the case of the string theory and its criticisms (Smolin ; Woit )).
. The financial landscape, determined by the interplay of the forces described above (as well as others, not mentioned) is superimposed on the epistemic landscape (Ahrweiler et al. ).The concept is rather vague and may di er between 'large-scale' disciplines.The scope of a single domain in the epistemic landscape would likely be di erent for physics, biology or sociology.Weisberg & Muldoon ( ) have focused on a scale where a domain corresponds to the topic that a specialized research conference or advanced level monograph might be devoted to.While not always applicable, this is also a reasonable unit at which the funding preferences can be recognized, so it provides a natural common ground for combining the financial interests and the research subjects.The connections between the domains are rather complex and multidimensional, involving the similarity of the conceptual base, research tools and equipment, hierarchical dependence of the concepts, etc.Despite this complexity, the visual representation of the domain relationships as a simple two-dimensional map on which some of them are close to each other while others are more distant has the obvious advantage of easy, if not accurate, understanding.It allows to visualize the domain characteristics, including the amount of funds available, the number of scientists working on the related topics etc., so it is most useful for modeling purposes.

.
Looking at these complex problems also requires to consider the individual decisions of scientists, who are typically faced with the challenges of combining the desire to conduct meaningful and interesting research with the necessities of securing funds.The diminishing 'success ratio', observed in most funding programmes, with the number of funded proposals falling sometimes below single digits forces researchers to make tough decisions.Should they continue to try, yet again, to submit proposals in their current domain, or should they change the direction, learn new skills and move to a 'neighbouring' one, which might o er better chances of success?Because the e ects of these individual decisions take some time before bearing fruit, they may lead to all kinds of phenomena found in the classical studies of population dynamics, such as cyclical changes in 'population' of research disciplines or overgrazing e ects.Our goal is to introduce a model that is simple enough to allow a meaningful play with the parameters describing various aspects of the processes described above, yet rich enough to include the most important factors.
. We will first present the groundwork for an application of the dynamics tested in the conceptual model and then turn to a realistic research landscape and discuss the remaining requirements for a full-scale model.

Abstract research domains .
As noted in the introduction, the current model is built from two core components: research domains and individual scientists.In contrast to models focusing on creation of separate research fields due to the underlying similarities and di erences of the studied topics, such as the classical model of Gilbert with his concept of 'kenes' (Gilbert ), we are going to assume the existence of a certain number of related but distinct domains, located in a simple square grid topology on a D surface.Topics and research tools of the neighboring domains allow a scientist to move from one to a neighbour (Moore neighbourhood), with a minimal disruption of the research career (we assume that one year has to pass before the scientist can successfully apply for a research grant in the new domain). .
Each domain D characterized, at time t, by its capacity to attract funding V T OT (D, t) (research potential), which is a sum of three components.The first is the domain utility or impact on economic or societal needs, V U (D).
To simplify the situation we assume that V U (D) is constant over time, an assumption that is obviously not true for longer time periods.The importance and the economic impact of many research fields change over time, increasing or decreasing in accordance with social needs, external technological advances enabling progress and political factors such as wars.We note that technically it is possible to extend the model to any form of the time dependence of the parameters, but these would be probably di erent in di erent disciplines and social environments.V U (D) are distributed randomly from a uniform distribution ranging between V U min and V U max .
. The second component to the research potential at time t is due to the e ects of the crucial discoveries made in the domain D, V CD (D, t).Each domain 'contains' a finite number of such discoveries to be made L cruc (D, t), distributed initially randomly in a uniform fashion between zero and L max cruc .The initial value of L cruc (D, 0) is 'unknown' to both the scientists and the funding agency.The decisions are based on the actual number of such breakthrough discoveries made in each year, L act cruc (D, t). .
The two components of the research potential, V U (D) and V CD (D, t), are treated as independent.This reflects a dual vision of the motivation to conduct and support research.Curiosity driven research is determined by V CD (D, t), via the chances of making significant discoveries, as perceived by the scientists.On the other hand, the utility potential V U (D) corresponds to the public and governmental perception of the contribution that the discipline D can bring to economy or societal well being.Various combinations of high/low values of V U (D) and V CD (D, t) are possible.For example, astrophysics or archeology belong to the class of curiosity driven, low utility value topics.Technical studies of machine manufacturing or soil science are mainly utility driven.Modern biology and biochemistry o er examples of disciplines where there are many fundamental discoveries and, at the same time, important contributions to medicine.
. Within the model value of V CD (D, t) is determined in the following process.Once the number of crucial discoveries made in a given year L act cruc (D, t) passes a threshold value L T cruc , the domain 'grabs the attention' of the research community (and the funding agencies) and V CD (D, t) jumps from zero to T F V CD , where V CD is an adjustable model parameter and T F denotes the fashion multiplier.The fame of the domain is not permanent, however.A fashionable patch will decrease its appeal if the number of the crucial discoveries made in a year falls below L T cruc − 2. In such case the value of T F is decreased by one.Thus, a single 'miraculous' year, with no repeated significant advances can 'ensure' a linearly diminishing appeal, lasting at most T F years.On the other hand, if L act cruc (D, t) ≥ L T cruc − 2, the decay process is stopped, and if L act cruc (D, t) once again passes the L T cruc value, the fashion multiplier is reset back to the maximum value.Thus, truly 'fruitful' disciplines have a lasting advantage.
. The third component in the determination of the research potential is due to the e ects of lobbying.The lobbying process works as follows.If the number of the senior scientists N S (D, t) in the domain is greater than a clique size threshold N C and, at the same time, at least one breakthrough discovery has been made (so that the lobbyists have a 'story' to narrate), then with a probability p L the domain may achieve a special status.In such a case, there is an additional component to the research potential, V L (D, t), which is set at V L , another model parameter.For domains without the special status V L (D, t) = 0.As with fashion, the special status obtained via lobbying is not permanent.It vanishes if the number of the senior scientists falls below a minimum clique size value N min C or if more than T 0 years pass without any crucial discovery is made.In both cases, the modelled behaviour aims at reproducing the necessary conditions for a clique to successfully portray the domain as an important and active one.The total research potential is thus .In addition to replacements, there are new agents entering the system, so that the total number grows exponentially N T OT (t) = N T OT (0)(1 + r S ) t , with the growth rate r S being a model parameter.
. The new agents and the replacements are distributed among the patches in the following way.Certain fraction f F follows the combined e ects of utility, fashion, and lobbying, and are distributed randomly among ten domains with the highest research potential V T OT (D, t).The remaining ones do not look for the research potential maxima, but are, instead, distributed randomly among all domains where there is at least one senior scientist.The two classes correspond to the 'active' and 'passive' career movement of young researchers.The active ones are those who evaluate the funding chances and the current fashions, while the passive ones rely on the guidance of their tutors and PhD advisers, and follow the local interests of their universities.

Discovery process .
The research process that leads to discoveries is relatively simple in our model.At any time, all agents have a chance of making a discovery.The word should be understood very broadly, and we divide the discoveries into two classes: ordinary ones (leading to 'normal', low-impact papers) and the breakthrough, crucial discoveries, leading to a large number of citations and boost in both the individual career and the interest in the research topic.There are several possible situations.
• A junior agent with funding makes a discovery with a probability of p norm .
• A senior agent with funding makes a discovery with a probability increased by a factor k S > 1, thus the probability becomes k S p norm .
• For any agent (senior o junior) working without funding, the probability of making a discovery is decreased by a factor k U < 1.
• The chance that the discovery is a crucial one is given by L cruc (D, t)/L normalize , where the denominator L normalize is greater than the maximum possible value of the number of crucial discoveries within a domain, L max cruc , introduced in Paragraph . .The remaining discoveries are considered ordinary.
. Each time a crucial discovery is made in domain D, the remaining number of L cruc (D, t) is decreased by one.Thus the chances of making a breakthrough discovery diminish in time.

Funding distribution .
The model assumes an exponentially growing amount of funding available, given by M T OT (t) = M T OT (0)(1+ r M ) t , with M T OT (0) defined in 'grant units', su icient to fund a scientist for a period of two years (grant duration), and r M being another model parameter.
. This amount is divided among the patches according to the equation: Domains with no scientists receive no funds.Domains with the same number of scientists may receive di erent amounts, due to di erences in V T OT (D, t), which means that the success ratio in obtaining funding would be di erent.
. Once funding is distributed among the domains, agents may submit research proposals.The proposals are submitted by 'eligible' scientists, i.e. those who are not in a midst of a running a project, or those who have just migrated to a new domain and are 'learning new tricks' (see Paragraph .).The model does not distinguish between 'good' and 'bad' proposals, and the individual success histories, and assumes random award distribution between the applicants of the same type (junior/senior).Therefore, there is no provision for quality improvement, grantsmanship and learning -with the sole exception that the senior agents have their chances increased by a 'seniority advantage factor' Z S , (Z S > 1).If we denote the numbers of eligible senior and junior agents in the domain D by N S,E (D, t) and N J,E (D, t) then the probability of getting a grant by a senior scientist is and for a junior one We remind here that it is possible to conduct research without funding, albeit with lesser chances of success.

Agent migration.
. As already noted, unlucky agents who do not get their proposals approved for more than Y Q years are forced to quit the research.But even before this happens, the agents may change their 'strategy'.Some (chosen randomly among the unfunded ones with a probability 1 − p M ) remain in the same domain and re-apply for funding the next year.Others (probability p M ) decide to migrate to a neighbouring domain, facing a loss of a year in applying for funding due to the necessary training, but perhaps facing a decreased competition later on.
. As in the case of new agents, migrating agents may either follow their evaluation of the chances o ered by other disciplines or simply move randomly.Therefore, a fraction f F of migrants picks the neighbouring domain with the highest research potential, while the remaining ones just pick a random neighbouring domain, including the 'empty' ones.

Time evolution and starting conditions
.
The time evolution happens in discrete steps, each corresponding to a year.Each year the following actions are performed: . Updating the domain characteristics, in particular, the research potential and allocation of funds .Distribution of grants among the scientists .Conducting the research .Recalculation of the fashions among the domains according to the discoveries .Migration of unlucky scientists, ageing and retirement .
For the simulations reported in this work, we have chosen a specific form of the initial conditions, designed to mimic an expanding research field in which there are many 'virgin' research domains.At time t = 0 agents are randomly distributed among a central patch of the epistemic landscape (covering about 1/4 of its total area).We also assume that the starting age distribution is uniform.The latter choice most likely overestimates the number of senior scientists, but as the time passes it quickly evolves to a pattern in which the number of scientists decreases with age.

Model implementation .
The model architecture fits perfectly with the conceptual framework of the NetLogo modelling environment (Wilensky ).The two-dimensional research domains landscape corresponds directly to NetLogo patches, the agents representing the scientists to NetLogo turtles, ticks allow an e icient time evolution description.This has made the implementation of the model relatively quick.Moreover, the flexibility of the language makes further extensions possible.The NetLogo code of the model will be made publicly available.

.
The number of parameters of the model and flexibility in choosing their values allows simulating a broad range of situations.Depending on the relative values of the parameters determining the relative size of the components to the research potential V U (D), V CD (D, t) and V L (D, t), the agents group in di erent domains, in accordance with the natural intuition of 'following the money'.
. Figure compares the situation in year and year for a rather small landscape.The bright squares indicate fashionable domains, while the magenta ones indicate domains successfully lobbied for.These domains attract significantly larger numbers of agents than the neighboring ones. .

Figures -present examples of results
obtained for a much larger system, a er years of the evolution, with higher growth rates for both funding and the number of scientists, r M = 0.006 and r S = 0.02.

Towards a Realistic Model of Science
Empirical background for a refined model .The open nature of the model makes it possible to consider developments that would make it more realistic.Some of these possible enhancements focus on the dynamics of the system.The first of such directions would be an inclusion of the 'PhD/postdoc slavery' stage for very junior scientists, in which their choices and work are related to (dictated by) their senior partners.The second direction would be to change the assumption of the fixed number of crucial discoveries per discipline.The assumption has the advantage of being simple to program.However, it implies that once a research field exhausts all the crucial discoveries, it becomes 'boring' forever-a er.Yet, it is quite frequent that advances in other disciplines (new mathematical techniques, new instrumentation, interdisciplinary ideas) may 'revive' a field previously considered fully explored.It is possible to rewrite the model to an open ended configuration, in which the probability of making such discoveries is di erent for various disciplines, and diminishes in time -but remains higher than zero.While these improvements are possible, our attention is drawn to a more fundamental issue of a realistic description of the epistemic landscape.
. Our focus in the remaining part of the paper is on a di erent, much more fundamental model characteristic.This is the change from the abstract square grid topology to a network of correlated research topics/subdisciplines, based on the real world connections between them.The topology based on similarities between the research topics allows to describe the 'migratory' patterns of the researchers better: changing one's research focus is relatively easy within a domain (e.g. between landscape and environmental archaeology and prehistoric archaeology or between astronomy and space technology and the solar system physics) than between quantum gravity and general relativity and neurobiology (even though the latter are also sometimes viable).Such networks and science maps are already studied in the literature, e.g. ).There are multiple ways of determining which topics may be grouped together.The simplest, fixed classifications are based on the traditional macro-disciplines (physics, chemistry, biology. . . ) and their subdivisions.More advanced methods are based on automated identification of topics and their linkage via text analysis of publications, citation networks, and similar scientometric approaches.

.
For our current analysis we have chosen the data provided by the UK Gateway to Research website (http: //gtr.rcuk.ac.uk/), which contains a wealth of information on the research projects funded by the UK Research Councils (RCUK), stretching back to .The categories defining the research disciplines and subdisciplines are based on traditional categories, so it does not automatically recognize new or changed research topics yielding a dynamic taxonomy (Klavans & Boyack ( )).However, the network connecting them is constructed using the declarations used by proposal authors to define their projects.Specifically, the project proposals contain the keywords indicating of the relevant subdisciplines.The connections between the topics result, therefore, directly from the way the actual research is done, sometimes with surprising, interdisciplinary links.Each project listing research fields A and B strengthens the network link between these two fields: as a result, we obtain a weighted network connecting subdisciplines.There are links (with weights between and ), the resulting network is relatively dense, with graph density of . .)).We should note here that the GtR dataset contains only the successful projects, i.e. those that actually received the funding.The only way of estimation of the characteristics of the 'invisible' part of the research community activities (projects that did not receive funding) is via the average success ratios published by the six research Councils (http://www.rcuk.ac.uk/research/efficiency/successrates/, which are much less detailed).

Figures and present the
. Unfortunately, despite the relative richness of information provided by the RCUK (compared with some other funding agencies), there is quite a lot of data still missing or impossible to obtain using automated tools.Among the unknowns are the numbers of scientists working on each project (the data contains only the principal investigator name) and their age/seniority distribution.Monitoring the potential 'movements' of particular scientists between the research fields, even if we limit ourselves to the principal investigators requires manual work on over thirty thousand successful projects -and no way to monitor the unsuccessful proposals.Nevertheless, the GtR dataset seems well suited to provide a basis for a more realistic ABM of the dynamics of the research community.

Groundwork for the ABM based on the GtR discipline network
Discipline network topology .
Using the empirical network of the disciplines and subdisciplines defined by the GtR data instead of the abstract square grid is the easiest part of the transition from simplistic, conceptual domain to a semi-realistic setting.
The import of the topology and characteristics of the research fields and their relationships is done using the graphml format.The movements of agents who are unsuccessful in getting the funding in their current field are .There is notable shi in the funding from basic science to domains better connected with industrial applications, for example to material characterization, biosciences and computing.At the same time the high energy physics/astrophysics (violet) has su ered significant reduction in funding, much more visible than in the case of the number of projects.
now restricted to the domains directly linked with the current subdiscipline, with the probability of choosing driven by the neighbouring subdiscipline popularities.

.
We note here, that the topology of the network is assumed to be fixed -neither new disciplines nor new links are created during the simulation runs.The dynamics of the system is thus limited to the numbers of agents working on specific subjects, their movements between the research topics, and the related numbers of the crucial and normal discoveries, funding allocation changes etc. -determined exactly in the same way as for the square lattice.
. Figure presents a snapshot of the simulation program based on the GtR data.The size of the nodes corresponds to the number of the projects.The colour indicates the funding: the lighter the color the higher is the average funding received during -period.

Initial conditions .
The square lattice topology model has not been intended to resemble neither the actual landscape of research nor the starting conditions.The aim of the initial random distribution of the agents in the central group of patches (Figure , le panel) was to emphasize the existence of the 'unexplored' part of the landscape.The situation is quite di erent in the case of the model based on the network of actual subdisciplines.The use of unrealistic initial conditions would destroy the gains due to the more realistic landscape description.For this reason, the preparation of the starting configuration must be considered very carefully.Moreover, it requires considerable amount of datamining and guesswork.
. The first challenge is the initial distribution of scientist-agents between domains.As the GtR landscape is constructed from research projects, which typically list only one or two principal investigators, the number of projects in a subdiscipline is only a poor proxy for the number of scientists working in the field (and, as we have noted, only of the successful ones).There are systematic di erences in the typical number of scientists actually working on research projects in various disciplines.This may vary from thousands (typical for advanced experiments in elementary particles), through mid-sized teams in biology or medical sciences to single scientists, found in some humanities.The lighter the colour of the node the higher is the social utility of the topic, calculated as the average of the funding in the period.
. One could be tempted to use the amounts on money provided to each project as an estimate of its size (assuming standardized yearly researcher costs).However, such approach requires to know the part of the project budget devoted to researchers and not to the infrastructure, equipment, materials and other non-personnel costs.These di er considerably not only between di erent subdisciplines (e.g. between astrophysics and psycholinguistics) but also within a discipline (for example, when one project uses equipment already paid for and the other includes, in its budget, some new and expensive infrastructure).Taking into account the number of projects in the GtR database, the task of estimating the starting numbers of scientists for each subdiscipline realistically requires extensive data analysis.Even of successful, the information would cover only the scientists who were successful in their grant applications, which, depending on the year and discipline are small parts of the whole community.
. A solution to the problem, lacking finesse, but useful as a starting point for the model development, is to use a simple rule that the initial number of the scientists in each domain is equal to the number of projects devoted to this domain in one of the early years of the GtR database, corrected by a success ratio coe icient.Further simplification is provided by the use of a single success ratio for all research disciplines.Such approach allows e ective, if not deeply realistic initial distribution of the scientists-agents.
. The next challenge is related to an estimation of the societal and economic utility of the research subdiscipline.The square lattice model simply assumed a random distribution of the V U (D) values, constant in time.In a fully realistic model such random distribution is obviously not applicable.Some disciplines are considered to be providing greater value to the society and, therefore, worthy of greater funding.This value changes in time, not only thanks to the crucial discoveries (included in the model) but also due to other factors, such as changing societal needs (e.g. the need to solve specific problems in health and defense industries or to respond to climate change).The latter changes are not modeled in our approach.
. The proposed solution is to use the actual amounts of funding devoted to each subdiscipline, averaged over the year period, as measures of the societal utility.These values are then re-scaled to the model parameter range.The total research potential would then be given, as before, by Equation .For the future version of the model, the baseline V U (D) values could be made time dependent.

Initial results
.
The model has a significant number of adjustable parameters: probabilities of certain events (such as grant proposal being funded or an agent decision to change the research topic, probabilities of normal and crucial discoveries) and the general settings, such as the average success ratio or the number of scientists in the system, fashion benefit due to crucial discoveries and lobbying, etc.With a very large number of topics and projects and the reliance on a guesswork in the choice of the parameters, it is not reasonable to expect a faithful reproduction of the evolution of the individual popularity of specific topics.) and late period ( -).While the distributions for the most popular topics are almost indistinguishable, the distributions for topics with to projects di er significantly.Black line is the power-law fit for the most popular topics, with an exponent of −2.54±0.5, similar for the three periods.The choice of the particular division of the timespan into the three periods is explained by the inset figure.It shows the number of topics in a given year which have attracted less than project.A rapid change happening around indicated a major change in the funding system, and separates two regimes: one favouring concentration of grants and neglect of 'obscure' subdisciplines and the one in which there is preference to these less popular subjects.

.
At the current stage, the model allows, however, to reproduce certain statistical properties of the UK science funding landscape.Once the basic network structure and the relative initial popularity of the subdisciplines are reproduced (see Figure ) we were able to track the time dependence of the statistical distribution of the popularity of the topics (measured by the number of grants).
. Figure presents the actual statistical data as derived from the GtR dataset.The inset of the figure indicates that during the -period there were two very distinct periods.The fist one, until , was characterized by a relatively large umber of subdisciplines which had less than one active project.These 'forgotten' or 'niche' topics constitute almost % of the subdisciplines.Beginning , the number of the forgotten topics has become much smaller -in the %-% range.

.
When we average the distribution of the number of projects per subdiscipline (main part of Figure ) , we also observe significant di erences between these two periods.The distributions for the most popular topics (attracting more than projects) remain almost unchanged in time.They decrease in accordance with a power law, with an exponent of −2.54 ± 0.5.On the other hand, the values for the medium-popularity subjects (attracting between and projects) is significantly di erent for the two periods.In this context, may be recognized as a transition year.

.
While there is no direct 'smoking gun' evidence for a single decisive action in the UK research funding practice around -, which would lead to such step-wise decrease in the statistics of the 'forgotten' topics, there are some possible connections.First, most of these subdisciplines were new.Their creation (invention) might be due to the desire to fit within the new, multidisciplinary focus.The changes in the British industrial strategy introduced in , which has led to the 'Eight great technologies' initiative (http://www.stfc.ac.uk/research/engineering-and-enabling-technologies/the-eight-great-technologies/).The technologies that are the focus of the initiative: advanced materials, agricultural science, big data, energy storage, regenerative medicine, robotics and autonomous systems, satellites and commercial applications of space, and synthetic biology force a view of science di erent from the traditional domains (such as physics, chemistry, so- ) was e ected by changing of a single parameter: the relative probability of migration of agents to 'empty' topics.Note the change of the horizontal scale from Figure .Black line is the power-law fit for the most popular topics, with an exponent of −2.44 ± 0.2, similar for the three periods.ciology etc.).It is reasonable to expect that to address the changed preferences in the funding programmes, the proposal authors would tend to list more diverse topics in their applications, leading to the increase of the list of the 'active' topics observed since .
. Despite the crudeness of the current version of the model, it has been possible, via careful choice of the model parameters, to reproduce the observed characteristics of the UK science landscape.Figure presents the results of a simulation corresponding to the statistical properties of the empirical data.Most parameters are chosen to reproduce the initial distribution of the topic popularity.The transition in is simulated by a change of a single parameter: the relative attention that the migrating agents pay to the 'empty' topics, i.e. topics with no other agent present in their search for a more profitable or interesting field.The model reproduces quite well the general properties of the real system: di erent behaviours for the low, medium and high popularity topics and time dependence of the number of the niche disciplines.Even the exponent for the power-law describing the highest popularity subjects, equal to −2.44 ± 0.2, is very close to the observed one.The ease in which such a simple model modification leads to the behaviour closely matching the empirical observations was quite surprising and encouraging at the same time.

Conclusions
. Science as a social process is incredibly complex.Discoveries are unpredictable, especially the crucial onesopening new avenues of research -because science is an experimental endeavour that deals with the unknown.The individual decisions of the scientists, regarding the topic of their research are driven by varying combinations of curiosity, skills, risk acceptance or aversion, financial interests, obedience, etc.On top of the individual di erences, there are many levels of social influences, from the role of the senior scientists (spanning the range from gentle advisor to 'postdoc-slave' owner), institutional hierarchies up to the political decisions shaping the general research funding policies.
. As with similarly complex social phenomena, the ABM can only aim at a partial and statistical description of the observed characteristics.Our goal was to construct a model that would take into account some aspects of the individual behaviour and describe the resulting trends in the researcher mobility (defined as the change of research discipline), success rate and popularity of the research topics.The square lattice ('toy') model allows to focus on the e ects of the parameters representing the individual behaviours (such as the tendency to follow fashion, productivity di erences between senior and junior scientists etc.) and parameters statistically describing the research landscape (e.g.relative numbers of crucial discoveries or funding distribution due to societal needs).The abstract nature of the epistemic landscape prohibits any realistic comparison of the model results with the real world, but the model dynamics is 'reasonable', showing variations of the popularity of the subdisciplines comparable to those recorded by funding agencies in many countries.
. The more realistic model presented here, using mostly the same parameters as the toy version but based on the actual research topic list and project funding data from the UK Research Council could become a reasonable basis for further work, that would capture the contextual factors more precisely.The planned future developments focus on more realistic initial conditions and more careful treatment of the research teams and individual scientists, for example via the use of the bibliometric data of publications resulting from the projects.
. In looking into the future directions it is quite illuminating to refer to two editorial reviews, accompanying collections of papers devoted to ABM of science (Edmonds et al. ; Börner et al. ).During the six years separating the reviews modelling community has made significant progress, especially by using the improved data availability.Interestingly, both editorials stress the importance of the potential predictive use of science models.Achieving the goal of using ABM in data-driven, informed decision and policy making related to research, depends on the completeness of the models and on the quality of data on which they are based.We are not there yet.

.
The path envisaged in our approach, combining individual scientist motivation perspective with large scale social phenomenology (funding priorities, industrial use. . . ) may be a step in the direction of predictive models.Such models, including factors that come from outside of science's own perspective becomes even more important in the light of financial austerity and decreasing understanding of the role of science in society.Recent drastic budget cuts in Argentina (Kornblihtt ) and Brazil (Escobar ) show how fragile the system may be.

Figure :
Figure : Example of the graphical representation of a simulation run of the 'toy' model, for a relatively small (11 × 11) epistemic/funding landscape size.Le panel: year of the simulation, right panel: year .Patch brightness corresponds to increased values of the research potential of the domain.Very bright patches are domains that benefit from being fashionable, thus stand out against the others.The magenta patches are those for which the lobbying campaign has been successful.Green humanoid symbols figures represent junior scientists without funding, blue ones -junior agents with grants, orange symbols -senior scientists without funding, red ones -seniors with grants.
Figure showsthe exponential dependence of the number of discoveries made by the scientists.A er a su iciently long time the domain funding follows approximately power-law distribution, the scientists age an exponentially decreasing one, while the relation of the average number of scientists in a domain and its research potential may be approximated by a shi ed power law(Figure ).

Figure :
Figure : An example of the large landscape a er years of evolution and rather sparse population.The particular simulation instance does not include (at the specified time step) any 'special' domains, neither the fashionable nor successfully lobbied for.The largest populations of scientists are found on the domains with the highest values of the utility component to the research potential V U (D) -shown in light colour.

Figure :
Figure : Distribution of the number of discoveries per (living) agent for the system and year as in Figure .Black markers: all discoveries, red points: crucial discoveries.The vertical axis is logarithmic, to emphasize a roughly exponentially decreasing partial distribution function, showing as straight lines.

Figure :
Figure : Some statistical properties of the system shown in Figure .Panel A: age distribution of scientists, together with an exponential fit.Panel B: distribution of the funds among the domains, together with a powerlaw fit.Panel C: scatter plot of the number of scientists as function of the domain funding, together with a shi er power-law fit.Panel D: scatter plot of the success ratio as function of the research potential.

Figure :
Figure : Examples of the time evolution of the scientist population in specific research domains.Red lines (right vertical axis) -research potential, black lines (le vertical axis) -the number of agents.Panel A: a domain that lies outside the initial distribution range, but has strong utility potential V U (D).At time t = 10 it becomes fashionable, at time t = 23, it is successfully lobbied (for the period of three years).The population continues to grow for some years and then declines somewhat.Panel B: domain within the initial distribution, but with low utility.Despite the fact that at time t = 4 it briefly attained the special status due to lobbying, the population diminishes rather rapidly.

Figure : .
Figure : An example of the time evolution of the popularity of several patches.One can distinguish disciplines for which the popularity vanes (either almost to zero -dark blue and light green lines or to a stable leveldark green line), disciplines which slowly become popular (red line).The popularity of the individual topics fluctuates slightly in time.The figure qualitatively resembles the statistics of funding for disciplines in the real world.
visualization of the network, with the size of the nodes corresponding to the number of projects funded in and .One can easily discern the changes in the popularity of various fields.

Figure :
Figure : Visualization of the network of connections between distinct research subdisciplines in the UK research Council GtR database.The projects were automatically grouped into macrodiscipline categories by a modularity algorithm (Blondel et al. ( )), denoted by di erent colours.The size of the node corresponds to the number of the projects funded in .The details of the network in the graphml format are available in the supplementary materials

.
The choice of the GtR data has been motivated by the richness of the database and the existence of well-defined APIs, allowing the use of automated datamining.It has been possible, for example, to determine the temporal changes of the number of projects in each subdiscipline, and to create a dynamic map of the changing landscape of projects funded by the RCUK.This is important because only a limited number of previous works on the 'maps of science' have gone beyond single time-frame snapshots (e.g.Mane & Börner ( ); Porter & Rafols ( ); Small et al. (

Figure :
Figure : Visualization of the network of connections between distinct research subdisciplines in the UK research Council GtR database.The size of the node corresponds to the funding awarded for.There is notable shi in the funding from basic science to domains better connected with industrial applications, for example to material characterization, biosciences and computing.At the same time the high energy physics/astrophysics (violet) has su ered significant reduction in funding, much more visible than in the case of the number of projects.

Figure :
Figure : Visualization of the network of connections within the NetLogo model of the UK research.The size of the node corresponds to the number of projects (agents are indicated graphically within the nodes).The lighter the colour of the node the higher is the social utility of the topic, calculated as the average of the funding in the period.

Figure :
Figure :The distribution of the popularity of research topics, measured by the number of projects related to a topic in the GtR system.Main figure: log-binned distribution for three time periods within the timespan: initial period ( -), transition year ( ) and late period ( -).While the distributions for the most popular topics are almost indistinguishable, the distributions for topics with to projects di er significantly.Black line is the power-law fit for the most popular topics, with an exponent of −2.54±0.5, similar for the three periods.The choice of the particular division of the timespan into the three periods is explained by the inset figure.It shows the number of topics in a given year which have attracted less than project.A rapid change happening around indicated a major change in the funding system, and separates two regimes: one favouring concentration of grants and neglect of 'obscure' subdisciplines and the one in which there is preference to these less popular subjects.

Figure :
Figure : The distribution of the popularity of research topics, measured by the number of projects resulting from a model run with parameters selected to qualitatively reproduce the data from Figure .The main figure and the inset correspond to the previous figure.The transition in the year of simulations (meant to correspond to) was e ected by changing of a single parameter: the relative probability of migration of agents to 'empty' topics.Note the change of the horizontal scale from Figure.Black line is the power-law fit for the most popular topics, with an exponent of −2.44 ± 0.2, similar for the three periods.