Synchronizing histories of exposure and demography : the construction of an agent-based model of the Ecuadorian Amazon colonization and exposure to oil pollution hazards

Since the 1970s, the northern part of the Amazonian region of Ecuador has been colonized with the support of intensive oil extraction that has opened up roads and supported the settlement of people fromOutside Amazonia. These dynamics have caused important forest cuttings but also regular oil leaks and spills, contaminating both soil and water. The PASHAMAMA Model seeks to simulate these dynamics on both environment and population by examining exposure and demography over time thanks to a retro-prospective and spatially explicit agent-based approach. The aim of the present paper is to describe this model, which integrates two dynamics: (a) Oil companies build roads and oil infrastructures and generate spills, inducing leaks and pipeline ruptures a ecting rivers, soils and people. This infrastructure has a probability of leaks, ruptures and other accidents that produce oil pollution a ecting rivers, soils and people. (b) New colonists settled in rural areasmostly as close as possible to roads and producing food and/or cash crops. The innovative aspect of this work is the presentation of a qualitative-quantitative approach explicitly addressed to formalize interdisciplinary modeling when data contexts are almost always incomplete.


Introduction
. Since the s, the northern part of the Amazonian region of Ecuador has been colonized with the support of intensive oil extraction companies that has opened up roads and supported the settlement of people from Outside Amazonia. These dynamics have caused important forest cuttings but also regular oil leaks and spills, contaminating both soil and water. The still ongoing trial occurring since opposing Texaco and the Ecuadorian government is the major conflict regarding petrol-related contaminations but other more recent and ongoing contaminations are a ecting the Ecuadorian amazon. .
However, present-time oil-related contaminations are part of the progressively acknowledged small but longterm expositions to pollution. The point there is that collecting significant information regarding exposition needs a data collection for far more time and even until the generation scale. Do we have to wait until enough death data are collected for assessing and evaluating contamination dynamics? This issue is even more crucial for areas and countries where few data sources are available at the relevant scale, meaning that, once used as inputs in models, there are no more external data for confrontation and confidence building purposes. We here propose to construct such a model as a platform for future evaluations regarding coexisting socio-economic and contamination dynamics. .
The aim of this article is to present the PASHAMAMA model that studies the colonization and exposure of the territory of the RAE (Región Amazonica Ecuatoriana) and population to petroleum pollution hazards. As a consequence, many dynamics have to be taken into account together, ranging from the purely physical (the physical dynamics of the environment related to pollution spread) to the social, economic and institutional factors (the colonization process over years). Our interest focused on the interactions and influences between these dynamics and on the possible emergent patterns rather than on a precise study of each independently. We have thus chosen to consider this study object as a complex system (Mitchell ) and more specifically a socio-environmental system in the sense of Ostrom ( ). Modeling and computer simulation are appropriate approaches to studying these kinds of complex systems and have had great success in the past few decades (Gaudou et al. ; Parrott ). More specifically, we chose the Agent-Based Modeling (ABM) approach (Treuil et al. ) among many others (Bousquet & Le Page ), for both its generative feature (macro phenomena generated from micro behaviors) and as it favors interdisciplinary work. The innovative aspect of this work is the presentation of an approach explicitly dedicated to formalizing interdisciplinary modeling in an almost always incomplete data context, with both qualitative and quantitative information. .
The article is organized as follows. Section presents the context of the study: it introduces the case study site and the previous studies of the oil exposition in the area. Section describes the model following the ODD protocol and Section presents the results of the simulation. Section discusses both the results and the method (and in particular the validation of such a model). Finally, Section concludes the article.

Context
The study site: The Región Amazonica Ecuatoriana . This work focuses on the Amazonian part of Ecuador, informally named The Oriente and administratively called the Region Amazonica Ecuatoriana. This area is one of the poorest of the country: the poverty indices in this region are among the lowest of the country, and the Gini index is the highest (Murphy et al. ; Hentschel et al. ). Moreover, it contains a large portion of indigenous people (labelled nacionalidades: Huaorani, Kichwa, Shuar, Siona-Secoya, and Ai'Co'fan, altogether representing . % of the population in Sucumbios but . % in Orellana, the two northern Amazonian provinces). .
More precisely, the study sites (Table ) are located in the two northern provinces of the RAE, Sucumbios and Orellana. Three sites (mapped in the Figure ) were chosen in agreement with other partners of the research project under which this investigation is conducted, the ANR-funded (Agence Nationale de la Recherche: the French National Research funding Agency) Franco-Ecuadorian research and development project called Monoil, using the criteria in the Table . .
From the first petrol discovery in in these provinces until the beginning of the conflictual and trial era ( -) Juteau-Martineau et al. ( ), this territory lived practically in a "Texaco era" where, due to the weakness of the central State, it was more an "inner colony" Díaz & Bolívar ( ) than an integrated part of  the national territory, as the company played a central role in local governance, exploiting the petrol resource with a policy vis-à-vis neighboring communities that combined threats, violence and bribes and that had a disastrous impact on the environment and human communities. During this era, the North of the RAE was also the object of a colonization plan supported by the various central governments of that time in an e ort to "export" the surplus of peasants of the Sierra and the Costa, most of whom lacked land tenure: According to the agropastoral census of , . % of farms covered only . % of farmland, with less than ha per farm, while . % of farms occupied . % of total agricultural land, with more than ha per farm (Naranjo Chiriboga ). The plan was supported not only by two laws ( and ) that were more a support for colonization laws than for agrarian reform (Twenty-three percent ( , km 2 ) of the national territory vs % ( , km 2 ) for the redistributed land thanks to the agrarian reform (Gondard & Mazurek ). These laws led to the creation of the IERAC (Ecuadorian Institute of Agrarian Reform and Colonization), which organized this colonization) but also by the opening of gravel roads into the forest and connecting oil wells by the Texaco petrol company following an agreement with the government: This means that the current spatial structure of the colonization reflects the organization of the geological resources more than the potentialities of the surface. Primary forests were therefore exploited for their wood and colonized along and around these roads and tracks: each family of colonos received approximately hectares and had to clear at least half of it for agricultural purposes. The forest lost territory while indigenous communities regressed, either changing their way of life or disappearing. However, progressively e ective applications of the law of comunas ( ) leased parts of indigenous territories to natives (both locals such as Huaorani, Siona-Secoya, Ai'Co'fan, many Kichwas and those coming from southern provinces, such as some Shuar families), o ering some protection thanks to collective land tenure. Since and the time of conflict, the Texaco company has been more or less expelled thanks to this famous court trial, but oil exploitation has continued and even spread all over the provinces.
Past studies on land cover, population and contamination in the Ecuadorian Amazon .
Many investigations have been conducted previously in this area to evaluate environmental degradation in the Amazonian part of Ecuador. These harms were evaluated individually as separate threats.
. Contaminants other than those originating from petroleum-related industries have been the focus of several works over time: e.g., mercury (Requelme et al. ) in the Zamora-Chinchipe province and pesticides (Hurtig et al. ). A more global dichotomy regarding sanitary infrastructure spatial and social accessibility but also disease sensitivity between colonists and indigenous people was explored as being reminiscent of past epidemic shocks (Larrick et al. ; Kaplan et al. ; Pan et al. ).
. Petroleum progressively became the major contaminant studied in the Ecuadorian Amazon since Kimerling ( ), who explored the oil production dichotomy between existing laws and reality: this theme expanded suddenly a er (Kimerling )  , ; Mena et al. ). Four LUCC dynamics were then explored as "both causes and consequences of (a) road development, (b) agricultural extensification and land abandonment, (c) major shi s in world markets and crop prices, and (d) urban expansion of the central city within the region" (Messina et al. ).
A simple but meaningful reconstruction of exposure to oil contamination and demography over time .
Prior to establishing national law to control oil exploitation in , including waste disposal, oil companies did not undertake measures to protect the environment; their main goal was to maximize production and revenue. A large part of the untreated drilling wastes, including formation water and drilling muds, were directly dumped into the environment in open and unlined pits. This practice did not prevent the waste from leaching out into the environment as the pools degraded or overflowed during rain events. Approximately separate waste pools have been found in the Ecuadorian Amazon (NGO Acción Ecológica, pers. com.
), of which are referenced geographically ( in the two provinces we study) and o en historically (in , . % ( . % in the two provinces we study) are not yet registered as having been remediated by the dedicated department in the Ministry of Environment (MAE): the PRAS, the Programa de Remediacion Ambiental y Social). Moreover, pools are also used to contain spill products. According to local testimonies, these practices still occur, although less frequently, even if they are forbidden. Serious health and environmental damage near oil fields have been identified (Sarria-Villa et al. ). .
PetroEcuador and PetroAmazonas, the public oil companies, have taken several remediation actions regarding these pools in recent years, along with monitoring by the Ecuadorian Ministry of the Environment (MAE) under the Amazonia Viva Program but executed by PetroAmazonas. From -, oil contamination sources were remediated, including pools and holes, and validated by the national authorities (MAE-PRAS data ). Each step in the oil and gas exploitation process causes environmental and social impacts. In fact, whether it is field exploration, drilling, refinery or transportation, each a ects the biosphere to a di erent extent. Deforestation, accidental oil spills, especially along the trans-Andean pipelines (Both SOTE and OCP, altogether designated as the poliducto), and industrial waste discharge are some of the main chronic or accidental impacts caused by oil activities. The release of chemicals, including polycyclic aromatic hydrocarbons (PAHs) and trace metals (TMs), to the environment due to oil activities is one of the major harms caused, a ecting human health (San Sebastián & Hurtig ; Kuang-Yao Pan et al. ) and the environment (Finer et al. ). In particular, PAHs are considered priority pollutants by the US Environment Protection Agency (US CFR, ). Even though several large-scale pollution incidents linked to well explosions or leaking pipelines have been described (Finer et al. ), it can be said that no exhaustive environmental impact assessment has been published concerning chronic contamination. In the s, several small studies coordinated by NGOs (like Acción Ecológica) identified local contamination of natural waters used for domestic purposes in the Sucumbíos Province (Pacayacu), but they only focused on the contamination risks of the US EPA-recommended PAHs without taking into account toxic volatile molecules or trace metals (Zhang et al. ). The analyses were conducted in Ecuadorian laboratories, and no information regarding the validity or accuracy of these results is available. Until , no large environmental study had been performed in the Ecuadorian Amazon to evaluate the environmental impacts of long-term oil activities and infrastructure presence in the region, especially on river quality and associated health risks. .
In this context, the IRD-CNRS consortium , in partnership with several French and Ecuadorian partners, launched the MONOIL program. This transdisciplinary project studies the impacts of oil activities in Ecuadorian territory on the environment and society at di erent levels, including geographical, political, economic and social levels (Houssou ; Chapotat ). Three petroleum areas are considered in northeastern Ecuador, each of which is a ected by oil exploitation. No other studies to date have approached these specific subjects in Ecuador. Within this project, the purpose of this paper is to present a model that aims to assess the historical and spatial exposure of the colonizing population to past oil contaminations through a simple confrontation of social and environmental dynamics spatially impacting each other.

Contribution and novelty of this work within the research community of other simulating coupled human-natural systems .
Di erent modeling approaches have been used to analyze social and economic questions of the rural world as initially reviewed by (Lambin et al. ) among others. Spatially-organized Agent-Based Models were found particularly useful for simulating the intricate and multidisciplinary complexity of any local reality. Many publications have been produced intending to assess a state-of-the-art such as (Abar et  : They can manage data integration gaps such as the quantitative vs. qualitative one, the micro vs. macro level one (with micro level data such as typologies of farmers and macro level data such as soil or elevation maps) but also to formalize the combination of disciplines (Rouchier & Requier-Desjardins ). They are particularly relevant to empirically describe the behaviors of populations without forcibly the need to define their rationalities upon predefined paradigms (Janssen & Ostrom ; Saqalli et al. ; Schutte ). As a result of this adaptability and totipotence, one may consider this work as an other case study regarding socio-ecological systems with the particularity of a site with a recent and somehow welldocumented history and both pollution and colonization/deforestation issues. However and using the GIS (Geographic Information System) capacities of the GAMA platform (Taillandier et al. ), we here tend to settle a methodology of combining qualitatively-acquired farmer-level information and quantitative spatial and historically-rebuild demographic data: the main question in this article is the way a socio-ecological dynamic evolves taking into account various factors of which the human transformation power is crucial. In the present study, the object of our modeling is the population itself and the territory it overturns as a global system, i.e. the definition of the socio-ecological system. Two issues are thereby here explored following many studies using a case study for exploring methodological issue following an anthropologically-settled process where the case creates the issue: Here we explores both the case of a complex present-time situation that has the advantage to be easily positioned regarding time and the methodological issue of the combination of quantitative data on human impacts and qualitative data regarding rationality of human processes inducing these impacts.

Model Description
. This section is dedicated to the description of the model using the standard O.D.D. (Overview, Design concepts, Details) protocol (Grimm et al. ).

Purpose .
This model aims at studying the colonization and exposure of the RAE territory and population to petroleum pollution hazards, focusing on the interactions and influences of purely physical (the physical dynamics of the environment related to pollution spread) with social, economic and institutional factors (the colonization process over years). Our interest is focused on the possible emergent patterns rather than on a precise study of each independently.
. Three case studies are considered: the parishes of Pacayacu, Joya de Los Sachas and Dayuma. Three sites were chosen to cover various combinations of the environment (quality of soils) and population (age of colonization) factors: good soils for Joya de los Sachas and poor soils and territory for Pacayacu and Dayuma; colonization in the 's for Pavayacu and Joaya de Los Sachas and in the 's for Dayuma) (see Table ).
Entities, state variables, and scales .

Scales.
As detailed in Sections . -. , we consider three sites which have approximately the same dimensions of km by km. In this area, the smallest spatial unit chosen is the cell (a square of m by m) coming from the DEM (Digital Elevation Model) file with the best resolution. It is used in particular to let the water flows. It should be noted that apart from large streams, other streams are less than one hundred ( ) meters wide, which makes it possible to consider their representation as well as that of their immediate shoreline within a cell. This spatial scale is the basis of the cellular automata component of the model. It can also be noted that this cell area is close to hectare, which is meaningful from an agricultural point of view. .
The simulations are launched from the st of January until the st of December . The minimum time for a farming step (i.e., weeding a one-ha pasture) is about one month, and we use monthly rainfall data from the WorldClim project (Hijmans et al. ). The choice of this time step is also justified by the fact that the stream flows are high enough to allow for the capture of the total discharge of the water resulting from precipitation, as well as the occasional pollution, in a period of one month. In addition, colonization and demography data were processed on a monthly basis. Therefore, we used a one-month interval for our model.

.
Entities. The integrated model has been subdivided into an environmental sub-model, comprising the contamination and the hydro-geomorphological sub-models, and a socio-economic sub-model ( Figure ). As the model is spatially explicit, each agent is located (has a location attribute) and has a geometry (shape attribute, that can be a point, polyline or polygon).
. The environmental sub-model is responsible of managing the water flow and flooding over the water catchment, the oil leaks occurrences and its spread in the environment. The model is spatially distributed following the Digital Elevation Model (DEM) and thus composed of a grid of Cells agents that contain, among others, information about the average height in meters of the corresponding land use, soil type (fertility and flood sensitivity proportion), water volume and forest biomass. The water flows through the hydrographic network composed of connected Waterways , each of them carrying the "Strahler" stream order (Strahler ) to conceptually simulate river discharge and the associated propagation of oil contamination. Watersheds agents are composed of Cells and are used to compute water volume statistics. The rain_grid agents (also organized on a grid) will compute the rain falls from their mean and standard deviation precipitation attributes. Finally the Pollution_sources are represented by points or lines (pipelines) and will discharge oil leaks on the Cells on which they are located, given discharge probability, volume and duration attributes. .
The socio-economic sub-model is responsible of managing the population migration and settlement, and its demography. According to field surveys (Morin ; Béguet ; Morin & Saqalli ), the autonomous decisionmaking entity, that is capable of moving geographically, is the Household , that is the main kind of agents of this model part. A Household is an aggregate of Person agents, characterized by their sex, age, spouse and children Person attributes. These individuals become older, get married, get children or die; they thus are responsible of the demography of the settled population. They will also define the composition and manpower of the Household to make its activities. The main Household activities are agriculture (co ee, cocoa and food crops), livestock (both for sale and subsistence) and seasonal jobs, mainly in oil and oil-related companies. They will be managed by the Production_System agents. Once arrived on the area, each Household will be a ected a Plot (that can be privately an owned and managed property, named finca by colonist Households or larger reserves or communities that are commonly exploited by indigenous Households ). These Plots are an administrative border for a set of physical Cells , with an average area of ha. The choice of the Plot by each family will depend on the evolution of the Road network. Each Road agent is in one of three types (and will evolved over time): forest track, roads covered with laterite and gravel, and asphalt roads. To each type is assigned a specific average speed.
. Biophysical and demographical environment variables. The environment variables are both used in biophysical dynamics (such as the rain and the evapotranspiration rate, describing the rain water quantity reaching rivers) and demographical dynamics (in particular the reproduction, birth, death and migration occurrences used to describe the population dynamics). In addition, the speed on the various road types are also defined as global variables.

Process overview and scheduling
.
At each step of the simulation, first the processes related to the socio-economic sub-model are executed and then the ones linked to environmental sub-model.

.
First the Road agents update their state. They contain indeed the date when they changed they were built, when they changed their state from forest track to roads covered by laterite and from covered roads to asphalt roads. Then the colonization process occurs: new Household agents (with their set of individual Person agents) are created with the appropriate type (colonist or indigenous). They choose the best Plots for them and install on it. Finally, the demography is processed: Person agents are aging, some of them die, some of them get married and so new children are created. .
Once the socio-economic sub-model has been executed, environmental dynamics are performed. First the rainfalls are computed on each rain_grid agents and then applied on each Cells spatially overlapped by rain_grid agents. The water flows on the Waterways network and water volume are computed at the Watershed level. Finally, the contamination process creates new Pollution_source agents, let the Pollution_sources discharge pollution on the Cells on which they are located. This pollution spreads from Cells to Cells toward the Waterways following the slope and contaminates all Cells covered by the river.

Design concepts .
As recommended by O.D.D. protocol authors, only the appropriate design concepts are presented below.
The main dynamics of the model are not driven by some objectives to reach (e.g. water and pollution flow, demography...). Only the parcel choice in the settlement of new families is driven by the family objective to settle in a free parcel as close as possible from the area city.

Interactions .
No interactions are assessed between family agents for now. Further improvements in the farming module include exchanges of information regarding soil reactions to the various crops to be implemented. For now, we here present a model where the sole interactions between family agents and between these agents and the territory go through the spatial dimension, i.e. through land appropriation, linking a farm and a family agent and forcing new settling families to settle in other places.

Sensing .
Each family agent is aware of the location of its finca, whether a finca is owned or not, and of all of its demographic, ownership and economic data. It is also aware of farm characteristics (type, location and progress of each crop, soil and distance to the local main city center that defines the transportation costs).

Stochasticity .
Three factors are stochastically settled: • Computation of monthly rainfall, producing a rainfall value for each km 2 , • Occurrence of oil spills for each oil infrastructure element, • Various demographical events (family size and number, birth, death...).

Collectives .
Family agent group human individuals within the household they belong to, but we consider that they function along a common rationality allowing thereby confounding into one unit each family. On the other hand, family plots, also named finca or farms, group cells, each one equivalent to one pixel of * m. Two cases are considered: • For family of colonos who settle into comunidades with private ownership, fincas were delimited ex ante by government authorities: they correspond to lots grouping around ha. For the territory from which we got the cadaster, this cadaster was included directly as the plan for settlement, each lot corresponding thereby to a finca; • For family of indigenos who settle into comunas with collective ownerships, each finca is the territory used de facto by each family: they are thereby the territory of use of each family and not the farm owned as in the previous case.

Observation .
We observe the annual rate of land appropriation by both colonos and indigenos in the three sites and its correspondence with deforestation data coming from satellite analysis, through the parameter linking family manpower and family rate of deforestation. As bedrock for further investigations, we also observe the occurrence rhythm of oil spills but we do not try to see correspondence as it is impossible to have other data than the ones we used for the simulation.  The simulation starts by initializing entities related to biophysical process, and then roads and plots that are the physical ground of the demographical dynamics.

Details
• The cells are initialized from the Digital Elevation Model of the simulated parish. Every cell has an initial surface, slope, fertility and biomass. Its biomass is calculated according to the local maximum biomass. Fertility and local biomass are provided through shapefile layers.
• The waterways and the rain grid are initialized (spatial location and shape and other attributes) from a dedicated shapefile.
• The plots and roads are then initialized from their GIS shapefile.
• Finally, the association between plots and cells are made through simple geographic queries: a plot is linked to all the cells it spatially overlaps. .

No family is then created at initialization.
Input data .
The input data are composed of several geographical (raster or vector) and tabular data files (details are provided in Table ). This dataset has been built from a raw dataset (summarized in Table ) through various data cleaning and transformations presented below.
. USGS-originated LANDSAT * data is used for elevation. We did choose the m* m data rather than the more precise m* m to improve model computing speed but also because this is the closest scale to ha, an area unit meaningful from an agricultural point of view.
. From these raw DEM data file, the river and water catchment network have been derived. We cannot use raw data for the hydrological network because Ecuadorian administration data do not have the same precision across all of the sites, which does not allow us to use them if we want to apply the same model over the three case studies. For getting a harmonized and precision-equivalent river network, a process applied to the whole territory of each site, using Rivertools from ArcGis on elevation data, was preferred than collecting heterogeneous river data from various sources. However, two sites are in contact with the two main rivers of the RAE (the Aguarico and the Napo), so we had to add a complementary upstream flow input using the INAHMI data for these rivers (Figure ). We built data about rainfall from the Worldclim project (Hijmans et al. ): this database reconstituted all the monthly rainfall values from to and extended rainfall values through simulated expectations integrating global warming impacts. It is then the most useful data for reconstituting the climate values along its temporal and spatial variability. Finally, evapotranspiration (global parameter) is considered in this model version to be fixed at % of the rainfall before reaching rivers. .
For everything related to soil, we use the IRD pedology map with the USDA topology and deduce from it: • The soil flood potential, defined as a flood probability, is derived from the addition of the area of hydromorphic characteristics of the IRD pedological map, i.e. hydromorphic soils and swamp land area. The extent of the flood within these floodable areas is defined by the di erence between the simulated flow of a river and its medium value, characterizing thereby a flooded surface extent expanding from the concerned riverbank, and potentially contaminating di erent areas. The value is then the extent (Figure ).
• The soil fertility is based on the IRD map, where soils are described only along their physical characteristics. It is simply the clustering into classes of the soils of the three sites , according to the crop potential of di erent soils as described by settlers during qualitative field investigations on agriculture (Maestripieri & Saqalli ), thereby producing a qualitative soil fertility categorization index ranging from (the least fertile) to (the most fertile).

.
All oil infrastructures were collected thanks to ministries (Environment, Agriculture) but also through GIS mapping services in decentralized authorities (Municipios) who collected locally but did not always shared spatial data regarding oil infrastructures on their territory. Implementing a series of derrames in the model means establishing their probability of occurrence, the volume associated with each incident and the duration of emission. We note that the PetroEcuador/PetroAmazonas data on derrames can be trusted for the blocks owned by these companies but that other companies did not provide any data for their own spills. Oil and gas infrastructure was referenced in the GAD in which it was located, so we hypothesized that the derrame frequency was equivalent for across this infrastructure, assuming the data were representative of all blocks. From the PetroEcuador/ PetroAmazonas dataset, we extracted probabilities of occurrence ( . incidents per month on average) and the volume per incident (a standard normal distribution with oil barrels on average ( * L = L) with ± . barrels as the standard deviation ( L). .
Oil wells and pipelines, which generate oil spills, are used in the model. Data on other contamination sources and using other media, including flare and aerial imagery and data on petrol pits and subterranean shallow waters, were not included in order to focus on the main visible contamination type and medium, water pollution from oil spills and the spread thereof ( Figure ). Portions of routes and paths were as well collected from various GADs. Speed according to terrains (tar roads, gravel roads, pistes, paths, untraced), history of appearance and characteristic shi s (from paths to pistes, then gravel roads until tar roads) are defined according to local qualitative interviews assessed from to . A shapefile has been created containing all the roads in , with for each the date in which it reaches the state tat, gravel or track). For each of these types, we associate an average speed, used to compute the distance of a plot to the market, tar: km/h; gravel: km/h; track: km/h; o : km/h.

.
Demography is based on quinquennial census from INEC (National Institute of Statistics and Censuses of Ecuador) from which we deduced for the whole simulation duration a monthly probability of appearance of families, each one defined with a family size and age distribution in order to fit with these demographic data. Demographic factors are translated into settlers' family appearance probabilities: the population data inventoried in the three sites by INEC, which are available with age classes, are first decomposed into local birth rate, identified through the one-month age class, and immigration growth. The latter should then be translated into families: following ) is here used to be compared in terms of value at the scale of each site with our data, acknowledging.

Sub-models .
Hydro-geomorphological model ( ). The amount of rainfall in each of the basins is computed on a monthly basis from the rainfall recorded at the level of the cells constituting the basin. Monthly water streams are reconstructed and derived from rainfall: for each m* m group of pixels, the monthly rainfall random sampling follows a Gaussian distribution, the mean and standard deviation of which are derived from the WorldClim data collected for the study areas (Hijmans et al. ). We then deduce the water fluxes of all rivers as the collection of water runo from their respective water catchments, minus a loss of % due to evapotranspiration and infiltration. Such a simple hydrological model is su icient, as we are seeking not an absolute value of the water fluxes but rather a relative value that is comparable between rivers and water catchments.

.
The forest is modeled based on biomass per hectare. As long as a cell is not a ected by human activity or by random treefalls, its associated biomass remains constant because the Amazonian forest is roughly considered  Table : Data used for the PASHAMAMA spatially explicit agent-based model on the three study sites, processed from the raw data to be at its climax. Otherwise, the forest su ers a biomass loss at the level of the a ected cell(s) and immediately enters a growth process unless the human activity persists. .

Contamination model ( ).
The pollution sources are represented by points or lines (pipelines) and are associated with the DEM cells. At each time step, a certain amount of oil is generated by the pollution points using a Gaussian function based on the available data (see Section . ). The cells that include pollution points are then identified as contaminated and propagate pollution to all the neighboring cells with an altitude lower than or equal to theirs. This pattern of propagation is reproduced by the newly contaminated cells until there are no more adjacent cells of equal or lower altitude or until the pollution touches a stream. In the second case, pollution is propagated along the hydrographic network as long as its intensity is above the predefined pollution threshold. .

Demographic model ( ).
At each simulation steps, new household are created in the simulation (given the immigration rate coming from data) with a random size between and people of various ages, each corresponding to the relative proportion of age classes within the calculated immigrant population. As farmers produce both food and cash crops, accessibility is as important as soil potentialities. Therefore, families settle close to roads, close to other farms and in areas with fertile soil. .
In addition, at every step, there is also a demographic process over the existing population: people become older, they can give birth to new people, die, get married... In particular, the birth of new people and their aging will increase, a er some steps, the manpower of the family.

Results
. In this section, we present various results: we start by describing the results of each of the main dynamics alone (the bio-physical dynamics and the colonization process). We then analyze the model coupling both ones.

Spatially modeling the riverine and soil oil contamination hazard .
The hydrographic network organized along the Strahler order-based hierarchy is fed by rainwater, except for the Joya de Los Sachas and Pacayacu sites, where we had to add the water catchment discharges from major rivers, Napo and Aguarico respectively. The network recreates the dynamics of the water flow from small watercourses to large ones as well as inundations. One representation shows the elevation and the river network in D, and the second representation categorizes the network according to each watercourse Strahler order in D (Figure ).

Modeling migration demography & colonization over time & space .
When considering alone the process of colonization, the model reproduces in a way the evolution of the population pressure over the environment. As an example, Figure shows the settlement process on the Dayuma parish, through new settlements in fincas but also in non-tenured comunas. Road conditions and networks are evolving Rectangles indicate parcels of the settlers, while polygons represent indigenous comunas. .
Comparing these outputs with external data, i.e. land cover maps derived from satellite image analyses at different times, time series analyses corresponding to similar time periods are used to assess the validation of our settlement evolution simulation. We used land cover classification GIS data layers derived from satellite images procured by the Ministry of Agriculture and Fisheries (MAGAP) and Ministry of Environment (MAE) in , focusing on forest area extent evolution, that we did not use purposely in the model. Forested land cover is defined as a minimum of one hectare area covered by trees at least m tall and in which foliage covers at least % of the ground. .
In addition, a network layer of di erent road types corresponding to the same time periods was added to the forest maps to be compared with the simulated road network. The coarse resolution data available for the year results in a divergent estimation of forest extent so we decided to combine forest cover with a class of mixed forest and arboriculture. Table shows   -.
-. % Table : Forested area extent evolution in the Dayuma water catchment: comparison between satellite imagederived land cover and simulation outputs comparison is encouraging as it shows that our reconstruction matches the spatial patterns of colonization in Dayuma and its water catchment since the model initialization in (Figure ).
. We can see that the model successfully represents the colonization along the main roads that started to spread from the north side, first following forest tracks and intensifying therea er with the improvement of the roads. The colonization is faster and stronger in the west due to a more substantial development of the road network. Meanwhile, we can see a growth in the number of dots within comuna limits, indicating the settlement of indigenous families there.
Combined spatial modeling of population and territory exposure .
Finally, combining demography, water catchment dynamics and pollution events, the model is able to represent the exposure dynamics of both territories and populations in the three study sites (Figure ).
. The Sacha and Pacayacu sites have a longer history of colonization and oil exploitation, meaning that their simulations should begin earlier and should show more impacts from territory and population exposure. The Pacayacu site may exhibit concentrated oil pollution exposures to scattered sites along the Aguarico River in the south due to a highly concentrated spatial distribution of oil wells and pipelines. The Sacha site, owing to its better soils, may provide a better income to farmers, which has not been included in the simulation. However, the far more important number of oil wells and pipelines and their spatial distributions, o en upstream, may expose a higher proportion of the territory and population, especially the indigenous comunas downstream along the Napo River. Finally, the later colonized Dayuma site has the same distribution of wells and pipelines,  o en upstream, meaning that the downstream population, a mix of colonos and indigenous people, who are o en the poorest as they are far from the tar road, may be the most exposed. .
One may then already note that the extent of contaminated areas is by definition limited to talwegs and regularly flooded areas, thereby reducing the territory exposure and therefore the a ected environment. Based on field interviews with farmers, petroleum technicians and public o icers during interview sessions in , , and , this reconstruction is perceived as correct in terms of spatial leak coverage. Scholars from the Monoil project agree as well on the spatially and temporally limited extent of the derrames, even when we apply the same frequency of PetroEcuador/PetroAmazonas derrames to the entire network of oil infrastructure.

Discussion
Value as a risk estimation tool . PASHAMAMA is a practical, sustainable and easy-to-explain model to assess the impact of a spatially "invisible" hazard in a socio-environmental system. The main issues were not only to build the framework for both a simple evaluation of the exposure to this hazard even as some data were unavailable but also to construct a so ware framework that can support additional modules for assessing the impacts of other risks a ecting both the environment and the population. .
The model uses both historical spatial and non-spatial data and uses expert knowledge and experiences to count for physical and conceptual processes and rules that are only partly validated due to the lack of observation time series within the study site, including discharge and pollutant transportation, stream flood dynamics and oil leak events and quantity. Therefore, we may assume that one strength of this simulation is its capacity to estimate, even roughly, the spatial and human impact of chemical pollution over time using modeling with accurately combined, accessible and open-access data and process-based dynamics. Based on the agent-based modeling approach, the integration of socio-economic and environmental aspects allows the model to support a) knowledge integration and sharing among studies from di erent disciplines and sources and b) future decision making and planning. Connectivity, accessibility and pedology seem to be the only major explicative factors as the global colonization pattern matched the satellite-based imagery at the end of the simulation but throughout, meaning that the pace of colonization corresponds to our demography forcing. Our results are different from the closest simulation approach, i.e., the Repast NEA model (Messina & Walsh ; Messina et al. ; Walsh et al. , ; Mena et al. ) because the model purpose, approach and scale are di erent. We focus on exposure and not on deforestation in the three sites, with each site being to times bigger than that in the other model. Furthermore, our model shows the colonization process itself and not the consequences in terms of deforestation of already settled populations. As such, the model we built here may be able to include the agriculture module from the Repast NEA in the future to simulate the consequences in terms of deforestation at a far larger scale. .
From a more epistemological point of view, the model is a practical application of the following principle: better than acquiring precise data, the relevant focus of modeling is to obtain, infer or generate the valuable variables that, when combined together, best fulfill the purpose of the question raised. Here, we show that a simple combination of variables may provide interesting insights, suggesting that this variable-then-data-first principle provides generalizable results in an easier way.

Model validation: Lack of data, confidence-building and knowledge emergence .
This socio-eco-hydrological modeling was only validated using existing spatio-temporal data, including the deforestation resulting from land appropriation/colonization dynamics. Indeed, hydrological monitoring programs conducted by the Ecuadorian government focus on major rivers (Aguarico, Napo) and do not yet focus on the less important ones, such as those draining our study sites. The processes and importance of pollution spreading to the environment need to be further quantified and validated using the research outputs from the Monoil research project, which are not yet fully developed. For instance, the first global epidemiology investigation was assessed in the Monoil project. The data are not yet analyzed in space and time. From a confidence-building perspective, from the dynamics we simulate, only the appropriation/colonization process can be tested: hydrological measurements in the RAE have been assessed by the Ecuadorian government for the major rivers (Aguarico, Napo) but not the less important ones we simulated here, meaning that there are no data that can be contrasted with our simulation outputs. The oil pollution values are yet to be analyzed, estimated and published by Monoil scholars, meaning that the related test is to be assessed in the future. Demographics are inputs in the model. No procedure can be provided yet as a test of the value of this simulation because the first global epidemiology investigation has been assessed in the Monoil project and has yet to be analyzed both temporally and spatially. However, one should not neglect the impacts of repeated low doses of contaminants on population health. The environmental consequences of such a succession of oil-related pollution accidents have yet to be determined. The first global estimation of these impacts is on its way to being proposed within the same project (Durango et al. ). Therefore, the procedure we propose can only be used a confidence-building step for now.

.
Thematically, this simulation underestimates the exposed population, with only the solute transport in water runo taken into account, though gas flares and oil pools should also be considered, but these are likely to be overestimated as large parts of the population, mainly those close to roads or living in villages, do have access to tap water, which, even if it is not of good quality, allows the population to avoid accidental contamination. One should not forget that the environmental and sanitary impacts of the two other oil-related pollution sources have yet to be assessed in terms of the chronic and permanent impacts, including the aerial pollution from the gas flares in the three study areas and the pollution from the at minimum geographically referenced oil pools, mainly from the Texaco era, in the same sites. A new source of pollution from newly implemented oil palm plantations may be noted, which, according to field interviews, uses many phytosanitary pesticides. .
Methodologically, field investigations induce both sampling and data flaws: apart from the choice of sites, a question that can be raised only a er field investigations, one may question the origins of the data: they come from central administrations (INEC, MAE) and GADs. However, some GADs have refused to share their data, as is their right, possibly a ecting the representativeness of the data. Any GIS data may face flaws in their positioning accuracy and/or their characteristics, while a sampled field-check of some of these points has yet to be done. The data source inventory also revealed the heterogeneity of these data, even if the national information system (SNI) tends to centralize all available spatial information. The geo-localization precision varies according to di erent sources, and there are potential duplicates and redundancies. As the derrame occurrence is known only for PetroEcuador/PetroAmazonas blocks, the equivalence hypothesis we postulate may underestimate the number of spills for other less cautious companies. Similar problems appear regarding the volumes of these derrames.
. Regarding the modeling aspect, ABMs do not require large hypotheses in the system. The main assumption is that the investigated macroscopic outputs can be generated by the behaviors and interactions between the individual elements of the system. This is a weak hypothesis compared to those needed for representing equationbased systems, and thereby provides more freedom in modeling. Such freedom has a counterpart, the ability to analyze a system in its entirety, thereby introducing many dynamic, entities and details, making it di icult to identify the individual influences of each part ( ). .
More globally, we deal in this model proposal with the more global validation issue for models with several intricate dynamics with few and incomplete data: we are forced and thereby assume to realize a partial validation for the sole dynamics for which we have external data to be used as confrontation test, assuming thereby that "validating" one dynamic implies having confidence on the dynamic simulation for which no test can be assessed. Otherwise, it means that no modeling can be done on environmental issues unless having enough data while usually, modeling is assessed for compensating the lack of data! Moreover, in our case, building such a platform is a necessary step for expanding the model with other social and environmental dynamics to be presented in future articles.

Conclusion
. The PASHAMAMA model should be considered as a first step in an on-going research process that employs a combination of results from various disciplines, methodologies and research practices within the same project. The model, a spatially explicit agent-based model, has been designed as a global model of system structure in which natural and socio-economic components are harmonized at the most "atomic scale", i.e., the family scale for the human agents, the * m pixel for the territory and the month for time.
. A first step is to test our model by comparing simulation outputs with newly produced data from the Monoil project or from the Ecuadorian government, for instance data on hydrology and water contamination. For now, these two ensembles of dynamics (exposure as a product of demography, contamination and hydrology) are juxtaposed. The model provides a good reproduction of the colonization dynamics through time and establishes a dynamic pollution exposure risk to population that can be improved with further research results. This expected knowledge about pollution epidemiology and transportation processes and levels will allow us to implement retroactive simulations of colonization, climate variability and pollution, for instance modeling families fleeing from contaminated areas. .
More globally, the model is purposely designed to be incremental, meaning that several modules can be added further, each being internally consistent, meaning that its value would be tested by comparing outputs with external data. This modularity implies the consistency of the ensemble as well, allowing a simulation of selected dynamics ceteris paribus for testing hypotheses, thereby allowing the model to function following its most valuable purposes and providing a testbed for discussion and prospective modeling. We then can envision further add-ons on for agriculture and deforestation, epidemiology and health, and finally the impacts of various policies a ecting all these issues in a prospective approach.
. Even more, the modeling focus is on processes, i.e., introducing data and processes only on dynamics (demography, hydrology and contamination) and not on patterns, thereby assuming the causality of phenomena, allowing us to position within the modeling methodology a confidence-building step where the simulation outputs are compared with pattern data not used in the model, such as satellite-based land cover data. This aspect is, for now, restricted to the human choice criteria for settlement in the model, but it is the main driver of the model. .
As a conclusion, we here address practical interdisciplinarity: usually, risk assessment is based on either deterministic or o en qualitative methods. Both ways do not provide justification for the chosen variables in the investigation nor the combination of dynamics. We stand for the use of modeling for quantitatively assessing socio-anthropological and o en qualitative variables alongside environmental variables, such as the human choice criteria for settlement. Using such a model opens up opportunities for improving this information in order to support both health and sanitary assessment planning and decision-making processes.
. Each of the model components has behavior for which we do not always have timely, reliable or complete data. Such an uncertainty is in fact the purpose of building such a model, and modeling is the best alternative for either producing and/or assessing data to which we do not have access. Nevertheless, this uncertainty is not itself harmless: the results of any simulation output provide a magnitude of both the result we look for but also the sensitivity of this result to the di erent variables a ecting it, thereby suggesting the most e ective research direction. The magnitude of the result itself can then be seen as an early-warning system through a prioritization of risks through combined modeling: even at present, many assertions are provided in popular media describing the actual contamination situation in the RAE. In terms of the applicability and potential of such a model, the more practical and unbiased methodologies that are available for decision making, the more legitimate such decisions may be.