Agent-Based Land ChangeModeling of a Large Watershed: Space-Time Locations of Critical Threshold

Land use and land cover change has been recognized to have significant environmental impacts in a watershed, such as regulation of water quality. However, the identification of potential regions that are sensitive to land change activities for the protection of water quality poses a grand challenge particularly in a large watershed. These potential regions are o en associated with critical thresholds in terms of, for example, water quality. In this study, we developed an agent-based land changemodel to investigate the relationship between land development activities and water quality in eight North Carolina counties that cover the lower High Rock Lake Watershed area. This agent-based model, which is empirically calibrated, is used to identify space-time locations of those regions at critical thresholds of water quality in this study area. Our experimental results suggest that land development as a form of system stress is of pivotal importance in a ecting water quality at sub watershed level and the state transition of water quality. The agent-based model developed in this study provides solid support for investigations on the impact of land development under alternative scenarios in a large watershed.


Introduction
. Land use and land cover change (LULCC) has been playing a critical role in driving landscape dynamics of watershed ecosystems. These ecosystems are o en coupled human and natural systems (Liu et al. a,b) in which anthropogenic activities or policies may lead to alternative consequences in terms of watershed conditions (Haidary et al. ). The condition of a watershed such as water quality is inherently related to land cover patterns (e.g., impervious cover, forest, and farmland) that have been modified by spatial processes of urbanization, deforestation, or agriculture activities. A set of models, represented by impervious cover model (Kau man & Brant ; Schueler ) and forest cover model (Gerow et al. , ), have been proposed to study the relationship between land cover and water quality at the watershed level. A threshold e ect of percent land cover (e.g., impervious cover or forest cover) on water quality at the watershed level has been identified in the literature. Watersheds within this threshold are at a tipping point that may transit to other equilibriums in terms of water quality (good or poor) in response to alternative land management practices or policies. A number of empirical studies have reported this threshold e ect, which may be specific to watersheds of interest (Schueler et al. ). For example, a range of -% of percent urban cover is a generic critical threshold based on the impervious cover model (Schueler ). -% of percent forest cover was identified as a critical threshold of water quality specific to the High Rock Lake Watershed in North Carolina, USA (Gerow et al. , ).
. Threshold e ects in terms of water quality at (sub) watershed level have been well established in the literature.
Most of the studies are based on the use of historic or present data (land cover and indicators of water quality) to .
To identify these potential regions experiencing water quality threshold and their response to land development and policy intervention o en requires the use of land change simulation modeling. Agent-based models (ABMs) of land systems are such a spatiotemporal simulation approach for their support in the representation of complex pattern-process interactions and scenario analysis (Epstein & Axtell ; Grimm et al. ). As one of the approaches recommended for land change modeling (Council ), ABMs are a bottom-up simulation approach that uses agents as building blocks to represent interactions among actors (e.g., stakeholders or land owners) or with their environments. Agents make explicit decisions on their actions based on the contextual information that they perceive from their situated environments that are o en spatially heterogeneous. The explicit representation of actor decision-making and actor-environment interactions in ABMs allows for the simulation of land developments that are o en complex and uncertain (Parker et al. ). A number of agentbased land change models have been proposed and detailed review conducted over the past two decades (An et al. , ; Groeneveld et al. ; Matthews et al. ; Ng et al. ; O'Sullivan et al. ; Parker et al. ), which o ers insights into the development of agent-based model in this study. .
Therefore, the goal of this study is to present an agent-based land change simulation model to identify the space-time locations of potential regions experiencing critical threshold of water quality in a watershed in response to land change activities. Our study area includes eight North Carolina (NC) counties that cover the lower High Rock Lake Watershed region. This study area is representative of coupled human and natural systems in which LULCC plays an important role in driving and interpreting complex ecosystem dynamics. Land-related activities such as land development and agricultural activities render the ecosystem of this region vulnerable facing a series of issues in terms of, for example, water supply and water quality. These issues may further lead to degradation in quality of life or public health concern in this study region. Thus, such an agent-based land change simulation model is needed that allows us to investigate the potential impact of land-related activities or policies on the landscape (water quality here). The agent-based modeling approach is preferred here over other simulation approaches (e.g., cellular automata or system dynamics). This is because the agentbased modeling approach provides an integrative modeling platform that can combine with other approaches as needed and, more importantly, support the explicit representation of individual-level decision-making processes of actors (e.g., preferences of land developers or owners here) and their interactions. The remainder of this article is organized in the following manner. First, we introduce the study area and data used in this study. Second, we focus on the design of the agent-based land change model. Then, we present experimental results and discussion. This article ends with conclusion of the study. . Three years ( , , and ) of land cover data used in this study were retrieved from U.S. National Land Cover Database (NLCD). These land cover data were available at a spatial resolution of m by m. We further aggregated these land cover data into five major types: urban (including developed areas with low, medium and high intensity, and open space), farmland (including pasture and cropland), forest (including deciduous, evergreen, and mixed forest), natural (including shrub, scrub, grassland, and barren land), and water (including open water and wetlands) (see Figure and Table ). Data of road network, cities or towns and their populations, and county boundaries were obtained from US Census website. Stream network and watershed data ( digit Hydrologic Unit Code was used) was from the USDA National Hydrography Dataset. Elevation data is from NASA SRTM data. All the vector GIS data were rasterized using a m × spatial resolution. The landscape size is , × , in terms of number of rows and columns of the raster GIS data. .

Land cover
The High Rock Lake in our study region have been facing a water quality issue due to the nutrient loads from, for example, land use and management activities. This water quality issue in this region has received considerable attention from government agencies such as US Environmental Protection Agency and NC Forest Service. NC Forest Service developed a forest cover model to evaluate the relationship between forest cover and water quality in the HRLW region (Gerow et al. , ). Benthic macroinvertebrates are aquatic organisms (most of them are aquatic insects) that are sensitive to the conditions of their living environments, and have been extensively used as indicators of water or habitat quality (Mandaville ). A series of metrics based on structural or functional characteristics of benthic macroinvertebrate communities have been developed to evaluate water conditions (Barbour et al. ). Based on the empirical analysis of five years of land cover data and benthic macroinvertebrate data, a range of -% of percent forest cover was identified as the threshold of water quality for the HRLW region (Gerow et al. , ). Sub watersheds (or catchments) with percent forest cover higher than % (or lower than %) correspond to good (or poor) water quality indicated by benthic macroinvertebrate metrics. Any sub watersheds with the percentage of forest cover within this range are at the tipping point with fair water quality that may be shi ed to either good or poor states. Our overall study region is at this tipping point (fair water quality) as its percent forest cover varies from . % down to . % from to . Figure shows the maps of sub watershed-level water quality based on forest cover model for , , and . Of more interest is to identify where those regions at sub watershed level are at the tipping point particularly in the future and how the conditions of these sub watershed regions may change in response to LULCC. This drives the development of the agent-based land change model in this study.

Design of the agent-based model
. Figure illustrates the architecture of the agent-based land change model in this study. The agent-based model was designed using the demand-allocation framework, which is commonly used in the land change modeling domain (Aquilué et al. ; Liu et al. ; Verburg et al. ). The demand-allocation land change framework consist of two modules: demand and allocation. The demand module controls the quantity of land development activities in a top-down manner and the allocation module determines where these activities will occur. The demand module can be linked to, for example, system dynamics model (Lauf et al. ; Liu et al. ) or a population density approach (McGarigal et al. ; Meentemeyer et al. ) to estimate land change demand. In this study, the land change demand in each county each year is the area of land that will be converted. .
The allocation module that determines where land development will occur is driven by the interactions of two types of agents (Mustafa et al. ): land owners and land developers. Land owners and land developers are situated within each county. These agents interact with each other on the transactions of lands for development. Land owners decide whether to sell their lands to developers for residential development. Land developers are in charge of decision making in terms of where a residential development event will occur, the area of the development, and how to realize the development. Both land owners and land developers are hypothetical agents (Matthews et al. ; Valbuena et al. ). Through interactions between land owners and land developers, the land change allocation process is implemented in a bottom-up manner.
. We documented this ABM using ODD protocol (Grimm et al. ) in Appendix. Please refer to the ODD protocol of the ABM for more detail. In this article, we focus on discussing the two types of agents: land owners and land developers.

Land owner agents .
Land owners are in charge of management of their lands including farmlands, forest, and natural. Land owners make decisions on whether to sell their lands to developers or not (i.e., two actions). This decision is driven by land owner's perception on surrounding environments. We used a distance-decayed model to represent land owner's decision on selling their lands for development -i.e., the utility function of land owner agents (Lauf et al. ; McGarigal et al. ; Tang & Jia ). Land owner's decision on selling their lands can be regarded as a form of spatial interactions in which distance plays an important role (Gould ; Taylor & Openshaw ). This is fundamentally related to Tobler's First Law of Geography: "Everything is related to everything else, but near things are more related than distant things" (see Tobler ). In other words, these interactions are spatially dependent and may lead to the di usion or movement of, for example, innovation, culture, and settlements (Gould ; Hagerstrand ). Spatial dependence in these interactions exhibits distance decayed e ect. Thus, in this study we used the distance-decayed model, similar to the concept of information field (Morrill & Pitts ), to explore the spatial di usion of land owner's decision on selling lands. Distance to nearest urban is a proxy variable here that represents the influence of existing urban area on the decisionmaking process of land owners in terms of selling lands.

.
The utility function of a land owner agent is formulated using Equation : where u_owner(i, j, k) is the utility function of land owner agent i in county j with respect to the land cover type k of his/her land parcel (k = for forest/natural; k = for farmland). The percentage of natural land in our study region is low ( . , . and . % for , , and ; see Table ). Thus, in this study, we combine forest and natural cover together to represent land owners' preference on land transactions (forest and natural cover information is separated when deriving percent in forest cover from simulation results). α(i, j, k) is the coe icient that controls magnitude of the utility of the land owner agent (increase in α will lower the magnitude of the utility). β(i, j, k) is the coe icient that is associated with the steepness of the distance-decayed function representing the utility of the agent. The higher the value of β is, the faster the utility of the land owner agent will decay with distance. d is the distance of the land owned by the agent to the nearest urban area (associated with contextual information that an agent can perceive). Thus, a collection of utility functions of land owner agents constitutes the collective knowledge in terms of decisions on land transactions for urban development. The pool of utility functions of land owner agents in county j, noted as D(j), can be formulated as: Utility functions of land owner agents can be calibrated by fitting with empirical land conversion data in each county in terms of distance to nearest urban land cover. In this study, we used the land conversion data from to calibrate the utility functions of land owner agents in each county. Table and Figure report the calibration results of distance-decayed coe icients for each county. The calibrated distance-decayed model for land owner agents in each county of our study region is normalized -i.e., the utility of a land owner agent is the probability that the agent will sell his/her land for residential development. In this study, we use the % confidence intervals of the fitted coe icients to represent the pool of decision rules of land owner agents, which becomes Equation : whereα(j, k) andβ(j, k) are the fitted coe icients of the distance decayed model for land owner agents in county j with respect to land cover type k. CI(.) denotes the confidence interval of the corresponding variables. The decision pool of land owner agents represents the heterogeneity of agent decision-making. We assume each land unit (cell here) is managed by a land owner agent, whose utility function is randomly drawn from the decision pool of land owner agents in the county. Every time when a land owner agent is contacted by land developers for land transaction, the land owner agent will make decisions on whether to proceed on land transactions based on his/her utility function. This land transaction process is stochastic (a random probability is drawn and compared with the utility value of the land owner agent).

County
Forest/natural to urban conversion Farmland to urban conversion .

Land developers
. Land developers lead on converting lands into residential area (i.e., urban land cover). In this model, we used a patch-based approach to represent the conversion of lands -each land development event by a land developer leads to the occurrence of a new patch of residential area. This patch-based approach has been used in the literature for land change simulation at a fine spatial scale ( ; Meentemeyer et al. ). The behavior of a land developer for land conversion consists of three major steps: ) search for the initial location of residential development, ) determine the area of the residential development, and ) generate a patch of land development to realize the land conversion. .
The first step of decision making associated with land development for a land developer agent is to determine where to acquire a land for residential development. Specifically, a land developer agent will randomly inspect a land that is feasible for residential development (i.e., farmland or forest/natural). The land developer decides if a land will be chosen for development based on the development probability of the land derived from his/her utility function (discussed next). A land with higher development probability is more likely to be selected. In this study, those areas that are water bodies, wetlands, or protected areas are excluded (i.e., these areas are infeasible or prohibited for land conversion). Once a land is selected, the land developer agent decides whether he/she will acquire the land for residential development using his/her utility function. The decision of a land developer on whether to acquire a land for residential development or not is a binary variable (two actions; : acquire; : do not acquire). This decision is dependent on the composite influence of a set of driving factors. In this study, the utility function of a land developer agent to acquire a land for development is based on a logistic regression model, which is represented as in the following equations (Equations and .): where u_developer(i, j) is the utility of a land developer agent i in county j. x 1 − x m are the driving factors (m: the number of driving factors). β 1 − β m are coe icients for the corresponding driving factors. β 0 is the intercept of the logistic regression model.

.
In this study, the factors that drive the decision of land developers include elevation, slope, distance to cities, distance to major roads, distance to streams, stream density, and development pressure (see Figure ). Neighborhood size for the calculation of development pressure (the percentage of urban cover) is by cells. A search radius of meters was used to derive stream density. For each county, the size of samples to fit the logistic regression model is , ( , for urban, , for nonurban) and stratified random sampling scheme was used. Land developer agents in a county share the same utility function calibrated using empirical land change data in the county from to . Figure shows the map of utility values (development probability) of land developers in the study region. .
Once the land developer decides to acquire the land, he/she will contact the owner of the land (a land owner agent). The land owner will decide whether to sell lands to the developer or not. Only the land owner agent agrees on land transaction, the land developer can launch residential development (i.e., conversion into urban land cover type) and this land is used as the initial location of the new patch of residential development.  . The second step of land conversion for a land developer agent is to determine the area of the new patch of development. The area of the new patch of development is randomly drawn from empirical area distribution of new patches of development from to . In this study, we use the observed patch of change of the entire study region from to to derive the empirical distribution that a land developer agent will use to determine the area of new patch of development. .
Once the initial location and area of residential development are determined, the land developer agent will generate the new patch of development. A patch growing mechanism is used to form the new patch: neighboring lands (cells) are recursively added into the new patch until the patch area is reached. The probability that a neighboring land cell is added into the development patch is the composite utility that the land developer agent decides on land acquisition multiplied by the land owner agent's utility on land transactions at that cell. A first-order von Neumann neighborhood rule is used to search for neighboring cells during the process of forming a new patch of development.

Implementation .
The code of this agent-based model was written in C/C++ programming language. All GIS data were processed in ESRI ArcGIS Pro version . . . All distance-based variables were calculated using the Euclidean distance tool in ArcGIS. All the raster GIS data were imported into the agent-based model -i.e., a loose coupling of agent-based modeling with GIS (Brown et al. b). The ABM was deployed on a high-performance computing cluster (total number of CPUs: , ; Redhat Linux-based) for acceleration of Monte Carlo runs of the ABM. The task scheduler for high-performance computing on the Linux cluster is PBS/TORQUE. We used Shell scripts to automate the execution of the ABM and the post-processing of simulation results.

Experiments and Discussion
. Our agent-based land change model allows us to explore the impact of LULCC activities on water quality in the study region. In this section, we design two experiments to explore the impact of macro-and micro-level LULCC activities on the spatiotemporal pattern of water quality at sub-watershed scale. The first experiment focuses on macro-level LULCC activities by the manipulation of annual land development demand. In the second experiment, land owners' preference on land transactions was varied to represent LULCC activities at micro or individual level. We first discuss performance metrics and model calibration and validation before we present the design and results of the two experiments.

Performance metric .
We used bivariate similarity metrics that are based on confusion matrix to evaluate the agreement between simulated and reference land cover patterns. A variety of similarity metrics based on confusion matrix have been developed in the literature. These similarity metrics include percent correct match, figure of merits, Kappa coe icients, producer accuracy, and user accuracy (Fielding & Bell ; Pontius Jr ). While these metrics are limited to categorical variables and the cell level (instead of object level), they provide support for the evaluation of locational agreement between simulation outcome and observed patterns. Further, these similarity metrics only provide an overall evaluation for a study region of interest based on cell-level comparison. Thus, o en time, a study region is partitioned into a collection of sub-regions (e.g., using lattice with coarser spatial resolutions) when using these metrics to evaluate the agreement between simulation outcome and reference patterns (Kok et al. ; Pontius Jr ). In this study, we used two percent correct match indices (pcm 1 and pcm 2 ) that are based on confusion matrix (Pontius et al. ; Pontius Jr & Millones ; van Vliet et al. ), shown in the following equations (Equations , ).
pcm 1 = n a n a + n b ( ) where n a is the number of observed cells of change simulated as cells of change (i.e., hits). n b is the number of observed cells of persistence simulated as change (false alarms). n c is the number of observed cells of change simulated as persistence (misses). n d represents the number of observed cells of persistence that are simulated as persistence (correct rejections). In general, larger values of these accuracy indices indicate higher agreement between simulated patterns and reference patterns. However, it has been recognized in the literature that model accuracies that focus on change (e.g., pcm 1 here, figure of merit, or kappa) tend to be low if the study region of interest is dominated by persistence (Pijanowski et al. ; Pontius et al. ; Pontius Jr et al. ). As reported in Pontius et al. ( ), a positive relationship exists between model accuracy (figure of merit in their study) and observed net land change. A multi-resolution approach (e.g., using lattice at coarse spatial resolutions) has been suggested to cope with this issue (Pontius Jr. et al. ; Pontius Jr et al. ).

Model calibration and validation
.
Model calibration and validation are important steps of land change simulation modeling (Brown et al. a; van Vliet et al. ). In general, there are two approaches for model calibration and validation: partitioning of temporal duration or spatial extent of a study system into calibration set (period or area) and validation set that do not overlap with each other (Pontius Jr. et al. ). A model is instantiated using data in the calibration set for the specific study region of interest. Simulation outcome of the calibrated model is compared with data in the validation set to ensure the model has su icient predictive power (instead of being overfitted to calibration data) and thus is valid for experimentation use. Typically, temporal partitioning is o en used for land change modeling given the availability of land cover data and related data over time. The simulation period of a model is divided into calibration period and validation period. In this study, the calibration period is from -, and the validation period is from to . The utility functions (the distance-decayed model) of land owners were calibrated using the land conversion data from -in each county. The utility function of land developers in a county was calibrated using logistic regression based on empirical land change data in the county from to . We used the area of land conversion from to in each county to represent the annual land development demand during the calibration phase. Please see the ODD protocol of our ABM in the Appendix for more detail. .
Figure shows the maps of simulated land cover patterns for year and . These maps depict the spatial patterns of persistence and change of land cover types. From these maps, we can see that most of farmland and forest land cover remain persistent over time -i.e., the landscape of our study region is dominated by persistence. The simulation accuracy of pcm 1 for -(calibration accuracy) is . % on average (with a standard deviation of . %; based on Monte Carlo runs). The corresponding simulation accuracy in terms of pcm 2 is averaged at . % (std: . %). .
We ran our simulation model to for validation purpose. Annual land change demand from to was based on the empirical land change quantity during this period. The validation accuracy of pcm 1 is . % on average (std: . %). The averaged pcm 2 is . % (std: . %) for the validation period. The simulation accuracy in terms of pcm 1 (pcm 2 ) is low (high) because land change quantity during the two simulation periods only occupies about -. % of the study area and most of the regions do not experience land change (i.e., high-level persistence; see Figure ). Increase in simulation accuracy in terms of pcm 1 is partially attributed to the occurrence of more land development from to (Pontius et al. ). Figure  The ROC plot suggests that while our model accuracy in terms of pcm 1 is low, the model performance is acceptable by comparing with the performance due to chance (the performance of all model runs for calibration and validation is higher than chance performance in the ROC plot). ; specificity = n d /(n b + n d ); sensitivity = n a /(n a + n c ); for notation, please refer to Equation and ). .
While low net land change or high-level persistence explains low simulation accuracy of our model (Pontius et al. ; Pontius Jr et al. ), of more interest is the variation of simulation accuracy across our study region. Thus, we extracted sub region-level accuracy using a grid with coarser spatial resolution ( km by km here). Figure shows the spatial distribution of averaged accuracy (pcm 1 ) for calibration and validation phases at sub region level. As we could see, the averaged simulation accuracy varies from -. % for -(calibration period), and can reach up to . % for the simulation period of -(validation period). For rural areas (e.g., Davie, Randolph, Rowan Counties), simulation accuracies are low (even reach zero hits because the development probability calibrated using historical data in these rural areas is low). High simulation accuracies tend to concentrate in urban areas (e.g., those in Cabarrus, Forsyth, Guilford, and Iredell Counties). Those urban areas typically have high net land change, which is likely to have reasonably high accuracies as indicated in the literature (Pontius et al. ). The accuracy assessment results suggest that our ABM provides reasonable support for the simulation of land development in our study region (most developments typically concentrate in urbanized areas) and thus empowers the exploration of the relationship between LULCC and water quality.

Experiment of land development demand at macro level
. In this experiment, we aim to explore the impact of alternative land development demand on the water quality at the sub watershed level in our study region. We use the annual land development demand derived from empirical land change data from to as a baseline scenario. We vary the annual land development demand at the county level by %, %, %, %, and %. Correspondingly, we have five treatments (T 1 , T 2 , T 3 , T 4 , and T 5 ) for this experiment. For each treatment, the agent-based simulation model ran from to . Each treatment was repeated Monte Carlo runs. These treatments were designed to represent di erent land development scenarios. The first two treatments are designed to represent the scenarios of slow land development, and treatment T 4 and T 5 for the scenarios of rapid land development.
. Figure shows the maps of water quality patterns at sub watershed level for the five treatments. Table reports the number of sub watersheds falling within the three types of water quality states (good, fair, and poor). Figure illustrates the time series of percent in forest cover over years for four sub watersheds. A sub watershed belongs to good or poor water quality states only when it remains in the same state for all simulation runs (otherwise, it belongs to the category of fair water quality -i.e., within the threshold). As we could observe, the number of sub watersheds with good water quality decreases (from to ) when land development demand increases. In other words, more sub watersheds (from to ) fall within the critical threshold (fair water quality). The average number of years that these critical sub watersheds remain its state is . , . , . , . , and . for the five treatments. It shows that increment in the quantity of land development tends to reduce the length of the time that critical sub watersheds remain in the same state.  Table : Distance-decayed coe icients for each county in the study region. Figure : Time series of percent in forest cover over years for the first experiment (T -T : five treatments corresponding to varying land development demand by , , , , and %; four sub watersheds were used with IDs of , , , and ).

Experiment of land owner preference at micro level .
The second experiment is to evaluate the influence of land owner's preference on water quality in the study area. In this model, the land owners' preference is associated with the decision pool of utility functions characterized by distance-decayed models (see Equation ). Distance-decayed coe icients (α ijk , β ijk ) in the decision pool with respect to conversion from forest or natural lands to urban were systematically varied by adding a specific amount of change (σ * r(α ijk ), σ * r(β ijk )), where r(.) is the range of the corresponding coe icients see Equation ) . Utility functions of land owner agents on farmland to urban conversion remain unchanged. We design five treatments (T 1 -T 5 ) by assigning σ to . , . , , -. , and -. correspondingly. We ran our model from to for each treatment. The number of Monte Carlo runs for each treatment is .Varying distancedecayed coe icients will modify the shape of the distance-decayed curves: high coe icients will render that the magnitude of the utility becomes lower and the distance-decayed curve becomes steep. In other words, land owner agents with high distance-decayed coe icients tend to have low preference on selling their forest or natural lands for residential development. .
Table reports the simulation results in terms of the number of sub watersheds at di erent states. Figure shows the maps of water quality patterns across the five treatments. Figure illustrates change in the percentage of forest cover of four sub watersheds. In general, we could observe a decreasing pattern in terms of the number of sub watersheds with good water quality when distance-decayed coe icients that evaluate land owner agents' preference on land development become lower (corresponding to high utility values of selling their forest land). The average number of years that critical sub watersheds (fair water quality) remain their states is . , . , . , . , and . for the five treatments. The number of years that a sub watershed stays within the critical threshold tends to be shortened as distance-decayed coe icients become lower (from . years to . years). This means that land owner agents with higher preference on selling their forest lands for development tend to accelerate the state transition of critical sub watersheds (from fair to poor water quality).

Discussion
. The quantity of land development functions as a top-down mechanism that drives LULCC in a study region.
Variation in land development demand (e.g., due to change in economy or population) is of pivotal importance in a ecting the sub watershed-level water quality in our study region. As increase in land development demand, more sub watersheds with good water quality will transit to the critical threshold state, as indicated by the first experiment. Further, the length of the time that critical sub watersheds remain their states is reduced in response to increased land development demand. This is because land developments at sub-watershed level will generally occur more in response to increased annual demand for land development at county level. More land development events consume more forest lands, leading to the detrimental e ect on water quality (see Figure ). In particular, if we remain the land develop demand same as that between and (baseline scenario), within . years from (i.e., around year ) all sub watersheds in our study region will tend to shi their states (from fair to poor water quality) in general. However, this transition time will be delayed if policies or regulations that control the number of land development events are adopted by, for example, government agencies. In particular, experimental results (the first experiment) suggest that gain in the delayed time of transition due to regulated land development ( . years for % of reduction in development demand) is higher than loss in the transition time resulting from increased land development ( . years for % increment in demand). This implies the importance of land conversation policies in the protection of water quality in watershed ecosystems. Figure : Map of simulation results of water quality distribution at the sub watershed level (A-E: five treatments in which distance-decayed coe icients for land owners' utility function were varied; labels over sub watersheds in the maps indicate the averaged time of transition -i.e., the averaged number of years staying within the critical threshold).  .

Good Fair Poor
Human decision-making processes and subsequent choices o en play a fundamental role in driving the landscape dynamics of watershed ecosystems that are modified by human actors. Our experimental results (the second experiment) demonstrate this role of human decision-making. Change in coe icients of utility functions of land owners (based on distance-decayed models) correspond to a continuum from organic land development (high values) to spontaneous development (low values) (Liu et al. , ; Torrens ). Varied preference of land owners may lead to alternative landscape patterns in terms of watershed conditions such as water quality here. This suggests the necessity of environmental education in terms of water quality protection for decision-makers involved in land development (Giri & Qiu ). Environmental education, possibly together with incentives from governments or institutional e orts on land conservation, will increase the awareness of decision makers in the potential negative e ect of land development on the water quality of residents' living environments. This will warrant the sustainability of watershed ecosystems shaped by land development. .
Further, the critical threshold of water quality driven by LULCC in this study is similar to (or can be explained as) relaxation phenomena (Malanson ), in which time-delayed e ect exists between the occurrence of an event (e.g., disturbance, stress, or land development here) in a system and response of the system to that event (e.g., change in water quality in this study). The temporal transition patterns of system states are so-called relaxation paths, which are driven by the strength of the event (land development in this study) and the capability that the system adjusts its states to equilibriums (di erent water quality states in this study). The time that a system stays within the relaxation phase is relaxation time. In our model, the relaxation time of a sub watershed is the number of years it will maintain the state of fair water quality. Figure depicts the distribution of the relaxation time of a specific sub watershed in our study region for the two experiments. As we could observe, the relaxation time of this specific sub watershed tends to be shortened as land development demand (macro level) increases or land owner agents have more preference on land transactions for development (micro level). This shortened relaxation time pattern can be observed for many sub watersheds with initial state of fair water quality (see Figure and ). However, for sub watersheds in which initial water quality state is good, the relaxation time tends to be delayed in response to increased land development demand and higher agent preference on land transactions. This is because it takes some time for sub watersheds to transit from the good water quality state to fair and this amount of time is shortened for higher development demand or higher preference of land owner agents. As a result, the relaxation time, which corresponds to the number of years sub watersheds stay at the fair water quality state, is longer for higher development demand or higher individual-level preference. Thus, we can see that the agent-based land change model together with the forest cover model of water quality provides substantial support for the exploration of the relaxation e ect in terms of the impact of LULCC on water quality in this study. This will be of great help in terms of developing a promoted understanding of the spatiotemporal complexity of watershed ecosystems.

Conclusion
. In this study, the agent-based land change model was developed to enable investigations on the relationship between LULCC and water quality in eight NC counties. Our study area covers the lower High Rock Lake Watershed region. The agent-based model was designed within a demand-allocation framework, including modules of land demand and spatial allocation. Two types of agents, land owners and land developers, were used to represent the decision-making processes involved with land development. Land owners have preference on selling their lands for land development, which is represented using distance-decayed modeling. Land developers interact with land owners and decide where and how to realize their needs of land development. Interactions between land owners and land developers produce a complex landscape of land development, which inherently drives spatiotemporal patterns of water quality at sub-watershed level in our study region.
. The contribution of this article is the use of agent-based modeling approach to investigate the space-time locations of critical threshold of water quality driven by land use and land cover change at various decision levels (macro and micro). Critical threshold or threshold e ect is a representative property of complex coupled human and natural system (Liu et al. a). However, studies on where and when threshold e ects may occur are inadequate in the literature. The work reported in this article has unique contribution to this thread, particularly, the study of space-time locations of critical threshold within the context of land use and land cover change. The water quality of sub watersheds in this study is represented by a forest cover model, which uses a threshold of -% of percent forest cover that classifies sub watersheds into di erent states (three: good, fair, and poor water quality). Spatiotemporally explicit land development modifies land cover patterns, which consequentially lead to the transition of water quality in sub watersheds into di erent states. In particular, those sub watersheds within the threshold are of more interest in terms of the protection of water quality. These sub watersheds experience transient dynamics (Malanson , ) that serve as a bu er state of transition between good and poor water quality (most of the time, from good to poor water quality). Our experiments indicate that the length of the transition phase, also known as relaxation time, is influenced by land development at di erent decision scales (macro-level: land change demand; micro-level: land owner preference). The agent-based land change model enables us to conduct alternative what-if scenario analysis that helps identify complex spatiotemporal patterns of sub watersheds experiencing critical thresholds. Further, the agent-based modeling approach is necessary (and preferred over other modeling approaches such as cellular automata or system dynamics models) because of its ability to supporting explicit representation of decision-making processes of individuals (land owners and land developers here) that are o en heterogeneous and uncertain. This allows us to investigate the impact of individual-level behavior on macro-level system response (watershedlevel water quality here).

.
Our agent-based land change model provide e icacious support for the exploration of complex relationship between LULCC and water quality in watershed-level ecosystems. Future work may focus on the following threads. First, we will apply this ABM to other study regions which may have alternative ecoregions (e.g., coastal plain or mountain) to further examine its utility in representing LULCC and associated spatiotemporal patterns of watershed conditions. Second, our current ABM has limitations in the representation of adaptive behavior of decision-or policy makers. Thus, in the future work, we plan to integrate machine learning approaches with the agent-based model so as to identify optimal or adaptive land conservation strategies, which will greatly support policy makers to prioritize regions in need of, for example, water quality protection. Machine learning approaches such as evolutionary algorithms, neural networks, swarm intelligence, and reinforcement learning hold great promise for the representation of adaptation in land systems (Bone & Dragićević ; Tang ) and will be used to explore the influence of adaptation in decision making of individuals and policy making of government agencies (e.g., for local government planning) on ecosystem-level response. Further, machine learning approaches can be of great help in the calibration and validation of agent-based modeling (e.g., use of neural networks for calibration of land developer's decision instead of logistic regression; see Pijanowski et al. ). Third, our ABM is an integration of agent-based approach, cellular automata, and distance-decayed models together with a set of parameters and input data. In future studies, we will conduct sensitivity analysis and uncertainty analysis to evaluate the relationship between uncertainties in model input or structure and uncertainties in model outcome. This will be of great help in minimizing the occurrence of possible structural mistakes and misspecification of our ABM (O'Sullivan et al. ). Fourth, the agent-based land change modeling is computationally demanding, in particular, for the experimentation of scenario analysis. Thus, a parallel computing solution of the agent-based land change model will be developed to accelerate model execution by leveraging high-performance computing power from cutting-edge cyberinfrastructure (NSF ; Tang & Jia ; Tang et al. ).

Model Documentation
This model was implemented in C++, based on the modification of an agent-based simulation platform, GAIASP (Tang ). The code of the model is available online here: https://github.com/wenwu-tang/agentlcm.

Appendix: The Overview, Design concepts, Details (ODD) Protocol
In this supplementary material, we document our agent-based model (ABM) using the ODD protocol by Grimm et al. ( ).

Purpose
Our ABM aims to simulate land changes in a large watershed (the lower High Rock Lake Watershed, HRLW, in North Carolina, USA), thus to identify the space-time locations of potential regions (sub watersheds) that may experience critical threshold of water quality.

Entities, state variables, and scales
There are two types of agents in the model: land owners and land developers. Both land owners and land developers are hypothetical agents situated within each county. Also, we assume each land unit (grid cell here) that is covered by farmland, forest, or natural land is managed by a land owner agent, and can be aquired by a land developer for residential development. Land developers are in charge of decision making in terms of where, how, and how many to acquire land for development based on his/her utility function represented by a logistic regression model. The land owners decide whether to sell their lands for development based on his/her perception on surrounding environments when contacted by a land developer. The utility function of a land owner agent to sell land is represented using a distance-decayed model.
The grid cells in the watershed serve as spatial context where land owners and land developers interact with each other. The state variables of land owners, land developers, and grid cells are listed in Table . One time step in the model represents one year in reality. We ran the model for years which covers three periods: for parameters calibration, -for model performance validation, and -for scenario-based future projection. The spatial resolution of the raster grid is m × m, resulting in , rows and , columns in the landscape of interest.

Description (unit) Value
Grid cell k Land cover type Farmland, forest, natural, water, urban (x, y) Coordinate of the grid cell (m) Distance to stream (m) x 5 Distance to major road (m) x 6 Distance to city (m) x 7 Development pressure Land owner agent i Identifier of the land owner agent j Identifier of county that land owner agent located in [ , ] α(i, j, k) Coe icient that controls magnitude of the utility of the land owner agent i in county j with respect to the land cover type k of his/her land parcel β(i, j, k) Coe icient that is associated with the steepness of the distancedecayed function representing the utility of land owner agent i in county j with respect to the land cover type k of his/her land parcel u_owner(i, j, k) Utility of land owner agent i in county j with respect to the land cover type k of his/her land parcel [ , ] Land developer agent B 0(j) , B 1(j) ,. . . ,β m(j) Coe icients for the corresponding driving factors in county j u_developer(i, j) Utility of a land developer agent i in county j Table : Overview of the state variables that characterize the agents and their situated environments in the ABM

Process overview and scheduling
The time step is one year in our ABM. The model starts from and run into for model calibration, to for model validation, and to for future projection under di erent scenarios. The overall processes in the ABM are presented in Figure . These processes mainly include ) acquire land for residential development, which includes four major steps: development initialization, negotiation with land owner, demand estimation, and patch growing; ) sell land for residential development; and ) estimation of water quality. Figure : The overall processes of the ABM.

Design concepts Basic principles
The model is the spatial representation of the urban development process in real-world, which involves the decision-making of land developers and land owners, as well as their direct and indirect interactions. In this study, we focus on the land conversion process from farmland, forest, and natural land to urban land. A land developers decide on whether to acquire a feasible land for residential development based on his/her utility function which is calculated from a set of driving factors using a logistic regression model. A land owner can be contacted by a land developer, and make his/her own decision on whether to sell land for development based on his/her utility function which is a distance-decayed abstraction of the surrounding environments.

Emergence
The spatial pattern of residential development in the lower HRLW, especially in each sub watershed, emerges from the interactions between land developer agents and land owner agents. The results of residential development in the previous period update the surrounding environment of agents, thus influencing their decisionmaking in subsequent steps.

Adaptation
The utility of land developer is an adaptive trait changing along with consecutive residential development activities in surrounding environments. So does the utility function of the land owners.

Objectives
Land owners closer to existing urban areas have higher utility of land conversion, thus more likely to sell their lands. Also, lands with more residential development in surrounding environment are more likely to be converted by land developers.

Learning
The ABM involves no learning attributes.

Prediction
The ABM involves no prediction attributes.

Sensing
Land developer agent can sense the state variables of grid cells that he/she intends to convert for estimating his/her utility. Land owner can sense the location and land cover type of the land that he/she manages, as well as the surrounding residential development, thus to calculate distance to nearest development areas in order to estimate his/her utility of selling land.

Interaction
Land developer has to directly contact a land owner to ask whether the owner will sell his/her land for residential development. Second, the spatial pattern of residential development emerging from interactions between land developers and land owners can influence their decision-making, which indicates indirect interactions between agents through the shared environment.

Stochasticity
First, there is a collection of utility functions of land owner agents in a county, which represents the heterogeneity of agent decision-making. For a specific land owner agent, his/her utility function is randomly drawn from the decision pool of land owner agents in the county. Second, the land selling process is stochastic. A random probability is drawn and compared against the utility value of the land owner agent to decide whether he/she will sell the land. Third, a land developer agent conducts a random inspection to find a land that is feasible to initialize a residential development activity. Fourth, the area (or size) of the new patch generated by land developer is randomly drawn from empirical distribution of area of new development patches from to . Fi h, during the patch growing process, whether a neighboring cell will be added into the new patch is determined stochastically, based on a composite utility that is the multiplication of the utility of a land developer and that of a land owner with respect to the neighboring cell.

Collectives
A patch which indicates an urban land parcel in real-world is a collective of grid cells, whose area is randomly drawn from observed distribution of new patches generated by historical development activities.

Observation
In each time step, the model observes the total amount of lands that are converted to urban in each county. At the end of the simulation, the model records the percent of forest and urban cover in each sub watershed to estimate the state of water quality.

Initialization
The ABM started from and ran to for model calibration and to for model validation. The total area of our study region is , km with size of , × , in terms of number of rows and columns. The region is administrated by counties and is divided into sub watersheds. Each grid cell covered by farmland, forest, or natural is assumed to be managed by a land owner agent. The collective of utility functions (the distancedecayed model) of land owners was calibrated using the land conversion data from -in each county. Land developers lead on the residential development process in each county. The utility function (the logistic regression model) of land developers in a county was calibrated using empirical land change data in the county from to . The factors used to fit the logistic regression model include elevation, slope, distance to cities, distance to major roads, distance to streams, stream density, and development pressure. Neighborhood size for the calculation of development pressure is by cells. A search radius of meters was used to derive stream density. The size of samples to fit the logistic regression model is , ( , for urban, , for nonurban) in each county and stratified random sampling scheme was used. We use the observed patch of change of the entire study region from to to derive the empirical distribution that a land developer agent will use to determine the area of new patch of development. We used the area of land conversion from to in each county to represent the annual land development demand during the calibration phase.

Input data
Three years ( , , and ) of land cover data were used to calibrate and validate the ABM, including the utility functions of land developers and land owners, and the structure of the ABM. Data of road and stream networks, locations of cities or towns, and DEM were used to calculate the factors that impact decisions of land developer agent. The spatial extent of HRLW, the sub watersheds in this region, and the county boundaries were from the USDA National Hydrography Dataset. The land cover data were retrieved from U.S. National Land Cover Database (NLCD). Elevation data is from NASA SRTM data.

Submodels
( ) Acquire land for residential development: The process of acquiring land for residential development consists of four major steps: ) Development initialization The land developers initialize the acquisition of land for residential development by searching for an initial location that is feasible under the guidance of his/her utility function. The utility function was derived through a logistic regeression analysis between historical residential development activities and a set of driving factors, which can be represented as in the following equations: where u_developer(i, j) is the utility of a land developer agent i in county j. x 1 − x m are the driving factors (m: the number of driving factors). β 1 − β m are coe icients for the corresponding driving factors. β 0 is the intercept of the logistic regression model.
Land developer agents in a county share the same utility function calibrated using empirical land change data in the county from to . Specifically, a land developer agent inspect a land that is feasible for conversion in a stochastic manner (based on the development probability of the land). In this study, those areas that are water bodies, wetlands, or protected areas are excluded from development.
) Negotiation with land owner Once the land developer selects a feasible initial location (a grid cell) for development, he/she has to contact the land owner dwelling in this cell to negotiate about the selling of the land. The land owner will decide whether to sell the land for residential development based on his/her utility. If the land owner agrees to sell the land, then the land developer will initialize a new patch of residential development with specified area from this location.
) Demand estimation If the land developer and the land owner reach an agreement on initializing a residential development activity, the land developer will specify the amount of land conversion, i.e. the area of the new patch. In this study, the area of the new patch is randomly drawn from a pool of patch size derived from the observed patches of change of the entire study region from to (calibration period).
) Patch growing A patch growing mechanism is used to form the new patch with speicfied area: neighboring lands (cells) are iteratively added into the new patch until the specified patch area is reached, or the annual demand of residential development in a county is satisfied. Similarily, each time a new cell is to be included in the new patch, the land developer has to negotiate with the land owner. In other words, the probability that a neighboring land cell is added into the development patch is the composite utility that the land developer agent decides on land acquisition multiplied by the land owner agentâĂŹs utility on land transactions at that cell.
( ) Sell land for residential development The land owner decides whether to sell his/her land (including farmland, forest, and natural in this study) during the negotiation with land developer according to the utility from land transaction. The utility function is represented by a distance-decayed model, which is formulated as follow: u_owner(i, j, k) = e −α(i,j,k) e β(i,j,k) * d ( ) where u_owner(i, j, k) is the utility function of land owner agent i in county j with respect to the land cover type k of his/her land parcel (k = for forest/natural; k = for farmland). α(i, j, k) is the coe icient that controls magnitude of the utility of the land owner agent (increase in α will lower the magnitude of the utility). β(i, j, k) is the coe icient that is associated with the steepness of the distance-decayed function representing the utility of the agent. d is the distance of the land owned by the agent to the nearest urban area. In this study, we assume that each land owner has his/her unique utility function, thus all land owners in the watershed generate a collection of utility functions. The pool of utility functions of land owner agents in county j, noted as D(j), can be formulated as: This collective of utility functions of land owner agents in each county can be calibrated by fitting the distancedecayed function using the empirical land conversion data in terms of distance to nearest urban land cover.
Here we used the land conversion data from -in each county to calibrate the utility functions, and the % confidence intervals of the fitted coe icients was used to represent the pool of decision rules of land owner agents, which becomes Equation : D(j) = {(α ijk , β ijk ) | α ijk ∈ CI α(j, k) , β ijk ∈ CI β (j, k) ( ) whereα(j, k) andβ(j, k) are the fitted coe icients of the distance decayed model for land owner agents in county j with respect to land cover type k. CI(.) denotes the confidence interval of the corresponding variables. During each negotiation process between the land owner and the land developer, the land owner agents randomly draw an utility function for the pool, and compare the his/her utility with a random probability. If the utility is larger, then he/she will sell the land. Otherwise, he/she will not.
( ) Estimation of water quality: The water quality in the HRLW was related to the percent of forest cover in a sub watershed. The relationship was established based on a empirical analysis of five years of land cover data and benthic macroinvertebrate data. Sub watersheds with percent forest cover higher than % (or lower than %) correspond to good (or poor) water quality. Any sub watersheds with the percentage of forest cover within the range of -% are at the tipping point with fair water quality that may be shi ed to either good or poor states.