1. ABOUT THE DATASET
--------------------

Amazon deforestation causes strong regional warming

Edward W. Butt1*, Jessica C. A. Baker1, Francisco G. Silva Bezerra2, Celso von Randow2, Ana P. D. Aguiar2, 3 and Dominick V. Spracklen1

1School of Earth and Environment, University of Leeds, Leeds, UK
2National Institute for Space Research (INPE), São José dos Campos, Brazil.
3Stockholm Resilience Centre, Stockholm, Sweden.

*Correspondence to Edward W. Butt: e.butt@leeds.ac.uk

Copyright 2022 University of Leeds

This document describes the dataset used to support the findings of this study.

The dataset can be downloaded from:
https://doi.org/10.5518/1325

Publcation year: 2023

Cite as: E. W. Butt. Amazon deforestation causes strong regional warming. [Dataset]. https://doi.org/10.5518/1325 

2. TERMS OF USE
---------------

This dataset is licensed under a Creative Commons Attribution 4.0 International Licence: https://creativecommons.org/licenses/by/4.0/.]

3. PROJECT AND FUNDING INFORMATION
----------------------------------

This work received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (Grant agreement no. 771492), the Natural Environmental Research Council (NE/J009822/1) and the Newton Fund, through the Met Office Climate Science for Service Partnership Brazil (CSSP Brazil). 

4. CONTENTS
-----------

The dataset is saved as a csv file ('Dataset_butt_etal.csv') and can be opened using the python pandas package, for example:

> python
> import pandas as pd
> df = pd.read_csv( ‘Dataset_butt_etal.csv’ )

The dataset comprises 23 column features:

1)	'lat':
Latitude centre in degrees of each 1km x 1km grid cell.

2)	'lon': 
Longitude centre in degrees of each 1km x 1km grid cell.

3)	'Latitude_rescale':
Latitude centre normalised resulting in values between 0 and 1.

4)	'Elevation_rescale':
Elevation in meters normalised resulting in values between 0 and 1.

5)	'Distance_coast_rescale':
Distance to nearest coast in km normalised resulting in values between 0 and 1.
 
6)	'local_0-2km_start':
Local 0-2 km forest fraction start (averaged from 2001 to 2003) in grid cell.

7)	'regional_2-5km_start':
Regional halo: 2-5 km forest fraction start in grid cell.

8)	'regional_5-10km_start':
Regional halo: 5-10 km forest fraction start in grid cell. 

9)	'regional_10-25km_start': 
Regional halo: 10-25 km forest fraction start in grid cell.

10)	'regional_25-50km_start':
Regional halo: 25-50 km forest fraction start in grid cell.

11)	'regional_50-100km_start':
Regional halo: 50-100 km forest fraction start in grid cell.

12)	'local_0-2km_end':
Local 0-2 km forest fraction end (averaged from 2018 to 2020) in grid cell.

13)	'regional_2-5km_end':
Regional halo: 2-5 km forest fraction end in grid cell.

14)	'regional_5-10km_end':
Regional halo: 5-10 km forest fraction end in grid cell.

15)	'regional_10-25km_end':
Regional halo: 10-25 km forest fraction end in grid cell.

16)	'regional_25-50km_end':
Regional 25-50 km forest fraction end in grid cell.

17)	'regional_50-100km_end':
Regional halo: 50-100 km forest fraction end in grid cell.

18)	'Delta_T':
Target feature: Surface temperature change calculated by subtracting the average surface temperature of the driest month for two periods at the end (2018 to 2020) and start (2001 to 2003) of the study period.

19)	'regional_2-10km_start':
Regional halo: 2-10 km forest fraction start in grid cell.

20)	'regional_2-10km_end':
Regional halo: 2-10 km forest fraction end in grid cell.

21)	'regional_10-100km_start':
Regional 10-100 km forest fraction start in grid cell.

22)	'regional_10-100km_end':
Regional halo: 10-100 km forest fraction end in grid cell.

23)	'train_test_split':
Identification whether data point is part of the training or testing dataset. 

Features used for machine learning are those described in Table 1 of the main paper and are change between start and end forest fraction from column 6 to column 17.


Code is provided in the form of two jupyter notebook files:

 1. 'reproduce_figure2.ipynb': provides code to reproduce Figure 2 as shown in the manuscript.

 2. 'XGBoost.ipynb': provides the code to run the XGBoost model under the best model feature selection using all halo distance features. This code requires the python file 'best_params.npy' to load the selected model parameters used in the model. This code also produces panel d in Figure 3 as shown in the main manuscript.