Yang, Yuanxuan and Turner, Andy (2020) Dataset of the enhanced demand prediction models in bike-sharing systems using graph structural information. University of Leeds. [Dataset] https://doi.org/10.5518/851 The data collection underpins the findings reported in Yang, Y., Heppenstall, A., Turner, A., & Comber, A. (2020). Using graph structural information about flows to enhance short-term demand prediction in bike-sharing systems. Computers, Environment and Urban Systems, 83, 101521. https://doi.org/10.1016/j.compenvurbsys.2020.101521. This paper is available openly and is hereby referred to as "the paper". Please adapt the text in line 1 of this README for for data citation and reference purposes. The data collection includes the source input data downloaded from the sources, preprocessing scripts, processed data and processing scripts that generated the data generalisations and visualisations shown in the paper. The scripts are R code. R Version 3.6.0 was used to generate the processed data and the results presented in the paper. The data collection has been organised into two directories/folders as depicted below. | + data | | | + input | | | | | + CitiBike | | | | | | | + 201611-citibike-tripdata.zip | | | + 201612-citibike-tripdata.zip | | | + 201701-citibike-tripdata.csv.zip | | | + 201702-citibike-tripdata.csv.zip | | | + 201703-citibike-tripdata.csv.zip | | | + 201704-citibike-tripdata.csv.zip | | | + 201705-citibike-tripdata.csv.zip | | | + 201706-citibike-tripdata.csv.zip | | | + 201707-citibike-tripdata.csv.zip | | | + 201708-citibike-tripdata.csv.zip | | | + 201709-citibike-tripdata.csv.zip | | | + 201710-citibike-tripdata.csv.zip | | | + LICENSE.txt | | | + README.txt | | | | | + weather | | | | | + hourly_weather.csv | | + LICENSE-CCASA.txt | | + LICENSE-odbl-10.txt | | + README.txt | | | + processed | | | + bikedata.Rdata | | + code | | | + feature_importance | | | | | + feature_imp_XGB.R | | | + models | | | | | + LSTM_TD_PGI_FGI.R | | + MLP_TD_PGI_FGI.R | | + XGB_TD_PGI_FGI.R | | | + preprocess | | | + preprocess.R | + bike_forecast.Rproj Individual file description list: 1. bike_forecast.Rproj This file is an RStudio project file. This file is provided to help users of this data reproduce results. It is located in the top level directory as this is currently the best location for it so that RStudio will link to all the necessary parts. Details of this type of file are provided at the following URLs. https://fileinfo.com/extension/rproj https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects It is not know if there is an open standard file specification for this format of data. 2. data > input > CitiBike > README.txt The README file for the source New York Citi Bike Scheme data. It links to the LICENSE for these data and provides details of all the source data Zip files. 3. data > input > CitiBike > LICENSE.txt This is the LICENSE for the New York Citi Bike Scheme data. In accordance with the LICENCE these data are included as source material, as applicable, in analyses, reports, or studies published or distributed for non-commercial purposes. The file contains the text on the Web Page at the following URL on 2021-01-06: https://www.citibikenyc.com/data-sharing-policy 4. data > input > CitiBike > 201611-citibike-tripdata.zip New York Citi Bike Scheme data obtained via: https://www.citibikenyc.com/system-data The Zip file contains bike travel records for November 2016. Please see data > input > CitiBike > README.txt for details. Please see data > input > CitiBike > LICENSE.txt for the LICENSE. In accordance with the LICENCE these data are included as source material, as applicable, in analyses, reports, or studies published or distributed for non-commercial purposes. 5. data > input > CitiBike > 201612-citibike-tripdata.zip New York Citi Bike Scheme data obtained via: https://www.citibikenyc.com/system-data The Zip file contains bike travel records for December 2016. Please see data > input > CitiBike > README.txt for details. Please see data > input > CitiBike > LICENSE.txt for the LICENSE. In accordance with the LICENCE these data are included as source material, as applicable, in analyses, reports, or studies published or distributed for non-commercial purposes. 6. data > input > CitiBike > 201701-citibike-tripdata.csv.zip New York Citi Bike Scheme data obtained via: https://www.citibikenyc.com/system-data The Zip file contains bike travel records for January 2017. Please see data > input > CitiBike > README.txt for details. Please see data > input > CitiBike > LICENSE.txt for the LICENSE. In accordance with the LICENCE these data are included as source material, as applicable, in analyses, reports, or studies published or distributed for non-commercial purposes. 7. data > input > CitiBike > 201702-citibike-tripdata.csv.zip New York Citi Bike Scheme data obtained via: https://www.citibikenyc.com/system-data The Zip file contains bike travel records for February 2017. Please see data > input > CitiBike > README.txt for details. Please see data > input > CitiBike > LICENSE.txt for the LICENSE. In accordance with the LICENCE these data are included as source material, as applicable, in analyses, reports, or studies published or distributed for non-commercial purposes. 8. data > input > CitiBike > 201703-citibike-tripdata.csv.zip New York Citi Bike Scheme data obtained via: https://www.citibikenyc.com/system-data The Zip file contains bike travel records for March 2017. Please see data > input > CitiBike > README.txt for details. Please see data > input > CitiBike > LICENSE.txt for the LICENSE. In accordance with the LICENCE these data are included as source material, as applicable, in analyses, reports, or studies published or distributed for non-commercial purposes. 9. data > input > CitiBike > 201704-citibike-tripdata.csv.zip New York Citi Bike Scheme data obtained via: https://www.citibikenyc.com/system-data The Zip file contains bike travel records for April 2017. Please see data > input > CitiBike > README.txt for details. Please see data > input > CitiBike > LICENSE.txt for the LICENSE. In accordance with the LICENCE these data are included as source material, as applicable, in analyses, reports, or studies published or distributed for non-commercial purposes. 10. data > input > CitiBike > 201705-citibike-tripdata.csv.zip New York Citi Bike Scheme data obtained via: https://www.citibikenyc.com/system-data The Zip file contains bike travel records for May 2017. Please see data > input > CitiBike > README.txt for details. Please see data > input > CitiBike > LICENSE.txt for the LICENSE. In accordance with the LICENCE these data are included as source material, as applicable, in analyses, reports, or studies published or distributed for non-commercial purposes. 11. data > input > CitiBike > 201706-citibike-tripdata.csv.zip New York Citi Bike Scheme data obtained via: https://www.citibikenyc.com/system-data The Zip file contains bike travel records for June 2017. Please see data > input > CitiBike > README.txt for details. Please see data > input > CitiBike > LICENSE.txt for the LICENSE. In accordance with the LICENCE these data are included as source material, as applicable, in analyses, reports, or studies published or distributed for non-commercial purposes. 12. data > input > CitiBike > 201707-citibike-tripdata.csv.zip New York Citi Bike Scheme data obtained via: https://www.citibikenyc.com/system-data The Zip file contains bike travel records for July 2017. Please see data > input > CitiBike > README.txt for details. Please see data > input > CitiBike > LICENSE.txt for the LICENSE. In accordance with the LICENCE these data are included as source material, as applicable, in analyses, reports, or studies published or distributed for non-commercial purposes. 13. data > input > CitiBike > 201708-citibike-tripdata.csv.zip New York Citi Bike Scheme data obtained via: https://www.citibikenyc.com/system-data The Zip file contains bike travel records for August 2017. Please see data > input > CitiBike > README.txt for details. Please see data > input > CitiBike > LICENSE.txt for the LICENSE. In accordance with the LICENCE these data are included as source material, as applicable, in analyses, reports, or studies published or distributed for non-commercial purposes. 14. data > input > CitiBike > 201709-citibike-tripdata.csv.zip New York Citi Bike Scheme data obtained via: https://www.citibikenyc.com/system-data The Zip file contains bike travel records for September 2017. Please see data > input > CitiBike > README.txt for details. Please see data > input > CitiBike > LICENSE.txt for the LICENSE. In accordance with the LICENCE these data are included as source material, as applicable, in analyses, reports, or studies published or distributed for non-commercial purposes. 15. data > input > CitiBike > 201710-citibike-tripdata.csv.zip New York Citi Bike Scheme data obtained via: https://www.citibikenyc.com/system-data The Zip file contains bike travel records for October 2017. Please see data > input > CitiBike > README.txt for details. Please see data > input > CitiBike > LICENSE.txt for the LICENSE. In accordance with the LICENCE these data are included as source material, as applicable, in analyses, reports, or studies published or distributed for non-commercial purposes. 16. data > input > Weather > README.txt The README file for the source Open Weather Map (https://openweathermap.org/) data. It links to the LICENSE for these data and provides details of the source data file. 17. data > input > Weather > LICENSE-CCASA.txt Creative Commons Attribution-ShareAlike 4.0 International Public License A license for the Open Weather Map (https://openweathermap.org/) data. 18. data > input > Weather > LICENSE-odbl-10.txt ODC Open Database License (ODbL) Version 1.0 A license for the Open Weather Map (https://openweathermap.org/) data. 19. data > input > Weather > hourly_weather.csv Open Weather Map (https://openweathermap.org/) hourly meteorological records for New York from 01/10/2012 12:00:00 to 2017-10-31 23:00:00. A Comma Separated Version rectangular data file with 7 columns and 44557 rows. The first row of data is a header providing some field names. Field 1 has a blank name, values are sequencial numerical IDs for the rows of data and are enclosed in double quotes. Field 2 is named "weather.datetime", values are standard date and time format YYYY-MM-DD HH:MM:SS and are enclosed in double quotes. Field 3 is named "weather.humidity", values are numeric. Field 4 is named "weather.pressure", values are numeric. Field 5 is named "weather.temperature", values are numeric. Field 6 is named "weather.weather_description", values are text and are enclosed in double quotes. Field 7 is named "weather.wind_speed", values are numeric. Example data record: "3","2012-10-01 14:00:00",57,1012,288.24767617,"few clouds",7 Open Weather Map provided all products and services under the terms of the Creative Commons Attribution-ShareAlike 4.0 International licence (CC BY-SA 4.0). The data and database are open and licensed by the Open Data Commons Open Database License (ODbL). Under these licences, the data can be freely used for non-commercial or commercial purposes. It is requested that OpenWeather as a weather data source is made visible. 20. data > code > preprocess > preprocess.R R script to preprocess the raw travel records and hourly weather data. Cluster identifiers were determined using Hierarchical Clustering based on spatial proximity; the hourly graph structures were constructed to facilitate the calculating of temporal graph structural information. Categorical features were converted by Multiple Correspondence Analysis (MCA) to generate numeric variables in lower dimensions. Details of the preprocessing were described in the paper. 21. data > code > bikedata.Rdata This preprocessed data is provided for convenience, it can be recreated by running the "preprocess.R", which requires a bit over 1 hour on a standard desktop PC (Intel i7-7700 CPU @3.6.0GHz, 16.0 GB RAM). The preprocessed data for each time step contains: temporal variables; meteorological variables; bike travel graph structural variables, bike trip counts (the target variable) and cluster identifiers. The format of the data is the Rdata binary format as described at the following: https://www.loc.gov/preservation/digital/formats/fdd/fdd000470.shtml https://cran.r-project.org/doc/manuals/r-release/R-data.html 22. data > code > feature_importance > feature_imp_XGB.R R script used to estimate feature importance for predicting short-term travel demand in the bike-sharing system. The XGBoost model is used to compare the importance of time-lagged graph structural information and meteorological variables. 23. data > code > models > XGB_TD_PGI_FGI.R R script used to enhance short-term bike travel demand prediction using graph structural information and XGBoost models. Three variants are included in the script, namely the XGB-TD, XGB-PGI and XGB-FGI models. Each of them takes advantages of different sets of time-lagged graph structural information; details of the models are described in the paper. 24. data > code > models > MLP_TD_PGI_FGI.R R script used to enhance short-term bike travel demand prediction using graph structural information and Multilayer Perceptron (MLP) neural network models. Three variants are included in the script, namely the MLP-TD, MLP-PGI and MLP-FGI models. Each of them takes advantage of different sets of time-lagged graph structural information; details of the models are described in the paper. 25. data > code > models > LSTM_TD_PGI_FGI.R R script used to enhance short-term bike travel demand prediction using graph structural information and Long Short-Term Memory (LSTM) neural network models. Three variants are included in the script, namely the LSTM-TD, LSTM-PGI and LSTM-FGI models. Each of them takes advantages of different sets of time-lagged graph structural information; details of the models are described in the paper.