
1\. ABOUT THE DATASET

\--------------------



Title:	Dataset for 'A data-driven model of pedestrian stepping behaviour including randomness and interpersonal interactions'



Creator(s): Samuel Curtis \[1], Mateusz Bocian \[1], Artur Soczawa-Stronczyk \[1,2]



Organisation(s): 1. University of Leeds 2. Bridges and Civil Structures Team, Buro Happold, London, UK



Rights-holder(s):Unless otherwise stated, Copyright 2026 University of Leeds



Publication Year: 2026



Description:

Data collected from the experiment of Soczawa-Stronczyk et al. (DOI: 10.1016/j.humov.2019.06.007), consisting of 3D acceleration data from APDM Opal sensors from 9 pedestrians across 44 trials, in which the participants walked in 3-by-3 formation in a straight line. The trial speed was set by a pacer with a metronome, and participant positions were randomised each trial.  Only trials 1-33 (those without the instruction to synchronise) are used in the analysis in the current paper. The random positions and order of the trials are given in a PDF file and code which unshuffles these before analysis is given. In addition, code is provided which extracts the gait cycle (phase) using wavelet transform, and converts this to a series of step/stride intervals. Code is provided that analyses these time series in pairs to identify synchronised runs. The code used in fitting the ARMA model is also provided. The code that uses the model to simulate trials is given, along with the output time series from 100 simulations for various values of coupling constant K(f) and coupling function G. Functions associated with the simulation are given in a subfolder. Code and data used to plot figures is given in a subfolder.



Cite as: 

Curtis et al. (2026) Dataset for 'A data-driven model of pedestrian stepping behaviour including randomness and interpersonal interactions'. University of Leeds. https://doi.org/10.5518/1858



Contact: vdft5519@leeds.ac.uk



2\. TERMS OF USE

\---------------



This dataset is licensed under a Creative Commons Attribution 4.0 International Licence: https://creativecommons.org/licenses/by/4.0/

Copyright 2026 University of Leeds



3\. PROJECT AND FUNDING INFORMATION

\----------------------------------



Title: Examining the dynamic response of bridges to pedestrian crowds



Dates: 2025-28



Funding organisation: School of Civil Engineering, University of Leeds



4\. CONTENTS

\-----------



**IN MAIN FOLDER:**



*MATLAB code files, all .m. Listed approximately in order that the associated analysis appears in the paper:*



Step1\_PreProcessing

chop raw accel signal (lateral and vertical), re-arrange participants so that column i corresponds to position i (i=1:9) and sort tests in order of speed with first 33 being the uninstructed trials.



Step2new\_BivariateAnalysis\_lateral, Step2new\_BivariateAnalysis\_vert

cut start and end of (lateral, vertical) signal, wavelet transform signal, identify the frequency with the greatest power in the wavelet transform and extract phase of wavelet transform. calculate phase difference between all 36 pairs in each trial.



SC\_Step\_Times\_model\_fitting

finds position for each subject in each test (used to unshuffle pedestrians for analysis). extracts stride and step interval series from phase. fits ARMA model for lags 0 to 5 to step interval series and calculates residuals, AICc and BIC, performs LBQ test. plots ARMA lags that give the lowest AICc and BIC. extracts fitted coefficients A1,C1, standard deviation from the ARMA(1,1) model. fits probability distributions to these and fits quadratic relationship between speed and standard deviation. averages ratio between coefficients so that the quadratics for each ped (with positive leading coef) only depend on one parameter, p1. fits distribution to p1.



SC\_stepstride\_discrep

calculates time discrepancy between stride times calculated from lateral data and those from summing two consecutive step from vertical data.



SC\_time\_series\_correlation

calculates measures of self-similarity (DFA exponent, RSR, power spectrum and correlogram slope) from step interval series (not discussed in detail in the paper). calculates ACF, PACF, standard deviation of step interval series and lags taken to fall within 95% confidence bounds for no corr.



SC\_time\_series\_synced\_runs

for each pair of series of step intervals, identifies synced periods based on steps taken close in time. lengths of synced periods, and proportion of synced steps calculated and compared between connections. calculates and plots variance and mean step interval for synced periods and for whole trials.



SC\_Lat\_Connection\_Comparison\_SC

calculates trial speed. calculates instantaneous frequency difference from phase. identifies periods of trials spent in (anti)phase, within 0.015Hz step frequency. plots phase histograms for different connections, for all pairs and for only periods within 0.015Hz step frequency.



SC\_arma\_params\_random

simulates 10000 random draws of the parameters a\_1,c\_1,p\_1 as in the model and calculates the (theoretical) variance of the random part of the step interval series from these, with and without the thresholding described in section 3.2.2.3



SC\_crowdsim\_varying\_Kc

runs simulations for each coupling function G and values of K(f) specified, and stores the resulting step interval series with their component parts. calculates the PSS for pairs of pedestrians from the simulation



SC\_crowdsim\_analysis\_1

from the run simulation results and experimental results, fits a Weibull function to the proportions of synchronised steps (PSS) in runs of length >2, for each value of K(f) used in sims. obtains the distribution across simulations of PSS for each value of K(f), and chooses the K(f) that minimises distance between mean PSS in simulations and experiments. calculates variance and ACF of simulated step interval series.





*MATLAB data files and variable contents, all .mat. Numbers in brackets in descriptions of variables represent dimension (i.e. 1=rows, 2=columns) along which a parameter varies. only variables containing raw data or relevant to analysis and figure plots are described. Trials (usually corresponding to rows) 1-33 are sorted in ascending order of metronome frequency unless stated:*



Data\_extracted

raw accelerometer data, and variables for sorting tests

accLCSraw, accWCSraw 894702x27 double - raw accel data in local coordinate system (LCS) and world coordinate system (WCS). Each set of 3 columns represents the x,y,z coordinate for one sensor (9 in total)

arrangement 132x3 double - each entry in each 3x3 block corresponds to the sensor/pedestrian in that position in the experiment (44 blocks)

pattern 2x36 double - 36 columns correspond to 36 unique pairwise connections, the two entries in each column are the two positions involved. many other variables (e.g. phase difference) that are indexed by pairs use this as a key

testInfo 44x4 double - test info for unsorted trials 1 to 44 (rows). column 2 is the metronome frequency given to the pacer, columns 3 and 4 are the indices in the acceleration data of the start and end of each test.

testSort 44x1 double - order of tests used to sort in ascending order of speed, see pdfs of test info



Speed\_unshuffled

walking speed and heights, positions

speed - walking speed of trials 1 to 44 (1), identical columns

height 1x9 double - heights of pedestrians with sensors 1 to 9

position 44x9 double - column indices represent the position of that pedestrian/sensor in the trial corresponding to the row. for example, '3' in row 1 column 2 means that in trial 1 (after sorting), pedestrian/sensor 2 was in position 3.



Data\_PreProcessed

acceleration data after pre-processing (chopping signal and sorting order of tests)

accLCSy 1x44 cell - lateral acceleration data for each trial (1 to 44) as a 9-column matrix

accWCSz 1x44 cell - vertical acceleration data for each trial (1 to 44) as a 9-column matrix

pattern 2x36 double - 36 columns correspond to 36 unique pairwise connections, the two entries in each column are the two positions involved. many other variables (e.g. phase difference) that are indexed by pairs use this as a key

testInfoSort 44x4 double - test info for trials 1 to 44 after sorting by frequency. column 1 is the original order of the tests. column 2 is the metronome frequency given to the pacer, columns 3 and 4 are the indices in the acceleration data of the start and end of each test. column 5 is the trial speed in m/s.



Data\_Lateral\_Step2\_1, Data\_Vertical\_Step2\_1 (vertical variables preceded by a 'V')

phase data extracted from lateral, vertical signal using wavelet transform

timeTest 1x44 cell - time of each sample in seconds, starting from 0 for each truncated trial, for trials 1 to 44

phind 44x9 cell - phase for trials 1 to 44 (1), peds in positions 1 to 9 (2)

ph 44x36 cell - pairwise phase difference for trials 1 to 44 (1), pairs 1 to 36 (2)

nFreqind 44x9 double - frequency (1/scale) at which wavelet transform is taken (corresponding to greatest power) for trials 1 to 44 (1), peds in positions 1 to 9 (2)

cwtind 44x9 cell - complex wavelet transform at chosen scale for trials 1 to 44 (1), peds in positions 1 to 9 (2)



Data\_time\_series

steptimes,t\_steps,stridetimes,t\_strides 44x9 cell - step interval series, time instances in s of steps, stride interval series, time instances in s of strides for trials 1 to 44 (1), peds in positions 1 to 9 (2)

avstepfreq,avsteptime,stdsteptime,avstridefreq,avstridetime,stdstridetime 44x9 double - mean step frequency in Hz, mean step duration in s, SD of step durations, mean stride frequency, mean stride duration, SD of stride durations for trials 1 to 44 (1), peds in positions 1 to 9 (2)

firstfoot 44x9 logical - first foot (0=L,1=R) used in each step series for trials 1 to 44 (1), peds in positions 1 to 9 (2)

t\_firststep, t\_firststride 44x9 double - time instance of first step, stride for trials 1 to 44 (1), peds in positions 1 to 9 (2)



Data\_synced\_runs

experimental results concerning synchronised runs

sspropexp 33x36x40 double - PSS from experiments for trials 1 to 33 (1), pairs 1 to 36 (2), run lengths 1 to 40 (3)

st\_sy\_ftb4 33x33 struct - run data for trials 1 to 33 (1), pairs 1 to 36 with FtB indices 3,11,18,24,29,33 nonempty (2). structure for each pair consists of step interval and time instance data (including mean and SD) for both pedestrians for synchronised steps in runs of 4 or more steps

syncratio 5x40 double - average PSS from experiments for connection type 1 to 5 (1), run length 1 to 40 (2)

durdata,tdata 44x9 cell - step interval series, time instances of steps for trials 1 to 44 (1), peds in positions 1 to 9 (1)



Data\_model\_fitting\_A

variables concerning the fitted ARMA(1,1) model and relationship with walking speed

A1,C1 33x9 double - fitted ARMA(1,1) coefficients for trials 1 to 33 (1), peds in positions 1 to 9 (2)

XV 33x9 double - walking speed for trials 1 to 33 (1), peds 1 to 9 (all peds have identical speed in each trial) (2)

Yfh 33x9 double - product of mean step frequency and height for trials 1 to 33 (1), peds 1 to 9 (2)

Ymodelsd 33x9 double - SD of innovations of the fitted ARMA(1,1) model for trials 1 to 33 (1), peds 1 to 9 (2)

Ysteptimesd 33x9 double - SD of series of step intervals for trials 1 to 33 (1), peds 1 to 9 (2)

p1,p2,p3 1x9 double - coefficients of the fit p1\*x^2 + p2\*x + p3 between walking speed and SD of innovations for pedestrians 1 to 9



Data\_model\_fitting\_ARMA

variables related to fitting an ARMA model and choosing the number of lags

Vmodel 44x9 cell - for trials 1 to 44 (1), peds in positions 1 to 9 (2) a 6x6 array of arima objects corresponding to fitted ARMA models for lags p,q from 0 to 5

aicp,aicq,bicp,bicq 44x9 double - for trials 1 to 44 (1), peds in positions 1 to 9 (2), best choice of p and q according to least AICc and BIC

num\_corr 6x6 double - for lags p,q from 0 to 5, number of pedestrian time series in first 33 trials that test positive for autocorrelation of residuals (LBQ test)

resid 105x1 double - example residual series



Data\_corr

ACF and PACF related

acf 33x9 cell - ACF at all lags for trials 1 to 33 (1), pedestrians in positions 1 to 9 (2)

actime, pctime 33x9 double - number of lags taken for ACF, PACF to fall within the 95% confidence limits



Data\_Lat\_cts

frequency difference derived from lateral (stride) phase data

freqdiffwin 44x36 cell - instantaneous frequency difference series for trials 1 to 44 (1), pairs 1 to 36 (2) calculated from phase difference



changing\_KC\_1\_fun\_82, changing\_KC\_2\_fun\_82

complete simulation results for coupling functions G1,G2.

Ks 1x25 double - values of K(f) used, 0.000:0.001:0.024

durstore 25x100 cell - step interval time series, rows correspond to values of K(f), columns are simulations runs 1 to 100

gammastore 25x100 cell - coupling term time series

tstore 25x100 cell - step time instance time series

t\_rndstore 25x100 cell - random term time series

simstepvar 33x9x25x100 double - variance of step interval series for trials 1 to 33 (1), pedestrians in positions 1 to 9 (2), values of K(f) (3), simulations 1 to 100 (4)

syncratio 5x40x25x100 double - averaged PSS from simulations for connections of type 1 to 5 (see types) (1), run lengths 1 to 40 (2), values of K(f) (3), simulations 1 to 100 (4)

ssprop 33x36x40x25x100 - PSS of individual pairs from simulations for trials 1 to 33 (1), pairs 1 to 36 (2), run lengths 1 to 40 (3), values of K(f) (4), simulations 1 to 100 (5)



Data\_arma\_params\_random

variance of experimental step interval time series, and generated variances using randomly drawn parameters

expvar 33x9 double - experimental variance for trials 1 to 33 (1), pedestrians in positions 1 to 9 (2)

modelvar 10000x1 double - simulated theoretical variances of 10000 step time series using procedure from section 3.2.2.3

modelvar\_init 10000x1 - as above before thresholding procedure



Data\_simulations\_G1\_82, Data\_simulations\_G2\_82

results from simulations, with experimental results - see also changing\_KC\_1\_fun

actime 33x9 double - number of lags taken for ACF, PACF to fall within the 95% confidence limits in experiments for trials 1 to 33 (1), peds in positions 1 to 9 (2)

actimesim 33x9x25x100 double - number of lags taken for ACF, PACF to fall within the 95% confidence limits in simulations for trials 1 to 33 (1), peds in positions 1 to 9 (2), increasing values of K(f) (3), simulation number (4)

avpss\_sim,sdpss\_sim 5x40x25 double - average,SD PSS across all simulations across connection types 1 to 5 (1), run length 1 to 40 (2), values of K(f) (3)

bestKf\_ind 5x40 double - best choice of K(f) for simulations (index in Ks) according to closeness to experimental PS value, for connection types 1 to 5 (1), run length 1 to 40 (2)

expbeta, explambda 1x5 double - Weibull fit parameters from experimental data for connection types 1 to 5

simbetaavg,simlambdaavg 5x25 double - Weibull fit parameters from simulations for connection types 1 to 5 (1), values of K(f) (2)

simstepmean 33x9x25x100 double - mean step interval from simulations for trials 1 to 33 (1), peds in positions 1 to 9 (2), values of K(f) (3), simulation number (4)

syncratioexp 5x40 double - average PSS from experiments for connection type 1 to 5 (1), run length 1 to 40 (2)



*pdf files:*



Participant ID - height and weight, of each pedestrian 1 to 9



Test Notes - walking formation and metronome frequency of each test (in time order, before sorting by frequency performed in analysis) (see 'TestSort' and 'position' variables)



**IN simulation SUBFOLDER:**

scripts (functions) used by scripts in the main folder, carrying out the simulations of the 3x3 group of pedestrians with randomly generated parameters at the speeds corresponding to the experiment. the function 'crowd1\_3x3\_article1\_KM.m' is where the coupling function and coupling constant can be provided as inputs, see file for details



**IN plots SUBFOLDER:**

scripts and data used to plot all figures appearing in the paper. most of the data is duplicated from variables in the main folder



5\. METHODS

\----------


Full details of data collection can be found in Soczawa-Stronczyk et al. (DOI: 10.1016/j.humov.2019.06.007). Details of data processing and simulations can be found in the main paper, and associated references.





