################### Description ################### The MINDSEYE2015 dataset is a subset of the Mind's Eye Year 2 dataset (http://www.visint.org/datasets) which contains videos showing a variety of people interacting with various different objects. It was primarily created to improve carried object tracking using event analysis based on the following work: [1] A. Tavanai, M. Sridhar, E. Chinellato, A. G. Cohn, D. C. Hogg. “Joint Tracking and Event Analysis for Carried Object Detection” British Machine Vision Conference (BMVC), 2015. As a result, the dataset also contains annotations for events, objects and people for each of the videos. Additionally object detections, tracklets and tracks from [1] are also provided. For more information and relevant code please visit: http://www.engineering.leeds.ac.uk/joint-tracking-and-event-analysis If this dataset is used in your work please cite the above paper ([1]). ################### The Dataset ################### The dataset contains 6 folders: - "Videos" Consists of 15 videos (5 recordings captured from 3 viewpoints), each lasting approximately 6000 frames. These videos have a resolution of 360x640 at 20 frames per second and are taken from three different viewpoints (C1, C2 and C3). ###### - "GroundTruth" Has three subfolders containing ground truth text files for "Events", "Objects" and "People". We have defined 7 events, namely Carry, Static, Pickup, Putdown, Drop, Raise and Roll, to allow for a full description of the scene. ###### - "ObjectDetections" Includes two types of object detections for each video: "detection_Pers_Auto_NMS.mat", detections obtained from automatic person tracks, and "detection_Pers_GT_NMS.mat", detections obtained from ground truth person tracks. Each .mat (MATLAB) file included a struct "dres" where the detections are stored. The fields [x,y,w,h] represent the bounding box of the detection. Field r is the detection strength, fr the frame number, PRect is the bounding box of the person it was detected from. If this bounding box is [1 1 639 359] it means the detection was obtained from the scene (reference track). PID is the person track ID. If it is zero it corresponds to the scene. All other fields were NOT used. ###### - "ObjectTracklets" Includes two types of object tracklets for each video: "tracklets_Pers_Auto.mat", tracklets obtained from "detection_Pers_Auto_NMS" object detections, and "tracklets_Pers_GT.mat", tracklets obtained from "detection_Pers_GT_NMS" object detections. Each .mat file includes two struct file types. Only the "dres_ALLTrackletsObj" is of use. It contains cells for each clip of a video. Each cell contains a large number of tracklets. An additional field "id" is included to represent the tracklet id. ###### - "ObjectTracks" Includes two types of object tracks for each video: "tracks_Pers_Auto_HMM.mat", tracks obtained from "tracklets_Pers_Auto" object tracklets, and "tracks_Pers_GT_HMM.mat", tracks obtained from "tracklets_Pers_GT" object tracklets. Each .mat file includes the "dres_Tracks_Obj" structure. It represents the final object tracks in a video An additional binary field "intp" is included to represent whether points in the track were interpolated or not ###### - "PersonTracks" Includes two subfolders, "GT_Tracks" containing ground truth person tracks and "Auto_Tracks" containing automatic person tracks for each of the 15 videos. If you have any questions please send an email to: Aryana Tavanai: fy06at@leeds.ac.uk