This Biofilms_harbour_Cdifficile_readme.txt file was generated on 02-12-2020 by Anthony Buckley GENERAL INFORMATION 1. Title of Dataset: Taxonomic analysis of simulated gut biofilms 2. Author Information A. Principal Investigator Contact Information Name: Dr Anthony Buckley Institution: University of Leeds Address: Old Medical School, Leeds General Infirmary, Leeds, UK, LS1 3EX Email: A.Buckley1@leeds.ac.uk B. Associate or Co-investigator Contact Information Name: Prof Mark Wilcox Institution: University of Leeds Address: Old Medical School, Leeds General Infirmary, Leeds, UK, LS1 3EX Email: Mark.Wilcox@nhs.net 3. Date of data collection (single date, range, approximate date): 30-04-2018 4. Geographic location of data collection: Leeds, UK 5. Information about funding sources that supported the collection of the data: Funding provided by Seres Therapeutics and Rosetrees seedcorn charity grant. SHARING/ACCESS INFORMATION 1. Licenses/restrictions placed on the data: 2. Links to publications that cite or use the data: 3. Links to other publicly accessible locations of the data: 4. Links/relationships to ancillary data sets: 5. Was data derived from another source? yes/no A. If yes, list source(s): 6. Recommended citation for this dataset: DATA & FILE OVERVIEW 1. File List: Biofilm_Taxonomic_analysis_by_16S_sequencing.xlsx 2. Relationship between files, if important: N/A 3. Additional related data collected that was not included in the current data package: N/A 4. Are there multiple versions of the dataset? NO A. If yes, name of file(s) that was updated: i. Why was the file updated? ii. When was the file updated? METHODOLOGICAL INFORMATION 1. Description of methods used for collection/generation of data: Luminal gut model fluid was collected from vessel 3 of control and recipient models at each sampling point. One DNA extraction was performed from each biofilm support structure. Total DNA from luminal or biofilm samples was extracted using FastDNATM SPIN kit for soil (MP BiomedicalsTM, U.K.) following manufacturer’s instructions with DNA stored at -80 °C. 16S rRNA gene V4 sequences were PCR-amplified from 1ul of DNA extract using the AccuPrime High Fidelity PCR kit (Invitrogen Catalog No 12346094) with the primer pair 515F (5’ AGCMGCCGCGGTAA 3’) and 806R (5’ GGACTACHVGGGTWTCTAAT ‘3) containing Illumina MiSeq adaptors and single-end barcodes. PCR temperature cycles weres: 98°C for 3 seconds, 33 cycles of: 98°C for 20 seconds, 50°C for 30 seconds, 72°C for 90 seconds; then 72°C for a final 10 minutes. Amplicons were pooled in equal quantities, cleaned with AMPure beads (Beckman Coulter) and paired-end sequenced on the MiSeq platform following Nextera XT library preparation (Illumina). 2. Methods for processing the data: Reads were demultiplexed with the split_libraries_fastq.py function in Qiime (version 1.9.1) [Caporaso et al., 2010] and identical sequences were binned into amplicon sequence variants (ASVs) using the program DADA2 (version 1.4.0, parameters EE=2, TruncL= c(200, 180) and q=10) [Callahan et al., 2016]. The assign Taxonomy function in DADA2 was used to assign a taxonomic name to each unique ASV using the RDP Classifier with the SILVA 16S rRNA database (Silva nr v128) [Wang et al., 2007, Quast et al., 2013]. Low abundance reads (≤10 reads) were removed from further analysis. Reads for each sample were aggregated to the family taxonomic level and converted to percentage abundance. Results shown are the mean abundance from at least 3 biofilm support structures from each model. Bacterial families shown in Figure 2 represent all families whose values were ≥1 % abundance at a single sampling point throughout the model timeline; the values from other bacterial families where the abundance was ≤1 % were aggregated and labelled as ‘other’. 3. Instrument- or software-specific information needed to interpret the data: Included in above descriptions 4. Standards and calibration information, if appropriate: N/A 5. Environmental/experimental conditions: Primary CDI - models H and I recurrent CDI - models X, E and F FMT treatment - models Y and G DATA-SPECIFIC INFORMATION FOR: Biofilm_Taxonomic_analysis_by_16S_sequencing.xlsx 1. Number of variables: Primary CDI models - 2 replciates Recurrent CDI models - 3 replciates FMT treatment models - 2 replciates 2. Number of cases/rows: 63 data points from 7 independent model datasets 3. Variable List: Bacterial taxonomic units aggregated at the family level and expressed at percentage abundance of the total. 4. Missing data codes: N/A 5. Specialized formats or other abbreviations used: N/A