O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Brain Imaging Data Structure and Center for Reproducible Neuroscince

394 visualizações

Publicada em

Talk given at Montreal Neurological Institute in December 2015

Publicada em: Ciências
  • Check the source ⇒ www.WritePaper.info ⇐ This site is really helped me out gave me relief from headaches. Good luck!
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

Brain Imaging Data Structure and Center for Reproducible Neuroscince

  2. 2. Getting lost in your data
  3. 3. Getting lost in your data • MRI has been used to study the human brain for over 20 years. • Despite similarities in experimental designs and data types each researcher tends to organize and describe their data in their own way. http://www.nature.com/news/brain-imaging -fmri-2-0-1.10365
  4. 4. Getting lost in your data Heterogeneity in data description practices causes: • problems in sharing data (even within the same lab), • unnecessary manual metadata input when running processing pipelines, • no way to automatically validate completeness of a given dataset, • difficulties in combining data from multi-center studies.
  5. 5. Brain Imaging Data Structure Brain Imaging Data Structure (BIDS) is a new way for standardizing, describing and organizing results of a human neuroimaging experiment.
  6. 6. Who is it for? 1. Lab PIs. It will make handing over one dataset from one student/postdocto another easy. 2. Workflow developers. It’s easier to write pipelines expecting a particular file organization. 3. Database curators. Accepting one dataset format will make curation easier.
  7. 7. Principles behind BIDS 1. Adoption is crucial. 2. Don’t reinvent the wheel. 3. Some meta data is better than no metadata 4. Don’t rely on external software (databases) or complicated file formats (RDF). 5. Aim to capture 80% of experiments but give the remaining 20% space to extend the standard.
  8. 8. Implementation 1. Some metadata is encoded in the folder structure. 2. Some metadata is replicated in the file name for simplicity. 3. Use of tab separated files for tabular data. 4. Use of NIFTI files for imaging data. 5. Use of JSON files for dictionary type metadata. 6. Use of legacy text file formats for b vectors/values and physiological data. 7. Make certain folder hierarchy levels optional for simplicity. 8. Allows for arbitrary files not covered by the spec to be included in any way the researchers deem appropriate.
  9. 9. Why TSV? 1. Simple text format with wide software support. 2. Strings with commas do not need to be escaped by quotation marks.
  10. 10. Why NiFTI? Pros: 1. Widest support from software packages. 2. Designed for neuroimaging. Cons: 1. Poor metadata support. 2. Memory mapped random access to compressed NifTI is hard to implement.
  11. 11. Why JSON? 1. Simple text (you can use notepad to edit). 2. Wide support from different programming languages. 3. Simpler than XML, but almost as powerful. 4. Extensible with linked data.
  12. 12. BIDS features 1. Handles multiple sessions and runs 2. Supports sparse acquisition (via slice timing) 3. Supports contiguous acquisition covariates (breathing, cardiac etc.) 4. Supports multiple field map formats 5. Supports multiple types of anatomical scans 6. Supports function MRI: both task based and resting state. 7. Supports diffusions data (together with corresponding bvec, bval files) 8. Supports behavioral variables on the level of subjects (demographics), sessions, and runs.
  13. 13. Folder organization (simplified) sub-control01/ anat/ sub-control01_T1w.nii.gz sub-control01_T1w.json sub-control01_T2w.nii.gz sub-control01_T2w.json func/ sub-control01_task- nback_bold.nii.gz sub-control01_task-nback_bold.json sub-control01_task-nback_events.tsv sub-control01_task-nback_cont- physio.tsv sub-control01_task-nback_cont- physio.json sub-control01_task- nback_sbref.nii.gz dwi/ sub-control01_dwi.nii.gz sub-control01_dwi.bval sub-control01_dwi.bvec fmap sub-control01_phasediff.nii.gz sub-control01_phasediff.json sub-control01_magnitude1.nii.gz sub-control01_scans.tsv participants.tsv dataset_description.json README CHANGES
  14. 14. Example events file onset duration trial_type ResponseTime 1.2 0.6 go 1.435 5.6 0.6 stop 1.739 …
  15. 15. Example metadata file { "RepetitionTime": 3.0, "EchoTime": 0.0003, "FlipAngle": 78, "SliceTiming": [0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8], "MultibandAccellerationFactor": 4, "ParallelReductionFactorInPlane": 2 }
  16. 16. Example demographics file participant_id age sex sub-001 34 M Sub-002 12 F Sub-003 33 F
  17. 17. Keys to success 1. Make the community involved in the design process. 2. Provide a good validation tool (browser based!). 3. Build tools/workflows/pipelines that make adopting BIDS worthwhile (AA, Nipype, C-PAC etc.) 4. Get support from databases (LORIS, COINS, SciTran, OpenfMRI, XNAT, etc.)
  18. 18. Existing tools 1. bids-validator:https://github.com/INCF/bids-validator (demo) 2. openfmri2bids:https://github.com/INCF/openfmri2bids 3. bidsutils: https://github.com/INCF/bidsutils 4. dcm2niix: https://github.com/neurolabusc/dcm2niix 5. dicm2nii: http://www.mathworks.com/matlabcentral/fileexchange/42997- dicom-to-nifti-converter--nifti-tool-and-viewer 6. Quality Assessment Protocol: http://preprocessed-connectomes- project.github.io/quality-assessment-protocol 7. SciTran: https://scitran.github.io
  19. 19. Upcoming tools 1. OpenfMRI (internal format) 2. XNAT (import) 3. COINS (export) 4. heudiconv (conversion) 5. LORIS (import) 6. C-PAC (import) 7. NIAK (import) 8. Nipype (import)
  20. 20. Why do I care
  21. 21. Data sharing drives progress
  22. 22. Data sharing drives progress $878,400 how much it would cost to perform studies using OpenfMRI data if it did not exist
  23. 23. Convincing people to share data is hard 1. Publication as an incentive (data papers – Gorgolewski et al. 2013) 2. Sharing only statistical derivatives (NeuroVault – Gorgolewski et al. 2014)
  24. 24. Poldrack and Gorgolewski, 2014
  25. 25. Convincing people to share data is hard 1. Publication as an incentive (data papers – Gorgolewski et al. 2013) 2. Sharing only statistical derivatives (NeuroVault – Gorgolewski et al. 2014) 3. Journal policies (see PloS One, F1000Research Scientific Data)
  26. 26. Data sharing fears 1. Fear of being scooped 2. Fear of someone finding a mistake 3. Misconceptions about the ownership of the data
  27. 27. Stanford | Center for Reproducible Neurscience Analyzing for reproducibility reproducibility.stanford.edu • Automated quality control reporting • Data analysis service • Using cutting edge, robust and well tested methods • Leveraging supercomputer power not accessible to most labs • Quantify reproducibility by out of sample prediction estimates • “Glass box” – in depth documentation describing all data analysis steps
  28. 28. Stanford | Center for Reproducible Neurscience Analyzing for reproducibility reproducibility.stanford.edu • The service is completely free of charge • Under one condition: the data will be publicly available after a grace period
  29. 29. Stanford | Center for Reproducible Neurscience Analyzing for reproducibility reproducibility.stanford.edu • CRN will: • Make more data publicly available • Improve access to best methods and algorithms (including yours!) • Enable automatic data exploration and hypothesis generation • Foster the culture of looking at out of sample predictions and effect sizes
  30. 30. Acknowledgments The Poldrack Lab @ Stanford Data Sharing Task Force
  31. 31. bids.neuroimaging.io