SlideShare uma empresa Scribd logo
1 de 19
What Makes Transfer learning Work for Medical Images
: Feature Reuse and Other Factors
Christos Matsoukas1,2,3 , Johan Fredin Haslum1,2,3, Moein Sorkhei1,2, Magnus Soderberg3, Kevin Smith1,2
1 KTH Royal Institute of Technology, Stockholm, Sweden
2 Science for Life Laboratory, Stockholm, Sweden
3 AstraZeneca, Gothenburg, Sweden
Presenter : Mithunjha Anandakumar
What is Transfer Learning?
Source domain
Model Model
Target domain
Knowledge
reuse knowledge gained in one domain, the
source domain, to improve performance in
another, the target domain.
2
Source domain vs Target domain
Source Domain/ ImageNet Target Domain/ Medical Images
Natural images with clear global subject large image of a bodily region of interest and use
variations in local textures to identify pathologies
Millions of images Larger Images/ fewer images*
1000 classes Fewer classes
Image credits : https://www.researchgate.net/figure/Examples-of-pictures-randomly-sampled-from-the-Tiny-ImageNet-dataset_fig1_354590544
Content credits: Raghu, M., Zhang, C., Kleinberg, J., & Bengio, S. (2019). Transfusion: Understanding transfer learning for medical imaging. Advances in neural information processing systems, 32.
3
* Rareness of disease, ethical concerns, expense of acquisition
Contribution of the paper
• Shows that the benefits of TL increase with:
• Reduced data size
• Smaller Distance between source and target domain
• Models with fewer inductive biases
• Models with more capacity (to a lesser extent)
• Shows that the benefits of TL correlates with feature reuse.
• Shows that there is feature independent benefits of pretraining -
speed up training.
4
Related work
• Summary of the paper and contribution
- 2 dataset
- a large dataset : CheXpert
- a private dataset : Retinal fundus
- Architecture:
- Resnet
- Inception
- Contribution :
- Little benefit (due to overparameterization
and weight statistics but not due to feature
reuse)
- Speed up the training
5
Methodology
• Dataset
6
N = 3662 N = 10,239 N = 25331 N = 224316 N = 327680
High-resolution
diabetic retinopathy
images
A mammography
dataset
Dermoscopic images Chest X-rays Patches of H&E
stained WSIs of lymph
node sections
Classification : 5
Classes
detect the presence of
masses
Classification : 9
Classes
Classification : 14
Classes
Classification : 2
Classes
Methodology
• Architecture
7
DEIT SWIN INCEPTION RESNET
ViT models CNN models
Methodology
• Initialization – to isolate the contribution of feature reuse and weights
statistics
8
1. Random Initialization (RI)
 Kaiming initialization
2. Weight statistics transfer (ST)
 Sampling weights from a normal distribution whose mean and std are taken
from an IMAGENET pretrained model
3. Weight Transfer (WT)
 Transferring IMAGENET pretrained weights
Results
9
When is TL to medical
domain beneficial and
how important is
feature reuse?
10
Relative increase in the performance,
𝑊𝑇
𝑅𝐼
Relative gain attributed to feature use,
𝑊𝑇 − 𝑆𝑇
𝑅𝐼
Which layers benefits from feature reuse?
11
Transferring weights (WT)
upto n block and initializing
remaining m blocks with ST.
What properties of TL are revealed via feature similarity?
12
Feature similarity resulting from transfer learning (WT) before and after finetuning.
What properties of TL are revealed via feature similarity?
13
Feature similarity between ST and WT initialized models after fine-tuning.
Which transferred weight changes?
14
L2 distance between the initial weights of each network and the weights after fine-tuning.
Which transferred weight changes?
15
impact of resetting a layer’s weights to their initial values : Reinitialization robustness
What is the impact of TL for different model capacities
16
What is the impact of TL on convergence speed?
17
Contribution of the paper
• Shows that the benefits of TL increase with:
• Reduced data size
• Smaller Distance between source and target domain
• Models with fewer inductive biases
• Models with more capacity (to a lesser extent)
• Shows that the benefits of TL correlates with feature reuse.
• Shows that there is feature independent benefits of pretraining -
speed up training.
18
Thank you
Question?
19

Mais conteúdo relacionado

Semelhante a What Makes Transfer learning Work for Medical Images

An Approach for Study and Analysis of Brain Tumor Using Soft Approach
An Approach for Study and Analysis of Brain Tumor Using Soft ApproachAn Approach for Study and Analysis of Brain Tumor Using Soft Approach
An Approach for Study and Analysis of Brain Tumor Using Soft Approachjournal ijrtem
 
Medical_Image_Processing-首节课.pptx
Medical_Image_Processing-首节课.pptxMedical_Image_Processing-首节课.pptx
Medical_Image_Processing-首节课.pptxtanmin14
 
Brain tumor classification using artificial neural network on mri images
Brain tumor classification using artificial neural network on mri imagesBrain tumor classification using artificial neural network on mri images
Brain tumor classification using artificial neural network on mri imageseSAT Journals
 
Analysis of Efficient Wavelet Based Volumetric Image Compression
Analysis of Efficient Wavelet Based Volumetric Image CompressionAnalysis of Efficient Wavelet Based Volumetric Image Compression
Analysis of Efficient Wavelet Based Volumetric Image CompressionCSCJournals
 
Introduction to systems medicine
Introduction to systems medicineIntroduction to systems medicine
Introduction to systems medicineimprovemed
 
Pathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillancePathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillanceJoel Saltz
 
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...ijistjournal
 
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...idescitation
 
Deep Learning-based Diagnosis of Pneumonia using X-Ray Scans
Deep Learning-based Diagnosis of Pneumonia using X-Ray ScansDeep Learning-based Diagnosis of Pneumonia using X-Ray Scans
Deep Learning-based Diagnosis of Pneumonia using X-Ray ScansIRJET Journal
 
11.texture feature based analysis of segmenting soft tissues from brain ct im...
11.texture feature based analysis of segmenting soft tissues from brain ct im...11.texture feature based analysis of segmenting soft tissues from brain ct im...
11.texture feature based analysis of segmenting soft tissues from brain ct im...Alexander Decker
 
APPLICATION OF CNN MODEL ON MEDICAL IMAGE
APPLICATION OF CNN MODEL ON MEDICAL IMAGEAPPLICATION OF CNN MODEL ON MEDICAL IMAGE
APPLICATION OF CNN MODEL ON MEDICAL IMAGEIRJET Journal
 
Brain Image Fusion using DWT and Laplacian Pyramid Approach and Tumor Detecti...
Brain Image Fusion using DWT and Laplacian Pyramid Approach and Tumor Detecti...Brain Image Fusion using DWT and Laplacian Pyramid Approach and Tumor Detecti...
Brain Image Fusion using DWT and Laplacian Pyramid Approach and Tumor Detecti...INFOGAIN PUBLICATION
 
A Survey of Convolutional Neural Network Architectures for Deep Learning via ...
A Survey of Convolutional Neural Network Architectures for Deep Learning via ...A Survey of Convolutional Neural Network Architectures for Deep Learning via ...
A Survey of Convolutional Neural Network Architectures for Deep Learning via ...ijtsrd
 
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...cscpconf
 
Ccids 2019 cutting edges of ai technology in medicine
Ccids 2019 cutting edges of ai technology in medicineCcids 2019 cutting edges of ai technology in medicine
Ccids 2019 cutting edges of ai technology in medicineNamkug Kim
 
Segmentation problems in medical images
Segmentation problems in medical imagesSegmentation problems in medical images
Segmentation problems in medical imagesJimin Lee
 

Semelhante a What Makes Transfer learning Work for Medical Images (20)

An Approach for Study and Analysis of Brain Tumor Using Soft Approach
An Approach for Study and Analysis of Brain Tumor Using Soft ApproachAn Approach for Study and Analysis of Brain Tumor Using Soft Approach
An Approach for Study and Analysis of Brain Tumor Using Soft Approach
 
Medical_Image_Processing-首节课.pptx
Medical_Image_Processing-首节课.pptxMedical_Image_Processing-首节课.pptx
Medical_Image_Processing-首节课.pptx
 
Brain tumor classification using artificial neural network on mri images
Brain tumor classification using artificial neural network on mri imagesBrain tumor classification using artificial neural network on mri images
Brain tumor classification using artificial neural network on mri images
 
Analysis of Efficient Wavelet Based Volumetric Image Compression
Analysis of Efficient Wavelet Based Volumetric Image CompressionAnalysis of Efficient Wavelet Based Volumetric Image Compression
Analysis of Efficient Wavelet Based Volumetric Image Compression
 
Introduction to systems medicine
Introduction to systems medicineIntroduction to systems medicine
Introduction to systems medicine
 
Pathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillancePathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer Surveillance
 
Women in STEM
Women in STEM Women in STEM
Women in STEM
 
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
BRAIN TUMOR MRIIMAGE CLASSIFICATION WITH FEATURE SELECTION AND EXTRACTION USI...
 
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...
Quantitative Comparison of Artificial Honey Bee Colony Clustering and Enhance...
 
Deep Learning-based Diagnosis of Pneumonia using X-Ray Scans
Deep Learning-based Diagnosis of Pneumonia using X-Ray ScansDeep Learning-based Diagnosis of Pneumonia using X-Ray Scans
Deep Learning-based Diagnosis of Pneumonia using X-Ray Scans
 
11.texture feature based analysis of segmenting soft tissues from brain ct im...
11.texture feature based analysis of segmenting soft tissues from brain ct im...11.texture feature based analysis of segmenting soft tissues from brain ct im...
11.texture feature based analysis of segmenting soft tissues from brain ct im...
 
APPLICATION OF CNN MODEL ON MEDICAL IMAGE
APPLICATION OF CNN MODEL ON MEDICAL IMAGEAPPLICATION OF CNN MODEL ON MEDICAL IMAGE
APPLICATION OF CNN MODEL ON MEDICAL IMAGE
 
Brain Image Fusion using DWT and Laplacian Pyramid Approach and Tumor Detecti...
Brain Image Fusion using DWT and Laplacian Pyramid Approach and Tumor Detecti...Brain Image Fusion using DWT and Laplacian Pyramid Approach and Tumor Detecti...
Brain Image Fusion using DWT and Laplacian Pyramid Approach and Tumor Detecti...
 
A Survey of Convolutional Neural Network Architectures for Deep Learning via ...
A Survey of Convolutional Neural Network Architectures for Deep Learning via ...A Survey of Convolutional Neural Network Architectures for Deep Learning via ...
A Survey of Convolutional Neural Network Architectures for Deep Learning via ...
 
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN...
 
Sensors 21-02222-v21
Sensors 21-02222-v21Sensors 21-02222-v21
Sensors 21-02222-v21
 
Ccids 2019 cutting edges of ai technology in medicine
Ccids 2019 cutting edges of ai technology in medicineCcids 2019 cutting edges of ai technology in medicine
Ccids 2019 cutting edges of ai technology in medicine
 
Segmentation problems in medical images
Segmentation problems in medical imagesSegmentation problems in medical images
Segmentation problems in medical images
 
brain tumor ppt.pptx
brain tumor ppt.pptxbrain tumor ppt.pptx
brain tumor ppt.pptx
 
Collins seattle-2014-final
Collins seattle-2014-finalCollins seattle-2014-final
Collins seattle-2014-final
 

Último

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 

Último (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 

What Makes Transfer learning Work for Medical Images

  • 1. What Makes Transfer learning Work for Medical Images : Feature Reuse and Other Factors Christos Matsoukas1,2,3 , Johan Fredin Haslum1,2,3, Moein Sorkhei1,2, Magnus Soderberg3, Kevin Smith1,2 1 KTH Royal Institute of Technology, Stockholm, Sweden 2 Science for Life Laboratory, Stockholm, Sweden 3 AstraZeneca, Gothenburg, Sweden Presenter : Mithunjha Anandakumar
  • 2. What is Transfer Learning? Source domain Model Model Target domain Knowledge reuse knowledge gained in one domain, the source domain, to improve performance in another, the target domain. 2
  • 3. Source domain vs Target domain Source Domain/ ImageNet Target Domain/ Medical Images Natural images with clear global subject large image of a bodily region of interest and use variations in local textures to identify pathologies Millions of images Larger Images/ fewer images* 1000 classes Fewer classes Image credits : https://www.researchgate.net/figure/Examples-of-pictures-randomly-sampled-from-the-Tiny-ImageNet-dataset_fig1_354590544 Content credits: Raghu, M., Zhang, C., Kleinberg, J., & Bengio, S. (2019). Transfusion: Understanding transfer learning for medical imaging. Advances in neural information processing systems, 32. 3 * Rareness of disease, ethical concerns, expense of acquisition
  • 4. Contribution of the paper • Shows that the benefits of TL increase with: • Reduced data size • Smaller Distance between source and target domain • Models with fewer inductive biases • Models with more capacity (to a lesser extent) • Shows that the benefits of TL correlates with feature reuse. • Shows that there is feature independent benefits of pretraining - speed up training. 4
  • 5. Related work • Summary of the paper and contribution - 2 dataset - a large dataset : CheXpert - a private dataset : Retinal fundus - Architecture: - Resnet - Inception - Contribution : - Little benefit (due to overparameterization and weight statistics but not due to feature reuse) - Speed up the training 5
  • 6. Methodology • Dataset 6 N = 3662 N = 10,239 N = 25331 N = 224316 N = 327680 High-resolution diabetic retinopathy images A mammography dataset Dermoscopic images Chest X-rays Patches of H&E stained WSIs of lymph node sections Classification : 5 Classes detect the presence of masses Classification : 9 Classes Classification : 14 Classes Classification : 2 Classes
  • 7. Methodology • Architecture 7 DEIT SWIN INCEPTION RESNET ViT models CNN models
  • 8. Methodology • Initialization – to isolate the contribution of feature reuse and weights statistics 8 1. Random Initialization (RI)  Kaiming initialization 2. Weight statistics transfer (ST)  Sampling weights from a normal distribution whose mean and std are taken from an IMAGENET pretrained model 3. Weight Transfer (WT)  Transferring IMAGENET pretrained weights
  • 10. When is TL to medical domain beneficial and how important is feature reuse? 10 Relative increase in the performance, 𝑊𝑇 𝑅𝐼 Relative gain attributed to feature use, 𝑊𝑇 − 𝑆𝑇 𝑅𝐼
  • 11. Which layers benefits from feature reuse? 11 Transferring weights (WT) upto n block and initializing remaining m blocks with ST.
  • 12. What properties of TL are revealed via feature similarity? 12 Feature similarity resulting from transfer learning (WT) before and after finetuning.
  • 13. What properties of TL are revealed via feature similarity? 13 Feature similarity between ST and WT initialized models after fine-tuning.
  • 14. Which transferred weight changes? 14 L2 distance between the initial weights of each network and the weights after fine-tuning.
  • 15. Which transferred weight changes? 15 impact of resetting a layer’s weights to their initial values : Reinitialization robustness
  • 16. What is the impact of TL for different model capacities 16
  • 17. What is the impact of TL on convergence speed? 17
  • 18. Contribution of the paper • Shows that the benefits of TL increase with: • Reduced data size • Smaller Distance between source and target domain • Models with fewer inductive biases • Models with more capacity (to a lesser extent) • Shows that the benefits of TL correlates with feature reuse. • Shows that there is feature independent benefits of pretraining - speed up training. 18

Notas do Editor

  1. What is transfer learning? The feature reuse hypothesis assumes that weights learned in the source domain yield features that can readily be used in the target domain. The lack of large public datasets has led to the widespread adoption of transfer learning from IMAGENET to improve performance on medical tasks. Transfer learning is typically performed by taking an architecture, along with its IMAGENET pretrained weights, and then fine-tuning it on the target task.
  2. Why TL is improving ? Good initialization, Speed up training or Feature reuse
  3. Neurips 2019 Raghu et al. showed that the actual values of the weights are not always necessary for good transfer learning performance. One can achieve similar performance by initializing the network using its weight statistics. In this setting, transfer amounts to providing a good range of values to randomly initialize the network– eliminating feature reuse as a factor. Many other works showed that transfer learning doesn’t significantly helps with medical images.
  4. DEIT (data efficient image transformer) – purely transformers Swin – self attention with hierarchical structures Indictive biases - locality, translational equivariance, hierarchical scale Inception - processes the signal in parallel at multiple scales before propagating it to the next layer.
  5. TL is least beneficial for CNN architectures on large dataset DEIT (lack of inductive bias) sees a boost from TL even in large dataset than SWIN All models shows gain on small dataset with TL ISIC closely resembles IMAGENET : higher gain with CNN models too SWIN falls in between DEIT and CNNs. DEIT has lack inductive bias – even large dataset is insufficient to learn better features than those transferred from Imagenet
  6. For large dataset, CNN exhibits relativelt flat line through out the network – no significant benefits over stats transfer For smaller dataset, linearly increasing trend implies that every layer benefit from feature reuse. DEIT shows sharp jumps in early layers – local attention learned in early layers; learning local features required huge data SWIN shows properties of both DEIT and CNNs – on small and imagenet like data behaves like DEIT but with large dataset resembles CNN On average, ViT (since lack ofinductive bias) benefits from feature reuse, but in early layer. CNN benefitted from feature reuse in a lesser extent, but consistently throughout the network layer – reflecting hierarchical nature of architecture.
  7. Red indicated high feature reuse – no changes in features after finetuning. For DEIT, we see feature similarity is strongest in the early- to mid-layers. In later layers, the trained model adapts to the new task and drifts away from the IMAGENET features. RESNET50 after transfer learning shows more broad feature similarity – with the exception of the final layers which must adapt to the new task. A common trend shared by both ViTs and CNNs is that when more data is available, the transition point from feature reuse to feature adaptation shifts towards earlier layers because the network has sufficient data to adapt more of the transferred IMAGENET features to the new task.
  8. ViTs Here, we find that early layers of ST-initialized models are similar to features from the first half of the WT-initialized models. We see that if the network is denied these essential pre-trained weights, it attempts to learn them rapidly using only a few layers (due to lack of data), resulting in poor performance. CNNs From the bottom row of Figure 3 we further observe that CNNs seem to learn similar features from different initializations, suggesting that their inductive biases may somehow naturally lead to these features (although the final layers used for classification diverge). We also observe a trend where, given more data, the ST-initialization is able to learn some novel midto high-level features not found in IMAGENET.
  9. The general trend is that transferred weights (WT) remain in the same vicinity after fine-tuning, more so when transfer learning gains are strongest As the network is progressively initialized more with ST, the transferred weights tend to “stick” less well. Certain layers, however, undergo substantial changes regardless – early layers in ViTs (the patchifier) and INCEPTION, and the first block at each scale in RESNET50. These are the first layers to encounter the data, or a scale change.
  10. Our main finding is that networks with weight transfer (WT) undergo few critical changes, indicating feature reuse. When transfer learning is least effective (RESNET on CHEXPERT and PATCHCAMELYON) the gap in robustness between WT and ST is at its smallest. Interestingly, in ViTs with partial weight transfer (WT-ST), critical layers often appear at the transition between WT and ST. Rather than change the transferred weights, the network quickly adapts. But following this adaptation, no critical layers appear. As the data size increases, ViTs make more substantial early changes to adapt to the raw input (or partial WT). Transferred weights in CNNs, on the other hand, tend to be less “sticky” than ViTs. We see the same general trend where WT is the most robust, but unlike ViTs where WT was robust throughout the network, RESNET50 exhibits poor robustness at the final layers responsible for classification, and also periodically within the network at critical layers where the scale changes, as observed by [44].
  11. Wee can observe slight increase in the TL performance as model size increases - Red curve dominating other curve when WT fraction is closer to 1
  12. We observe that convergence speed monotonically increases with the number of WT layers. Furthermore, we observe that CNNs converge faster at a roughly linear rate as we include more WT layers vision transformers see a rapid increase in convergence speed for the first half of the network but diminishing returns are observed after that.
  13. Why TL is improving ? Good initialization, Speed up training or Feature reuse