ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE INFORMATION

•Transferir como PPTX, PDF•

5 gostaram•1,390 visualizações

홍

홍배 김

Auto encoder를 이용하여 Image의 pose변화와 latent vector의 Circular Permutation에 의한 Shift가 연관되도록 하는 ROLLABLE LATENT SPACE를 제안

Tecnologia

ARCHITECTURAL CONDITIONING
FOR DISENTANGLEMENT OF OBJECT
IDENTITY AND POSTURE INFORMATION
저자 : Kazutoshi Sagi, Takahiro Toizumi & Yuzo Senda
Data Science Research Laboratories
NEC Corporation
https://openreview.net/forum?id=HkaYjG6Lf
정리 : 김홍배

일반적으로 다양한 pose에 대한 image를 획득하여 Networks을
Pose에 대하여 Invariance하게 Training  High cost approach
3D Object Identification Problem

Equivariance
Φ
Image(X)
Latent(Z) Z1 Z2
𝑇𝑔
2
𝑇𝑔
1
Φ
Transformation
X1 X2
Z2 = 𝑻 𝒈
𝟐
Z1 = 𝑻 𝒈
𝟐
Φ(X1) = Φ(𝑻 𝒈
𝟏
X1 )
: Invariance is special case of equivariance where 𝑇𝑔
2 is the identity.
X2 = 𝑇𝑔
1
X1
Z2 = 𝑇𝑔
2
Z1
: 주어진 Image의 pose변환에 대하여 Latent space상에서
명확한 변환관계를 찾을 수 있다면 ?
Z1 ≠ Z2 but keeps the relationship
Mapping
ft’n(Φ(·))

ROLLABLE LATENT SPACE
0
1
2
3
4
5
6
7
8
9
10
11
0 1 2 3 4 5 6 7 8 9 10 11
0
1
2
3
4
5
6
7
8
9
10
11
Image Latent vector
Shift by
circular permutation
Angular
rotation
본 연구에서 제시한 아이디어
3 4 5 6 7 8 9 10 11 0 1 2
𝑋θ 𝑖
𝑋θ 𝑗
𝑍θ 𝑖
𝑍θ 𝑗

ROLLABLE LATENT SPACE
Image space에서의 pose 변경(Angular rotation)이 Latent vector의
Circular Permutation에 의한 Shift로 나타낼 수 있다면 ?
 2 space의 Mapping 관계를 명확하게 알 수 있으며
Training하지않은 다른 pose에서의 latent vector를 유추할 수 있다 !
 여기서는 Auto-Encoder를 살짝 바꿔서 강제로 학습을 시킨다

ROLLABLE LATENT SPACE
𝑋θ 𝑖
𝑋θ 𝑗
𝑍θ 𝑖
𝑍θ 𝑗
여기서 Roll(Z, s)는 𝑍θ 𝑖
를 shift parameter s(각도 차) 만큼 Cyclic
permutation 시킨 후 Decoder쪽의 입력 latent vector로 준다.
Encoder쪽 입력에 𝑋θ 𝑖
를 Decoder 쪽 출력에는 회전한 𝑋θ 𝑗
를 준다.

ROLLABLE LATENT SPACE
𝑋θ 𝑖
𝑋θ 𝑗
𝑍θ 𝑖
𝑍θ 𝑗
Decoder의 출력이 𝑋θ 𝑗
와 근사하도록 Networks을 훈련시키면 됨.

Feature Augmentation by RLS
Classifier의 훈련 시 Image level에서의 augmentation이 필요없이 주어
진 image, 𝑋𝑖의 latent vector, 𝑍𝑖를 랜덤하게 shift 시킴으로서 Feature
level에서의 augmentation이 가능

EXPERIMENTAL RESULTS
- The encoder and the decoder just consist of one hidden fully connected
layer with ReLU activation for each.
- The number of the latent space dimentions is given as 24, which
corresponds to 2 dimensions in 12 viewing directions
Exp. 1 : DISENTANGLING 2D IMAGE ROTATION
Reconstructions of the test dataset. An input and reconstructions in given
rotation angles generated by
are presented from the left column of each row.

EXPERIMENTAL RESULTS
Exp. 2 : DISENTANGLING 3D OBJECT ROTATION
• 809 chair models are selected
• The first 500 models are used as a training set and the remaining 309 models
are used as a test set.
• Each chair model is rendered from 31 azimuth angles and 2 elevation angles
(20 and 30)
• A deep convolutional encoder-decoder architecture are used.
• The number of the latent space dimensions is given as 992, which corresponds
to 32 dimensions in 31 viewing directions.

EXPERIMENTAL RESULTS
Exp. 2 : DISENTANGLING 3D OBJECT ROTATION
(a): A network architecture used in the experiment of 3D object rotation.
(b): Reconstructions of the test dataset. An input and reconstructions in given
rotation angles are shown from the left column of each row.

Mais conteúdo relacionado

Mais procurados

Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...Universitat Politècnica de Catalunya

Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya

The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya

Variational Autoencoders For Image GenerationJason Anderson

Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya

Variational Auto Encoder and the Math BehindVarun Reddy

Backpropagation - Elisa Sayrol - UPC Barcelona 2018Universitat Politècnica de Catalunya

Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya

Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya

Rabbit challenge 3 DNN Day2TOMMYLINK1

Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Universitat Politècnica de Catalunya

The Perceptron (D1L2 Deep Learning for Speech and Language)Universitat Politècnica de Catalunya

(研究会輪読) Facial Landmark Detection by Deep Multi-task LearningMasahiro Suzuki

Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya

Digital signal and image processing FAQMukesh Tekwani

Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...Universitat Politècnica de Catalunya

CS 354 Acceleration StructuresMark Kilgard

00463517b1e90c1e63000000Ivonne Liu

Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya

Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Universitat Politècnica de Catalunya

Mais procurados (20)

Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...

Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)

The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)

Variational Autoencoders For Image Generation

Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)

Variational Auto Encoder and the Math Behind

Backpropagation - Elisa Sayrol - UPC Barcelona 2018

Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)

Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)

Rabbit challenge 3 DNN Day2

Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...

The Perceptron (D1L2 Deep Learning for Speech and Language)

(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning

Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...

Digital signal and image processing FAQ

Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...

CS 354 Acceleration Structures

00463517b1e90c1e63000000

Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)

Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018

Semelhante a ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE INFORMATION

A Beginner's Guide to Monocular Depth EstimationRyo Takahashi

TransNeRFNavneetPaul2

PPT Image Analysis(IRDE, DRDO)Nidhi Gopal

Theories and Engineering Technics of 2D-to-3D Back-Projection ProblemSeongcheol Baek

20150703.journal clubHayaru SHOUNO

Passive network-redesign-ntuaIEEE NTUA SB

VoxelNettaeseon ryu

Fisheye Omnidirectional View in Autonomous DrivingYu Huang

Journey to structure from motionJa-Keoung Koo

Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...United States Air Force Academy

Mask R-CNNChanuk Lim

AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSEThiyagarajan G

Yolo v2 ai_tech_20190421穗碧陳

Generating super resolution images using transformersNEERAJ BAGHEL

Weakly supervised semantic segmentation of 3D point cloudArithmer Inc.

I3602061067ijceronline

Computer vision 3 4sachinmore76

Isvc08arun.arwachin

Kccsi 2012 a real-time robust object tracking-v2Prarinya Siritanawan

image_segmentation_ppt.pptxfgdg12

Semelhante a ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE INFORMATION (20)

A Beginner's Guide to Monocular Depth Estimation

TransNeRF

PPT Image Analysis(IRDE, DRDO)

Theories and Engineering Technics of 2D-to-3D Back-Projection Problem

20150703.journal club

Passive network-redesign-ntua

VoxelNet

Fisheye Omnidirectional View in Autonomous Driving

Journey to structure from motion

Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...

Mask R-CNN

AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE

Yolo v2 ai_tech_20190421

Generating super resolution images using transformers

Weakly supervised semantic segmentation of 3D point cloud

I3602061067

Computer vision 3 4

Isvc08

Kccsi 2012 a real-time robust object tracking-v2

image_segmentation_ppt.pptx

Mais de 홍배 김

Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...홍배 김

Gaussian processing홍배 김

Lecture Summary : Camera Projection 홍배 김

Learning agile and dynamic motor skills for legged robots홍배 김

Robotics of Quadruped Robot홍배 김

Basics of Robotics홍배 김

Recurrent Neural Net의 이론과 설명홍배 김

Convolutional neural networks 이론과 응용홍배 김

Optimal real-time landing using DNN홍배 김

Anomaly Detection with GANs홍배 김

Focal loss의 응용(Detection & Classification)홍배 김

Convolution 종류 설명홍배 김

Learning by association홍배 김

알기쉬운 Variational autoencoder홍배 김

Binarized CNN on FPGA홍배 김

Visualizing data using t-SNE홍배 김

Normalization 방법 홍배 김

Learning to remember rare events홍배 김

InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...홍배 김

Meta-Learning with Memory Augmented Neural Networks홍배 김

Mais de 홍배 김 (20)

Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...

Gaussian processing

Lecture Summary : Camera Projection

Learning agile and dynamic motor skills for legged robots

Robotics of Quadruped Robot

Basics of Robotics

Recurrent Neural Net의 이론과 설명

Convolutional neural networks 이론과 응용

Optimal real-time landing using DNN

Anomaly Detection with GANs

Focal loss의 응용(Detection & Classification)

Convolution 종류 설명

Learning by association

알기쉬운 Variational autoencoder

Binarized CNN on FPGA

Visualizing data using t-SNE

Normalization 방법

Learning to remember rare events

InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...

Meta-Learning with Memory Augmented Neural Networks

Último

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

WordPress Websites for Engineers: Elevate Your Brandgvaughan

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE INFORMATION

1. ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE INFORMATION 저자 : Kazutoshi Sagi, Takahiro Toizumi & Yuzo Senda Data Science Research Laboratories NEC Corporation https://openreview.net/forum?id=HkaYjG6Lf 정리 : 김홍배

2. 일반적으로 다양한 pose에 대한 image를 획득하여 Networks을 Pose에 대하여 Invariance하게 Training  High cost approach 3D Object Identification Problem

3. Equivariance Φ Image(X) Latent(Z) Z1 Z2 𝑇𝑔 2 𝑇𝑔 1 Φ Transformation X1 X2 Z2 = 𝑻 𝒈 𝟐 Z1 = 𝑻 𝒈 𝟐 Φ(X1) = Φ(𝑻 𝒈 𝟏 X1 ) : Invariance is special case of equivariance where 𝑇𝑔 2 is the identity. X2 = 𝑇𝑔 1 X1 Z2 = 𝑇𝑔 2 Z1 : 주어진 Image의 pose변환에 대하여 Latent space상에서 명확한 변환관계를 찾을 수 있다면 ? Z1 ≠ Z2 but keeps the relationship Mapping ft’n(Φ(·))

4. ROLLABLE LATENT SPACE 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 Image Latent vector Shift by circular permutation Angular rotation 본 연구에서 제시한 아이디어 3 4 5 6 7 8 9 10 11 0 1 2 𝑋θ 𝑖 𝑋θ 𝑗 𝑍θ 𝑖 𝑍θ 𝑗

5. ROLLABLE LATENT SPACE Image space에서의 pose 변경(Angular rotation)이 Latent vector의 Circular Permutation에 의한 Shift로 나타낼 수 있다면 ?  2 space의 Mapping 관계를 명확하게 알 수 있으며 Training하지않은 다른 pose에서의 latent vector를 유추할 수 있다 !  여기서는 Auto-Encoder를 살짝 바꿔서 강제로 학습을 시킨다

6. ROLLABLE LATENT SPACE 𝑋θ 𝑖 𝑋θ 𝑗 𝑍θ 𝑖 𝑍θ 𝑗 여기서 Roll(Z, s)는 𝑍θ 𝑖 를 shift parameter s(각도 차) 만큼 Cyclic permutation 시킨 후 Decoder쪽의 입력 latent vector로 준다. Encoder쪽 입력에 𝑋θ 𝑖 를 Decoder 쪽 출력에는 회전한 𝑋θ 𝑗 를 준다.

7. ROLLABLE LATENT SPACE 𝑋θ 𝑖 𝑋θ 𝑗 𝑍θ 𝑖 𝑍θ 𝑗 Decoder의 출력이 𝑋θ 𝑗 와 근사하도록 Networks을 훈련시키면 됨.

8. Feature Augmentation by RLS Classifier의 훈련 시 Image level에서의 augmentation이 필요없이 주어 진 image, 𝑋𝑖의 latent vector, 𝑍𝑖를 랜덤하게 shift 시킴으로서 Feature level에서의 augmentation이 가능

9. EXPERIMENTAL RESULTS - The encoder and the decoder just consist of one hidden fully connected layer with ReLU activation for each. - The number of the latent space dimentions is given as 24, which corresponds to 2 dimensions in 12 viewing directions Exp. 1 : DISENTANGLING 2D IMAGE ROTATION Reconstructions of the test dataset. An input and reconstructions in given rotation angles generated by are presented from the left column of each row.

10. EXPERIMENTAL RESULTS Exp. 2 : DISENTANGLING 3D OBJECT ROTATION • 809 chair models are selected • The first 500 models are used as a training set and the remaining 309 models are used as a test set. • Each chair model is rendered from 31 azimuth angles and 2 elevation angles (20 and 30) • A deep convolutional encoder-decoder architecture are used. • The number of the latent space dimensions is given as 992, which corresponds to 32 dimensions in 31 viewing directions.

11. EXPERIMENTAL RESULTS Exp. 2 : DISENTANGLING 3D OBJECT ROTATION (a): A network architecture used in the experiment of 3D object rotation. (b): Reconstructions of the test dataset. An input and reconstructions in given rotation angles are shown from the left column of each row.

ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE INFORMATION

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE INFORMATION

Semelhante a ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE INFORMATION (20)

Mais de 홍배 김

Mais de 홍배 김 (20)

Último

Último (20)

ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE INFORMATION