UPC at MediaEval Hyperlinking 2013

•

0 gostou•823 visualizações

Universitat Politècnica de Catalunya

Joint work with Carles Ventura and Marcel Tella. More details:

Tecnologia Turismo

MediaEval Hyperlinking
Carles Ventura
Marcel Tella
Xavier Giró-i-Nieto

Barcelona, Catalonia
18th October 2013

Challenge

Visual Features

SURF descriptors
[Bay et al, CVIU 2008]

Visual Features
Bag of Features [Sivic & Zisserman, 2003]

Figure: Fergus, ICCV 2009

Visual Features
Histogram intersection

Approach
Shot
boundaries
Context

Anchor

Query video

Context

Approach

...

...
Keyframe 1 ranked list

Keyframe 2 ranked list

Approach
...

...

...

Fused ranked list of keyframes

Approach
...

Linked segments

Results
MAP

P@5

P@10

P@20

No context 0.0282 0.2600 0.2000 0.1233
Context

0.0260 0.2400 0.1967 0.1217

Conclusions
●

First MediaEval participation completed.

●

Visual alone is useful, but not enough.

●

1.2M keyframes is challenging, video is more.

●

Visual performance can be improved with:
○ spatial coding
○ larger vocabulary
○ face / concept detectors

Poster

Mais conteúdo relacionado

Destaque

XC800 A-Family 8-bit Automotive Microcontrollers

XC800 A-Family 8-bit Automotive Microcontrollers

XC800 A-Family 8-bit Automotive Microcontrollers

Premier Farnell

Cirugía Plástica del Quemado. Mamas y Abdomen

Cirugía Plástica del Quemado. Mamas y Abdomen

Cirugía Plástica del Quemado. Mamas y Abdomen

Xavier Cotto Presentacion

Xavier Cotto Presentacion

Xavier Cotto Presentacion

Xbox y nintendo jose andres almendarez mendez

Xbox y nintendo jose andres almendarez mendez

Xbox y nintendo jose andres almendarez mendez

Xangai Rt 01102010

Xangai Rt 01102010

Xangai Rt 01102010

Xclover Uno

Fernando Esteban

Carta del Concejal de Hacienda Oscar Anton Referente a la sentencia sobre el ...

Carta del Concejal de Hacienda Oscar Anton Referente a la sentencia sobre el ...

Carta del Concejal de Hacienda Oscar Anton Referente a la sentencia sobre el ...

Xabia_Democratica

XDoes anyone know where/what this is?

XDoes anyone know where/what this is?

XDoes anyone know where/what this is?

xel_08/16/02_4

X3pure chapter 6 slides

X3pure chapter 6 slides

X3pure chapter 6 slides

Destaque (10)

XC800 A-Family 8-bit Automotive Microcontrollers

XC800 A-Family 8-bit Automotive Microcontrollers

XC800 A-Family 8-bit Automotive Microcontrollers

Cirugía Plástica del Quemado. Mamas y Abdomen

Cirugía Plástica del Quemado. Mamas y Abdomen

Cirugía Plástica del Quemado. Mamas y Abdomen

Xavier Cotto Presentacion

Xavier Cotto Presentacion

Xavier Cotto Presentacion

Xbox y nintendo jose andres almendarez mendez

Xbox y nintendo jose andres almendarez mendez

Xbox y nintendo jose andres almendarez mendez

Xangai Rt 01102010

Xangai Rt 01102010

Xangai Rt 01102010

Xclover Uno

Carta del Concejal de Hacienda Oscar Anton Referente a la sentencia sobre el ...

Carta del Concejal de Hacienda Oscar Anton Referente a la sentencia sobre el ...

Carta del Concejal de Hacienda Oscar Anton Referente a la sentencia sobre el ...

XDoes anyone know where/what this is?

XDoes anyone know where/what this is?

XDoes anyone know where/what this is?

xel_08/16/02_4

X3pure chapter 6 slides

X3pure chapter 6 slides

X3pure chapter 6 slides

Semelhante a UPC at MediaEval Hyperlinking 2013

https://imatge.upc.edu/web/publications/keyframe-based-video-summarization-designer This Final Degree Work extends two previous projects and consists in carrying out an improvement of the video keyframe extraction module from one of them called Designer Master, by integrating the algorithms that were developed in the other, Object Maps. Firstly the proposed solution is explained, which consists in a shot detection method, where the input video is sampled uniformly and afterwards, cumulative pixel-to-pixel difference is applied and a classifier decides which frames are keyframes or not. Last, to validate our approach we conducted a user study in which both applications were compared. Users were asked to complete a survey regarding to different summaries created by means of the original application and with the one developed in this project. The results obtained were analyzed and they showed that the improvement done in the keyframes extraction module improves slightly the application performance and the quality of the generated summaries.

Keyframe-based Video Summarization Designer

Keyframe-based Video Summarization Designer

Keyframe-based Video Summarization Designer

Universitat Politècnica de Catalunya

International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.

Video Manifold Feature Extraction Based on ISOMAP

Video Manifold Feature Extraction Based on ISOMAP

Video Manifold Feature Extraction Based on ISOMAP

inventionjournals

01_Introduction.pdf.pdf

01_Introduction.pdf.pdf

01_Introduction.pdf.pdf

The aim of this paper is to present a procedure for video analysis applied in an innovative way to diving performance assessment. Sport performance analysis is a trend that is growing exponentially for all level athletes. The technique here shown is based on two important requirements: flexibility and low cost. These two requirements lead to many problems in the video processing that have been faced and solved in this paper.

DIVING PERFORMANCE ASSESSMENT BY MEANS OF VIDEO PROCESSING

DIVING PERFORMANCE ASSESSMENT BY MEANS OF VIDEO PROCESSING

DIVING PERFORMANCE ASSESSMENT BY MEANS OF VIDEO PROCESSING

Locally linear embedding (LLE) is an unsupervised learning algorithm which computes the low dimensional, neighborhood preserving embeddings of high dimensional data. LLE attempts to discover non-linear structure in high dimensional data by exploiting the local symmetries of linear reconstructions. In this paper, video feature extraction is done using modified LLE alongwith adaptive nearest neighbor approach to find the nearest neighbor and the connected components. The proposed feature extraction method is applied to a video. The video feature description gives a new tool for analysis of video.

5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...

5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...

5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...

INFOGAIN PUBLICATION

In the image classification task, we only need to learn local features, but in the image segmentation task, we also need to learn positional information. Therefore, there is a difference between the image segmentation task and the image classification task in the features to be learned. In this study, we propose SE-U-Net++, which efficiently learns both local features and positional information by incorporating SE blocks, and a transfer learning algorithm that bridges the difference between the tasks by comparing parameters in the convolutional layer.

Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...

Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...

Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...

TVSum: Summarizing Web Videos Using Titles

TVSum: Summarizing Web Videos Using Titles

TVSum: Summarizing Web Videos Using Titles

Deep Neural Networks for Multimodal Learning

Deep Neural Networks for Multimodal Learning

Deep Neural Networks for Multimodal Learning

Marc Bolaños Solà

Long range infrared videos such as the Defense Systems Information Analysis Center (DSIAC) videos usually do not have high resolution. In recent years, there are significant advancement in video super-resolution algorithms. Here, we summarize our study on the use of super-resolution videos for target detection and classification. We observed that super-resolution videos can significantly improve the detection and classification performance. For example, for 3000 m range videos, we were able to improve the average precision of target detection from 11% (without super-resolution) to 44% (with 4x super-resolution) and the overall accuracy of target classification from 10% (without super-resolution) to 44% (with 2x superresolution).

TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...

TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...

TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...

Long range infrared videos such as the Defense Systems Information Analysis Center (DSIAC) videos usually do not have high resolution. In recent years, there are significant advancement in video super-resolution algorithms. Here, we summarize our study on the use of super-resolution videos for target detection and classification. We observed that super-resolution videos can significantly improve the detection and classification performance. For example, for 3000 m range videos, we were able to improve the average precision of target detection from 11% (without super-resolution) to 44% (with 4x super-resolution) and the overall accuracy of target classification from 10% (without super-resolution) to 44% (with 2x superresolution).

TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...

TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...

TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...

Research and activity report

Research and activity report

Research and activity report

med_poster_spie

med_poster_spie

med_poster_spie

Matching images to articles is challenging and can be considered a special version of the cross-media retrieval problem. This working note paper presents our solution for the MediaEval NewsImages benchmarking task. We investigated the performance of two cross-modal networks, a pre-trained network and a trainable one, the latter originally developed for text-video retrieval tasks and adapted to the NewsImages task. Moreover, we utilize a method for revising the similarities produced by either one of the cross-modal networks, i.e., a dual softmax operation, to improve our solutions’ performance. We report the official results for our submitted runs and additional experiments we conducted to evaluate our runs internally.

Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022

Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022

Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022

VasileiosMezaris

SUMMARY GENERATION FOR LECTURING VIDEOS

SUMMARY GENERATION FOR LECTURING VIDEOS

SUMMARY GENERATION FOR LECTURING VIDEOS

We thereby propose a time-dependent compression technique that satisfies the need of the hour. The concept suggests different levels of compression based on the age of the recorded video. The study finds the dependence of block size with the time taken for compression and, in turn, finds its performance with the help of metrics like Object Identif ication, Motion Tracking, Activity Recognition, and Mean Squared Error. The user is free to choose from the compression stages mentioned based on the specific application and other essential parameters like Storage capacity.

Time Dependent Video Compression For Efficient Storage

Time Dependent Video Compression For Efficient Storage

Time Dependent Video Compression For Efficient Storage

Modern features-part-0-intro

Modern features-part-0-intro

Modern features-part-0-intro

USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLING

USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLING

USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLING

IRJET- A Review on Image Denoising & Dehazing Algorithm to Improve Dark Chann...

IRJET- A Review on Image Denoising & Dehazing Algorithm to Improve Dark Chann...

IRJET- A Review on Image Denoising & Dehazing Algorithm to Improve Dark Chann...

Transferable Decoding with Visual Entities for Zero-Shot Image Captioning Junjie Fei, Teng Wang, Jinrui Zhang, Zhenyu He, Chengjie Wang, Feng Zheng; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 3136-3146 https://openaccess.thecvf.com/content/ICCV2023/html/Fei_Transferable_Decoding_with_Visual_Entities_for_Zero-Shot_Image_Captioning_ICCV_2023_paper.html

論文紹介：Transferable Decoding with Visual Entities for Zero-Shot Image Captioning

論文紹介：Transferable Decoding with Visual Entities for Zero-Shot Image Captioning

論文紹介：Transferable Decoding with Visual Entities for Zero-Shot Image Captioning

It is challenging to detect vehicles in long range and low quality infrared videos using deep learning techniques such as You Only Look Once (YOLO) mainly due to small target size. This is because small targets do not have detailed texture information. This paper focuses on practical approaches for target detection in infrared videos using deep learning techniques. We first investigated a newer version of You Only Look Once (YOLO v4). We then proposed a practical and effective approach by training the YOLO model using videos from longer ranges. Experimental results using real infrared videos ranging from 1000 m to 3500 m demonstrated huge performance improvements. In particular, the average detection percentage over the six ranges of 1000 m to 3500 m improved from 54% when we used the 1500 m videos for training to 95% if we used the 3000 m videos for training.

Practical Approaches to Target Detection in Long Range and Low Quality Infrar...

Practical Approaches to Target Detection in Long Range and Low Quality Infrar...

Practical Approaches to Target Detection in Long Range and Low Quality Infrar...

Semelhante a UPC at MediaEval Hyperlinking 2013 (20)

Keyframe-based Video Summarization Designer

Keyframe-based Video Summarization Designer

Keyframe-based Video Summarization Designer

Video Manifold Feature Extraction Based on ISOMAP

Video Manifold Feature Extraction Based on ISOMAP

Video Manifold Feature Extraction Based on ISOMAP

01_Introduction.pdf.pdf

01_Introduction.pdf.pdf

01_Introduction.pdf.pdf

DIVING PERFORMANCE ASSESSMENT BY MEANS OF VIDEO PROCESSING

DIVING PERFORMANCE ASSESSMENT BY MEANS OF VIDEO PROCESSING

DIVING PERFORMANCE ASSESSMENT BY MEANS OF VIDEO PROCESSING

5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...

5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...

5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...

Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...

Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...

Transfer Learning Model for Image Segmentation by Integrating U-NetPlusPlus a...

TVSum: Summarizing Web Videos Using Titles

TVSum: Summarizing Web Videos Using Titles

TVSum: Summarizing Web Videos Using Titles

Deep Neural Networks for Multimodal Learning

Deep Neural Networks for Multimodal Learning

Deep Neural Networks for Multimodal Learning

TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...

TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...

TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...

TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...

TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...

TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...

Research and activity report

Research and activity report

Research and activity report

med_poster_spie

med_poster_spie

med_poster_spie

Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022

Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022

Cross-modal Networks and Dual Softmax Operation for MediaEval NewsImages 2022

SUMMARY GENERATION FOR LECTURING VIDEOS

SUMMARY GENERATION FOR LECTURING VIDEOS

SUMMARY GENERATION FOR LECTURING VIDEOS

Time Dependent Video Compression For Efficient Storage

Time Dependent Video Compression For Efficient Storage

Time Dependent Video Compression For Efficient Storage

Modern features-part-0-intro

Modern features-part-0-intro

Modern features-part-0-intro

USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLING

USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLING

USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLING

IRJET- A Review on Image Denoising & Dehazing Algorithm to Improve Dark Chann...

IRJET- A Review on Image Denoising & Dehazing Algorithm to Improve Dark Chann...

IRJET- A Review on Image Denoising & Dehazing Algorithm to Improve Dark Chann...

論文紹介：Transferable Decoding with Visual Entities for Zero-Shot Image Captioning

論文紹介：Transferable Decoding with Visual Entities for Zero-Shot Image Captioning

論文紹介：Transferable Decoding with Visual Entities for Zero-Shot Image Captioning

Practical Approaches to Target Detection in Long Range and Low Quality Infrar...

Practical Approaches to Target Detection in Long Range and Low Quality Infrar...

Practical Approaches to Target Detection in Long Range and Low Quality Infrar...

Mais de Universitat Politècnica de Catalunya

This document provides an overview of deep generative learning and summarizes several key generative models including GANs, VAEs, diffusion models, and autoregressive models. It discusses the motivation for generative models and their applications such as image generation, text-to-image synthesis, and enhancing other media like video and speech. Example state-of-the-art models are provided for each application. The document also covers important concepts like the difference between discriminative and generative modeling, sampling techniques, and the training procedures for GANs and VAEs.

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

Universitat Politècnica de Catalunya

Deep Generative Learning for All

Deep Generative Learning for All

Deep Generative Learning for All

Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...

Universitat Politècnica de Catalunya

Machine translation and computer vision have greatly benefited from the advances in deep learning. A large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two fields in sign language translation and production still poses multiple open challenges, like the low video resources, limitations in hand pose estimation, or 3D spatial grounding from poses.

Towards Sign Language Translation & Production | Xavier Giro-i-Nieto

Towards Sign Language Translation & Production | Xavier Giro-i-Nieto

Towards Sign Language Translation & Production | Xavier Giro-i-Nieto

Universitat Politècnica de Catalunya

The Transformer - Xavier Giró - UPC Barcelona 2021

The Transformer - Xavier Giró - UPC Barcelona 2021

The Transformer - Xavier Giró - UPC Barcelona 2021

Universitat Politècnica de Catalunya

Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...

Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...

Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...

Universitat Politècnica de Catalunya

Machine translation and computer vision have greatly benefited of the advances in deep learning. The large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two field in sign language translation and production is still poses multiple open challenges, like the low video resources, limitations in hand pose estimation, or 3D spatial grounding from poses. This talk will present these challenges and the How2✌️Sign dataset (https://how2sign.github.io) recorded at CMU in collaboration with UPC, BSC, Gallaudet University and Facebook. https://imatge.upc.edu/web/publications/sign-language-translation-and-production-multimedia-and-multimodal-challenges-all

Open challenges in sign language translation and production

Open challenges in sign language translation and production

Open challenges in sign language translation and production

Universitat Politècnica de Catalunya

https://imatge-upc.github.io/synthref/ Integrating computer vision with natural language processing has achieved significant progress over the last years owing to the continuous evolution of deep learning. A novel vision and language task, which is tackled in the present Master thesis is referring video object segmentation, in which a language query defines which instance to segment from a video sequence. One of the biggest challenges for this task is the lack of relatively large annotated datasets since a tremendous amount of time and human effort is required for annotation. Moreover, existing datasets suffer from poor quality annotations in the sense that approximately one out of ten language expressions fails to uniquely describe the target object. The purpose of the present Master thesis is to address these challenges by proposing a novel method for generating synthetic referring expressions for an image (video frame). This method pro- duces synthetic referring expressions by using only the ground-truth annotations of the objects as well as their attributes, which are detected by a state-of-the-art object detection deep neural network. One of the advantages of the proposed method is that its formulation allows its application to any object detection or segmentation dataset. By using the proposed method, the first large-scale dataset with synthetic referring expressions for video object segmentation is created, based on an existing large benchmark dataset for video instance segmentation. A statistical analysis and comparison of the created synthetic dataset with existing ones is also provided in the present Master thesis. The conducted experiments on three different datasets used for referring video object segmentation prove the efficiency of the generated synthetic data. More specifically, the obtained results demonstrate that by pre-training a deep neural network with the proposed synthetic dataset one can improve the ability of the network to generalize across different datasets, without any additional annotation cost. This outcome is even more important taking into account that no additional annotation cost is involved.

Generation of Synthetic Referring Expressions for Object Segmentation in Videos

Generation of Synthetic Referring Expressions for Object Segmentation in Videos

Generation of Synthetic Referring Expressions for Object Segmentation in Videos

Universitat Politècnica de Catalunya

Master MATT thesis defense by Juan José Nieto Advised by Víctor Campos and Xavier Giro-i-Nieto. 27th May 2021. Pre-training Reinforcement Learning (RL) agents in a task-agnostic manner has shown promising results. However, previous works still struggle to learn and discover meaningful skills in high-dimensional state-spaces. We approach the problem by leveraging unsupervised skill discovery and self-supervised learning of state representations. In our work, we learn a compact latent representation by making use of variational or contrastive techniques. We demonstrate that both allow learning a set of basic navigation skills by maximizing an information theoretic objective. We assess our method in Minecraft 3D maps with different complexities. Our results show that representations and conditioned policies learned from pixels are enough for toy examples, but do not scale to realistic and complex maps. We also explore alternative rewards and input observations to overcome these limitations. https://imatge.upc.edu/web/publications/discovery-and-learning-navigation-goals-pixels-minecraft

Discovery and Learning of Navigation Goals from Pixels in Minecraft

Discovery and Learning of Navigation Goals from Pixels in Minecraft

Discovery and Learning of Navigation Goals from Pixels in Minecraft

Universitat Politècnica de Catalunya

Peter Muschick MSc thesis Universitat Pollitecnica de Catalunya, 2020 Sign language recognition and translation has been an active research field in the recent years with most approaches using deep neural networks to extract information from sign language data. This work investigates the mostly disregarded approach of using human keypoint estimation from image and video data with OpenPose in combination with transformer network architecture. Firstly, it was shown that it is possible to recognize individual signs (4.5% word error rate (WER)). Continuous sign language recognition though was more error prone (77.3% WER) and sign language translation was not possible using the proposed methods, which might be due to low accuracy scores of human keypoint estimation by OpenPose and accompanying loss of information or insufficient capacities of the used transformer model. Results may improve with the use of datasets containing higher repetition rates of individual signs or focusing more precisely on keypoint extraction of hands.

Learn2Sign : Sign language recognition and translation using human keypoint e...

Learn2Sign : Sign language recognition and translation using human keypoint e...

Learn2Sign : Sign language recognition and translation using human keypoint e...

Universitat Politècnica de Catalunya

Intepretability / Explainable AI for Deep Neural Networks

Intepretability / Explainable AI for Deep Neural Networks

Intepretability / Explainable AI for Deep Neural Networks

Universitat Politècnica de Catalunya

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.

Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020

Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020

Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020

Universitat Politècnica de Catalunya

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.

Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...

Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...

Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...

Universitat Politècnica de Catalunya

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.

Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020

Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020

Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020

Universitat Politècnica de Catalunya

https://telecombcn-dl.github.io/dlai-2020/ Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.

Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...

Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...

Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...

Universitat Politècnica de Catalunya

https://telecombcn-dl.github.io/drl-2020/ This course presents the principles of reinforcement learning as an artificial intelligence tool based on the interaction of the machine with its environment, with applications to control tasks (eg. robotics, autonomous driving) o decision making (eg. resource optimization in wireless communication networks). It also advances in the development of deep neural networks trained with little or no supervision, both for discriminative and generative tasks, with special attention on multimedia applications (vision, language and speech).

Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020

Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020

Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020

Universitat Politècnica de Catalunya

Giro-i-Nieto, X. One Perceptron to Rule Them All: Language, Vision, Audio and Speech. In Proceedings of the 2020 International Conference on Multimedia Retrieval (pp. 7-8). Tutorial page: https://imatge.upc.edu/web/publications/one-perceptron-rule-them-all-language-vision-audio-and-speech-tutorial Deep neural networks have boosted the convergence of multimedia data analytics in a unified framework shared by practitioners in natural language, vision and speech. Image captioning, lip reading or video sonorization are some of the first applications of a new and exciting field of research exploiting the generalization properties of deep neural representation. This tutorial will firstly review the basic neural architectures to encode and decode vision, text and audio, to later review the those models that have successfully translated information across modalities.

Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)

Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)

Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)

Universitat Politècnica de Catalunya

Image segmentation is a classic computer vision task that aims at labeling pixels with semantic classes. These slides provide an overview of the basic approaches applied from the deep learning field to tackle this challenge and presents the basic subtasks (semantic, instance and panoptic segmentation) and related datasets. Presented at the International Summer School on Deep Learning (ISSonDL) 2020 held online and organized by the University of Gdansk (Poland) between the 30th August and 2nd September. http://2020.dl-lab.eu/virtual-summer-school-on-deep-learning/

Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...

Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...

Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...

Universitat Politècnica de Catalunya

https://imatge-upc.github.io/rvos-mots/ Video object segmentation can be understood as a sequence-to-sequence task that can benefit from the curriculum learning strategies for better and faster training of deep neural networks. This work explores different schedule sampling and frame skipping variations to significantly improve the performance of a recurrent architecture. Our results on the car class of the KITTI-MOTS challenge indicate that, surprisingly, an inverse schedule sampling is a better option than a classic forward one. Also, that a progressive skipping of frames during training is beneficial, but only when training with the ground truth masks instead of the predicted ones.

Curriculum Learning for Recurrent Video Object Segmentation

Curriculum Learning for Recurrent Video Object Segmentation

Curriculum Learning for Recurrent Video Object Segmentation

Universitat Politècnica de Catalunya

Deep neural networks have achieved outstanding results in various applications such as vision, language, audio, speech, or reinforcement learning. These powerful function approximators typically require large amounts of data to be trained, which poses a challenge in the usual case where little labeled data is available. During the last year, multiple solutions have been proposed to leverage this problem, based on the concept of self-supervised learning, which can be understood as a specific case of unsupervised learning. This talk will cover its basic principles and provide examples in the field of multimedia.

Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020

Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020

Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020

Universitat Politècnica de Catalunya

Mais de Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

Deep Generative Learning for All

Deep Generative Learning for All

Deep Generative Learning for All

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...

Towards Sign Language Translation & Production | Xavier Giro-i-Nieto

Towards Sign Language Translation & Production | Xavier Giro-i-Nieto

Towards Sign Language Translation & Production | Xavier Giro-i-Nieto

The Transformer - Xavier Giró - UPC Barcelona 2021

The Transformer - Xavier Giró - UPC Barcelona 2021

The Transformer - Xavier Giró - UPC Barcelona 2021

Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...

Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...

Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...

Open challenges in sign language translation and production

Open challenges in sign language translation and production

Open challenges in sign language translation and production

Generation of Synthetic Referring Expressions for Object Segmentation in Videos

Generation of Synthetic Referring Expressions for Object Segmentation in Videos

Generation of Synthetic Referring Expressions for Object Segmentation in Videos

Discovery and Learning of Navigation Goals from Pixels in Minecraft

Discovery and Learning of Navigation Goals from Pixels in Minecraft

Discovery and Learning of Navigation Goals from Pixels in Minecraft

Learn2Sign : Sign language recognition and translation using human keypoint e...

Learn2Sign : Sign language recognition and translation using human keypoint e...

Learn2Sign : Sign language recognition and translation using human keypoint e...

Intepretability / Explainable AI for Deep Neural Networks

Intepretability / Explainable AI for Deep Neural Networks

Intepretability / Explainable AI for Deep Neural Networks

Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020

Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020

Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020

Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...

Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...

Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...

Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020

Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020

Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020

Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...

Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...

Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...

Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020

Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020

Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020

Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)

Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)

Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)

Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...

Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...

Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...

Curriculum Learning for Recurrent Video Object Segmentation

Curriculum Learning for Recurrent Video Object Segmentation

Curriculum Learning for Recurrent Video Object Segmentation

Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020

Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020

Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020

Último

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Strategies for Landing an Oracle DBA Job as a Fresher

Strategies for Landing an Oracle DBA Job as a Fresher

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

MS Copilot expands with MS Graph connectors

MS Copilot expands with MS Graph connectors

MS Copilot expands with MS Graph connectors

Nanddeep Nachan

Scaling API-first – The story of a global engineering organization Ian Reasor, Senior Computer Scientist - Adobe Radu Cotescu, Senior Computer Scientist - Adobe Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

In the thrilling conclusion to 2023, ransomware groups had a banner year, really outdoing themselves in the "make everyone's life miserable" department. LockBit 3.0 took gold in the hacking olympics, followed by the plucky upstarts Clop and ALPHV/BlackCat. Apparently, 48% of organizations were feeling left out and decided to get in on the cyber attack action. Business services won the "most likely to get digitally mugged" award, with education and retail nipping at their heels. Hackers expanded their repertoire beyond boring old encryption to the much more exciting world of extortion. The US, UK and Canada took top honors in the "countries most likely to pay up" category. Bitcoins were the currency of choice for discerning hackers, because who doesn't love untraceable money?

Ransomware_Q4_2023. The report. [EN].pdf

Ransomware_Q4_2023. The report. [EN].pdf

Ransomware_Q4_2023. The report. [EN].pdf

Overkill Security

presentation ICT roal in 21st century education

presentation ICT roal in 21st century education

presentation ICT roal in 21st century education

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

Artificial Intelligence Chap.5 : Uncertainty

Artificial Intelligence Chap.5 : Uncertainty

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

AWS Community Day CPH - Three problems of Terraform

AWS Community Day CPH - Three problems of Terraform

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

Real Time Object Detection Using Open CV

Real Time Object Detection Using Open CV

Real Time Object Detection Using Open CV

A Beginners Guide to Building a RAG App Using Open Source Milvus

A Beginners Guide to Building a RAG App Using Open Source Milvus

A Beginners Guide to Building a RAG App Using Open Source Milvus

MINDCTI Revenue Release Quarter One 2024

MINDCTI Revenue Release Quarter One 2024

MINDCTI Revenue Release Quarter One 2024

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

Manulife - Insurer Transformation Award 2024

Manulife - Insurer Transformation Award 2024

Manulife - Insurer Transformation Award 2024

The Digital Insurer

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

GenAI Risks & Security Meetup 01052024.pdf

GenAI Risks & Security Meetup 01052024.pdf

GenAI Risks & Security Meetup 01052024.pdf

DBX First Quarter 2024 Investor Presentation

DBX First Quarter 2024 Investor Presentation

DBX First Quarter 2024 Investor Presentation

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Strategies for Landing an Oracle DBA Job as a Fresher

Strategies for Landing an Oracle DBA Job as a Fresher

Strategies for Landing an Oracle DBA Job as a Fresher

MS Copilot expands with MS Graph connectors

MS Copilot expands with MS Graph connectors

MS Copilot expands with MS Graph connectors

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Ransomware_Q4_2023. The report. [EN].pdf

Ransomware_Q4_2023. The report. [EN].pdf

Ransomware_Q4_2023. The report. [EN].pdf

presentation ICT roal in 21st century education

presentation ICT roal in 21st century education

presentation ICT roal in 21st century education

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

Artificial Intelligence Chap.5 : Uncertainty

Artificial Intelligence Chap.5 : Uncertainty

Artificial Intelligence Chap.5 : Uncertainty

AWS Community Day CPH - Three problems of Terraform

AWS Community Day CPH - Three problems of Terraform

AWS Community Day CPH - Three problems of Terraform

Real Time Object Detection Using Open CV

Real Time Object Detection Using Open CV

Real Time Object Detection Using Open CV

A Beginners Guide to Building a RAG App Using Open Source Milvus

A Beginners Guide to Building a RAG App Using Open Source Milvus

A Beginners Guide to Building a RAG App Using Open Source Milvus

MINDCTI Revenue Release Quarter One 2024

MINDCTI Revenue Release Quarter One 2024

MINDCTI Revenue Release Quarter One 2024

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Manulife - Insurer Transformation Award 2024

Manulife - Insurer Transformation Award 2024

Manulife - Insurer Transformation Award 2024

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

GenAI Risks & Security Meetup 01052024.pdf

GenAI Risks & Security Meetup 01052024.pdf

GenAI Risks & Security Meetup 01052024.pdf

DBX First Quarter 2024 Investor Presentation

DBX First Quarter 2024 Investor Presentation

DBX First Quarter 2024 Investor Presentation

UPC at MediaEval Hyperlinking 2013

1. MediaEval Hyperlinking Carles Ventura Marcel Tella Xavier Giró-i-Nieto Barcelona, Catalonia 18th October 2013

3. Visual Features SURF descriptors [Bay et al, CVIU 2008]

4. Visual Features Bag of Features [Sivic & Zisserman, 2003] Figure: Fergus, ICCV 2009

5. Visual Features Histogram intersection

6. Approach Shot boundaries Context Anchor Query video Context

7. Approach ... ... Keyframe 1 ranked list Keyframe 2 ranked list

8. Approach ... ... ... Fused ranked list of keyframes

9. Approach ... Linked segments

10. Results MAP P@5 P@10 P@20 No context 0.0282 0.2600 0.2000 0.1233 Context 0.0260 0.2400 0.1967 0.1217

11. Conclusions ● First MediaEval participation completed. ● Visual alone is useful, but not enough. ● 1.2M keyframes is challenging, video is more. ● Visual performance can be improved with: ○ spatial coding ○ larger vocabulary ○ face / concept detectors