SlideShare uma empresa Scribd logo
1 de 67
Baixar para ler offline
Introduction to MPEG-7

     Guest lecture for ECE417 TSH



      Charlie Dagli
            [dagli@illinois.edu]


            April 7, 2009
Contents
This lecture : A general idea of MPEG – 7
MPEG-7
 –Background
 –Introduction
 –Components of MPEG-7
    Description Definition Language (DDL)
    Multimedia Description Scheme (MDS)
    Video Descriptors
    Audio Descriptors
 –References




                                        2
Background
 Search and Retrieval of Multimedia data
  – In recent years, there has been a huge increasing amount of audiovisual data
    that is becoming available
  – Applications
     Large-scale multimedia search engines on the Web
     Media asset management systems in corporations
     AV broadcast servers
     Personal media servers…
  – Need: Retrieval, search, storage of the AV-data with higher level concept
  – A solver:
     Efficient processing tools to create description of AV material or to support the
      identification or retrieval of AV documents.
  – The research activity on processing tools, the need for interoperability
    between devices has been recognized and standardization activities have
    been launched.
     MPEG-7, “MULTIMEDIA CONTENT DESCRIPTION INTERFACE”,
      standardizes the description of multimedia content supporting wide range of
      applications.
     MPEG stands for Moving Picture Experts Group (1988)
                                                  3
Introduction : What is MPEG-7?
“Multimedia Content Description Interface”
 –Intuition:
    NOT focus so much on processing tools
    Concentrate more on the selection of features that have to be described
    Find a way to structure and instantiate the selected features with a
     common language
 –Efficient representation of audio-visual (AV) meta-data
 –Goal: allow interoperable searching, indexing, filtering and
  access of multimedia content by enabling interoperability
  among devices that deal with multimedia content description.




                                            4
MPEG-7 Main Elements
 Descriptor (D) – standardized “audio
  only” and “visual only” descriptors. <ex>
  a time code for duration, color histograms
  for color.
 Multimedia Description Scheme (MDS)
  – standardized description schemes for
  audio and visual descriptors. <ex> video:
  temporally structured scenes and shots,
  including textual descriptors at the scene
  level and color, motion, audio amplitude
  descriptors at the shot level.
 Description Definition Language
  (DDL) – provides a standardized
  language to express description schemes,
  – based on XML (eXtensible Markup
    Language) – a language that allows the
    creation of new description schemes, and
    possibly, descriptors. Also allows the
    extension and modification of existing
    description schemes.
                                               5
What can MPEG-7 do?
 Increasing availability of potentially interesting audiovisual
  materials makes search more difficult.

 The searching system that any type of AV material may be
  retrieved by means of any type of query materials, such as video,
  music, speech, etc.
  – Some query examples
      Music : Play a few notes on a keyboard and get in return a list of musical pieces
       containing the required tune or images somehow matching the notes.
      Image : Define objects, including color patches or textures and get in return
       examples among which you select the interesting objects to compose your image
      Voice : Using an excerpt of Pavarotti’s voice, and getting a list of Pavarotti’s
       records, video clips where Pavarotti is singing or video clips where Pavarotti is
       present.
      Sports video analysis: can be solved by a much easier way with better results




                                                  6
Application Areas
 Application domains listed in the MPEG-7 Applications document:
   – Education
   – Journalism (e.g. searching speeches of person using his name, his voice or
     his face)
   – Tourist information
   – Cultural services (museum, art gallery, digital library)
   – Entertainment (searching a game, karaoke)
   – Investigation services (human characteristics recognition)
   – Geographical information systems
   – Remote sensing
   – Surveillance (traffic control, surface transportation)
   – Shopping
   – Architecture, real estate, and interior design
   – Social (Dating Service)
   – Film, Video and Radio archives. ……..
   – Audiovisual content production

                                              7
MPEG-7 v.s. previous MPEG activities
 MPEG-1,2, & 4 are designed to represent the information itself,
  while MPEG-7 is meant to represent information about the
  information.

 MPEG-1,2, & 4 made content available, MPEG-7 allows you to
  find the content you need.

 Also, MPEG-7 can be used independently of the other MPEG
  standards – the description might even be attached to an analog
  movie.




                                       8
MPEG-7 Parts
 ISO/IEC TR 15938-1 (Systems)
  – The binary format for encoding MPEG-7 descriptions and the terminal architecture.
 ISO/IEC TR 15938-2 (Description Definition Language)
  – The language for defining the syntax of the MPEG-7 Description Tools and for
    defining new Description Schemes.
 ISO/IEC TR 15938-3 (Visual)
  – The Description Tools dealing with Visual descriptions.
 ISO/IEC TR 15938-4 (Audio)
  – The Description Tools dealing with Audio descriptions.
 ISO/IEC TR 15938-5 (Multimedia Description Schemes)
  – The Description Tools dealing with generic features and multimedia descriptions.
 ISO/IEC TR 15938-6 (Reference Software)
  – A Software implementation of relevant parts of the MPEG-7 Standard with
    normative status.
 ISO/IEC TR 15938-7 (Conformance Testing)
  – Guidelines and procedures for testing conformance of MPEG-7 implementations
 ISO/IEC TR 15938-8 (Extraction and use of descriptions)
  – Informative material (in the form of a Technical Report) about the extraction and use
    of some of the Description Tools.


                                                   9
Next… Description Definition Language (DDL)




                        10
Description Definition Language (DDL)
 Foundations of MPEG-7 standard, provides the language for
  defining the structure and content of multimedia information
 A schema language to represent the results of modeling
  audiovisual data, (i.e. descriptors, and description schemes) as
  a set of syntactic, structural and value constraints to which
  valid MPEG-7 descriptors, description schemes, and
  descriptions must confirm.
 Also provide the rules by which user can combine, extend, and
  refine existing description schemes and descriptors.
 XML. Example
 <PersonName>
         <Title> Prof. </Title>
         <Firstname>Thomas </Firstname>
         <Lastname>Huang</Lastname>
         <Nickname>Tom</Nickname>
 </PersonName>

                                          11
Next…Multimedia Description Schemes (MDS)




                       12
Multimedia Description Schemes (MDS)
 An overview of the organization of MPEG-7 MDS : Organized
  in 6 Areas, Basic Elements, Content Descriptions, Content
  Organization, Content management, Navigation and Access, and
  User Interaction




                                   13
MDS: Basic Elements
Basic Elements – fundamental constructs of the
 definition of MPEG-7 description schemes
 –Schema Tools :
    facilitate the creation of valid MPEG-7 descriptions and packing..
 –Basic Data types :
    Integer & Real – represent constrained integer and real value
    Vectors & Matrix – represent arbitrary sized vectors and matrices of
     integer or real values
    Probability Vectors & Matrices – represent probability distribution
     described using vectors/matrices
    String – represents codes identifying content type, countries, regions,
     currencies, and character sets
 –Linking, Identification and Localization Tools :
    tools for referencing MPEG-7 descriptions, for linking descriptions to
     multimedia content and for describing time in multimedia content


                                                14
MDS: Basic Elements
 –Example: Three kinds of media time representation:
                                            t1                 t2
                                                    Duration
                                 TimeBase

                                            RelTimePoint




    A) Simple time: Specify a time point and a duration
    B) Relative time: Specify a media time point relative to a time base, and a
     duration
    C) Incremental time: Specification of time using a predefined interval
     called Time Unit and counting the number of intervals (efficient for
     periodic signals)

                                                     15
MDS: Basic Elements
 – Basic Description Tools : A library of description schemes and data types, which
   are used as primitive components for building more complex and functionality-
   specific description tools found in the rest of MPEG-7.
     Graph and relation tools: weave together complex multimedia description
      structures                                <Graph>
                                                  <Node id = “A”/> <Node id = “A”/> <Node id = “A”/> <Node id
       – Ex.                                      = “A”/> <Node id = “A”/>
                                                            <Relation type = “#r1” source “#A” target = “#B”/>
                   r3            r3
                          D                                 <Relation type = “#r2” source “#A” target = “#C”/>
                                          C
             B             r1                               …………..

                                 A
                   E
             r4           r1            r2                </Graph>


      Textual annotations: represent textual descriptions
        – Free text annotation : Spain scores a goal against Sweden.
        – Keyword annotation : score, Sweden, Spain
      Classification schemes and terms: define and reference vocabularies for
       multimedia content descriptors.
        – Ex. Part of a ClassificationScheme for sports:
                                                                       sports
                                              soccer         basketball                baseball                  tennis
                                                              16
MDS: Basic Elements
 People and locations: represent people and places related to
  multimedia content
  – Agent: persons, organizations, groups of persons,…
      Ex. <PersonGroup>
         <Name>Spanish National Soccer Team </Name>
           <Kind><Name>Soccer Team </Name></Kind>
         <Member>
             <Name> Fernando </Name>
         </Member>
         <Member>
            ….
         </PersonGroup>

  – Places: existing, historical, and fictional places.
 Affective description: describe emotional response to
  multimedia content
  – Ex. Recording an audience’s excitement while watching an action movie
 Ordering tools:
  – Provides a hint for ordering descriptions for presentation based on
    information contained in those descriptions
  – Ex. Order a set of video segments in a soccer game by the amount of
                                               17
    camera zoom within each segment.
Content management and content description




                            18
MDS: Content Management
 Content management : the description of the life cycle of the
  content, from content to consumption
  – Creation and Production Description,
     Including title, textual annotation, creators, creation locations, dates, how the data
      is classified, review and guidance information, and related multimedia material.
  – Usage Description
     Describes information related to the usage rights, usage record, and financial
      information.
     Rights information is not explicitly included in the description but links are
      provided to the rights holders or right management.
     Usage record description provides information related to the use of the content,
      such as broadcasting, or demand delivery.
     Financial information provides information related to the cost of production and
      the income resulting from content use.
     Usage description is dynamic and subject to change during the lifetime of the
      multimedia content.
  – Media Description
     Describes the storage media in particular the compression, coding, and storage
      format of multimedia content. It describes the master media that is the original
      source from which different instances of the multimedia content are produced.
                                                   19
Content management and content description




                            20
MDS: Structural Content Description
 Content Description: structural and conceptual aspects
  – Structure Description: describes the structure of multimedia built around the
   notation of Segment Description Scheme that represents the spatial, temporal, or
   spatiotemporal portion of the multimedia content
     Segment DSs (the core element)
       – Example: Mosaic DS – panoramic view of video segment constructed by
         aligning together and warping the frames of a Video Segment upon each other




                                               21
MDS: Structural Content Description
    Specific features for structural data description

           Feature      Video         Still        Moving    Audio
                       Segment       Region        Region   Segment
            Time          X                          X         X

            Shape                      X             X

            Color         X            X             X

           Texture                     X

           Motion         X                          X

           Camera         X
           motion

            Audio         X                          X        X
           features
                                              22
MDS: Structural Content Description
    Examples of Image description with Still Regions




                                            23
MDS: Conceptual Content Description
 Conceptual aspects: describes the multimedia content from
  the viewpoint of real-world semantics and conceptual
  notations.
  – Involve entities such as objects, events, abstract concepts and relationships.
  – Segment description schemes and semantic description schemes are related
    by a set of links that allows the multimedia content to be described on the
    basis of both content structure and semantics together.




                                              24
MDS: Conceptual Content Description
    Example of video segments and Regions   Corresponding SegmentRelationship Graph




                                             25
Navigation and access




                        26
MDS: Navigation and Access
 Facilitating browsing and retrieval by defining summaries,
  views, and variations of the multimedia content.
 Summaries: provide compact highlights of the multimedia
  content to enable discovering, browsing, navigation, and
  visualization of multimedia content.
  – Hierarchical navigation mode
  – Sequential navigation mode




                                    27
MDS: Navigation and Access
 View: based on partitions and decompositions, which
  describes different decompositions of the multimedia signals
  in space, time, and frequency. The partitions and
  decompositions can be used as different views of the
  multimedia content  important for multi-resolution access
  and progressive retrieval.

 Variations: provides different variations of multimedia
  programs, such as summaries and abstract, scaled,
  compressed and low-resolution versions and versions with
  different languages and modalities – audio, video, image, text,
  and so forth  allow the selection of the most suitable
  variation of a multimedia program


                                     28
Content organization




                       29
MDS: Content Organization
 Content Organization – tools describe collections and models
  – Collection: unordered sets of multimedia content, segments, descriptor
   instances, concepts or mixed sets of the above
      (Example of collections of AV content including the relationships (i.e.
        RAB,RBC,RAC) within and across Collection Clusters)


               Collection structure




       Content collection

   Segment collection

Descriptor collection       Collection (abstract)

   Concept collection

       Mixed collection
                                                    30
MDS: Content Organization
  – Model tools: Parameterized representation of an instance or class
    multimedia content, descriptors or collections, as follows:
       Probability model : Associates statistics or probabilities with the attributes of
        multimedia content, descriptors or collections
       Analytic model: Associates labels or semantics with multimedia content or
        collections
       Cluster model: Associates labels or semantics and statistics or probabilities with
        multimedia content collections
       Classification model: Describes information about known collections of
        multimedia content in terms of labels, semantics, and models that can be used to
        classify unknown multimedia content

                                        Model (abstract)

                                                                            Classification Model
   Probability Model       Analytic Model              Cluster Model

                                                        Cluster Model
  Probability Model        Collection Model                             ClusterClassification
                                                                           Model
  Discrete distribution    Probability Model class
                                                                          ProbabilityClassification
  Continuous
                                                                           Model
   distribution
  Finite State Model                                    31
MDS: Content Organization
 – Clusters of positive
 and negative
 examples of images
 are described using
 Cluster Model tool.




 – Soccer video sequence
 modeled using State
 Transition Model tool.




                           32
User Interaction




                   33
MDS: User Interaction
 User interaction describes user preferences and usage history
 Allow matching between user preferences and MPEG-7
  content description  facilitate personalization of multimedia
  content access, presentation, and consumption.




                                    34
Introduction to MPEG-7

     Guest lecture for ECE417 TSH



      Charlie Dagli
            [dagli@illinois.edu]


            April 7, 2009
Introduction : What is MPEG-7?
“Multimedia Content Description Interface”
 –Intuition:
    NOT focus so much on processing tools
    Concentrate more on the selection of features that have to be described
    Find a way to structure and instantiate the selected features with a
     common language
 –Provide a way to get information about the audiovisual (AV)
  data without the need of performing the actual decoding of these
  data.
 –Goal: allow interoperable searching, indexing, filtering and
  access of multimedia content by enabling interoperability
  among devices that deal with multimedia content description.



                                            36
MPEG-7 Main Elements
 Descriptor (D) – provides standardized “audio only” and “visual only”
  descriptors. <ex> a time code for duration, color histograms for color.
 Multimedia Description Scheme (MDS) – provides standardized description
  schemes involving both audio and visual descriptors. <ex> a movie,
  temporally structured as scenes and shots, including textual descriptors at the
  scene level and color, motion, audio amplitude descriptors at the shot level.
 Description Definition Language (DDL) – provides a standardized language
  to express description schemes,
  – based on XML (eXtensible Markup Language) – a language that allows the creation
    of new description schemes, and possibly, descriptors. Also allows the extension and
    modification of existing description schemes.
 Coding Schemes – compressing MPEG-7 textual XML descriptions into
  Binary format (BiM) to satisfy application requirements for compression
  efficiency, error resilience, ...

 SYSTEM:



                                                 37
Visual Descriptors
 Cover 6 basic visual features as
   –Color
   –Texture
   –Shape
   –Motion
   –Localization
   –Face Recognition




                                     38
Color descriptors
 Color Descriptors
  – Color Space : defines the color components as continuous-value entities
      R, G, B
      Y, Cr, Cb
        – Y = 0.299R + 0.587G + 0.114B
        – Cb = – 0.169R – 0.331G + 0.500B                                     Min (whiteness)
        – Cr = 0.500R – 0.419G – 0.081B
      H, S, V (Hue, Saturation, Value)
        – A nonlinear transform of the RGB
        – Quantized into 16,32,64,128,256 bins for
        scalable color descriptor and frames
         histogram descriptor
      HMMD (Hue, Max, Min, Diff, Sum)
        – Max = max (R, G, B)
        – Min = min (R, G, B)
        – Diff = Max – Min                                                      Max (blackness)
        – Sum = (Max + Min ) / 2


      Linear transformation matrix with reference to R, G, B
        – Any 3 x 3 color transform matrix that specifies the linear
        transformation between RGB and the respective color space.
      Monochrome: Y component alone in YCrCb is used
                                                      39
Color Descriptors
 –Color Quantization Descriptor : specifies the partitioning of the
  given color space into discrete bins.
 –Dominant Color Descriptor (DCD): allows specification of a small
  number of dominant color values as well as their statistical properties, such as
  distribution and variance  provides an effective an compact representation
  of colors present in a region or an image.
    DCD is defined to be
         F = {(ci, pi, vi), s}, (i = 1, 2, .. N), N is the number of dominant colors
        ci  dominant color value, a vector of corresponding color space component
         values
        pi  the fraction of pixels in the image corresponding to ci
        vi  the variation of the color values of the pixels in a cluster around the
         corresponding representative color
        s  the spatial coherency, represents the overall spatial homogeneity
        (Examples of low and high spatial coherency of color)




                                              40
Color Descriptors
 –Scalable Color Descriptor : a Haar transform-based encoding
  scheme applied across values of a color histogram in the HSV
  color space
      – Useful for image-to-image matching and retrieval based on color feature. Its
        binary representation is scalable in terms of bin numbers and bit
        representation accuracy over a broad range of data rate.
 –Group-of-Frame or Group-of-Picture Descriptor :
    For joint representation of color-based features for multiple images or multiple
     frames in a video segment
    Traditionally for a group of frames or pictures  a key frame or image is
     selected and the color-related features of the entire collection are represented by
     the chosen sample  unreliable
    By GoF and GoP  histogram based descriptors that reliably capture the color
     content of multiple images or video frames.




                                                 41
Color Descriptors
 – Color Layout Descriptor (CLD) : represents the spatial distribution of
   representative colors on a grid superimposed on a region or image. Representation is
   based on coefficients of Discrete Cosine Transform. This is a very compact
   descriptor being highly efficient in fast browsing and search applications.
 – Color Structure Descriptor (CSD): based on color histogram, but aims at
   identifying localized color distributions using a small structuring window. To
   guarantee, interoperability, the CSD is bound to the HMMD color space.
 – CSD: the degree to which its pixels are clumped together relative to the scale of an
   associated structuring element.




             Examples of structured and unstructured color.


                                                42
Texture Descriptors
 Homogeneous Texture Descriptor (HTD):
  – provides a quantitative representation using 62 numbers, consisting of the
    mean energy and energy deviation from a set of frequency channel
  – Useful for similarity retrieval
  – Effective in characterizing homogeneous texture regions
 Texture Browsing Descriptor (TBD):
  – Defined for coarse level texture browsing
  – Provides a perceptual characterization of texture, similar to human
    characterization, in terms of regularity, coarseness and directionality of the
    texture pattern.
 Edge Histogram Descriptor (EHD):
  – Capture spatial distribution of edges in an image
  – Useful in matching regions with partially varying, non-uniform texture.



                                               43
Homogeneous Texture Descriptor
• Texture Descriptor
  – Homogeneous Texture Descriptor (HTD): characterize the region
    texture using the mean energy and the energy deviation from a set of
    frequency channel. The 2D frequency plane is partitioned into 30
    channels as the following:
                                           (Frequency layout for
                                                     feature extraction)




                                                         ω




  The Syntax of the HTD is as follows:
                HTD = [fDC, fSD, e1, e2, ..,e30, d1, d2, .. ,d30]
  Where fDC and fSD are the mean and standard deviation of input images, and ei
   and di are the nonlinearly scaled and quantized mean energy and energy
                                             44
   deviation of the i-th channel.
Texture Browsing Descriptor
     – Texture Browsing : Perceptual characterization of a texture, similar to a human
       characterization, in terms of regularity, coarseness and directionality
 

     – TBD = [v1,v2,v3,v4,v5]
         v1 ∈ {1, 2, 3, 4} or {00,01,10,11}: represents the regularity
         v2,v3 ∈ {1, 2, 3, 4, 5, 6} : capture the directionality of the texture
         v4, v5 ∈ {1, 2, 3, 4}: capture the coarseness of the texture

                                Regularity                    Semantics
                                     00                        irregular
                                     01                     slightly regular
                                     10                             regular
                                     11                      highly regular
                                          Semantics of Regularity.
 


                                                                                    
              
                                                                                    
                      



                          11                                  01               00
                                            10
        Regularity



                                   Examples of Regularity
                                                      45
Edge Histogram Descriptor
 – Edge Histogram: represents local edge distribution in the image
    Five types of edges: 5 histogram bins per each sub-image




                                           BinCounts[k]                   Semantics
                                             BinCounts[0]   Vertical edges in sub-image (0,0)
                                             BinCounts[1]   Horizontal edges in sub-image (0,0)
                                             BinCounts[2]   45 degree edges in sub-image (0,0)
                                             BinCounts[3]   135 degree edges in sub-image (0,0)
                                             BinCounts[4]   Non-directional edges in sub-image (0,0)
                                             BinCounts[5]   Vertical edges in sub-image (0,1)
                                                           
                                            BinCounts[74]   Non-directional edges in sub-image (3,2)
                                            BinCounts[75]   Vertical edges in sub-image (3,3)
                                            BinCounts[76]   Horizontal edges in sub-image (3,3)
                                            BinCounts[77]   45 degree edges in sub-image (3,3)
                                            BinCounts[78]   135 degree edges in sub-image (3,3)
                                            BinCounts[79]   Non-directional edges in sub-image (3,3)



                                              46
Shape Descriptors
 Shape Descriptors
  – Region-based Shape Descriptor
     Expresses pixel distribution within a 2-D object or region.
     Based on both boundary and internal pixels and can describe complex objects
      consisting of multiple disconnected regions as well as simple objects with or
      without holes.
  – Contour-based Shape Descriptor
     Based on CSS representation of the contour
  – 3-D Spectrum Descriptor
     Expresses characteristic features of objects represented as discrete polygonal 3-D
      meshes.
     Based on the histogram of local geometrical properties of the 3-Dsurfaces of the
      object.




                                                 47
Shape Descriptors
 – Region-based shape descriptor utilizes a set of ART(Angular Radial
   Transform) coefficients. Twelve angular and three radial functions are used
   (n < 3, m < 12).


         Fnm is an ART coefficient of order n and m. V is ART basis function and f is an image function




        V (ART basis function) is separable along the angular and radial directions




                           (Real part of the ART basis functions)
    ART coefficients are divided by the magnitude of ART coefficient of order n= 0, m = 0, which is not used
     as a descriptor element.
    Quantization is applied to each coefficient using 4 bit per coefficient to minimize the size of the descriptor
                                                              48
Shape Descriptors
 – Contour-based Shape Descriptor : describes a closed contour of a 2D object or
   region in image or video sequence. Based on the Curvature Scale Space (CSS)
   representation of the contour




          (A 2D visual object (region) and its corresponding shape)



                                                                 Field           No. of bits             Meaning

                                                            No. of peaks             6         No. of peaks in CSS image
                                                                                               Circularity and eccentricity
                                                                                    2×6
                                                            GlobalCurvature
                                                                                               of the contour
                                                                                               Circularity and eccentricity
                                                                                    2×6
                                                            PrototypeCurvature
                                                                                               of the smoothed contour
                                                                                               Absolute height of the highest
                                                            HighestPeakY             7
                                                                                               peak (quantized)
                                                                                               X-position on the contour of a
                                                            PeakX[]                  6
                                                                                               peak (quantized)
                                                                                               Height of the peak
                                                            PeakY[]                  3
                                                                                               (quantized)

          (CSS Image Formation)
                                                               49
          Smoothing evolution of zero-crossing
Shape Descriptors
    Contour-based Shape Descriptor has the following properties
       • It can distinguish between shapes that have similar region-shape properties but
       different contour-shape properties.



     – · It supports search for shapes that are semantically similar for humans



     – · It is robust to significant non-rigid deformations



     – · It is robust to distortions in the contour due to perspective transformations, which are
       common in the images and video



     – · It is robust to noise present on the contour.
     – · It is very compact (14 Bytes per contour on average).
     – · The descriptor is easy to implement and offers fast extraction and matching.

                                                    50
Shape Descriptors
 (3-Dimensional Class)
 – 3-D Shape spectrum descriptor : This descriptor specifies an intrinsic shape
  description for 3D mesh models. It exploits some local attributes of the 3D surface.

    The shape index, introduced by Koenderink, is defined as a function of the two principal
     curvatures,   and       associated with point p on the 3D surface S.

                                        with



    By definition, the shape index value is in the interval [0,1]
    The shape spectrum of the 3D mesh (3D-SSD) is the histogram of the shape indices (Ip‘s)
     calculated over the entire mesh.




                                                    51
Motion Descriptors
 Camera Motion Descriptor
 Motion Trajectory Descriptor
 Parametric Motion Descriptor
 Motion Activity Descriptor


                                              Moving region
           Video segment


Camera motion              Mosaic
                                         Motion trajectory
     Motion activity
                            Warping
                                                 Parametric motion
                           parameters
                                    52
Motion Descriptors
 Motion Descriptors
  – Camera Motions: pan, track, tilt, boom, zoom, dolly, roll, absence




                                   perspective projection and camera
                                   motion parameters

                                                  53
Motion Descriptors
 – Motion Trajectory : describes the displacements of objects in time. A high
   level feature associated to a moving region, defined as the spatiotemporal
   localization of one of its representative points (such as its center) as a list of key
   points (x, y, z, t)
 – Parametric Motion : describing the motion of objects in video sequences as a 2D
   parametric model.
     Affine Models (6): translations, rotations, scaling and combination of these.
     Planar Perspective Models (8) : Global deformations with perspective projections
     Quadratic Models (12) : describes more complex movements
 – Motion Activity : Intuitive notion of ‘intensity of action’ or ‘pace of action’ in a
   video segment.
     Example of high “activity”: Goal scoring in a soccer match
     Can be used in diverse applications such as content repurposing, video summarization,
      surveillance, content-based querying, etc.
     Four attributes:
       – Intensity of activity: indicate high or low activity by a integer lying in [1—5]
       – Direction of activity: expresses the dominant direction of the activity if any
       – Spatial distribution of activity: the number and size of active regions in a frame
       – Temporal distribution of activity: expresses the variation of activity over the duration
                                                       54
Localization Descriptors
 Localization Descriptors
  – Region Locator : Localization of regions within images or frames by specifying
   them with a brief and scalable representation of a Box or a Polygon. Procedure
   consists of the following 2 steps
      Extraction of vertices of the region to be localized
      Localization of the region within the image or frame




        (localization using a polygonal and Box element of the RegionLocator)
  – Spatio Temporal Locator: describes spatial-temporal regions in a video
    sequence, such as moving object regions, and provides localization
    functionality.

                                                       55
Face Recognition Descriptor

FaceRecognition Descriptor : Used to retrieve face images which match a query
face image.
    –Face Recognition : The projection of a face vector     onto a set of 48 basis
    eigenvectors U (‘eigenfaces’) which span the space of possible face vectors.
    –Feature Extraction : The FaceRecognition feature set is extracted from a
    normalized face image. This normalized face image contains 56 lines with 46
    intensity values in each line. The centre of the two eyes in each face image are
   located on the 24th row and the 16th and 31st column for the right and left eye
   respectively.
   Features are given by the vector W
                                        and       is the mean face vector.
   The features are normalized and clipped using Z=2048 as follows.




                                                56
Face descriptor
 – Automatic Face Image Localization




    (Block Diagram of the Automatic face Image Localization algorithm)
    Color Segmentation




      (A color segmentation example: a) the skin color region in the Cb-Cr plane
       b) original image c) results of the color segmentation algorithm)



                                                           57
Audio descriptors
 Overview of Audio Framework including Descriptors




                                   58
Audio Descriptors
 Basic Descriptors: temporally sampled scalar values for general use,
 applicable to all kinds of signals
  – AudioWaveform Descriptor : Audio waveform envelope (minimum and
    maximum), typically for display purposes
  – AudioPower Descriptor : the temporally smoothed instantaneous power,
    which is useful as a quick summary of a signal, and in conjunction with the
    power spectrum.

 Basic Spectral Descriptors: all deriving from a single time-frequency
 analysis of an audio signal
  – AudioSpectrumEnvelope Descriptor : a logarithmic-frequency spectrum,
    spaced by a power-of-two divider (multiple of an octave)
  – AudioSpectrumCentroid Descriptor : the center of gravity of the log-
    frequency power spectrum, which describes the shape of the power
    spectrum


                                             59
Audio Descriptors
 – AudioSpectrumSpread Descriptor : complementary of the previous descriptor
   by describing the second moment of log-frequency power spectrum. This may
   help distinguish between pure-tone and noise-like sounds
 – AudioSpectrumFlatness Descriptor : the flatness properties of the spectrum of
   an audio signal for each of a number of frequency bands. When this indicates a high
   deviation from a flat spectral shape for a given band, it may signal the presence of
   tonal components
 (Example of AudioSpectrumEnvelope description of a pop song)
  Visualized using a spectrogram.
  Required data storage is NM values
  where N is the no. of spectrum bins
  and M is the no. of time points




                                                 60
Audio Descriptors
 Spectral Basis Descriptor: low-dimensional projections of a high-
 dimensional spectral space to aid compactness and recognition, which are
 used primarily with the Sound Classification and Indexing Description Tools
  – AudioSpectrumBasis : a series of basis functions that are derived from the
    singular value decomposition of a normalized power spectrum
  – AudioSpectrumProjection : Used with above descriptor, and represents low-
    dimensional features of a spectrum after projection upon a reduced rank basis.
  (Example: A 10-basis component reconstruction showing most of the detail of the
    original spectrogram including guitar, bass guitar, etc.)
   The left vectors are an AudioSpectrumBasis
   Descriptor and the top vectors are the
   corresponding AudioSpectrumProjection
   Descriptor. The required data storage is
   10(M+N) values




                                                61
Audio Descriptors
 Signal Parameters : apply chiefly to periodic or quasi-periodic
  signals

  – AudioFundamentalFrequency Descriptor : fundamental frequency of an
    audio signal, which represents for a confidence measure in recognition of
    the fact that the various extraction methods, commonly called “pitch-
    tracking”, are not perfectly accurate.

  – AudioHarmonicity Descriptor : the harmonicity of a signal, allowing
    distinction between sounds with a harmonic spectrum (e.g., musical tones
    or voiced speech [vowels like ‘a’]), sounds with an inharmonic spectrum
    (e.g., metallic or bell-like sounds) and sounds with a non-harmonic
    spectrum (e.g., noise, unvoiced speech [fricatives like ‘f’], or dense
    mixtures of instruments).



                                            62
Audio Descriptors
 Timbral Temporal Descriptor : temporal characteristics of segments
 of sounds, useful for the description of musical timbre( characteristic tone
 quality independent of pitch and loudness).
  – LogAttackTime Descriptor : the ‘attack’ of a sound, the time it takes for the signal
    to rise from silence to the maximum amplitude. It tells the difference between a
    sudden and a smooth sound
  – TemporalCentroid Descriptor : the signal envelope, representing where in time the
    energy of a signal is focused. It is used for the distinction between a decaying piano
    note and a sustained organ note, when the lengths and the attacks of the two notes
    are identical.
 Timbral Spectral Descriptor : spectral features in a linear-frequency
 space especially applicable to the perception of musical timbre.
  – SpectralCentroid Descriptor : the power-weighted average of the frequency of the
    bins in the linear power spectrum. Very similar to the AudioSpectrumCentroid, but
    specialized for use in distinguishing musical instrument timbres. It tells the
    “sharpness” of a sound.



                                                  63
Audio Descriptors
  – HarmonicSpectralCentroid Descriptor : the amplitude-weighted mean of the
    harmonic peaks of the spectrum. It has a similar semantic to the other centroid
    descriptors, but applies only to the harmonic parts of the musical tone.
  – HarmonicSpectralDeviation Descriptor : the spectral deviation of log-amplitude
    components from a global spectral envelope.
  – HarmonicSpectralSpread Descriptor : the amplitude-weighted standard deviation
    of the harmonic peaks of the spectrum, normalized by the instantaneous
    HarmonicSpectralCentroid.
  – HarmonicSpectralVariation Descriptor : the normalized correlation between the
    amplitude of the harmonic peaks between two subsequent time-slices of the signal.
 Silence Segment : attaches the simple semantic of “silence” (i.e. no
 significant signal) to an Audio Segment. It may be used to aid further
 segmentation of the audio stream, or as a hint not to process a segment.




                                                64
Audio Descriptors
 High-level Audio Description Tools (Ds and DSs)
  – Audio Signature DS : A condensed representation of an audio signal designed to
   provide a unique content for the purpose of robust automatic identification of audio
   signals. Applications include audio fingerprinting, identification of audio based on a
   database of known works
  – Musical Instrument Timbre Description Tools
     HarmonicInstrumentTimbre Descriptor : Four harmonic timbral spectral
      Descriptors with the LogAttackTime Descriptor
     PercussiveInstrumentTimbre Descriptor : The timbral temporal Descriptors
      with a SpectralCentroid Descriptor
  – Melody Description Tools
     Include a rich representation for monophonic melodic information to
      facilitate efficient, robust, and expressive melodic similarity matching.
     MelodyContour DS: terse, efficient melody contour representation
     MelodySequence DS: a more verbose, complete, expressive melody
      representation


                                                 65
Audio Descriptors
 – General Sound Recognition and Indexing Description Tools
    A collection of tools for indexing and categorization of sound (effects) in
     general
    SoundModelStatePath Descriptor: states generated by a sound model
    SoundModelStateHistogram Descriptor: normalized histogram of the state
     sequence generated by a sound model

 – Spoken Content Description Tools
    Consists of combined word and phone lattices for each speaker in an audio
     stream. Use phone lattices to alleviate out-of-vocabulary problem (OOV)
    SpokenContentLattice Description Scheme : the actual decoding produced by
     an ASR(Automatic Speech Recognition) engine.
    SpokenContentHeader : information about the speakers being recognized and
     the recognizer itself.




                                              66
References
 Book – Introduction to MPEG-7: Multimedia Content
  Description Interface
  B. S. Manjunath (Editor), Philippe Salembier (Editor), Thomas
  Sikora (Editor)
  ISBN: 0-471-48678-7
  http://www.wiley.com/WileyCDA/WileyTitle/
  productCd-0471486787.html

 MPEG-7
  http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm

 MPEG-7 DDL Homepage
     http://archive.dstc.edu.au/mpeg7-ddl/

                                    67

Mais conteúdo relacionado

Mais procurados

Speech Recognition Using Python | Edureka
Speech Recognition Using Python | EdurekaSpeech Recognition Using Python | Edureka
Speech Recognition Using Python | EdurekaEdureka!
 
Streaming Media Protocols
Streaming Media ProtocolsStreaming Media Protocols
Streaming Media Protocolssanjoysanyal
 
Introduction to Multimedia Design and Development
Introduction to Multimedia Design and DevelopmentIntroduction to Multimedia Design and Development
Introduction to Multimedia Design and DevelopmentNana Kofi Annan PhD
 
Introduction to HEVC
Introduction to HEVCIntroduction to HEVC
Introduction to HEVCYoss Cohen
 
Multimedia: Audio and video technology
Multimedia: Audio and video technologyMultimedia: Audio and video technology
Multimedia: Audio and video technologyArti Parab Academics
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)WingChan46
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingChristian Kehl
 
AV1: the next generation video codec
AV1: the next generation video codecAV1: the next generation video codec
AV1: the next generation video codecTing-Li Chou
 
"Data Annotation at Scale: Pitfalls and Solutions," a Presentation from Intel
"Data Annotation at Scale: Pitfalls and Solutions," a Presentation from Intel"Data Annotation at Scale: Pitfalls and Solutions," a Presentation from Intel
"Data Annotation at Scale: Pitfalls and Solutions," a Presentation from IntelEdge AI and Vision Alliance
 
Chapter 1 - Multimedia Fundamentals
Chapter 1 - Multimedia FundamentalsChapter 1 - Multimedia Fundamentals
Chapter 1 - Multimedia FundamentalsPratik Pradhan
 
Chapter 5 : ANIMATION
Chapter 5 : ANIMATIONChapter 5 : ANIMATION
Chapter 5 : ANIMATIONazira96
 
Chapter 5 Multimedia SC025 2017/2018
Chapter 5 Multimedia SC025 2017/2018Chapter 5 Multimedia SC025 2017/2018
Chapter 5 Multimedia SC025 2017/2018Fizaril Amzari Omar
 

Mais procurados (20)

Mpeg 7-21
Mpeg 7-21Mpeg 7-21
Mpeg 7-21
 
Multimedia:Multimedia compression
Multimedia:Multimedia compression Multimedia:Multimedia compression
Multimedia:Multimedia compression
 
Audio And Video Over Internet
Audio And Video Over InternetAudio And Video Over Internet
Audio And Video Over Internet
 
Speech Recognition Using Python | Edureka
Speech Recognition Using Python | EdurekaSpeech Recognition Using Python | Edureka
Speech Recognition Using Python | Edureka
 
Streaming Media Protocols
Streaming Media ProtocolsStreaming Media Protocols
Streaming Media Protocols
 
Video coding standards ppt
Video coding standards pptVideo coding standards ppt
Video coding standards ppt
 
Introduction to Multimedia Design and Development
Introduction to Multimedia Design and DevelopmentIntroduction to Multimedia Design and Development
Introduction to Multimedia Design and Development
 
Adobe Premiere Pro
Adobe Premiere ProAdobe Premiere Pro
Adobe Premiere Pro
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
 
Image Files Formats
Image Files FormatsImage Files Formats
Image Files Formats
 
Introduction to HEVC
Introduction to HEVCIntroduction to HEVC
Introduction to HEVC
 
Multimedia: Audio and video technology
Multimedia: Audio and video technologyMultimedia: Audio and video technology
Multimedia: Audio and video technology
 
NLP
NLPNLP
NLP
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video Encoding
 
AV1: the next generation video codec
AV1: the next generation video codecAV1: the next generation video codec
AV1: the next generation video codec
 
"Data Annotation at Scale: Pitfalls and Solutions," a Presentation from Intel
"Data Annotation at Scale: Pitfalls and Solutions," a Presentation from Intel"Data Annotation at Scale: Pitfalls and Solutions," a Presentation from Intel
"Data Annotation at Scale: Pitfalls and Solutions," a Presentation from Intel
 
Chapter 1 - Multimedia Fundamentals
Chapter 1 - Multimedia FundamentalsChapter 1 - Multimedia Fundamentals
Chapter 1 - Multimedia Fundamentals
 
Chapter 5 : ANIMATION
Chapter 5 : ANIMATIONChapter 5 : ANIMATION
Chapter 5 : ANIMATION
 
Chapter 5 Multimedia SC025 2017/2018
Chapter 5 Multimedia SC025 2017/2018Chapter 5 Multimedia SC025 2017/2018
Chapter 5 Multimedia SC025 2017/2018
 

Semelhante a Mpeg7

multimedia mpeg-7
multimedia mpeg-7multimedia mpeg-7
multimedia mpeg-7nil65
 
資訊理論與視訊壓縮 mpeg 7
資訊理論與視訊壓縮 mpeg 7資訊理論與視訊壓縮 mpeg 7
資訊理論與視訊壓縮 mpeg 7智豪 薛
 
MPEG-7 Services in Community Engines
MPEG-7 Services in Community EnginesMPEG-7 Services in Community Engines
MPEG-7 Services in Community EnginesRalf Klamma
 
A Personalized Audio Web Service using MPEG-7 and MPEG-21 standards
A Personalized Audio Web Service using MPEG-7 and MPEG-21 standardsA Personalized Audio Web Service using MPEG-7 and MPEG-21 standards
A Personalized Audio Web Service using MPEG-7 and MPEG-21 standardsUniversity of Piraeus
 
C14 fiatifta dubai 2013, the mpeg-7 audiovisual description profile standar...
C14   fiatifta dubai 2013, the mpeg-7 audiovisual description profile standar...C14   fiatifta dubai 2013, the mpeg-7 audiovisual description profile standar...
C14 fiatifta dubai 2013, the mpeg-7 audiovisual description profile standar...FIAT/IFTA
 
Rosinski ibm ai overview with several examples of projects in the media and l...
Rosinski ibm ai overview with several examples of projects in the media and l...Rosinski ibm ai overview with several examples of projects in the media and l...
Rosinski ibm ai overview with several examples of projects in the media and l...FIAT/IFTA
 
Mpeg 7 video signature tools for content recognition
Mpeg 7 video signature tools for content recognitionMpeg 7 video signature tools for content recognition
Mpeg 7 video signature tools for content recognitionParag Tamhane
 
Caspar Preservation Methodology Steve Renkin
Caspar Preservation Methodology Steve RenkinCaspar Preservation Methodology Steve Renkin
Caspar Preservation Methodology Steve RenkinDigitalPreservationEurope
 
IRJET- Virtual Vision for Blinds
IRJET- Virtual Vision for BlindsIRJET- Virtual Vision for Blinds
IRJET- Virtual Vision for BlindsIRJET Journal
 
MPEG Compression Standards
MPEG Compression StandardsMPEG Compression Standards
MPEG Compression StandardsAjay
 
A Distributed Audio Personalization Framework over Android
A Distributed Audio Personalization Framework over AndroidA Distributed Audio Personalization Framework over Android
A Distributed Audio Personalization Framework over AndroidUniversity of Piraeus
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Recordspbajcsy
 
A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09
A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09
A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09Roku
 
Video Indexing And Retrieval
Video Indexing And RetrievalVideo Indexing And Retrieval
Video Indexing And RetrievalYvonne M
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSymeon Papadopoulos
 

Semelhante a Mpeg7 (20)

Mpeg 7 slides
Mpeg 7 slidesMpeg 7 slides
Mpeg 7 slides
 
multimedia mpeg-7
multimedia mpeg-7multimedia mpeg-7
multimedia mpeg-7
 
Mpeg 7
Mpeg 7Mpeg 7
Mpeg 7
 
資訊理論與視訊壓縮 mpeg 7
資訊理論與視訊壓縮 mpeg 7資訊理論與視訊壓縮 mpeg 7
資訊理論與視訊壓縮 mpeg 7
 
MPEG-7 Services in Community Engines
MPEG-7 Services in Community EnginesMPEG-7 Services in Community Engines
MPEG-7 Services in Community Engines
 
A Personalized Audio Web Service using MPEG-7 and MPEG-21 standards
A Personalized Audio Web Service using MPEG-7 and MPEG-21 standardsA Personalized Audio Web Service using MPEG-7 and MPEG-21 standards
A Personalized Audio Web Service using MPEG-7 and MPEG-21 standards
 
Mpeg7
Mpeg7Mpeg7
Mpeg7
 
C14 fiatifta dubai 2013, the mpeg-7 audiovisual description profile standar...
C14   fiatifta dubai 2013, the mpeg-7 audiovisual description profile standar...C14   fiatifta dubai 2013, the mpeg-7 audiovisual description profile standar...
C14 fiatifta dubai 2013, the mpeg-7 audiovisual description profile standar...
 
Rosinski ibm ai overview with several examples of projects in the media and l...
Rosinski ibm ai overview with several examples of projects in the media and l...Rosinski ibm ai overview with several examples of projects in the media and l...
Rosinski ibm ai overview with several examples of projects in the media and l...
 
Mpeg 7 video signature tools for content recognition
Mpeg 7 video signature tools for content recognitionMpeg 7 video signature tools for content recognition
Mpeg 7 video signature tools for content recognition
 
Swws
SwwsSwws
Swws
 
Swws
SwwsSwws
Swws
 
Caspar Preservation Methodology Steve Renkin
Caspar Preservation Methodology Steve RenkinCaspar Preservation Methodology Steve Renkin
Caspar Preservation Methodology Steve Renkin
 
IRJET- Virtual Vision for Blinds
IRJET- Virtual Vision for BlindsIRJET- Virtual Vision for Blinds
IRJET- Virtual Vision for Blinds
 
MPEG Compression Standards
MPEG Compression StandardsMPEG Compression Standards
MPEG Compression Standards
 
A Distributed Audio Personalization Framework over Android
A Distributed Audio Personalization Framework over AndroidA Distributed Audio Personalization Framework over Android
A Distributed Audio Personalization Framework over Android
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Records
 
A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09
A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09
A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09
 
Video Indexing And Retrieval
Video Indexing And RetrievalVideo Indexing And Retrieval
Video Indexing And Retrieval
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia content
 

Último

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Último (20)

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Mpeg7

  • 1. Introduction to MPEG-7 Guest lecture for ECE417 TSH Charlie Dagli [dagli@illinois.edu] April 7, 2009
  • 2. Contents This lecture : A general idea of MPEG – 7 MPEG-7 –Background –Introduction –Components of MPEG-7  Description Definition Language (DDL)  Multimedia Description Scheme (MDS)  Video Descriptors  Audio Descriptors –References 2
  • 3. Background  Search and Retrieval of Multimedia data – In recent years, there has been a huge increasing amount of audiovisual data that is becoming available – Applications  Large-scale multimedia search engines on the Web  Media asset management systems in corporations  AV broadcast servers  Personal media servers… – Need: Retrieval, search, storage of the AV-data with higher level concept – A solver:  Efficient processing tools to create description of AV material or to support the identification or retrieval of AV documents. – The research activity on processing tools, the need for interoperability between devices has been recognized and standardization activities have been launched.  MPEG-7, “MULTIMEDIA CONTENT DESCRIPTION INTERFACE”, standardizes the description of multimedia content supporting wide range of applications.  MPEG stands for Moving Picture Experts Group (1988) 3
  • 4. Introduction : What is MPEG-7? “Multimedia Content Description Interface” –Intuition:  NOT focus so much on processing tools  Concentrate more on the selection of features that have to be described  Find a way to structure and instantiate the selected features with a common language –Efficient representation of audio-visual (AV) meta-data –Goal: allow interoperable searching, indexing, filtering and access of multimedia content by enabling interoperability among devices that deal with multimedia content description. 4
  • 5. MPEG-7 Main Elements  Descriptor (D) – standardized “audio only” and “visual only” descriptors. <ex> a time code for duration, color histograms for color.  Multimedia Description Scheme (MDS) – standardized description schemes for audio and visual descriptors. <ex> video: temporally structured scenes and shots, including textual descriptors at the scene level and color, motion, audio amplitude descriptors at the shot level.  Description Definition Language (DDL) – provides a standardized language to express description schemes, – based on XML (eXtensible Markup Language) – a language that allows the creation of new description schemes, and possibly, descriptors. Also allows the extension and modification of existing description schemes. 5
  • 6. What can MPEG-7 do?  Increasing availability of potentially interesting audiovisual materials makes search more difficult.  The searching system that any type of AV material may be retrieved by means of any type of query materials, such as video, music, speech, etc. – Some query examples  Music : Play a few notes on a keyboard and get in return a list of musical pieces containing the required tune or images somehow matching the notes.  Image : Define objects, including color patches or textures and get in return examples among which you select the interesting objects to compose your image  Voice : Using an excerpt of Pavarotti’s voice, and getting a list of Pavarotti’s records, video clips where Pavarotti is singing or video clips where Pavarotti is present.  Sports video analysis: can be solved by a much easier way with better results 6
  • 7. Application Areas  Application domains listed in the MPEG-7 Applications document: – Education – Journalism (e.g. searching speeches of person using his name, his voice or his face) – Tourist information – Cultural services (museum, art gallery, digital library) – Entertainment (searching a game, karaoke) – Investigation services (human characteristics recognition) – Geographical information systems – Remote sensing – Surveillance (traffic control, surface transportation) – Shopping – Architecture, real estate, and interior design – Social (Dating Service) – Film, Video and Radio archives. …….. – Audiovisual content production 7
  • 8. MPEG-7 v.s. previous MPEG activities  MPEG-1,2, & 4 are designed to represent the information itself, while MPEG-7 is meant to represent information about the information.  MPEG-1,2, & 4 made content available, MPEG-7 allows you to find the content you need.  Also, MPEG-7 can be used independently of the other MPEG standards – the description might even be attached to an analog movie. 8
  • 9. MPEG-7 Parts  ISO/IEC TR 15938-1 (Systems) – The binary format for encoding MPEG-7 descriptions and the terminal architecture.  ISO/IEC TR 15938-2 (Description Definition Language) – The language for defining the syntax of the MPEG-7 Description Tools and for defining new Description Schemes.  ISO/IEC TR 15938-3 (Visual) – The Description Tools dealing with Visual descriptions.  ISO/IEC TR 15938-4 (Audio) – The Description Tools dealing with Audio descriptions.  ISO/IEC TR 15938-5 (Multimedia Description Schemes) – The Description Tools dealing with generic features and multimedia descriptions.  ISO/IEC TR 15938-6 (Reference Software) – A Software implementation of relevant parts of the MPEG-7 Standard with normative status.  ISO/IEC TR 15938-7 (Conformance Testing) – Guidelines and procedures for testing conformance of MPEG-7 implementations  ISO/IEC TR 15938-8 (Extraction and use of descriptions) – Informative material (in the form of a Technical Report) about the extraction and use of some of the Description Tools. 9
  • 10. Next… Description Definition Language (DDL) 10
  • 11. Description Definition Language (DDL)  Foundations of MPEG-7 standard, provides the language for defining the structure and content of multimedia information  A schema language to represent the results of modeling audiovisual data, (i.e. descriptors, and description schemes) as a set of syntactic, structural and value constraints to which valid MPEG-7 descriptors, description schemes, and descriptions must confirm.  Also provide the rules by which user can combine, extend, and refine existing description schemes and descriptors.  XML. Example <PersonName> <Title> Prof. </Title> <Firstname>Thomas </Firstname> <Lastname>Huang</Lastname> <Nickname>Tom</Nickname> </PersonName> 11
  • 13. Multimedia Description Schemes (MDS)  An overview of the organization of MPEG-7 MDS : Organized in 6 Areas, Basic Elements, Content Descriptions, Content Organization, Content management, Navigation and Access, and User Interaction 13
  • 14. MDS: Basic Elements Basic Elements – fundamental constructs of the definition of MPEG-7 description schemes –Schema Tools :  facilitate the creation of valid MPEG-7 descriptions and packing.. –Basic Data types :  Integer & Real – represent constrained integer and real value  Vectors & Matrix – represent arbitrary sized vectors and matrices of integer or real values  Probability Vectors & Matrices – represent probability distribution described using vectors/matrices  String – represents codes identifying content type, countries, regions, currencies, and character sets –Linking, Identification and Localization Tools :  tools for referencing MPEG-7 descriptions, for linking descriptions to multimedia content and for describing time in multimedia content 14
  • 15. MDS: Basic Elements –Example: Three kinds of media time representation: t1 t2 Duration TimeBase RelTimePoint  A) Simple time: Specify a time point and a duration  B) Relative time: Specify a media time point relative to a time base, and a duration  C) Incremental time: Specification of time using a predefined interval called Time Unit and counting the number of intervals (efficient for periodic signals) 15
  • 16. MDS: Basic Elements – Basic Description Tools : A library of description schemes and data types, which are used as primitive components for building more complex and functionality- specific description tools found in the rest of MPEG-7.  Graph and relation tools: weave together complex multimedia description structures <Graph> <Node id = “A”/> <Node id = “A”/> <Node id = “A”/> <Node id – Ex. = “A”/> <Node id = “A”/> <Relation type = “#r1” source “#A” target = “#B”/> r3 r3 D <Relation type = “#r2” source “#A” target = “#C”/> C B r1 ………….. A E r4 r1 r2 </Graph>  Textual annotations: represent textual descriptions – Free text annotation : Spain scores a goal against Sweden. – Keyword annotation : score, Sweden, Spain  Classification schemes and terms: define and reference vocabularies for multimedia content descriptors. – Ex. Part of a ClassificationScheme for sports: sports soccer basketball baseball tennis 16
  • 17. MDS: Basic Elements  People and locations: represent people and places related to multimedia content – Agent: persons, organizations, groups of persons,…  Ex. <PersonGroup> <Name>Spanish National Soccer Team </Name> <Kind><Name>Soccer Team </Name></Kind> <Member> <Name> Fernando </Name> </Member> <Member> …. </PersonGroup> – Places: existing, historical, and fictional places.  Affective description: describe emotional response to multimedia content – Ex. Recording an audience’s excitement while watching an action movie  Ordering tools: – Provides a hint for ordering descriptions for presentation based on information contained in those descriptions – Ex. Order a set of video segments in a soccer game by the amount of 17 camera zoom within each segment.
  • 18. Content management and content description 18
  • 19. MDS: Content Management  Content management : the description of the life cycle of the content, from content to consumption – Creation and Production Description,  Including title, textual annotation, creators, creation locations, dates, how the data is classified, review and guidance information, and related multimedia material. – Usage Description  Describes information related to the usage rights, usage record, and financial information.  Rights information is not explicitly included in the description but links are provided to the rights holders or right management.  Usage record description provides information related to the use of the content, such as broadcasting, or demand delivery.  Financial information provides information related to the cost of production and the income resulting from content use.  Usage description is dynamic and subject to change during the lifetime of the multimedia content. – Media Description  Describes the storage media in particular the compression, coding, and storage format of multimedia content. It describes the master media that is the original source from which different instances of the multimedia content are produced. 19
  • 20. Content management and content description 20
  • 21. MDS: Structural Content Description  Content Description: structural and conceptual aspects – Structure Description: describes the structure of multimedia built around the notation of Segment Description Scheme that represents the spatial, temporal, or spatiotemporal portion of the multimedia content  Segment DSs (the core element) – Example: Mosaic DS – panoramic view of video segment constructed by aligning together and warping the frames of a Video Segment upon each other 21
  • 22. MDS: Structural Content Description  Specific features for structural data description Feature Video Still Moving Audio Segment Region Region Segment Time X X X Shape X X Color X X X Texture X Motion X X Camera X motion Audio X X X features 22
  • 23. MDS: Structural Content Description  Examples of Image description with Still Regions 23
  • 24. MDS: Conceptual Content Description  Conceptual aspects: describes the multimedia content from the viewpoint of real-world semantics and conceptual notations. – Involve entities such as objects, events, abstract concepts and relationships. – Segment description schemes and semantic description schemes are related by a set of links that allows the multimedia content to be described on the basis of both content structure and semantics together. 24
  • 25. MDS: Conceptual Content Description  Example of video segments and Regions Corresponding SegmentRelationship Graph 25
  • 27. MDS: Navigation and Access  Facilitating browsing and retrieval by defining summaries, views, and variations of the multimedia content.  Summaries: provide compact highlights of the multimedia content to enable discovering, browsing, navigation, and visualization of multimedia content. – Hierarchical navigation mode – Sequential navigation mode 27
  • 28. MDS: Navigation and Access  View: based on partitions and decompositions, which describes different decompositions of the multimedia signals in space, time, and frequency. The partitions and decompositions can be used as different views of the multimedia content  important for multi-resolution access and progressive retrieval.  Variations: provides different variations of multimedia programs, such as summaries and abstract, scaled, compressed and low-resolution versions and versions with different languages and modalities – audio, video, image, text, and so forth  allow the selection of the most suitable variation of a multimedia program 28
  • 30. MDS: Content Organization  Content Organization – tools describe collections and models – Collection: unordered sets of multimedia content, segments, descriptor instances, concepts or mixed sets of the above (Example of collections of AV content including the relationships (i.e. RAB,RBC,RAC) within and across Collection Clusters) Collection structure Content collection Segment collection Descriptor collection Collection (abstract) Concept collection Mixed collection 30
  • 31. MDS: Content Organization – Model tools: Parameterized representation of an instance or class multimedia content, descriptors or collections, as follows:  Probability model : Associates statistics or probabilities with the attributes of multimedia content, descriptors or collections  Analytic model: Associates labels or semantics with multimedia content or collections  Cluster model: Associates labels or semantics and statistics or probabilities with multimedia content collections  Classification model: Describes information about known collections of multimedia content in terms of labels, semantics, and models that can be used to classify unknown multimedia content Model (abstract) Classification Model Probability Model Analytic Model Cluster Model  Cluster Model  Probability Model  Collection Model  ClusterClassification Model  Discrete distribution  Probability Model class  ProbabilityClassification  Continuous Model distribution  Finite State Model 31
  • 32. MDS: Content Organization – Clusters of positive and negative examples of images are described using Cluster Model tool. – Soccer video sequence modeled using State Transition Model tool. 32
  • 34. MDS: User Interaction  User interaction describes user preferences and usage history  Allow matching between user preferences and MPEG-7 content description  facilitate personalization of multimedia content access, presentation, and consumption. 34
  • 35. Introduction to MPEG-7 Guest lecture for ECE417 TSH Charlie Dagli [dagli@illinois.edu] April 7, 2009
  • 36. Introduction : What is MPEG-7? “Multimedia Content Description Interface” –Intuition:  NOT focus so much on processing tools  Concentrate more on the selection of features that have to be described  Find a way to structure and instantiate the selected features with a common language –Provide a way to get information about the audiovisual (AV) data without the need of performing the actual decoding of these data. –Goal: allow interoperable searching, indexing, filtering and access of multimedia content by enabling interoperability among devices that deal with multimedia content description. 36
  • 37. MPEG-7 Main Elements  Descriptor (D) – provides standardized “audio only” and “visual only” descriptors. <ex> a time code for duration, color histograms for color.  Multimedia Description Scheme (MDS) – provides standardized description schemes involving both audio and visual descriptors. <ex> a movie, temporally structured as scenes and shots, including textual descriptors at the scene level and color, motion, audio amplitude descriptors at the shot level.  Description Definition Language (DDL) – provides a standardized language to express description schemes, – based on XML (eXtensible Markup Language) – a language that allows the creation of new description schemes, and possibly, descriptors. Also allows the extension and modification of existing description schemes.  Coding Schemes – compressing MPEG-7 textual XML descriptions into Binary format (BiM) to satisfy application requirements for compression efficiency, error resilience, ...  SYSTEM: 37
  • 38. Visual Descriptors  Cover 6 basic visual features as –Color –Texture –Shape –Motion –Localization –Face Recognition 38
  • 39. Color descriptors  Color Descriptors – Color Space : defines the color components as continuous-value entities  R, G, B  Y, Cr, Cb – Y = 0.299R + 0.587G + 0.114B – Cb = – 0.169R – 0.331G + 0.500B Min (whiteness) – Cr = 0.500R – 0.419G – 0.081B  H, S, V (Hue, Saturation, Value) – A nonlinear transform of the RGB – Quantized into 16,32,64,128,256 bins for scalable color descriptor and frames histogram descriptor  HMMD (Hue, Max, Min, Diff, Sum) – Max = max (R, G, B) – Min = min (R, G, B) – Diff = Max – Min Max (blackness) – Sum = (Max + Min ) / 2  Linear transformation matrix with reference to R, G, B – Any 3 x 3 color transform matrix that specifies the linear transformation between RGB and the respective color space.  Monochrome: Y component alone in YCrCb is used 39
  • 40. Color Descriptors –Color Quantization Descriptor : specifies the partitioning of the given color space into discrete bins. –Dominant Color Descriptor (DCD): allows specification of a small number of dominant color values as well as their statistical properties, such as distribution and variance  provides an effective an compact representation of colors present in a region or an image.  DCD is defined to be F = {(ci, pi, vi), s}, (i = 1, 2, .. N), N is the number of dominant colors ci  dominant color value, a vector of corresponding color space component values pi  the fraction of pixels in the image corresponding to ci vi  the variation of the color values of the pixels in a cluster around the corresponding representative color s  the spatial coherency, represents the overall spatial homogeneity (Examples of low and high spatial coherency of color) 40
  • 41. Color Descriptors –Scalable Color Descriptor : a Haar transform-based encoding scheme applied across values of a color histogram in the HSV color space – Useful for image-to-image matching and retrieval based on color feature. Its binary representation is scalable in terms of bin numbers and bit representation accuracy over a broad range of data rate. –Group-of-Frame or Group-of-Picture Descriptor :  For joint representation of color-based features for multiple images or multiple frames in a video segment  Traditionally for a group of frames or pictures  a key frame or image is selected and the color-related features of the entire collection are represented by the chosen sample  unreliable  By GoF and GoP  histogram based descriptors that reliably capture the color content of multiple images or video frames. 41
  • 42. Color Descriptors – Color Layout Descriptor (CLD) : represents the spatial distribution of representative colors on a grid superimposed on a region or image. Representation is based on coefficients of Discrete Cosine Transform. This is a very compact descriptor being highly efficient in fast browsing and search applications. – Color Structure Descriptor (CSD): based on color histogram, but aims at identifying localized color distributions using a small structuring window. To guarantee, interoperability, the CSD is bound to the HMMD color space. – CSD: the degree to which its pixels are clumped together relative to the scale of an associated structuring element. Examples of structured and unstructured color. 42
  • 43. Texture Descriptors  Homogeneous Texture Descriptor (HTD): – provides a quantitative representation using 62 numbers, consisting of the mean energy and energy deviation from a set of frequency channel – Useful for similarity retrieval – Effective in characterizing homogeneous texture regions  Texture Browsing Descriptor (TBD): – Defined for coarse level texture browsing – Provides a perceptual characterization of texture, similar to human characterization, in terms of regularity, coarseness and directionality of the texture pattern.  Edge Histogram Descriptor (EHD): – Capture spatial distribution of edges in an image – Useful in matching regions with partially varying, non-uniform texture. 43
  • 44. Homogeneous Texture Descriptor • Texture Descriptor – Homogeneous Texture Descriptor (HTD): characterize the region texture using the mean energy and the energy deviation from a set of frequency channel. The 2D frequency plane is partitioned into 30 channels as the following: (Frequency layout for feature extraction) ω The Syntax of the HTD is as follows: HTD = [fDC, fSD, e1, e2, ..,e30, d1, d2, .. ,d30] Where fDC and fSD are the mean and standard deviation of input images, and ei and di are the nonlinearly scaled and quantized mean energy and energy 44 deviation of the i-th channel.
  • 45. Texture Browsing Descriptor – Texture Browsing : Perceptual characterization of a texture, similar to a human characterization, in terms of regularity, coarseness and directionality   – TBD = [v1,v2,v3,v4,v5]  v1 ∈ {1, 2, 3, 4} or {00,01,10,11}: represents the regularity  v2,v3 ∈ {1, 2, 3, 4, 5, 6} : capture the directionality of the texture  v4, v5 ∈ {1, 2, 3, 4}: capture the coarseness of the texture Regularity Semantics 00 irregular 01 slightly regular 10 regular 11 highly regular Semantics of Regularity.                   11 01 00 10 Regularity Examples of Regularity 45
  • 46. Edge Histogram Descriptor – Edge Histogram: represents local edge distribution in the image  Five types of edges: 5 histogram bins per each sub-image BinCounts[k] Semantics BinCounts[0] Vertical edges in sub-image (0,0) BinCounts[1] Horizontal edges in sub-image (0,0) BinCounts[2] 45 degree edges in sub-image (0,0) BinCounts[3] 135 degree edges in sub-image (0,0) BinCounts[4] Non-directional edges in sub-image (0,0) BinCounts[5] Vertical edges in sub-image (0,1)   BinCounts[74] Non-directional edges in sub-image (3,2) BinCounts[75] Vertical edges in sub-image (3,3) BinCounts[76] Horizontal edges in sub-image (3,3) BinCounts[77] 45 degree edges in sub-image (3,3) BinCounts[78] 135 degree edges in sub-image (3,3) BinCounts[79] Non-directional edges in sub-image (3,3) 46
  • 47. Shape Descriptors  Shape Descriptors – Region-based Shape Descriptor  Expresses pixel distribution within a 2-D object or region.  Based on both boundary and internal pixels and can describe complex objects consisting of multiple disconnected regions as well as simple objects with or without holes. – Contour-based Shape Descriptor  Based on CSS representation of the contour – 3-D Spectrum Descriptor  Expresses characteristic features of objects represented as discrete polygonal 3-D meshes.  Based on the histogram of local geometrical properties of the 3-Dsurfaces of the object. 47
  • 48. Shape Descriptors – Region-based shape descriptor utilizes a set of ART(Angular Radial Transform) coefficients. Twelve angular and three radial functions are used (n < 3, m < 12). Fnm is an ART coefficient of order n and m. V is ART basis function and f is an image function V (ART basis function) is separable along the angular and radial directions (Real part of the ART basis functions)  ART coefficients are divided by the magnitude of ART coefficient of order n= 0, m = 0, which is not used as a descriptor element.  Quantization is applied to each coefficient using 4 bit per coefficient to minimize the size of the descriptor 48
  • 49. Shape Descriptors – Contour-based Shape Descriptor : describes a closed contour of a 2D object or region in image or video sequence. Based on the Curvature Scale Space (CSS) representation of the contour (A 2D visual object (region) and its corresponding shape) Field No. of bits Meaning No. of peaks 6 No. of peaks in CSS image Circularity and eccentricity 2×6 GlobalCurvature of the contour Circularity and eccentricity 2×6 PrototypeCurvature of the smoothed contour Absolute height of the highest HighestPeakY 7 peak (quantized) X-position on the contour of a PeakX[] 6 peak (quantized) Height of the peak PeakY[] 3 (quantized) (CSS Image Formation) 49 Smoothing evolution of zero-crossing
  • 50. Shape Descriptors  Contour-based Shape Descriptor has the following properties • It can distinguish between shapes that have similar region-shape properties but different contour-shape properties. – · It supports search for shapes that are semantically similar for humans – · It is robust to significant non-rigid deformations – · It is robust to distortions in the contour due to perspective transformations, which are common in the images and video – · It is robust to noise present on the contour. – · It is very compact (14 Bytes per contour on average). – · The descriptor is easy to implement and offers fast extraction and matching. 50
  • 51. Shape Descriptors (3-Dimensional Class) – 3-D Shape spectrum descriptor : This descriptor specifies an intrinsic shape description for 3D mesh models. It exploits some local attributes of the 3D surface.  The shape index, introduced by Koenderink, is defined as a function of the two principal curvatures, and associated with point p on the 3D surface S. with  By definition, the shape index value is in the interval [0,1]  The shape spectrum of the 3D mesh (3D-SSD) is the histogram of the shape indices (Ip‘s) calculated over the entire mesh. 51
  • 52. Motion Descriptors  Camera Motion Descriptor  Motion Trajectory Descriptor  Parametric Motion Descriptor  Motion Activity Descriptor Moving region Video segment Camera motion Mosaic Motion trajectory Motion activity Warping Parametric motion parameters 52
  • 53. Motion Descriptors  Motion Descriptors – Camera Motions: pan, track, tilt, boom, zoom, dolly, roll, absence perspective projection and camera motion parameters 53
  • 54. Motion Descriptors – Motion Trajectory : describes the displacements of objects in time. A high level feature associated to a moving region, defined as the spatiotemporal localization of one of its representative points (such as its center) as a list of key points (x, y, z, t) – Parametric Motion : describing the motion of objects in video sequences as a 2D parametric model.  Affine Models (6): translations, rotations, scaling and combination of these.  Planar Perspective Models (8) : Global deformations with perspective projections  Quadratic Models (12) : describes more complex movements – Motion Activity : Intuitive notion of ‘intensity of action’ or ‘pace of action’ in a video segment.  Example of high “activity”: Goal scoring in a soccer match  Can be used in diverse applications such as content repurposing, video summarization, surveillance, content-based querying, etc.  Four attributes: – Intensity of activity: indicate high or low activity by a integer lying in [1—5] – Direction of activity: expresses the dominant direction of the activity if any – Spatial distribution of activity: the number and size of active regions in a frame – Temporal distribution of activity: expresses the variation of activity over the duration 54
  • 55. Localization Descriptors  Localization Descriptors – Region Locator : Localization of regions within images or frames by specifying them with a brief and scalable representation of a Box or a Polygon. Procedure consists of the following 2 steps  Extraction of vertices of the region to be localized  Localization of the region within the image or frame (localization using a polygonal and Box element of the RegionLocator) – Spatio Temporal Locator: describes spatial-temporal regions in a video sequence, such as moving object regions, and provides localization functionality. 55
  • 56. Face Recognition Descriptor FaceRecognition Descriptor : Used to retrieve face images which match a query face image. –Face Recognition : The projection of a face vector onto a set of 48 basis eigenvectors U (‘eigenfaces’) which span the space of possible face vectors. –Feature Extraction : The FaceRecognition feature set is extracted from a normalized face image. This normalized face image contains 56 lines with 46 intensity values in each line. The centre of the two eyes in each face image are located on the 24th row and the 16th and 31st column for the right and left eye respectively. Features are given by the vector W and is the mean face vector. The features are normalized and clipped using Z=2048 as follows. 56
  • 57. Face descriptor – Automatic Face Image Localization (Block Diagram of the Automatic face Image Localization algorithm)  Color Segmentation (A color segmentation example: a) the skin color region in the Cb-Cr plane b) original image c) results of the color segmentation algorithm) 57
  • 58. Audio descriptors  Overview of Audio Framework including Descriptors 58
  • 59. Audio Descriptors  Basic Descriptors: temporally sampled scalar values for general use, applicable to all kinds of signals – AudioWaveform Descriptor : Audio waveform envelope (minimum and maximum), typically for display purposes – AudioPower Descriptor : the temporally smoothed instantaneous power, which is useful as a quick summary of a signal, and in conjunction with the power spectrum.  Basic Spectral Descriptors: all deriving from a single time-frequency analysis of an audio signal – AudioSpectrumEnvelope Descriptor : a logarithmic-frequency spectrum, spaced by a power-of-two divider (multiple of an octave) – AudioSpectrumCentroid Descriptor : the center of gravity of the log- frequency power spectrum, which describes the shape of the power spectrum 59
  • 60. Audio Descriptors – AudioSpectrumSpread Descriptor : complementary of the previous descriptor by describing the second moment of log-frequency power spectrum. This may help distinguish between pure-tone and noise-like sounds – AudioSpectrumFlatness Descriptor : the flatness properties of the spectrum of an audio signal for each of a number of frequency bands. When this indicates a high deviation from a flat spectral shape for a given band, it may signal the presence of tonal components (Example of AudioSpectrumEnvelope description of a pop song) Visualized using a spectrogram. Required data storage is NM values where N is the no. of spectrum bins and M is the no. of time points 60
  • 61. Audio Descriptors  Spectral Basis Descriptor: low-dimensional projections of a high- dimensional spectral space to aid compactness and recognition, which are used primarily with the Sound Classification and Indexing Description Tools – AudioSpectrumBasis : a series of basis functions that are derived from the singular value decomposition of a normalized power spectrum – AudioSpectrumProjection : Used with above descriptor, and represents low- dimensional features of a spectrum after projection upon a reduced rank basis. (Example: A 10-basis component reconstruction showing most of the detail of the original spectrogram including guitar, bass guitar, etc.) The left vectors are an AudioSpectrumBasis Descriptor and the top vectors are the corresponding AudioSpectrumProjection Descriptor. The required data storage is 10(M+N) values 61
  • 62. Audio Descriptors  Signal Parameters : apply chiefly to periodic or quasi-periodic signals – AudioFundamentalFrequency Descriptor : fundamental frequency of an audio signal, which represents for a confidence measure in recognition of the fact that the various extraction methods, commonly called “pitch- tracking”, are not perfectly accurate. – AudioHarmonicity Descriptor : the harmonicity of a signal, allowing distinction between sounds with a harmonic spectrum (e.g., musical tones or voiced speech [vowels like ‘a’]), sounds with an inharmonic spectrum (e.g., metallic or bell-like sounds) and sounds with a non-harmonic spectrum (e.g., noise, unvoiced speech [fricatives like ‘f’], or dense mixtures of instruments). 62
  • 63. Audio Descriptors  Timbral Temporal Descriptor : temporal characteristics of segments of sounds, useful for the description of musical timbre( characteristic tone quality independent of pitch and loudness). – LogAttackTime Descriptor : the ‘attack’ of a sound, the time it takes for the signal to rise from silence to the maximum amplitude. It tells the difference between a sudden and a smooth sound – TemporalCentroid Descriptor : the signal envelope, representing where in time the energy of a signal is focused. It is used for the distinction between a decaying piano note and a sustained organ note, when the lengths and the attacks of the two notes are identical.  Timbral Spectral Descriptor : spectral features in a linear-frequency space especially applicable to the perception of musical timbre. – SpectralCentroid Descriptor : the power-weighted average of the frequency of the bins in the linear power spectrum. Very similar to the AudioSpectrumCentroid, but specialized for use in distinguishing musical instrument timbres. It tells the “sharpness” of a sound. 63
  • 64. Audio Descriptors – HarmonicSpectralCentroid Descriptor : the amplitude-weighted mean of the harmonic peaks of the spectrum. It has a similar semantic to the other centroid descriptors, but applies only to the harmonic parts of the musical tone. – HarmonicSpectralDeviation Descriptor : the spectral deviation of log-amplitude components from a global spectral envelope. – HarmonicSpectralSpread Descriptor : the amplitude-weighted standard deviation of the harmonic peaks of the spectrum, normalized by the instantaneous HarmonicSpectralCentroid. – HarmonicSpectralVariation Descriptor : the normalized correlation between the amplitude of the harmonic peaks between two subsequent time-slices of the signal.  Silence Segment : attaches the simple semantic of “silence” (i.e. no significant signal) to an Audio Segment. It may be used to aid further segmentation of the audio stream, or as a hint not to process a segment. 64
  • 65. Audio Descriptors  High-level Audio Description Tools (Ds and DSs) – Audio Signature DS : A condensed representation of an audio signal designed to provide a unique content for the purpose of robust automatic identification of audio signals. Applications include audio fingerprinting, identification of audio based on a database of known works – Musical Instrument Timbre Description Tools  HarmonicInstrumentTimbre Descriptor : Four harmonic timbral spectral Descriptors with the LogAttackTime Descriptor  PercussiveInstrumentTimbre Descriptor : The timbral temporal Descriptors with a SpectralCentroid Descriptor – Melody Description Tools  Include a rich representation for monophonic melodic information to facilitate efficient, robust, and expressive melodic similarity matching.  MelodyContour DS: terse, efficient melody contour representation  MelodySequence DS: a more verbose, complete, expressive melody representation 65
  • 66. Audio Descriptors – General Sound Recognition and Indexing Description Tools  A collection of tools for indexing and categorization of sound (effects) in general  SoundModelStatePath Descriptor: states generated by a sound model  SoundModelStateHistogram Descriptor: normalized histogram of the state sequence generated by a sound model – Spoken Content Description Tools  Consists of combined word and phone lattices for each speaker in an audio stream. Use phone lattices to alleviate out-of-vocabulary problem (OOV)  SpokenContentLattice Description Scheme : the actual decoding produced by an ASR(Automatic Speech Recognition) engine.  SpokenContentHeader : information about the speakers being recognized and the recognizer itself. 66
  • 67. References  Book – Introduction to MPEG-7: Multimedia Content Description Interface B. S. Manjunath (Editor), Philippe Salembier (Editor), Thomas Sikora (Editor) ISBN: 0-471-48678-7 http://www.wiley.com/WileyCDA/WileyTitle/ productCd-0471486787.html  MPEG-7 http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm  MPEG-7 DDL Homepage http://archive.dstc.edu.au/mpeg7-ddl/ 67