multimedia

MULTIMEDIA APPLICATION A Multimedia Application is an Application which uses a collection of multiple media sources e.g. text, graphics, images, sound/audio, animation and/or video. (Hypermedia can be considered as one of the multimedia applications.) APPLICATIONS of Multimedia (general) World Wide Web Hypermedia courseware Video conferencing Video-on-demand Interactive TV Groupware Home shopping Games Virtual reality Digital video editing and production systems ,[object Object],INTERPERSONAL APPLICATIONS 1. Conferencing call is a telephone call in which the calling party wishes to have more than one called party listens in to the audio portion of the call. The conference calls may be designed to allow the called party to participate during the call, or the call may be set up so that the called party merely listens into the call and cannot speak. Participants are usually able to call into the conference call themselves, by dialing into a special telephone number that connects to a quot;
conference bridgequot;
(a specialized type of equipment that links telephone lines). 2. Audio bridge- Connects the telephones at remote sites, equalizes the noise distortion and background noise for a live audio teleconference (audio conference). INTERACTIVE APPLICATIONS A multimedia application, in which an user actively participates, instead of just sitting as a passive recipient of information is called, Interactive Multimedia. 1. VOIP or Internet telephony refers to communications services — voice, facsimile, and/or voice-messaging applications — that are transported via the Internet, rather than the public switched telephone network (PSTN). The basic steps involved in originating an Internet telephone call are conversion of the analog voice signal to digital format and compression/translation of the signal into Internet protocol (IP) packets for transmission over the Internet; the process is reversed at the receiving end. 2. IPTV Internet Protocol television (IPTV) is a system through which digital television service is delivered using the architecture and networking methods of the Internet Protocol Suite over a packet-switched network infrastructure, e.g., the Internet and broadband Internet access networks, instead of being delivered through traditional radio frequency broadcast, satellite signal, and cable television (CATV) formats. 3. Video conferencing is a set of interactive telecommunication technologies which allow two or more locations to interact via two-way video and audio transmissions simultaneously. It has also been called 'visual collaboration' and is a type of groupware. 4. Facsimile is the telephonic transmission of scanned-in printed material (text or images), usually to a telephone number associated with a printer or other output device. The original document is scanned with a fax machine, which treats the contents (text or images) as a single fixed graphic image, converting it into a bitmap. 5. Voice Mail (vmail or VMS) is a centralized system of managing telephone messages for a large group of people. It has features such as answer many phones at the same time store incoming voice messages in personalized mailboxes forward messages, send messages to other users ,[object Object],Answer text messages with voice. Use the visual inbox on your cell phone for managing received messages. Send messages to email and get a text reply. Reply to or forward received messages. Send personal memos to yourself. 7. Interaction with websites ENTERTAINMENT APPLICATIONS Multimedia is heavily used in entertainment industry, especially to develop special effects in movies and animation for cartoon characters. 1. Movie / video on demand Video on Demand (VOD) or Audio Video on Demand (AVOD) are systems which allow users to select and watch/listen to video or audio content on demand. Television VOD systems either stream content through a set-top box, allowing viewing in real time, or download it to a device such as a computer for viewing at any time. As in pay-per-view, whereby a user buys or selects a movie or television program and it begins to play on the television set almost instantaneously, or downloading to a DVR rented from the provider, for viewing in the future. Internet television, using the Internet, is a form of video on demand. Found in airlines as in-flight entertainment. AVOD systems offer users the opportunity to select specific stored video or audio content and play it on demand including pause, fast forward, and rewind. 2. Interactive TV - (or iTV) describes a number of techniques that allow viewers to interact with television content as they view it. To be truly interactive, the viewer must be able to alter the viewing experience (e.g. choose which angle to watch a football match), or return information to the broadcaster. This quot;
return pathquot;
or quot;
back channelquot;
can be by telephone, mobile SMS (text messages), radio, and digital subscriber lines (ADSL) or cable. 3. Games Multimedia games, which are software programs with interactive animation, videos and controls. 4. Virtual Reality-The environment is created by using many equipment such as like headset, goggles, joystick, sensors and by coordinating various multimedia components. They provide an environment which is experienced by users as similar to reality. This technique is used in some arcade games and also in flight simulators, to impart training to pilots, without having to go for a real flight. MULTIMEDIA IN THE INTERNET Multimedia in the INTERNET includes the already described applications Audio/Video on demand, Internet TV, VoIP, online and multiplayer games.Some others are described: Websites these days are interactive with Hypermedia, images, audio and video. 1. Webcasting- A webcast is a media file distributed over the Internet using streaming media technology to distribute a single content source to many simultaneous listeners/viewers. A webcast may either be distributed live or on demand. Essentially, webcasting is “broadcasting” over the Internet. 2. Web conferencing is used to conduct live meetings, training, or presentations via the Internet. In a web conference, each participant sits at his or her own computer and is connected to other participants via the internet. This can be either a downloaded application on each of the attendees' computers or a web-based application where the attendees access the meeting by clicking on a link distributed by e-mail (meeting invitation) to enter the conference. A webinar is a web seminar is a specific type of web conference. 3. Net Chat Applications-An interface that allows real time exchange of information between 2 parties. Information these days is in the form of text, links, images and videos. 4. Group SMS-It is bulk SMS sent to a group of people. Several sites offer this option eg: way2sms.com, smsjunction.com etc. 5. Animation- includes moving images created by markup or software such as flash, 3Ds MAX etc. and is included in the website/page. 6. Forums- An online forum where participants post Text, images, Videos and Hypermedia and is viewable by the net population. 7. Internet Fax- Internet faxing (or quot;
online faxingquot;
) is a general term which refers to sending a document facsimile using the Internet, rather than using only phone networks (traditional faxing). 8. Blogs are types of website/pages, usually maintained by an individual with regular entries of commentary, descriptions of events, or other material such as graphics or video. Entries are commonly displayed in reverse-chronological order. A typical blog combines text, images, and links to other blogs, Web pages, and other media related to its topic. Huffman Encoding algorithm Definition: A minimal variable-length character coding based on the frequency of each character. First, each character becomes a trivial binary tree, with the character as the only node. The character's frequency is the tree's frequency. Two trees with the least frequencies are joined as the subtrees of a new root that is assigned the sum of their frequencies. Repeat until all characters are in one tree. One code bit represents each level. Thus more frequent characters are near the root and are coded with few bits, and rare characters are far from the root and are coded with many bits. Basic technique The technique works by creating a binary tree of nodes. These can be stored in a regular array, the size of which depends on the number of symbols, n. A node can be either a leaf node or an internal node. Initially, all nodes are leaf nodes, which contain the symbol itself, the weight (frequency of appearance) of the symbol and optionally, a link to a parent node which makes it easy to read the code (in reverse) starting from a leaf node. Internal nodes contain symbol weight, links to two child nodes and the optional link to a parent node. As a common convention, bit '0' represents following the left child and bit '1' represents following the right child. A finished tree has up to n leaf nodes and n − 1 internal nodes. A Huffman tree that omits unused symbols produces the most optimal code lengths. The process essentially begins with the leaf nodes containing the probabilities of the symbol they represent, then a new node whose children are the 2 nodes with smallest probability is created, such that the new node's probability is equal to the sum of the children's probability. With the previous 2 nodes merged into one node (thus not considering them anymore), and with the new node being now considered, the procedure is repeated until only one node remains, the Huffman tree. The simplest construction algorithm uses a priority queue where the node with lowest probability is given highest priority: Create a leaf node for each symbol and add it to the priority queue. While there is more than one node in the queue: Remove the two nodes of highest priority (lowest probability) from the queue Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities. Add the new node to the queue. The remaining node is the root node and the tree is complete. Since efficient priority queue data structures require O(log n) time per insertion, and a tree with n leaves has 2n−1 nodes, this algorithm operates in O(n log n) time. If the symbols are sorted by probability, there is a linear-time (O(n)) method to create a Huffman tree using two queues, the first one containing the initial weights (along with pointers to the associated leaves), and combined weights (along with pointers to the trees) being put in the back of the second queue. This assures that the lowest weight is always kept at the front of one of the two queues: Start with as many leaves as there are symbols. Enqueue all leaf nodes into the first queue (by probability in increasing order so that the least likely item is in the head of the queue). While there is more than one node in the queues: Dequeue the two nodes with the lowest weight by examining the fronts of both queues. Create a new internal node, with the two just-removed nodes as children (either node can be either child) and the sum of their weights as the new weight. Enqueue the new node into the rear of the second queue. The remaining node is the root node; the tree has now been generated. It is generally beneficial to minimize the variance of codeword length. For example, a communication buffer receiving Huffman-encoded data may need to be larger to deal with especially long symbols if the tree is especially unbalanced. To minimize variance, simply break ties between queues by choosing the item in the first queue. This modification will retain the mathematical optimality of the Huffman coding while both minimizing variance and minimizing the length of the longest character code. Diatomic encoding This is a variation of run-length coding based on a combination of two data bytes. For a given media type the most common co-occurring pairs of data bytes are identified. These are then replaced in the data stream by single bytes that do not occur anywhere else in the stream. Statistical encoding Statistical encoding techniques comprise techniques for characterizing data according to its statistical probability of occurrence. Data with a higher probability of occurrence is encoded with a shorter code than data having a lesser probability of occurrence. For example, the American National Standard Code for Information Interchange (ASCII) and the Extended Binary Coded Decimal Interchange (EBCDIC) comprise standard formatting schemes in which numbers, letters, punctuation, carriage controlstatements and other data are assigned various hexadecimal positions in a data formatting scheme using 8-bit bytes. These alphanumeric symbols, which are assigned different positions depending upon the standard used, have differing probabilities ofoccurrence. Since a quot;
spacequot;
or an quot;
equot;
has a much higher probability of occurrence than a quot;
yquot;
or a quot;
zquot;
or other nonfrequently occurring hexadecimal numbers, the quot;
spacequot;
or quot;
equot;
is encoded into a code of a lesser number of bits, e.g., 3 or 4 bits, ratherthan the standard 8 bit per byte code for these alphanumeric symbol. On the other hand, alphanumeric symbols such as quot;
yquot;
and quot;
zquot;
that have a much lower probability of occurrence are encoded into a code having more bits than the standard 8 bit byte codeused in ASCII and EBCDIC standards, e.g., quot;
yquot;
and quot;
zquot;
may have 11 bits. The quot;
Huffman Codequot;
generated as a result of the statistical encoding employed, is a code which can be uniquely identified as it is read in a serial fashion. In other words, the encoded data is uniquely arranged so that no ambiguity existsinidentifying a particular encoded word as the bits of the code are read in a serial fashion. Consequently, flagging signals and other extraneous data is notrequired in the encoded database. A problem with the Huffman statistical encoding technique is that the statistical probability of occurrence of particular alphanumeric symbols in any database will be different depending upon the data in the database, the formatting techniqueused (i.e., ASCII, EBCDIC, or other formatting technique), the nature of the database and various other factors. Several techniques have been used to overcome these disadvantages. For example, one technique which has been used is to study theparticular database to be encoded and generate astatistical encoding table for each particular database. The disadvantage of this technique is that the database must be read and studied prior to statistical encoding and cannot, therefore, be encoded asthe data is received for the first time. Another technique which has been used is to study large quantities of data to produce a statistical encoding table which is generally applicable to most databases. Although compression of data can be achieved to some extent, in many cases thedata is expanded because the particular database does not match the statistical probability set forth in the generic table used to encode the data. Additionally, maximum compression and maximum entropy of the data encoded is not achieved with this sortof generic database. 1. LINEAR PREDICTIVE CODING Linear predictive coding (LPC) is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. It is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate and provides extremely accurate estimates of speech parameters. Overview LPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube (voiced sounds), with occasional added hissing and popping sounds (sibilants and plosive sounds). Although apparently crude, this model is actually a close approximation to the reality of speech production. The glottis (the space between the vocal folds) produces the buzz, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract (the throat and mouth) forms the tube, which is characterized by its resonances, which give rise to formants, or enhanced frequency bands in the sound produced. Hisses and pops are generated by the action of the tongue, lips and throat during sibilants and plosives. LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue. The numbers which describe the intensity and frequency of the buzz, the formants, and the residue signal, can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and runs the source through the filter, resulting in speech. Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames; generally 30 to 50 frames per second give intelligible speech with good compression. Early history of LPC According to Robert M. Gray of Stanford University, the first ideas leading to LPC started in 1966 when S. Saito and F. Itakura of NTT described an approach to automatic phoneme discrimination that involved the first maximum likelihood approach to speech coding. In 1967, John Burg outlined the maximum entropy approach. In 1969 Itakura and Saito introduced partial correlation, May Glen Culler proposed realtime speech encoding, and B. S. Atal presented an LPC speech coder at the Annual Meeting of the Acoustical Society of America. In 1971 realtime LPC using 16-bit LPC hardware was demonstrated by Philco-Ford; four units were sold. In 1972 Bob Kahn of ARPA, with Jim Forgie (Lincoln Laboratory, LL) and Dave Walden (BBN Technologies), started the first developments in packetized speech, which would eventually lead to Voice over IP technology. In 1973, according to Lincoln Laboratory informal history, the first realtime 2400 bit/s LPC was implemented by Ed Hofstetter. In 1974 the first realtime two-way LPC packet speech communication was accomplished over the ARPANET at 3500 bit/s between Culler-Harrison and Lincoln Laboratories. In 1976 the first LPC conference took place over the ARPANET using the Network Voice Protocol, between Culler-Harrison, ISI, SRI, and LL at 3500 bit/s. And finally in 1978, Vishwanath et al. of BBN developed the first variable-rate LPC algorithm. Basic Principles LPC starts with the assumption that the speech signal is produced by a buzzer at the end of a tube. The glottis (the space between the vocal cords) produces the buzz, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract (the throat and mouth) forms the tube, which is characterized by its resonances, which are called formants. LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal is called the residue. The numbers which describe the formants and the residue can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech. Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames. Usually 30 to 50 frames per second give intelligible speech with good compression. LPC coefficient representations LPC is frequently used for transmitting spectral envelope information, and as such it has to be tolerant of transmission errors. Transmission of the filter coefficients directly (see linear prediction for definition of coefficients) is undesirable, since they are very sensitive to errors. In other words, a very small error can distort the whole spectrum, or worse, a small error might make the prediction filter unstable. There are more advanced representations such as Log Area Ratios (LAR), line spectral pairs (LSP) decomposition and reflection coefficients. Of these, especially LSP decomposition has gained popularity, since it ensures stability of the predictor, and spectral errors are local for small coefficient deviations. Applications LPC is generally used for speech analysis and resynthesis. It is used as a form of voice compression by phone companies, for example in the GSM standard. It is also used for secure wireless, where voice must be digitized, encrypted and sent over a narrow voice channel, an early example of this is the US government's Navajo I. LPC synthesis can be used to construct vocoders where musical instruments are used as excitation signal to the time-varying filter estimated from a singer's speech. This is somewhat popular in electronic music. Paul Lansky made the well-known computer music piece notjustmoreidlechatter using linear predictive coding. A 10th-order LPC was used in the popular 1980's Speak & Spell educational toy.Waveform ROM in digital sample-based music synthesizers made by Yamaha Corporation is compressed using LPC algorithm.0-to-32nd order LPC predictors are used in FLAC audio codec. The prediction model The most common representation is where is the predicted signal value, x(n − i) the previous observed values, and ai the predictor coefficients. The error generated by this estimate is where x(n) is the true signal value. These equations are valid for all types of (one-dimensional) linear prediction. The differences are found in the way the parameters ai are chosen. For multi-dimensional signals the error metric is often defined as where is a suitable chosen vector norm. Estimating the parameters The most common choice in optimization of parameters ai is the root mean square criterion which is also called the autocorrelation criterion. In this method we minimize the expected value of the squared error E[e2(n)], which yields the equation for 1 ≤ j ≤ p, where R is the autocorrelation of signal xn, defined as , and E is the expected value. In the multi-dimensional case this corresponds to minimizing the L2 norm. The above equations are called the normal equations or Yule-Walker equations. In matrix form the equations can be equivalently written as where the autocorrelation matrix R is a symmetric, Toeplitz matrix with elements ri,j = R(i − j), vector r is the autocorrelation vector rj = R(j), and vector a is the parameter vector. Another, more general, approach is to minimize where we usually constrain the parameters ai with a0 = − 1 to avoid the trivial solution. This constraint yields the same predictor as above but the normal equations are then where the index i ranges from 0 to p, and R is a (p + 1) × (p + 1) matrix. Optimization Optimization of the parameters is a wide topic and a large number of other approaches have been proposed. Still, the autocorrelation method is the most common and it is used, for example, for speech coding in the GSM standard. Solution of the matrix equation Ra = r is computationally a relatively expensive process. The Gauss algorithm for matrix inversion is probably the oldest solution but this approach does not efficiently use the symmetry of R and r. A faster algorithm is the Levinson recursion proposed by Norman Levinson in 1947, which recursively calculates the solution. Later, Delsarte et al. proposed an improvement to this algorithm called the split Levinson recursion which requires about half the number of multiplications and divisions. It uses a special symmetrical property of parameter vectors on subsequent recursion levels. Prediction Example:: Cameraman s[x,y] uH[x,y]= s[x,y]-0.95s[x-1,y] uV[x,y]=s[x,y]-0.95 uD[x,y]=s[x,y]- s[x,y-1] 0.5(s[x,y-1]+ s[x-1,y]) DELTA MODULATION Delta modulation (DM or Δ-modulation) is an analog-to-digital and digital-to-analog signal conversion technique used for transmission of voice information where quality is not of primary importance. DM is the simplest form of differential pulse-code modulation (DPCM) where the difference between successive samples is encoded into n-bit data streams. In delta modulation, the transmitted data is reduced to a 1-bit data stream. Some of the features are ,[object Object]

each segment of the approximated signal is compared to the original analog wave to determine the increase or decrease in relative amplitude

the decision process for establishing the state of successive bits is determined by this comparison

only the change of information is sent, that is, only an increase or decrease of the signal amplitude from the previous sample is sent whereas a no-change condition causes the modulated signal to remain at the same 0 or 1 state of the previous sample.To achieve high signal-to-noise ratio, delta modulation must use oversampling techniques, that is, the analog signal is sampled at a rate several times higher than the Nyquist rate. Derived forms of delta modulation are continuously variable slope delta modulation, delta-sigma modulation, and differential modulation. The Differential Pulse Code Modulation is the super set of DM Principle Rather than quantizing the absolute value of the input analog waveform, delta modulation quantizes the difference between the current and the previous step, as shown in the block diagram in Fig. 1. Fig. 1 - Block diagram of a Δ-modulator/demodulator The modulator is made by a quantizer which converts the difference between the input signal and the average of the previous steps. In its simplest form, the quantizer can be realized with a comparator referenced to 0 (two levels quantizer), whose output is 1 or 0 if the input signal is positive or negative. The demodulator is simply an integrator (like the one in the feedback loop) whose output rises or falls with each 1 or 0 received. The integrator itself constitutes a low-pass filter. Applications of Delta Modulation 1) Telecommunications: Digitized signals are easily routed and multiplexed with low cost digital gates. Voice channels may be easily added to existing multiplexed digital data transmission systems. The digital signals are much more immune to crosstalk and noise when transmitted over long distances by wire, R.F., or optical paths. CVSD has better intelligibility than PCM when random bit errors are introduced during transmission. 2) Secure Communications: Digital data can be quite securely encrypted using fairly simple standard hardware (Figure 4A). Scrambled speech for audio channels may also be accomplished by encoding into a shift register, then selecting different segments of the shifted data in pseudo-random fashion and decoding it (Figure 4B). FIGURE 4A. DIGITAL TRANSMISSION ENCRYPTION FIGURE 4B. VOICE TRANSMISSION SCRAMBLING 3) Audio Delay Lines: Although charge-coupled deviced (CCD) will perform this function, they are still expensive and choice of configurations is quite limited. Also, there is a practical limit to the number of CCD stages, since each introduces a slight degradation to the signal. As shown in Figure 5, the delay line consists of a CVSD modulator, a shift register and a demodulator. Delay is proportional to the number of register stages divided by the clock frequency. This can be used in speech scrambling, as explained above, echo suppression in PA systems; special echo effects; music enhancement or synthesis; and recursive or nonrecursive filtering. FIGURE 5. AUDIO DELAY LINE 4) Voice I/O: Digitized speech can be entered into a computer for storage, voice identification, or word recognition. Words stored in ROM’s, disc memory, etc. can be used for voice output. CVSD, since it can operate at low data rates, is more efficient in storage requirements than PCM or other A to D conversions. Also, the data is in a useful form for filtering or other processing. continuously variable slope delta modulation Continuously variable slope delta modulation (CVSD or CVSDM) is a voice coding method. It is a delta modulation with variable step size (i.e. special case of adaptive delta modulation), first proposed by Greefkes and Riemens in 1970. CVSD encodes at 1 bit per sample, so that audio sampled at 16kHz is encoded at 16kbit/s. Additional digital logic, a second integrator, and an analog multiplier are added to the simple modulator. Under small input signal conditions, the second integrator (known as the syllabic filter) has no input, and circuit functionis identical to the simple modulator, except that the multiplier is biased to output quite small ramp amplitudes giving good resolution to the small signals. CVSD A larger signal input is characterized by consecutive strings of 1’s or 0’s in the data as the integrator attempts to track the input. The logic input to the syllabic filter actuates whenever 3 or more consecutive 0’s or 1’s are present in the data. When this happens, the syllabic filter output starts to build up increasing the multiplier gain, passing larger amplitude ramps to the comparator, enabling the system to track the larger signal. Up to a limit, the more consecutive 1’s or 0’s generated, the larger the ramp amplitude. Since the larger signals increase the negative feedback of the modulator and the forward gain of the demodulator, companding takes place. By listening tests, the syllabic filter time constant of 4 to 10ms is generally considered optimum. An outstanding characteristic of CVSD is its ability, with fairly simple circuitry, to transmit intelligible voice at relatively low data rates. Companded PCM, for telephone quality transmission, requires about 64K bits/sec data rate per channel. CVSD produces equal quality at 32K bits/sec. (However, at this rate it does not handle tone signals or phase encoded modern transmissions as well.) CVSD is useful at even lower data rates. At 16K bits/sec the reconstructed voice is remarkably natural, but has a slightly “Fuzzy Edge”. At 9.6K bits/sec intelligibility is still excellent, although the sound is reminiscent of a damaged loudspeaker. Of course, very sophisticated speech compression techniques have been used to transmit speech at even lower data rates; but CVSD is an excellent compromise between circuit simplicity and bandwidth economy Applications 12 kbit/s CVSD is used by Motorola's SECURENET line of digitally encrypted two-way radio products. 16 kbit/s CVSD is used by military digital telephones [DNVT, DSVT] for use in deployed areas to provide voice recognition quality audio. 64 kbit/s CVSD is one of the options to encode voice signals in telephony-related Bluetooth service profiles, e.g. between mobile phones and wireless headsets. The other options are PCM with logarithmic a-law or μ-law quantization. Delta-sigma modulation Delta-sigma (ΔΣ; or sigma-delta, ΣΔ) modulation is a method for encoding high resolution signals into lower resolution signals using pulse-density modulation. This technique has found increasing use in modern electronic components such as analog-to-digital and digital-to-analog converters, frequency synthesizers, switched-mode power supplies and motor controls. One of the earliest and most widespread uses of delta-sigma modulation is in data conversion. An ADC or DAC circuit which implements this technique can relatively easily achieve very high resolutions while using low-cost CMOS processes, such as the processes used to produce digital integrated circuits. For this reason, even though the technique was first presented in the early 1960s, it is only in recent years that it has come into widespread use with improvements in silicon technology. Almost all analog integrated circuit vendors offer delta-sigma converters. Given a particular fabrication process, a sigma-delta ADC can give more bits of resolution than any other ADC structure, with the only exception of the integrating ADC structure. Both kinds of ADCs use an analog integrating amplifier to cancel out many kinds of noise and errors. Relationship to Δ-modulation Fig. 2: Derivation of ΔΣ- from Δ-modulation ΔΣ modulation (SDM) is inspired by Δ modulation (DM), as shown in Fig. 2. If quantization was homogeneous (e.g., if it was linear), the following would be a sufficient derivation of the equivalence of DM and SDM: Start with a block diagram of a Δ-modulator/demodulator. The linearity property of integration () makes it possible to move the integrator, which reconstructs the analog signal in the demodulator section, in front of the Δ-modulator. Again, the linearity property of the integration allows the two integrators to be combined and a ΔΣ-modulator/demodulator block diagram is obtained. However, the quantizer is not homogeneous, and so this explanation is flawed. It's true that ΔΣ is inspired by Δ-modulation, but the two are distinct in operation. From the first block diagram in Fig. 2, the integrator in the feedback path can be removed if the feedback is taken directly from the input of the low-pass filter. Hence, for delta modulation of input signal u, the low-pass filter sees the signal However, sigma-delta modulation of the same input signal places at the low-pass filter In other words, SDM and DM swap the position of the integrator and quantizer. The net effect is a simpler implementation that has the added benefit of shaping the quantization noise away from signals of interest (i.e., signals of interest are low-pass filtered while quantization noise is high-pass filtered). This effect becomes more dramatic with increased oversampling, which allows for quantization noise to be somewhat programmable. On the other hand, Δ-modulation shapes both noise and signal equally. Additionally, the quantizer (e.g., comparator) used in DM has a small output representing a small step up and down the quantized approximation of the input while the quantizer used in SDM must take values outside of the range of the input signal, as shown in Fig. 3. Fig. 3: An example of SDM of 100 samples of one period a sine wave. 1-bit samples (e.g., comparator output) overlaid with sine wave where logic high (e.g., + VCC) represented by blue and logic low (e.g., − VCC) represented by white. In general, ΔΣ has some advantages versus Δ modulation: ,[object Object]

The demodulator can be a simple linear filter (e.g., RC or LC filter) to reconstruct the signal

The quantizer (e.g., comparator) can have full-scale outputs

The quantized value is the integral of the difference signal, which makes it less sensitive to the rate of change of the signal.Principle The principle of the ΔΣ architecture is to make rough evaluations of the signal, to measure the error, integrate it and then compensate for that error. The mean output value is then equal to the mean input value if the integral of the error is finite. A demonstration applet is available online to simulate the whole architecture. Variations There are many kinds of ADC that use this delta-sigma structure. The above analysis focuses on the simplest 1st-order, 2-level, uniform-decimation sigma-delta ADC. Many ADCs use a second-order 5-level sinc3 sigma-delta structure. Quantization theory formulas When a signal is quantized, the resulting signal approximately has the second-order statistics of a signal with independent additive white noise. Assuming that the signal value is in the range of one step of the quantized value with an equal distribution, the root mean square value of this quantization noise is In reality, the quantization noise is of course not independent of the signal; this dependence is the source of idle tones and pattern noise in Sigma-Delta converters. Oversampling ratio, where fs is the sampling frequency and 2f0 is Nyquist rate The rms noise voltage within the band of interest can be expressed in terms of OSR Adaptive Lossless Data Compression Algorithm (ALDC) Scope This ECMA standard specifies a lossless compression algorithm to reduce the number of bytes required to represent data. The algorithm is known as Adaptive Lossless Data Compression algorithm (ALDC). The numerical identifiers according to ISO/IEC 11576 allocated to this algorithm are: ALDC 512-Byte History Buffer: 3 ALDC 1024-Byte History Buffer: 4 ALDC 2048-Byte History Buffer: 5 Definitions For the purposes of this ECMA Standard, the following definitions apply. 1 Compressed Data Stream The output stream after encoding. 2 Copy Pointer A part of the Compressed Data Stream which represents a group of two or more consecutive bytes for which there already exists an identical group in the History Buffer. It comprises a Length Code Field and a Displacement Field. 3 Current Address The location within the History Buffer where the Data Byte is written. 4 Data Byte The current byte of incoming data which is written into the History Buffer and is compared to all data bytes previously written into the History Buffer. 5 Displacement Field That part of the Copy Pointer which specifies the location within the History Buffer of the first byte of a Matching String. 6 End Marker A string of 12 ONEs indicating the end of the Compressed Data Stream. 7 History Buffer A data structure where incoming data bytes are stored for use in the compression and decompression process. 8 Literal A Data Byte for which no match was found in the History Buffer. 9 Matching String A sequence of bytes in the incoming data which is identical with a sequence of bytes in the History Buffer. 10 Match Count The number of bytes in a Matching String. 11 Match Count Field That part of the Copy Pointer which specifies the number of consecutive bytes for which a match was found in the History Buffer. 12 Pad Bits Bits set to ZERO and included in the Compressed Data Stream, as required, to maintain an 8-bit byte boundary. Conventions and Notations 1 Representation of numbers The following conventions and notations apply in this Standard, unless otherwise stated. ,[object Object]

Numbers in binary notation and bit combinations are represented by ZEROs and ONEs with the most significant

All other numbers shall be in decimal form.2 Names The names of entities are given with a capital initial letter. ALDC compression algorithm Encoding description for a 512-byte History Buffer At the start of encoding, all bytes of the History Buffer shall be reset to all ZEROs. Data bytes shall be stored in sequence in the History Buffer, starting with a Current Address of 0. The encoder processes the incoming data stream one byte at a time. The current byte being processed is referred to as the Data Byte. When a Data Byte is received from the input data stream, it shall be written into the History Buffer at the Current Address. Then the Current Address shall be incremented by 1. If it exceeds the maximum address, which is 511 for a History Buffer size of 512 bytes, it shall be reset to 0. Step 1 The Data Byte shall be compared with each byte previously written into the History Buffer to identify any identical bytes. Step 2 If the Data Byte does not match any byte in the History Buffer, the process shall continue at step 6. If the Data Byte matches one or more bytes in the History Buffer, for every matching byte it shall be noted whether this matching byte is a continuation of a previous sequence of matching bytes or not. If it is not a continuation, the Displacement Field of the matching byte shall be noted and recorded as having a Match Count of one byte. If the matching byte is a continuation of a previous string, the Match Count for that string shall be incremented by 1. Step 3 If a Match Count equals 271, the corresponding bytes shall be identified by a Copy Pointer, which shall be added to the Compressed Data Stream. Its Match Count Field and Displacement Field shall be specified. The next Data Byte shall then be read and the process shall continue at Step 1.If there is no more data to be read, the process shall continue at Step 7. Note: The value of 271 was chosen for implementation reasons. Step 4 If the Match Count has not reached 271, any pending Matching Strings shall be checked to see if any are continued by the Data Byte. If none of the previous Matching Strings is continued and if any of the previous Matching Strings consists of two or more bytes, that Matching String having the lowest Displacement Field shall be identified by a Copy Pointer, which shall be added to the Compressed Data Stream. The next Data Byte shall then be read. If there is no more data to be read, the process shall continue at Step 7. If no previous Matching Strings are continued and the previous matches were only 1-byte matches, the previous 1-byte match shall be identified as a Literal and shall be added to the Compressed Data Stream. The next Data Byte shall then be read. If there is no more data to be read, the process shall continue at Step 7. Step 5 If there are no Matching Strings with a Match Count of 271 and there is the continuation of at least 1 previous Matching String, the next Data Byte shall be read. The process shall continue at Step 1. If there is no more data to be read, the pending Matching String shall be identified as a Copy Pointer, which shall be added to the Compressed Data Stream. The process shall then continue at Step 7. Step 6 If the Data Byte does not match any bytes of the History Buffer, a check shall be made for any previous Matching Strings that may be pending. If there are any previous Matching Strings of two or more bytes, they shall be identified by a Copy Pointer, which shall be added to the Compressed Data Stream. Then the Data Byte that did not match shall be added to the Compressed Data Stream as a Literal. The next Data Byte shall be read and the process shall continue at Step 1. If there is no more data to be read, the process shall continue at Step 7. If there are no pending Matching Strings of two or more bytes and there are any pending 1-byte matches, the byte preceding the Data Byte shall be identified as a Literal, which shall be added to the Compressed Data Stream. Then the Data Byte that did not find a match shall be identified as a Literal, which shall be added to the Compressed Data Stream. The next Data Byte shall be read and the process shall continue at Step 1. If there is no more data to be read, the process shall continue at Step 7. Step 7 An End Marker shall be added to the Compressed Data Stream and Pad Bits shall be included as required. This ends the encoding process. Description of the Compressed Data Stream As described above, the processing of the input data generates as its output the Compressed Data Stream. The completed Compressed Data Stream shall consist of: _ Literals, each preceded by a bit set to ZERO. _ Copy Pointers, each preceded by a bit set to ONE. _ An End Marker preceded by a bit set to ONE. _ Pad Bits. Once all data has been read, the Compressed Data Stream shall be terminated by a bit set to ONE, followed by an End Marker, followed by Pad Bits. During the encoding, if more than one Matching String of the same Match Count is found, the Copy Pointer with the lowest Displacement Field shall be used. The Match Count Field shall consist of 2, 4, 6, 8, or 12 bits, identifying Match Counts as specified in table 1. The length of the Displacement Field shall be 9 bits, 10 bits, or 11 bits for History Buffer sizes of 512 bytes, 1024 bytes, or 2048 bytes, respectively. ALDC Overview The ALDC algorithm accepts input in 8-bit data bytes and outputs a bit stream representing data in compressed form. The ALDC algorithm is one implementation of the Lempel-Ziv 1 (LZ1) class of data compression algorithms. LZ1 algorithms achieve compression using a data structure called a History Buffer where incoming data is stored and compared to previous data in the same History Buffer. An LZ1 encoding process and an LZ1 decoding process both initialize this structure to the same known state and update it in an identical fashion. Consequently, these two histories remain identical, so it is not necessary to include history content information within the compressed data stream. Incoming data is entered into the History Buffer. Each incoming byte is compared with all other bytes previously stored in the History Buffer. Compression results from finding sequential matches. At the beginning of the encoding process the History Buffer is empty. The result of not finding any matching bytes in the History Buffer is that the Compressed Data Stream contains only Literals. Bytes are encoded as Literals until a matching sequence of two or more bytes occurs. As the History Buffer fills, it becomes increasingly possible for the encoder to represent incoming data by encoding it as a Copy Pointer for a string already present in this History Buffer. This is the principal mechanism by which LZ1 algorithms are able to achieve compression. Pixel Short for Picture Element, a pixel is a single point in a graphic image. Graphics monitors display pictures by dividing the display screen into thousands (or millions) of pixels, arranged in rows and columns. The pixels are so close together that they appear connected. Each pixel can only be one color at a time. However, since they are so small, pixels often blend together to form various shades and blends of colors. The number of colors each pixel can be is determined by the number of bits used to represent it. A pixel does not need to be rendered as a small square. This image shows alternative ways of reconstructing an image from a set of pixel values, using dots, lines, or smooth filtering. COLOR PRINCIPLES Human eye sees just a single color when a particular set of three Primary colors are mixed and displayed simultaneously. In fact a whole spectrum of colors known as a color gamut can be produced by using different proportions of the three primary colors RED (R) GREEN (G) BLUE (B). Color Derivation Principles ,[object Object],The mixing technique used in part (A) is known as Additive Color Mixing which , since black is produced when all three primary colors are zero, is particularly useful for producing a color image on a black surface as in the case of display applications. b) Subtractive color mixing It is also possible to perform the complimentary subtractive color mixing operation to produce a similar range of colors. This is shown in the figure and, as we can see, with subtractive mixing white is produced when all three chosen primary colors cyan (C) , magenta (M) , Yellow (Y) all are zero . Hence the choice of colors is particularly useful for producing a color image on a white surface as is the case of printing Applications. Image Formats JPEG JPEG stands for quot;
Joint Photographic Expert Groupquot;
. It was voted as international standard in 1992. It works with both color and grayscale images, e.g., satellite, medical, etc. It has both lossy and lossless compression. First generation JPEG uses DCT + Run length Huffman entropy coding. Second generation JPEG (JPEG2000) uses wavelet transform + bit plane coding + Arithmetic entropy coding. Graphics Interchange Format (GIF) GIF is used extensively with the internet for the representation and compression of graphical images. Although images comprising of 24bit pixels are supported – 8 bits each for R, G and B . GIF reduces the number of possible colors that are present by choosing the 256 colors from the original set of 224 colors that match most closely those used in the original image. The resulting table of colors therefore consists of 256 entries each of which contains a 24bit color value. Hence instead of sending each pixel as a 24bit value, only the 8bit index to the Table entry that contains the closest match color to the original is sent. This results in a compression ratio of 3:1. The table of colors can relate either to the whole image in which case it is referred as the GLOBAL COLOR TABLE or To a portion of the image, when it is referred to as a LOCAL COLOR TABLE. The contents of the table are sent across the network – together with the compressed image data and other information such as the screen size and aspect ratio in a standard format. The LZW coding algorithm can be used to obtain further levels of compression. Same as text compression, but in case of image compression this works by extending the basic color table dynamically as the compressed image data is being encoded and decoded. Since each entry in the color table comprises 24 bits, in order to save memory, to represent each string of pixel values just the corresponding string of 8 bit indices to the basic color table used. GIF also allows an image to be stored and subsequently transferred over the network in an interlaced mode. This can be useful when transferring images over either low bit rate channels or the internet which provides a variable transmission rate. The Maximum compression available with GIF therefore depends on the amount of repetition is there in an image. A flat color will compress well – sometime even down to one tenth of the original file size. While a complex non repetitive image will fare worse, perhaps only saving 20% or so. Tagged Image File format (TIFF) The TIFF is also used extensively. It supports pixel resolutions up to 48 bits – 16 bit for each R, G and B and is intended for both images and digitized documents. The image, data, therefore can be stored and hence transferred over a network in a number of different formats. TIFF is the leading commercial and professional image standard and most widely supported format across all platforms – windows, MAC, UNIX. TIFF supports Layers. TIFF supports most color spaces, RGB CMYK YCbCr; etc TIFF is a flexible format with many options. The data contains tags to declare what type of data follows. Several compression formats are used with TIFF. TIFF with G3 compression is the universal standard for fax and Multiage line art documents. Capturing Camera operation ,[object Object]

More tubes (CCD) and better lens produce better pictures

Video composed of luminance and chrominance signals

Composite video combines luminance and chrominance

Component video sends signals separatelyWhen light reflected from an object passes through a video camera lens, the light is converted into an electronic signal by a special sensor called a charge coupled device (CCD). The output of the CCD is processed by the camera into a signal containing through the channels of colors RGB. There are several video standards for managing CCD output, each dealing with the amount of separation between the components of the signal. Standard video camera outputs two main elements luminance and chrominance. In production and post-production recording, the luminance and chrominance is kept separate resulting in higher quality. For broadcast and distribution the luminance and chrominance is combined into one signal. Luminance is the brightness of a pixel point. Chrominance is the color information. In color TV’s, the chrominance signals are interlaced with the luminance signals. Chrominance is made up of two parts, hue and saturation. Hue describes the actual color displayed, and saturation is the intensity of the color. Image Processing and Storage In electrical engineering and computer science, image processing is any form of signal processing for which the input is an image, such as photographs or frames of video; the output of image processing can be either an image or a set of characteristics or parameters related to the image. Most image-processing techniques involve treating the image as a two-dimensional signal and applying standard signal-processing techniques to it. Image processing usually refers to digital image processing, but optical and analog image processing are also possible. This article is about general techniques that apply to all of them. The acquisition of images (producing the input image in the first place) is referred to as imaging Among many other image processing operations are: Euclidean geometry transformations such as enlargement, reduction, and rotation Color corrections such as brightness and contrast adjustments, color mapping, color balancing, quantization, or color translation to a different color space Digital compositing or optical compositing (combination of two or more images). Used in film-making to make a quot;
mattequot;
 Interpolation, demosaicing, and recovery of a full image from a raw image format using a Bayer filter pattern Image registration, the alignment of two or more images Image differencing and morphing Image recognition, for example, extract the text from the image using optical character recognition or checkbox and bubble values using optical mark recognition Image segmentation High dynamic range imaging by combining multiple images Geometric hashing for 2-D object recognition with affine invariance Software for image editing and compression Image Editing Image editing encompasses the processes of altering images, whether they be digital photographs, traditional analog photographs, or illustrations. Graphic software programs, which can be broadly grouped into vector graphics editors, raster graphics editors, and 3d modelers, are the primary tools with which a user may manipulate, enhance, and transform images. The Basics of Image Editing Raster images are stored in a computer in the form of a grid of picture elements, or pixels. These pixels contain the image's color and brightness information. Image editors can change the pixels to enhance the image in many ways. Whereas, vector graphics software are used to create and modify vector images, which are stored as descriptions oflines, Bézier splines, and text instead of pixels. Vector images can be modified more easily, because they contain descriptions of the shapes for easy rearrangement. They are also scalable, being rasterizable at any resolution. Due to the popularity of digital cameras, image editing programs are readily available. Minimal programs, that perform such operations as rotating and cropping, are often provided within the digital camera itself, while others are returned to the user on a compact disc (CD) when images are processed at a discount store. The more powerful programs contain functionality to perform a large variety of advanced image manipulations. Popular raster-based digital image editors include Adobe Photoshop, GIMP, Corel Photo-Paint, Paint Shop Pro and Paint.NET. Image Editing and Compression Many image file formats use data compression to reduce file size and save storage space. Digital compression of images may take place in the camera, or can be done in the computer with the image editor. When images are stored in JPEG format, compression has already taken place. Both cameras and computer programs allow the user to set the level of compression. Some compression algorithms, such as those used in PNG file format, are lossless and others are lossy. The greater the compression, the more information is lost, ultimately reducing image quality or detail. Main features of Image Editing and compression softwares Mostly currently used Softwares have the following features. ,[object Object],Layers Image size alteration Cropping an image Histogram Noise reduction Removal of unwanted elements Selective color change Image orientation Perspective correction and distortion Lens correction Sharpening and softening images Selecting and merging of images Slicing of images Special effects Change color depth Contrast change and brightening Color adjustments Printing Some of these features allow the images to be compressed. Features like Image size alteration or Image cropping can improve compression of the image. Examples with Adobe Photoshop: Adobe Photoshop, or simply Photoshop, is a graphics editing program developed and published by Adobe Systems. It is the current market leader for commercial bitmap and image manipulation software. It has been described as quot;
an industry standard for graphics professionalsquot;
. Adobe Photoshop CS4 is the 11th major release of Adobe Photoshop. Photoshop can utilize the color models RGB, lab, CMYK, grayscale, binary bitmap, and duotone. Photoshop has the ability to read and write raster and vector image formats such as .EPS, .PNG, .GIF, .JPEG, and Adobe Fireworks. Image compression with Photoshop: Photoshop offers a standard JPEG compression accessible from quot;
File -> Save For Web And Devicequot;
menu.You can choose the compression level by sliding quality and then preview the result on a window.The main problem is that the better the image’s quality, the more space it will use. And more space means slow upload and download speed on the Web.Sample image compressed by Photoshop: Original PNG image.File size: 75,315 bytesJPG Compressed by Photoshop CS4.Maximum Quality (100) File size: 40,964 bytesJPG Compressed by Photoshop CS4.Low Quality (10) File size: 3847 bytes The lower the compression is done, the best the quality will be and vice versa. Image Editing and compression Softwares on the Internet There exists online image Editing softwares that allow you to implement some feature such as: Resize or Crop your images (also for animated gifs)Add Text to your images (also for animated gifs)Add Borders, or Overlay your images (also for animated gifs)Put your image in a Picture Frame or add a Mask. (also for animated gifs)Resize or Crop your images (also your animated gifs)Overlay your images with predefined animationsConvert, sharpen, reduce size of your animated gif... and much, much, more.... It is a new generation of software that do not give you all the features of powerful softwares like photoshop, but they help you to do the job of compression and gives you some editing functions. Examples of online software for image editing and compression: http://www.online-image-editor.com/ http://www.pixlr.com/editor/ Pixlr is a powerful online flash-based image editor that can be included as a LAMS activity. The editor has powerful image creation and editing features, and the interface will be familiar to anyone who has used Paint, or more advanced editors like Photoshop or GIMP. Conclusion Software for image editing and compression are various. All have the ability to transform images on different aspects depending on the purpose for which they are used by people. It is a need to have compression done for images that have to be hosted on the web. How Audio Compression works Well, there are two essential properties to audio compression. 1)Reduction of data used to represent -The zip (Huffman) type of compression where patterns are searched for in order to decrease the amount of data that needs to be stored, the net result is that you can make files smaller without getting rid of any of the data. In audio there are a few codecs that can but a popular lossless codec is Monkey's Audio codec which is quite cool. 2) Psychoacoustic models-This is the lossy part of the compression where an encoder will throw away information in order to reduce the size. This is based on a mathematical model which attempts to describe what the human ear actually hears - i.e. with the intention of disregarding any information that cannot actually be heard. (Psychoacoustic - means “the way the brain interprets sound.”) The human ear is sensitive to the range 20Hz-20Khz Silence removal- Detecting the silent portions of the signal and not representing or removing them. Masked Sounds: There are two main types of masking effects - Simultaneous Masking and Temporal Masking. Simultaneous Masking works under the principle that certain sounds can drown out other sounds when played at the same time. However, if you have two distinct sounds playing, even if you can't hear one, you have much more information. This is the kind of information that is removed and that is the principle of simultaneous masking - the removal of the sounds the brain doesn't hear because of other sounds being present. Temporal Masking works in a similar way but here the idea isn't that you can't hear one sound because of another one being similar, it's the fact that if you play one sound slightly after another one you won’t be able to hear the second one (and vice versa) Again, this is sound information that would be removed. Joint Stereo Most of the time, the left and right channels are very similar. So why bother having twice the data for most of the song when lots of it can be duplicated for each channel? This is there the Joint Stereo idea comes in. It compares the left and right channels and works out how much data it can save by making them identical and encoding the data once. This means there will be elements of your wav that are, in effect, mono. These are only elements however and it is a very useful addition for the reduction of file sizes. All forms of compressed audio use powerful algorithms to discard audio information that we can't hear. Exactly what information to throw away depends on the codec being used. The most significant in recent years is undoubtedly the psychoacoustic models used in mpeg1 layer 3 (mp3) compression. ,[object Object]

MP3 is a form of compression. It's an acronym which stands for Mpeg 1 Audio Layer 3.The Stages of MP3 Compression First, let's look at the stages that take place in compressing an audio file. For this example, the mp3 codec is described: The waveform is separated into small sections called frames (think of them as similar to video frames) and it is within each frame that the audio will be analyzed. The section is analyzed to see what frequencies are present (aka spectral analysis). These figures are then compared to tables of data in the codec that contains information of the psychoacoustic models. In the mp3 codec, these models are very advanced and a great deal of the modeling is based on the principle known as masking .Any information that matches the psychoacoustic model is retained and the rest is discarded. This is the majority of the audio compression. Depending on the bitrate, the codec uses the allotted amount of bits to store this data. Once this has taken place, the result is then passed through the lossless Huffman zip-type compression which reduces the size by another 10%. [this is why there is no point in zipping an mp3… it's already been 'zipped'] Audio Editing Audio Editors designed for use with music typically allow the user to do the following: Record audio from one or more inputs and store recordings in the computer's memory as digital audio Edit the start time, stop time, and duration of any sound on the audio timeline Fade into or out of a clip (e.g. an S-fade out during applause after a performance), or between clips (e.g. crossfading between takes) Mix multiple sound sources/tracks, combine them at various volume levels and pan from channel to channel to one or more output tracks Apply simple or advanced effects or filters, including compression, expansion, flanging, reverb, audio noise reduction and equalization to change the audio Playback sound (often after being mixed) that can be sent to one or more outputs, such as speakers, additional processors, or a recording medium Conversion between different audio file formats, or between different sound quality levels Typically these tasks can be performed in a manner that is both non-linear and non-destructive. Examples of popular Audio Editing Software: Fruity Loops (now FL Studio) Cool Edit Pro Sony Vegas Cubase Audacity MP3 Cutter Video compression refers to reducing the quantity of data used to represent digital video images, and is a combination of spatial image compression and temporal motion compensation. Another way to explain video compression is as follows: Compression is a reversible conversion (encoding) of data that contains fewer bits. This allows a more efficient storage and transmission of the data. The inverse process is called decompression (decoding). Software and hardware that can encode and decode are called decoders. Both combined form a codec and should not be confused with the terms data container or compression algorithms. Figure: Relation between codec, data containers and compression algorithms. Why is video compression used? A simple calculation shows that an uncompressed video produces an enormous amount of data: a resolution of 720x576 pixels (PAL), with a refresh rate of 25 fps and 8-bit colour depth, would require the following bandwidth: 720 x 576 x 25 x 8 + 2 x (360 x 576 x 25 x 8) = 1.66 Mb/s (luminance + chrominance) For High Definition Television (HDTV): 1920 x 1080 x 60 x 8 + 2 x (960 x 1080 x 60 x 8) = 1.99 Gb/s Even with powerful computer systems (storage, processor power, network bandwidth), such data amount cause extreme high computational demands for managing the data. Fortunately, digital video contains a great deal of redundancy. Thus it is suitable for compression, which can reduce these problems significantly. Especially lossy compression techniques deliver high compression ratios for video data. However, one must keep in mind that there is always a trade-off between data size (therefore computational time) and quality. The higher the compression ratio, the lower the size and the lower the quality. The encoding and decoding process itself also needs computational resources, which have to be taken into consideration. It makes no sense, for example for a real-time application with low bandwidth requirements, to compress the video with a computational expensive algorithm which takes too long to encode and decode the data. Compression Principles Compression is like making orange juice concentrate. Fresh oranges go in one end and concentrate comes out the other. The concentrated orange juice takes up less space, is easier to distribute, and so forth. There are different brands and types of concentrate to meet the consumers' needs or desires. Likewise, video compression takes a large file and makes it smaller. The smaller files require less hard disk space, less memory to run, and less bandwidth to play over networks or the Internet. Many compression schemes exist and have their specific strengths and weaknesses. Lossless vs. Lossy Compression There are two types of compression: •Lossless—Lossless compression preserves all the data, but makes it more compact. The movie that comes out is exactly the same quality as what went in. Lossless compression produces very high quality digital audio or video, but requires a lot of data. The drawback with Lossless compression is that it is inefficient when trying to maximize storage space or network and Internet delivery capacity (bandwidth). •Lossy—Lossy compression eliminates some of the data. Most images and sounds have more details than the eye and ear can discern. By eliminating some of these details, Lossy compression can achieve smaller files than Lossless compression. However, as the files get smaller, the reduction in quality can become noticeable. The smaller file sizes make Lossy compression ideal for placing video on a CD-ROM or delivering video over a network or the Internet. Most codecs in use today are Lossy codecs. Spatial and Temporal Compression There are two different ways to compress digital media: Spatial compression—Spatial refers to compression applied to a single frame of data. The frame is compressed independently of any surrounding frames. Compression can be Lossless or Lossy. A spatially compressed frame is often referred to as an “intraframe.”, I frame or Keyframe Temporal compression—Temporal compression identifies the differences between frames and stores only those differences. Unchanged areas are simply repeated from the previous frame(s). A temporally compressed frame is often referred to as an “interframe.” or P frame Interframe vs. Intraframe Compressed video frames are defined as interframes or intraframes. Interframes—There are codecs that are categorized as “interframe” codecs. Interframe means many frames are described based on their difference from the preceding frame. Mpeg1, Mpeg2 etc. (P frames) Intraframes—“Intraframe” codecs compress each frame separately and independent of surrounding frames (JPEG is an intraframe codec). However, interframe codecs also use intraframes. The intraframes are used as the reference frames (keyframes) for the interframes. (I frames) Generally codecs always begin with a keyframe. Each keyframe becomes the main reference frame for the following interframes. Whenever the next frame is significantly different from the previous frame, the codec compresses a new keyframe. Some video compression algorithms use both interframe and intraframe compression. For example, Motion Picture Experts Group (MPEG) uses Joint Photographic Experts Group (JPEG), which is an intrafame technique, and a separate interframe algorithm. Motion-JPEG (M-JPEG) uses only intraframe compression. Interframe Compression Interframe compression uses a system of key and delta frames to eliminate redundant information between frames. Key frames store an entire frame, and delta frames record only changes. Some implementations compress the key frames, and others don't. Either way, the key frames serve as a reference source for delta frames. Delta frames contain only pixels that are different from the key frame or from the immediately preceding delta frame. During decompression, delta frames look back to their respective reference frames to fill in missing information. Different compression techniques use different sequences of key and delta frames. For example, most video for Windows CODECs calculate interframe differences between sequential delta frames during compression. In this case, only the first delta frame relates to the key frame. Each subsequent delta frame relates to the immediately preceding delta frame. In other compression schemes, such as MPEG, all delta frames relate to the preceding key frame. All interframe compression techniques derive their effectiveness from interframe redundancy. Low-motion video sequences, such as the head and shoulders of a person, have a high degree of redundancy, which limits the amount of compression required to reduce the video to the target bandwidth. Until recently, interframe compression has addressed only pixel blocks that remained static between the delta and the key frame. Some new CODECs increase compression by tracking moving blocks of pixels from frame to frame. This technique is called motion compensation (also known as dynamic carry forwards) because the data that is carried forward from key frames is dynamic. Consider a video clip in which a person is waving an arm. If only static pixels are tracked between frames, no interframe compression occurs with respect to the moving parts of the person because those parts are not located in the same pixel blocks in both frames. If the CODEC can track the motion of the arm, the delta frame description tells the decompressor to look for particular moving parts in other pixel blocks, essentially tracking the moving part as it moves from one pixel block to another. Although dynamic carry forwards are helpful, they cannot always be implemented. In many cases, the capture board cannot scale resolution and frame rate, digitize, and hunt for dynamic carry forwards at the same time. Dynamic carry forwards typically mark the dividing line between hardware and software CODECs. Hardware CODECs, as the name implies, are usually add-on boards that provide additional hardware compression and decompression operations. The benefit of hardware CODECs is that they do not place any additional burden on the host CPU in order to execute video compression and decompression. Software CODECs rely on the host CPU and require no additional hardware. The benefit of software CODECs is that they are typically cheaper and easier to install. Because they rely on the host's CPU to perform compression and decompression, software CODECs are often limited in their capability to use techniques such as advanced tracking schemes. Intraframe Compression Intraframe compression is performed solely with reference to information within a particular frame. It is performed on pixels in delta frames that remain after interframe compression and on key frames. Although intraframe techniques are often given the most attention, overall CODEC performance relates more to interframe efficiency than intraframe efficiency. The following are the principal intraframe compression techniques: •Run Length Encoding (RLE)—A simple lossless technique originally designed for data compression and later modified for facsimile. RLE compresses an image based on quot;
runsquot;
of pixels. Although it works well on black-and-white facsimiles, RLE is not very efficient for color video, which have few long runs of identically colored pixels. •JPEG—A standard that has been adopted by two international standards organizations: the ITU (formerly CCITT) and the ISO. JPEG is most often used to compress still images using discrete cosine transform (DCT) analysis. First, DCT divides the image into 88 blocks and then converts the colors and pixels into frequency space by describing each block in terms of the number of color shifts (frequency) and the extent of the change (amplitude). Because most natural images are relatively smooth, the changes that occur most often have low amplitude values, so the change is minor. In other words, images have many subtle shifts among similar colors but few dramatic shifts between very different colors. Next, quantization and amplitude values are categorized by frequency and averaged. This is the lossy stage because the original values are permanently discarded. However, because most of the picture is categorized in the high-frequency/low-amplitude range, most of the loss occurs among subtle shifts that are largely indistinguishable to the human eye. After quantization, the values are further compressed through RLE using a special zigzag pattern designed to optimize compression of like regions within the image. At extremely high compression ratios, more high-frequency/low-amplitude changes are averaged, which can cause an entire pixel block to adopt the same color. This causes a blockiness artifact that is characteristic of JPEG-compressed images. JPEG is used as the intraframe technique for MPEG. •Vector quantization (VQ)—A standard that is similar to JPEG in that it divides the image into 88 blocks. The difference between VQ and JPEG has to do with the quantization process. VQ is a recursive, or multistep algorithm with inherently self-correcting features. With VQ, similar blocks are categorized and a reference block is constructed for each category. The original blocks are then discarded. During decompression, the single reference block replaces all of the original blocks in the category. After the first set of reference blocks is selected, the image is decompressed. Comparing the decompressed image to the original reveals many differences. To address the differences, an additional set of reference blocks is created that fills in the gaps created during the first estimation. This is the self-correcting part of the algorithm. The process is repeated to find a third set of reference blocks to fill in the remaining gaps. These reference blocks are posted in a lookup table to be used during decompression. The final step is to use lossless techniques, such as RLE, to further compress the remaining information. VQ compression is by its nature computationally intensive. However, decompression, which simply involves pulling values from the lookup table, is simple and fast. VQ is a public-domain algorithm used as the intraframe technique for both Cinepak and Indeo. NTSC NTSC is a color TV standard developed in the U.S. in 1953 by the National Television System Committee. NTSC uses a Frame consisting of 486 horizontal lines in the Active Area and a Frame rate of 29.97fps. The frame is interlaced, meaning it's composed of two individual fields (pictures) with a Field rate of 59.94fps. The term NTSC may also be used to describe any video, including digital video, formatted for playback on a NTSC TV. This generally includes any Standard Definition (SD) video with a vertical Resolution of up to 480 Pixels and a horizontal Resolution no greater than 720, which also has a Frame rate of 29.97fps. NTSC is sometimes referred to as 525/60, in reference to the total number of lines (including lines not in the Active Area) and approximate Field rate. Digital formats include only 480 of NTSC's 486 visible Scan lines due to the need to guarantee mod16 Resolution, meaning its divisible evenly by 16. PAL The PAL (Phase Alternating Line) TV standard was introduced in the early 1960's in Europe. It has better Resolution than NTSC, having 576 lines in the Active Area of the Frame. The Frame rate, however, is slightly lower at 25fps. The term PAL may also be used to describe any video, including digital video, formatted for playback on a PAL TV. This generally includes any Standard Definition (SD) video with a vertical Resolution of up to 576 Pixels and a horizontal resolution no greater than 720, which also has a Frame rate of 25fps. PAL may also be called 625/50, in reference to the total number of lines (including lines not in the Active Area) and field rate. SECAM SECAM (Sequential Couleur Avec Memoire or Sequential Colour with Memory color TV standard was introduced in the early 1960's and implemented in France. Except for the color encoding scheme, it's nearly identical to the PAL standard. SECAM uses the same 576 line Active Area as PAL, as well as nearly all other. SECAM is used in France, former French colonies and in several eastern European countries. Because of its great similarities with PAL, including the same frame rate and Active Area, all of the modern video systems, such as DVD, VCD and Super VHS use PAL internally (for storing the data in the storage media, etc) and just change the color encoding to SECAM when outputting the signal back to SECAM TV. Television color encoding systems. Countries using NTSC are shown in green. Countries using PAL are shown in blue. Countries using SECAM are shown in orange. The MPEG standards MPEG stands for Moving Picture Coding Exports Group [4]. At the same time it describes a whole family of international standards for the compression of audio-visual digital data. The most known are MPEG-1, MPEG-2 and MPEG-4, which are also formally known as ISO/IEC-11172, ISO/IEC-13818 and ISO/IEC-14496. More details about the MPEG standards can be found in [4],[5],[6]. The most important aspects are summarised as follows: The MPEG-1 Standard was published 1992 and its aim was it to provide VHS quality with a bandwidth of 1,5 Mb/s, which allowed to play a video in real time from a 1x CD-ROM. The frame rate in MPEG-1 is locked at 25 (PAL) fps and 30 (NTSC) fps respectively. Further MPEG-1 was designed to allow a fast forward and backward search and a synchronisation of audio and video. A stable behaviour, in cases of data loss, as well as low computation times for encoding and decoding was reached, which is important for symmetric applications, like video telephony. In 1994 MPEG-2 was released, which allowed a higher quality with a slightly higher bandwidth. MPEG-2 is compatible to MPEG-1. Later it was also used for High Definition Television (HDTV) and DVD, which made the MPEG-3 standard disappear completely. The frame rate is locked at 25 (PAL) fps and 30 (NTSC) fps respectively, just as in MPEG-1. MPEG-2 is more scalable than MPEG-1 and is able to play the same video in different resolutions and frame rates. MPEG-4 was released 1998 and it provided lower bit rates (10Kb/s to 1Mb/s) with a good quality. It was a major development from MPEG-2 and was designed for the use in interactive environments, such as multimedia applications and video communication. It enhances the MPEG family with tools to lower the bit-rate individually for certain applications. It is therefore more adaptive to the specific area of the video usage. For multimedia producers, MPEG-4 offers a better reusability of the contents as well as a copyright protection. The content of a frame can be grouped into object, which can be accessed individually via the MPEG-4 Syntactic Description Language (MSDL). Most of the tools require immense computational power (for encoding and decoding), which makes them impractical for most “normal, nonprofessional user” applications or real time applications. The real-time tools in MPEG-4 are already included in MPEG-1 and MPEG-2. The MPEG Compression The MPEG compression algorithm encodes the data in 5 steps: First a reduction of the resolution is done, which is followed by a motion compensation in order to reduce temporal redundancy. The next steps are the Discrete Cosine Transformation (DCT) and a quantization as it is used for the JPEG compression; this reduces the spatial redundancy (referring to human visual perception). The final step is an entropy coding using the Run Length Encoding and the Huffman coding algorithm. Step 1: Reduction of the Resolution The human eye has a lower sensibility to colour information than to dark-bright contrasts. A conversion from RGB-colour-space into YUV colour components help to use this effect for compression. The chrominance components U and V can be reduced (subsampling) to half of the pixels in horizontal direction (4:2:2), or a half of the pixels in both the horizontal and vertical (4:2:0). Figure: Depending on the subsampling, 2 or 4 pixel values of the chrominance channel can be grouped together. The subsampling reduces the data volume by 50% for the 4:2:0 and by 33% for the 4:2:2 subsampling. MPEG uses similar effects for the audio compression, which are not discussed at this point. Step 2: Motion Estimation An MPEG video can be understood as a sequence of frames. Because two successive frames of a video sequence often have small differences (except in scene changes), the MPEG-standard offers a way of reducing this temporal redundancy. It uses three types of frames: I-frames (intra), P-frames (predicted) and B-frames (bidirectional). The I-frames are “key-frames”, which have no reference to other frames and their compression is not that high. The P-frames can be predicted from an earlier I-frame or P-frame. P-frames cannot be reconstructed without their referencing frame, but they need less space than the I-frames, because only the differences are stored. The B-frames are a two directional version of the P-frame, referring to both directions (one forward frame and one backward frame). B-frames cannot be referenced by other P- or Bframes, because they are interpolated from forward and backward frames. P-frames and B-frames are called inter coded frames, whereas I-frames are known as intra coded frames. Figure:. An MPEG frame sequence with two possible references: a P-frame referring to a I-frame and a B-frame referring to two P-frames. The usage of the particular frame type defines the quality and the compression ratio of the compressed video. I-frames increase the quality (and size), whereas the usage of B-frames compresses better but also produces poorer quality. The distance between two I-frames can be seen as a measure for the quality of an MPEG-video. In practise following sequence showed to give good results for quality and compression level: IBBPBBPBBPBBIBBP. The references between the different types of frames are realised by a process called motion estimation or motion compensation. The correlation between two frames in terms of motion is represented by a motion vector. The resulting frame correlation, and therefore the pixel arithmetic difference, strongly depends on how good the motion estimation algorithm is implemented. Good estimation results in higher compression ratios and better quality of the coded video sequence. However, motion estimation is a computational intensive operation, which is often not well suited for real time applications. The following figure shows the steps involved in motion estimation, which will be explained as follows: Frame Segmentation - The Actual frame is divided into nonoverlapping blocks (macro blocks) usually 8x8 or 16x16 pixels. The smaller the block sizes are chosen, the more vectors need to be calculated; the block size therefore is a critical factor in terms of time performance, but also in terms of quality: if the blocks are too large, the motion matching is most likely less correlated. If the blocks are too small, it is probably, that the algorithm will try to match noise. MPEG uses usually block sizes of 16x16 pixels. Search Threshold - In order to minimise the number of expensive motion estimation calculations, they are only calculated if the difference between two blocks at the same position is higher than a threshold, otherwise the whole block is transmitted. Block Matching - In general block matching tries, to “stitch together” an actual predicted frame by using snippets (blocks) from previous frames. The process of block matching is the most time consuming one during encoding. In order to find a matching block, each block of the current frame is compared with a past frame within a search area. Only the luminance information is used to compare the blocks, but obviously the colour information will be included in the encoding. The search area is a critical factor for the quality of the matching. It is more likely that the algorithm finds a matching block, if it searches a larger area. Obviously the number of search operations increases quadratically, when extending the search area. Therefore too large search areas slow down the encoding process dramatically. To reduce these problems often rectangular search areas are used, which take into account, that horizontal movements are more likely than vertical ones. Prediction Error Coding - Video motions are often more complex, and a simple “shifting in 2D” is not a perfectly suitable description of the motion in the actual scene, causing so called prediction errors. The MPEG stream contains a matrix for compensating this error. After prediction the, the predicted and the original frame are compared, and their differences are coded. Obviously less data is needed to store only the differences (yellow and black regions in the figure). Vector Coding - After determining the motion vectors and evaluating the correction, these can be compressed. Large parts of MPEG videos consist of B- and P-frames as seen before, and most of them have mainly stored motion vectors. Therefore an efficient compression of motion vector data, which has usually high correlation, is desired. Block Coding - see Discrete Cosine Transform (DCT) below. Step 3: Discrete Cosine Transform (DCT) DCT allows, similar to the Fast Fourier Transform (FFT), a representation of image data in terms of frequency components. So the frame-blocks (8x8 or 16x16 pixels) can be represented as frequency components. The transformation into the frequency domain is described by the following formula: Standard Application Bit Rate The DCT is unfortunately computational very expensive and its complexity increases disproportionately (O(N 2 ) ). That is the reason why images compressed using DCT are divided into blocks. Another disadvantage of DCT is its inability to decompose a broad signal into high and low frequencies at the same time. Therefore the use of small blocks allows a description of high frequencies with less cosineterms. Figure: Visualisation of 64 basis functions (cosine frequencies) of a DCT The first entry (top left in this figure) is called the direct current-term, which is constant and describes the average grey level of the block. The 63 remaining terms are called alternating-current terms. Up to this point no compression of the block data has occurred. The data was only well-conditioned for a compression, which is done by the next two steps. Step 4: Quantization During quantization, which is the primary source of data loss, the DCT terms are divided by a quantization matrix, which takes into account human visual perception. The human eyes are more reactive to low frequencies than to high ones. Higher frequencies end up with a zero entry after quantization and the domain was reduced significantly. Where Q is the quantisation Matrix of dimension N. The way Q is chosen defines the final compression level and therefore the quality. After Quantization the DC- and AC- terms are treated separately. As the correlation between the adjacent blocks is high, only the differences between the DC-terms are stored, instead of storing all values independently. The AC-terms are then stored in a zig-zag-path with increasing frequency values. This representation is optimal for the next coding step, because same values are stored next to each other; as mentioned most of the higher frequencies are zero after division with Q. Figure: Zig-zag-path for storing the frequencies If the compression is too high, which means there are more zeros after quantization, artefacts are visible (next figure). This happens because the blocks are compressed individually with no correlation to each other. When dealing with video, this effect is even more visible, as the blocks are changing (over time) individually in the worst case. Step 5: Entropy Coding The entropy coding takes two steps: Run Length Encoding (RLE ) [2] and Huffman coding [1]. These are well known lossless compression methods, which can compress data, depending on its redundancy, by an additional factor of 3 to 4. All five Steps together Figure: Illustration of the discussed 5 steps for a standard MPEG encoding. As seen, MPEG video compression consists of multiple conversion and compression algorithms. At every step other critical compression issues occur and always form a trade-off between quality, data volume and computational complexity. However, the area of use of the video will finally decide which compression standard will be used. Most of the other compression standards use similar methods to achieve an optimal compression with best possible quality. H.261 is a 1990 ITU-T video coding standard originally designed for transmission over ISDN lines on which data rates are multiples of 64 kbit/s. It is one member of the H.26x family of video coding standards in the domain of the ITU-T Video Coding Experts Group (VCEG). The coding algorithm was designed to be able to operate at video bit rates between 40 kbit/s and 2 Mbit/s. MPEG-1: Initial video and audio compression standard. Later used as the standard for Video CD, and includes the popular Layer 3 (MP3) audio compression format. MPEG-2: Transport, video and audio standards for broadcast-quality television. Used for over-the-air digital television ATSC, DVB and ISDB, digital satellite TV services like Dish Network, digital cable television signals, and (with slight modifications[citation needed]) for DVDs. MPEG-3: Originally designed for HDTV, but abandoned when it was discovered that MPEG-2 (with extensions) was sufficient for HDTV. (Do not confuse with MP3, which is MPEG-1 Layer 3.) MPEG-4: Expands MPEG-1 to support video/audio quot;
objectsquot;
, 3D content, low bitrate encoding and support for Digital Rights Management. Several new (newer than MPEG-2 Video) higher efficiency video standards are included (an alternative to MPEG-2 Video). These are explained as follows: MPEG-1 It is a standard for lossy compression of video and audio. It is designed to compress VHS-quality raw digital video and CD audio down to 1.5 Mbit/s (26:1 and 6:1 compression ratios respectively) without excessive quality loss, making video CDs, digital cable/satellite TV and digital audio broadcasting (DAB) possible. Today, MPEG-1 has become the most widely compatible lossy audio/video format in the world, and is used in a large number of products and technologies. Perhaps the best-known part of the MPEG-1 standard is the MP3 audio format it introduced. The MPEG-1 standard is published as ISO/IEC 11172 - Information technology—Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s. The standard consists of the following five Parts: Systems (storage and synchronization of video, audio, and other data together) Video (compressed video content) Audio (compressed audio content) Conformance testing (testing the correctness of implementations of the standard) Reference software (example software showing how to encode and decode according to the standard) Applications Most popular computer software for video playback includes MPEG-1 decoding, in addition to any other supported formats. The popularity of MP3 audio has established a massive installed base of hardware that can play back MPEG-1 Audio (all three layers). quot;
Virtually all digital audio devicesquot;
can play back MPEG-1 Audio.[30] Many millions have been sold to-date. Before MPEG-2 became widespread, many digital satellite/cable TV services used MPEG-1 exclusively.[9][19] The widespread popularity of MPEG-2 with broadcasters means MPEG-1 is playable by most digital cable and satellite set-top boxes, and digital disc and tape players, due to backwards compatibility. MPEG-1 is the exclusive video and audio format used on Video CD (VCD), the first consumer digital video format, and still a very popular format around the world. The Super Video CD standard, based on VCD, uses MPEG-1 audio exclusively, as well as MPEG-2 video. The DVD-Video format uses MPEG-2 video primarily, but MPEG-1 support is explicitly defined in the standard. Color space Before encoding video to MPEG-1, the color-space is transformed to Y'CbCr (Y'=Luma, Cb=Chroma Blue, Cr=Chroma Red). Luma (brightness, resolution) is stored separately from chroma (color, hue, phase) and even further separated into red and blue components. The chroma is also subsampled to 4:2:0, meaning it is reduced by one half vertically and one half horizontally, to just one quarter the resolution of the vid

multimedia

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (15)

Semelhante a multimedia

Semelhante a multimedia (20)

multimedia