SlideShare uma empresa Scribd logo
1 de 13
Baixar para ler offline
Notes on a Standard:
    UNICODE
     Elena-Oana Tabaranu
  elena.tabaranu@info.uaic.ro
           UAIC, Iasi
Plan
●   Introduction
●   Design Goals
●   Code Points and Characters
●   Encoding Forms, UTF-32, UTF-16, UTF-8
●   Conclusion




                                            2
Introduction
●   UNIversal character enCODing system
●   Unicode = universal character encoding scheme
    for written characters and text
●   Advantages
    ●   Consistent way of encoding multilingual text
    ●   Data stability instead of proliferating character sets
    ●   Encode ALL characters used for the written languages (> 1
        million characters can be encoded)
    ●   Creates a foundation for global software

                                                                    3
Design Principles




                    4
Characters, not Glyphs
●   The Unicode Standard draws a distinction between
    characters and glyphs.
●   Characters are the abstract representations of the
    smallest components of written language that have
    semantic value.




                                                         5
Logical Order
       ●   The order in which
           Unicode text is stored
           in the memory
           representation is
           called logical order
       ●   Unicode Standard
           includes characters to
           explicitly specify
           changes in direction
           when necessary
                                6
Code Points and Characters
●   Abstract characters are
    encoded internally as
    numbers
●   Codespace: 0 to 10FFFF16
    => 1,114,112 code points
    available
●   Abstract character -> code
    point
●   Example:
    U+0061 latin small letter a

                                     7
Encoding Forms
●   Encoding forms specify how
    each code point is to be
    expressed as a sequence of
    one or more code unit (8-bit,
    16-bit, 32-bit units)
●   Encoding forms for Unicode
    characters: UTF-8, UTF-16,
    UTF-32
●   Each form can be efficiently
    transformed into either of the
    other two without any loss of
    data

                                     8
UTF-32
●   The simplest Unicode encoding form
●   Each Unicode code point is represented directly
    by a single 32-bit code unit (fixed-width)
●   restricted to representation of code points in
    the range 0..10FFFF16
●   Example:
    U+10000 is represented as <00010000>
●   preferred encoding form for processing
    characters on most Unix platforms                 9
UTF-16
●   Code unit values often change from the code
    point value => conversion required
●   Variable-width encoding:
    ➢   U+0000..U+FFFF are represented as a single 16-bit
        code unit
    ➢   U+10000..U+10FFFF are represented as pairs of
        16-bit code units (surrogate pairs)
●   Optimized for BMP (Basic Multilingual Plain) =
    majority of common-use characters for all
    modern scripts of the world
                                                        10
UTF-8
●   UTF-8 encodes each character (code point) in 1 to 4
    octets (8-bit bytes), with the single–octet encoding
    used only for the 128 US-ASCII characters
    ●   U+0000 to U+007F → 1 byte
    ●   above → 2, 3, up to 4 bytes
●   Backwards compatible with ASCII
●   Standard for XML (XHTML) documents
●   Example:
    U+10000 is represented as <F0 90 80 80>

                                                           11
Conclusion
●   The Unicode Standard is a superset of all
    characters in widespread use today.
●   It contains characters from major international
    and national standards (e.g. the SGML
    standard) as well as prominient industry
    character sets (e.g. industy code from Apple,
    Adobe, Fujitsu, etc).
●   Responds to changing industry demands by
    encoding important new characters (e.g. the €
    sign )
                                                      12
Questions?
●   Thank You!




                              13

Mais conteúdo relacionado

Mais procurados

Computer graphics chapter 4
Computer graphics chapter 4Computer graphics chapter 4
Computer graphics chapter 4PrathimaBaliga
 
Attributes of output Primitive
Attributes of output Primitive Attributes of output Primitive
Attributes of output Primitive SachiniGunawardana
 
Multimedia system, Architecture & Databases
Multimedia system, Architecture & DatabasesMultimedia system, Architecture & Databases
Multimedia system, Architecture & DatabasesHarshita Ved
 
Random scan displays and raster scan displays
Random scan displays and raster scan displaysRandom scan displays and raster scan displays
Random scan displays and raster scan displaysSomya Bagai
 
Introduction to computer graphics
Introduction to computer graphics Introduction to computer graphics
Introduction to computer graphics Priyodarshini Dhar
 
Chapter 4 : SOUND
Chapter 4 : SOUNDChapter 4 : SOUND
Chapter 4 : SOUNDazira96
 
Multimedia tools (sound)
Multimedia tools (sound)Multimedia tools (sound)
Multimedia tools (sound)dhruv patel
 
Graphics inputdevices
Graphics inputdevicesGraphics inputdevices
Graphics inputdevicesBCET
 
Segments in Graphics
Segments in GraphicsSegments in Graphics
Segments in GraphicsRajani Thite
 
Multimedia operating system
Multimedia operating systemMultimedia operating system
Multimedia operating systemHome
 
Mpeg video compression
Mpeg video compressionMpeg video compression
Mpeg video compressionGem WeBlog
 
Graphics software standards
Graphics software standardsGraphics software standards
Graphics software standardsAnkit Garg
 
itft-Decision making and branching in java
itft-Decision making and branching in javaitft-Decision making and branching in java
itft-Decision making and branching in javaAtul Sehdev
 

Mais procurados (20)

Computer graphics chapter 4
Computer graphics chapter 4Computer graphics chapter 4
Computer graphics chapter 4
 
Attributes of output Primitive
Attributes of output Primitive Attributes of output Primitive
Attributes of output Primitive
 
Multimedia system, Architecture & Databases
Multimedia system, Architecture & DatabasesMultimedia system, Architecture & Databases
Multimedia system, Architecture & Databases
 
Random scan displays and raster scan displays
Random scan displays and raster scan displaysRandom scan displays and raster scan displays
Random scan displays and raster scan displays
 
Text compression
Text compressionText compression
Text compression
 
Introduction to computer graphics
Introduction to computer graphics Introduction to computer graphics
Introduction to computer graphics
 
Sound
SoundSound
Sound
 
Chapter 4 : SOUND
Chapter 4 : SOUNDChapter 4 : SOUND
Chapter 4 : SOUND
 
Multimedia chapter 5
Multimedia chapter 5Multimedia chapter 5
Multimedia chapter 5
 
Multimedia chapter 2
Multimedia chapter 2Multimedia chapter 2
Multimedia chapter 2
 
Multimedia tools (sound)
Multimedia tools (sound)Multimedia tools (sound)
Multimedia tools (sound)
 
Computer graphics realism
Computer graphics realismComputer graphics realism
Computer graphics realism
 
Graphics inputdevices
Graphics inputdevicesGraphics inputdevices
Graphics inputdevices
 
Segments in Graphics
Segments in GraphicsSegments in Graphics
Segments in Graphics
 
Multimedia operating system
Multimedia operating systemMultimedia operating system
Multimedia operating system
 
Multimedia
MultimediaMultimedia
Multimedia
 
Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
 
Mpeg video compression
Mpeg video compressionMpeg video compression
Mpeg video compression
 
Graphics software standards
Graphics software standardsGraphics software standards
Graphics software standards
 
itft-Decision making and branching in java
itft-Decision making and branching in javaitft-Decision making and branching in java
itft-Decision making and branching in java
 

Destaque

Multimedia Presentation
Multimedia PresentationMultimedia Presentation
Multimedia PresentationRajesh R. Nair
 
Lecture # 3
Lecture # 3Lecture # 3
Lecture # 3Mr SMAK
 
Multimedia file formats
Multimedia file formatsMultimedia file formats
Multimedia file formatsShruti Garg
 
Hypertext,hypermedia and multimedia
Hypertext,hypermedia and multimediaHypertext,hypermedia and multimedia
Hypertext,hypermedia and multimediagaflores2
 
Hypertext, hypermedia and multimedia
Hypertext, hypermedia and multimediaHypertext, hypermedia and multimedia
Hypertext, hypermedia and multimediafernandadavalos2566
 
multimedia data and file format
multimedia data and file formatmultimedia data and file format
multimedia data and file formatALOK SAHNI
 
MultiMedia dbms
MultiMedia dbmsMultiMedia dbms
MultiMedia dbmsTech_MX
 
Multimedia data and file format
Multimedia data and file formatMultimedia data and file format
Multimedia data and file formatNiketa Jain
 
Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Karan Panjwani
 
File formats and its types
File formats and its typesFile formats and its types
File formats and its typesAnu Garg
 
Pulse modulation
Pulse modulationPulse modulation
Pulse modulationstk_gpg
 
Text-Elements of multimedia
Text-Elements of multimediaText-Elements of multimedia
Text-Elements of multimediaVanitha Chandru
 

Destaque (20)

Multimedia Presentation
Multimedia PresentationMultimedia Presentation
Multimedia Presentation
 
Pablo 9r multimedia
Pablo 9r multimediaPablo 9r multimedia
Pablo 9r multimedia
 
Unit 4 and 5
Unit 4 and 5Unit 4 and 5
Unit 4 and 5
 
Hypertext: An Overview
Hypertext: An OverviewHypertext: An Overview
Hypertext: An Overview
 
CLI313
CLI313CLI313
CLI313
 
Lecture # 3
Lecture # 3Lecture # 3
Lecture # 3
 
Unicode (and Python)
Unicode (and Python)Unicode (and Python)
Unicode (and Python)
 
Multimedia Technology - text
Multimedia Technology - textMultimedia Technology - text
Multimedia Technology - text
 
Ch04
Ch04Ch04
Ch04
 
Multimedia file formats
Multimedia file formatsMultimedia file formats
Multimedia file formats
 
Hypertext,hypermedia and multimedia
Hypertext,hypermedia and multimediaHypertext,hypermedia and multimedia
Hypertext,hypermedia and multimedia
 
Hypertext, hypermedia and multimedia
Hypertext, hypermedia and multimediaHypertext, hypermedia and multimedia
Hypertext, hypermedia and multimedia
 
multimedia data and file format
multimedia data and file formatmultimedia data and file format
multimedia data and file format
 
MultiMedia dbms
MultiMedia dbmsMultiMedia dbms
MultiMedia dbms
 
Multimedia data and file format
Multimedia data and file formatMultimedia data and file format
Multimedia data and file format
 
Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Optical Character Recognition( OCR )
Optical Character Recognition( OCR )
 
File formats and its types
File formats and its typesFile formats and its types
File formats and its types
 
Pulse modulation
Pulse modulationPulse modulation
Pulse modulation
 
File formats
File formatsFile formats
File formats
 
Text-Elements of multimedia
Text-Elements of multimediaText-Elements of multimedia
Text-Elements of multimedia
 

Semelhante a Notes on a Standard: Unicode

Unicode Fundamentals
Unicode Fundamentals Unicode Fundamentals
Unicode Fundamentals SamiHsDU
 
Abap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesAbap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesMilind Patil
 
Xml For Dummies Chapter 6 Adding Character(S) To Xml
Xml For Dummies   Chapter 6 Adding Character(S) To XmlXml For Dummies   Chapter 6 Adding Character(S) To Xml
Xml For Dummies Chapter 6 Adding Character(S) To Xmlphanleson
 
Comprehasive Exam - IT
Comprehasive Exam - ITComprehasive Exam - IT
Comprehasive Exam - ITguest6ddfb98
 
Data encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeData encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeUlf Mattsson
 
Unicode and Legacy Representations of Emoji (IUC 36)
Unicode and Legacy Representations of Emoji (IUC 36)Unicode and Legacy Representations of Emoji (IUC 36)
Unicode and Legacy Representations of Emoji (IUC 36)David Yonge-Mallo
 
Character encoding and unicode format
Character encoding and unicode formatCharacter encoding and unicode format
Character encoding and unicode formatAdityaSharma1452
 
Lecture_ASCII and Unicode.ppt
Lecture_ASCII and Unicode.pptLecture_ASCII and Unicode.ppt
Lecture_ASCII and Unicode.pptAlula Tafere
 
Overview of character encoding
Overview of character encodingOverview of character encoding
Overview of character encodingDuy Lâm
 
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6Andrei Zmievski
 
Type हिन्दी in Java
Type हिन्दी in JavaType हिन्दी in Java
Type हिन्दी in Javagagmansa
 
Type हिन्दी in Java
Type हिन्दी in JavaType हिन्दी in Java
Type हिन्दी in Javagagmansa
 
Unicode for Small Children (and Children at Heart)
Unicode for Small Children (and Children at Heart)Unicode for Small Children (and Children at Heart)
Unicode for Small Children (and Children at Heart)Feihong Hsu
 

Semelhante a Notes on a Standard: Unicode (20)

Unicode Fundamentals
Unicode Fundamentals Unicode Fundamentals
Unicode Fundamentals
 
Unicode
UnicodeUnicode
Unicode
 
Abap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfilesAbap slide class4 unicode-plusfiles
Abap slide class4 unicode-plusfiles
 
Unicode basics in python
Unicode basics in pythonUnicode basics in python
Unicode basics in python
 
Xml For Dummies Chapter 6 Adding Character(S) To Xml
Xml For Dummies   Chapter 6 Adding Character(S) To XmlXml For Dummies   Chapter 6 Adding Character(S) To Xml
Xml For Dummies Chapter 6 Adding Character(S) To Xml
 
Comprehasive Exam - IT
Comprehasive Exam - ITComprehasive Exam - IT
Comprehasive Exam - IT
 
Data encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeData encryption and tokenization for international unicode
Data encryption and tokenization for international unicode
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
 
Unicode and Legacy Representations of Emoji (IUC 36)
Unicode and Legacy Representations of Emoji (IUC 36)Unicode and Legacy Representations of Emoji (IUC 36)
Unicode and Legacy Representations of Emoji (IUC 36)
 
Using unicode with php
Using unicode with phpUsing unicode with php
Using unicode with php
 
Character encoding and unicode format
Character encoding and unicode formatCharacter encoding and unicode format
Character encoding and unicode format
 
Lecture_ASCII and Unicode.ppt
Lecture_ASCII and Unicode.pptLecture_ASCII and Unicode.ppt
Lecture_ASCII and Unicode.ppt
 
Overview of character encoding
Overview of character encodingOverview of character encoding
Overview of character encoding
 
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
The Good, the Bad, and the Ugly: What Happened to Unicode and PHP 6
 
Type हिन्दी in Java
Type हिन्दी in JavaType हिन्दी in Java
Type हिन्दी in Java
 
Type हिन्दी in Java
Type हिन्दी in JavaType हिन्दी in Java
Type हिन्दी in Java
 
Unicode Primer for the Uninitiated
Unicode Primer for the UninitiatedUnicode Primer for the Uninitiated
Unicode Primer for the Uninitiated
 
Uncdtalk
UncdtalkUncdtalk
Uncdtalk
 
Unicode & PHP6
Unicode & PHP6Unicode & PHP6
Unicode & PHP6
 
Unicode for Small Children (and Children at Heart)
Unicode for Small Children (and Children at Heart)Unicode for Small Children (and Children at Heart)
Unicode for Small Children (and Children at Heart)
 

Mais de Elena-Oana Tabaranu

Recunoasterea organizatiilor in postarile pe Tweeter
Recunoasterea organizatiilor in postarile pe TweeterRecunoasterea organizatiilor in postarile pe Tweeter
Recunoasterea organizatiilor in postarile pe TweeterElena-Oana Tabaranu
 
SXSW 2012 JavaScript MythBusters
SXSW 2012 JavaScript MythBustersSXSW 2012 JavaScript MythBusters
SXSW 2012 JavaScript MythBustersElena-Oana Tabaranu
 
A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationElena-Oana Tabaranu
 
A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationElena-Oana Tabaranu
 
Graph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationGraph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationElena-Oana Tabaranu
 
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpediaSemantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpediaElena-Oana Tabaranu
 
Miscarea "NoSQL" in contextul Web-ului social/semantic
Miscarea "NoSQL" in contextul Web-ului social/semanticMiscarea "NoSQL" in contextul Web-ului social/semantic
Miscarea "NoSQL" in contextul Web-ului social/semanticElena-Oana Tabaranu
 
Folosirea instumentului Zemanta in recomandarea de continut
Folosirea instumentului Zemanta in recomandarea de continutFolosirea instumentului Zemanta in recomandarea de continut
Folosirea instumentului Zemanta in recomandarea de continutElena-Oana Tabaranu
 

Mais de Elena-Oana Tabaranu (9)

Recunoasterea organizatiilor in postarile pe Tweeter
Recunoasterea organizatiilor in postarile pe TweeterRecunoasterea organizatiilor in postarile pe Tweeter
Recunoasterea organizatiilor in postarile pe Tweeter
 
SXSW 2012 JavaScript MythBusters
SXSW 2012 JavaScript MythBustersSXSW 2012 JavaScript MythBusters
SXSW 2012 JavaScript MythBusters
 
A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense Disambiguation
 
A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense Disambiguation
 
Graph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationGraph-based Word Sense Disambiguation
Graph-based Word Sense Disambiguation
 
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpediaSemantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
 
Miscarea "NoSQL" in contextul Web-ului social/semantic
Miscarea "NoSQL" in contextul Web-ului social/semanticMiscarea "NoSQL" in contextul Web-ului social/semantic
Miscarea "NoSQL" in contextul Web-ului social/semantic
 
Folosirea instumentului Zemanta in recomandarea de continut
Folosirea instumentului Zemanta in recomandarea de continutFolosirea instumentului Zemanta in recomandarea de continut
Folosirea instumentului Zemanta in recomandarea de continut
 
Adobe Flex Framework
Adobe Flex FrameworkAdobe Flex Framework
Adobe Flex Framework
 

Último

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Último (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Notes on a Standard: Unicode

  • 1. Notes on a Standard: UNICODE Elena-Oana Tabaranu elena.tabaranu@info.uaic.ro UAIC, Iasi
  • 2. Plan ● Introduction ● Design Goals ● Code Points and Characters ● Encoding Forms, UTF-32, UTF-16, UTF-8 ● Conclusion 2
  • 3. Introduction ● UNIversal character enCODing system ● Unicode = universal character encoding scheme for written characters and text ● Advantages ● Consistent way of encoding multilingual text ● Data stability instead of proliferating character sets ● Encode ALL characters used for the written languages (> 1 million characters can be encoded) ● Creates a foundation for global software 3
  • 5. Characters, not Glyphs ● The Unicode Standard draws a distinction between characters and glyphs. ● Characters are the abstract representations of the smallest components of written language that have semantic value. 5
  • 6. Logical Order ● The order in which Unicode text is stored in the memory representation is called logical order ● Unicode Standard includes characters to explicitly specify changes in direction when necessary 6
  • 7. Code Points and Characters ● Abstract characters are encoded internally as numbers ● Codespace: 0 to 10FFFF16 => 1,114,112 code points available ● Abstract character -> code point ● Example: U+0061 latin small letter a 7
  • 8. Encoding Forms ● Encoding forms specify how each code point is to be expressed as a sequence of one or more code unit (8-bit, 16-bit, 32-bit units) ● Encoding forms for Unicode characters: UTF-8, UTF-16, UTF-32 ● Each form can be efficiently transformed into either of the other two without any loss of data 8
  • 9. UTF-32 ● The simplest Unicode encoding form ● Each Unicode code point is represented directly by a single 32-bit code unit (fixed-width) ● restricted to representation of code points in the range 0..10FFFF16 ● Example: U+10000 is represented as <00010000> ● preferred encoding form for processing characters on most Unix platforms 9
  • 10. UTF-16 ● Code unit values often change from the code point value => conversion required ● Variable-width encoding: ➢ U+0000..U+FFFF are represented as a single 16-bit code unit ➢ U+10000..U+10FFFF are represented as pairs of 16-bit code units (surrogate pairs) ● Optimized for BMP (Basic Multilingual Plain) = majority of common-use characters for all modern scripts of the world 10
  • 11. UTF-8 ● UTF-8 encodes each character (code point) in 1 to 4 octets (8-bit bytes), with the single–octet encoding used only for the 128 US-ASCII characters ● U+0000 to U+007F → 1 byte ● above → 2, 3, up to 4 bytes ● Backwards compatible with ASCII ● Standard for XML (XHTML) documents ● Example: U+10000 is represented as <F0 90 80 80> 11
  • 12. Conclusion ● The Unicode Standard is a superset of all characters in widespread use today. ● It contains characters from major international and national standards (e.g. the SGML standard) as well as prominient industry character sets (e.g. industy code from Apple, Adobe, Fujitsu, etc). ● Responds to changing industry demands by encoding important new characters (e.g. the € sign ) 12
  • 13. Questions? ● Thank You! 13