SlideShare uma empresa Scribd logo
1 de 81
Baixar para ler offline
Localizing your apps
for multibyte languages
Ken ISHIMOTO (K’s Room Japan)
Localizing your apps
• Part 1 - WebObject
• Part II - What is a multibyte Language
• Part III - Combine multibyte Language with WebObjects
• Part IV - multibyte & WOdka
Localizing your apps
• Part 1 - WebObject
• Part II - What is a multibyte Language
• Part III - Combine multibyte Language with WebObjects
• Part IV - multibyte & WOdka
Part 1 - WebObject
• Eclipse
• Ant build
• Properties (to make WebObjects ready)
• Database
Eclipse
• Set your Workspace to UTF-8
if you not do that you can get
all kind of problems, also
having not English Code in
Source can break the
compilation.
Ant build
• Set your Ant Compile task script to UTF-8
Properties in you APP
• This are the Properties that we use
• file.encoding=UTF-8
• er.extensions.ERXApplication.DefaultEncoding=UTF-8
• er.extensions.ERXApplication.DefaultMessageEncoding=UTF-8
• er.extensions.ERXLocalizationEditor.encoding=UTF-8
• wodka.Application.LanguageEncoding={Japanese = UTF-8; }
CSS
@charset "UTF-8";
Javascript
<script type="text/javascript" charset="UTF-8">
Database - MySQL
• MySQL = &useUnicode=true&characterEncoding=UTF-8
don’t forget to create a ‘utf8’ database
Database - FrontBase
Nothing to do, just works
Localizing your apps
• Part 1 - WebObject
• Part II - What is a multibyte Language
• Part III - Combine multibyte Language with WebObjects
• Part IV - multibyte & WOdka
Part II - What is a multibyte
Language (Japanese)
• Basics
• Alphabet (How works Japanese)
• Encoding (What Encoding I have to use)
Basics
• This is a sample Page from a Book
• a Book starting reading from right to left, so
you open it where usually close it.
• you read from right to left and
from top to bottom
• This can be very complex for Word-processing
Software so XX Word isn’t a good choice to
write Books or Magazines.That’s also one Reason
why there are some Japanese Text Editor that can
do that.
Spaces between Words
• This is a pen.
• これはペンです。
• Today we have a good weather in Tokyo.
• 今日、東京はとてもいい天気です。 also a big problem can be
that there are no spaces
between words.
yen symbol vs backslash
• If you’re familiar with the Japanese keyboard, the backslash key () is replaced by the symbol for theYen (¥).
Way back when, we did a Japanese version of BRIEF, so I was familiar with this phenomenon—paths would
be separated byYen symbols, but everything worked as expected.
• set the URL_A_chars to “$+!’,?;&@=#%><{}[]"~`^|*()”
• completely failed to compile, because it looked like this:
• set the URL_A_chars to “$+!’,?;&@=#%><{}[]¥"~`^¥¥|*()”
• and ¥ didn’t escape as you’d expect.
• If I create a new file, either on my system or the English only system I can use any font and type the  key
and I get the  glyph. Side by side in this file I can use exactly the same font but when I type the  symbol I
get the ¥ glyph. 
Japanese Alphabet
• 漢字 Kanji (Chinese characters)
• ひらがな Hiragana (Japanese Alphabet)
• カタカナ Katakana (Alphabet for Foreign Words)
• ローマ字 Romaji (English characters)
• 記号 Kigo (Sign)
• 絵文字 Emoji (Smilies)
• 外字 Gaiji (Self-made characters)
• 振り仮名 Furigana
Japanese Alphabet
•漢字 Kanji (Chinese characters)
• ひらがな Hiragana (Japanese Alphabet)
• カタカナ Katakana (Alphabet for Foreign Words)
• ローマ字 Romaji (English characters)
• 記号 Kigo (Sign)
• 絵文字 Emoji (Smilies)
• 外字 Gaiji (Self-made characters)
• 振り仮名 Furigana
漢字 Kanji
• The complexity of this Characters
• The vast majority of these are not in common use in either Japan or China; as discussed below,
approximately 2,000 to 3,000 characters are in common use in Japan, a few thousand more find occasional
use, and a total of about 13,000 characters can be encoded in various Japanese Industrial Standards for
kanji.
• Kyōiku kanji The Kyōiku kanji (教育漢字, "education kanji") are 1,006 characters that Japanese children
learn in elementary school.
• Jōyō kanji The Jōyō kanji (常用漢字, "regular-use kanji") are 2,136 characters consisting of all the Kyōiku
kanji, plus 1,130 additional kanji taught in junior high and high school. In publishing, characters outside this
category are often given furigana.
• Jinmeiyō kanji Since September 27, 2004, the Jinmeiyō kanji (人名用漢字, "kanji for use in personal
Encoding of 生
• UNICODE : 751F
• UTF-8 : E7 94 9F
• Shift-JIS : 90B6 A character can have not only 16 bit, and today
multibyte characters can also have more than 32
bit. so it is difficult to say in a database the name
field has only 20 varchar. That would be enough for
some Languages but in UTF-8 that can be only a few
chars long and not enough.
生
Pronunciation : 生
• ON : Chinese-style reading for kanji.
ショウ, ショウ_ジル, ショウ_ズル, ジョウ, セイ, ゼイ
Shou, Shou_jiru, Shou_zuru, Jou, Sei, Zei
• KUN : Japanese-style reading for kanji.
イ_カス, イ_キ, イ_キル, イ_ケル, ウ_マレ, ウ_マレル, ウ_ム, ウブ, ウマ_レ, ウマ_レル, オ
_イ, オ_ウ, キ, ナ_ス, ナ_ル, ナマ, ハ_エ, ハ_エル, ハ_ヤス, バ_エ
i_kasu, i_ki, i_kiru, i_keru, u_mare, U-mareru, u_mu ....
• Special reading.
アイ, イク, イケ, エ, オ, サ, ナリ, ニュウ, ヌク, フ, ブ, ム_ス, ヨイ
ai, iku, ike, e, o, sa, nari, nyuu, nuku, fu, bu, mu_su, yoi
• In China this get read : Shēng
difference between Countries
手紙
Letter Toilet paper
Japanese and Chinese are very different
even if there are some Kanji’s that looks
the some.
It is like English and French, the share
some Letters but can you read and
understand it?
Character : 生
• 生きる Ikiru ..... live, living , alive
• 生クリーム Nama kuri-mu ..... fresh cream
• 生涯 Shougai ..... lifetime
• 生命 Seimei ..... life
• 生む Umu ..... born
We can see that 1 Kanji can have a lot of
different meanings, and pronunciations.
So it makes 100% no sense to sort a
Database with Kanji’s.
People wouldn’t find the Data where the
excepted. And the sort would be only a
Unicode Sort that has no meaning.
every Char is very easy to
use and access, no special
treatment is necessary.
Japanese Alphabet
• 漢字 Kanji (Chinese characters)
•ひらがな Hiragana (Japanese Alphabet)
• カタカナ Katakana (Alphabet for Foreign Words)
• ローマ字 Romaji (English characters)
• 記号 Kigo (Sign)
• 絵文字 Emoji (Smilies)
• 外字 Gaiji (Self-made characters)
• 振り仮名 Furigana
ひらがな Hiragana
• Hiragana is a Japanese syllabary,
one basic component of the
Japanese writing system.
• Hiragana is used to write native
words for which there are no
kanji, including grammatical
particles , and suffixes such as さん
~san "Mr., Mrs., Miss, Ms.". every Char is very easy to
use and access, no special
treatment is necessary.
Japanese Alphabet
• 漢字 Kanji (Chinese characters)
• ひらがな Hiragana (Japanese Alphabet)
•カタカナ Katakana (Foreign Words)
• ローマ字 Romaji (English characters)
• 記号 Kigo (Sign)
• 絵文字 Emoji (Smilies)
• 外字 Gaiji (Self-made characters)
• 振り仮名 Furigana
カタカナ Katakana
• Katakana is a Japanese syllabary, one
component of the Japanese writing system.
• In contrast to the hiragana syllabary, which is
used for those Japanese language words and
grammatical inflections which kanji does not
cover, the katakana syllabary is primarily used
for transcription of foreign language words into
Japanese
every Char is very easy to use
and access, no special
treatment is necessary.
Half-width kana 半角カナ
• Half-width kana (半角カナ Hankaku kana) are katakana characters displayed at half their normal width (a
2:1 aspect ratio), instead of the usual square (1:1) aspect ratio.
• Half-width kana were used in the early days of Japanese computing, to allow Japanese characters to be
displayed on the same grid as monospaced fonts of Latin characters.
• Half-width hiragana or kanji were not used.
• Half-width kana characters are not generally used today, but find some use in specific settings, such as cash
register displays, on shop receipts, and Japanese digital television and DVD subtitles.
注
意
!
those kind of char’s can be a pain, so a good program will make a
conversion from half to full size Katakana.
String s1 = "アナタ";
String s2 = "アナタ";
ERXStringUtilitiesEXTENDED.changeHanKatakanaToZenkakuKatakana(s1);
// RESULT = "アナタ"
s1.equalsIgnoreCase(s2)
// RESULT = false
s1.length()
// RESULT = 3
s2.length()
// RESULT = 3
Half-width kana 半角カナ
Japanese Alphabet
• 漢字 Kanji (Chinese characters)
• ひらがな Hiragana (Japanese Alphabet)
• カタカナ Katakana (Alphabet for Foreign Words)
•ローマ字 Romaji (English characters)
• 記号 Kigo (Sign)
• 絵文字 Emoji (Smilies)
• 外字 Gaiji (Self-made characters)
• 振り仮名 Furigana
NUMBER 数字
NUMBER 数字
• like with Space also Numbers have
variations.
• single Byte (Hankaku)
• double Byte (Zenkaku)
• chinese Char version (Kanji)
• Hankaku (Single) - 0123456789
• Zenkaku - 0123456789
• Kanji - 0 is 零 or 〇
1 is 一 or 壱  /  2 is 二 or 弐  /  3 is 三 or 参
四五六七八九
to convert every Number
into single size before
storing in the database is
the easy way to go.
String s1 = “0123456789”;
String s2 = "0123456789";
ERXStringUtilities.isDigitsOnly(s1);
// RESULT = true
ERXStringUtilities.isDigitsOnly(s2);
// RESULT = true
s1.equalsIgnoreCase(s2);
// RESULT = false
isDigitsOnly
replace double to single
String s = "0123456789";
ERXStringUtilitiesEXTENDED.changeZenkakuNumberToHanNumber(s);
// RESULT = “0123456789”
LETTER 英字
LETTER 英字
• Everybody loves the simple 26
characters, that in most School takes
2 years to learn.
• In some Countries there are
variations like German with ÜÖÄ
LETTER 英字
• There is for each Letter a double
byte Letter
• ‘U‘ == ‘U ’
to convert every Letter
into single size before
storing in the database is
the easy way to go.
String s1 = "BC";
String s2 = "BC";
s1.equalsIgnoreCase(s2);
// RESULT = false
s1 = ERXStringUtilitiesEXTENDED.changeZenkakuEijiToHanEiji(s2);
// RESULT = ‘BC’
LETTER 英字
Japanese Alphabet
• 漢字 Kanji (Chinese characters)
• ひらがな Hiragana (Japanese Alphabet)
• カタカナ Katakana (Alphabet for Foreign Words)
• ローマ字 Romaji (English characters)
•記号 Kigo (Sign)
• 絵文字 Emoji (Smilies)
• 外字 Gaiji (Self-made characters)
• 振り仮名 Furigana
Sign 記号
Sign 記号
• For each Sign there is a double byte
counterpart
• ‘!‘ == ‘! ’
to convert every Sign into
single size before storing in
the database is the easy
way to go.
String s1 = "!@#$%^&*()";
String s2 = "!@#$%^&*()";
s1 = ERXStringUtilitiesEXTENDED.changeZenkakuKigouToHanKigou(s2);
// RESULT = ‘!@#$%^&*()’
Sign 記号
SPACE スペース
SPACE スペース
• String a = “ “;
• String b = “ ”;
a == space char
b == double-size space char
to convert every Number
into single size before
storing in the database is
the easy way to go.
// head and tail are 3 space chars
String s = “ A B C ”;
s.trim();
// RESULT = ‘A B C’
ERXStringUtilities.trimString(s);
// RESULT = ‘A B C’
ERXStringUtilitiesEXTENDED.trimStringWithZenkaku(s);
// RESULT = ‘A B C’
trim
// head and tail are 3 japanese ZENKAKU(double byte) space chars
String s = “   A B C   ”;
s.trim();
// RESULT = ‘   A B C   ’
ERXStringUtilities.trimString(s);
// RESULT = ‘   A B C   ’
ERXStringUtilitiesEXTENDED.trimStringWithZenkaku(s);
// RESULT = ‘A B C’
better trim
// between A and B are 2 single space + 2 double space + 2 single space
String s = “A    B”;
s.replace(" ", "");
// RESULT = ‘A  B’
ERXStringUtilities.removeCharacters(s, " ");
// RESULT = ‘A  B’
ERXStringUtilitiesEXTENDED.changeZenkakuToHanKakaku(s).replace(" ", "");
// RESULT = ‘ABC’
remove Space between chars
Japanese Alphabet
• 漢字 Kanji (Chinese characters)
• ひらがな Hiragana (Japanese Alphabet)
• カタカナ Katakana (Alphabet for Foreign Words)
• ローマ字 Romaji (English characters)
• 記号 Kigo (Sign)
•絵文字 Emoji (Smilies)
• 外字 Gaiji (Self-made characters)
• 振り仮名 Furigana
絵文字 Emoji (Smilies)
絵文字 Emoji (Smilies)
• Emoji (絵文字); Japanese pronunciation: [emodʑi] is the Japanese term for the
ideograms or smileys used in Japanese electronic messages and webpages.
• Emoji pictograms by au are specified using the IMG tag. SoftBank Mobile emoji
are wrapped between SI/SO escape sequences, and support colors and
animation. DoCoMo's emoji are the most compact to transmit while au's
version is more flexible based on open standards.
If you are creating a CMS or Data Entry like Blog,
Forum or whatever else, you will have to deal with
this Emoji. Japanese People loves to use it.
WOEmoji
last year WOWODC 2012, I spoke about
SnoWOman CMS and there is a Framework named
WOEmoji, with using this Framework it is easy to
convert Emojis for saving to the database and will
automatically working also on Windofs or Android
devices.
Version 2 of this Framework(working on it) can
also convert to the new open standard Emoji that is
under developing just right now in Japan.
I am a payed supporter of this Project and waiting
for delivery, so WOEmoji can be updated.
Japanese Alphabet
• 漢字 Kanji (Chinese characters)
• ひらがな Hiragana (Japanese Alphabet)
• カタカナ Katakana (Alphabet for Foreign Words)
• ローマ字 Romaji (English characters)
• 記号 Kigo (Sign)
• 絵文字 Emoji (Smilies)
•外字 Gaiji (Self-made characters)
• 振り仮名 Furigana
外字 Gaiji (Self-made characters)
• Gaiji (外字), literally meaning "external characters", are kanji that are not represented in existing
Japanese encoding systems.These include variant forms of common kanji that need to be
represented alongside the more conventional glyph in reference works, and can include non-kanji
symbols as well.
Win XP : the had only a few 1000 Kanjis and it wasn’t easy to use some
Kanjis that was not available. so People started with creating their own,
also the look was sometimes different.
WinVista : you can see the font is a little different.
But you have to buy this 1500 char Gaiji Package for about USD 500.-
OS X : works out of the Box and it is free.
Gaiji 外字 Editor
• This is a old Gaiji Editor, so the user
could make his own characters and
that was nice. it started with the first
version of Win. but now with the
Internet there is a problem, because
lot of People really recognize that
this character can bee seen only on
this one machine, and after pushing it
up via mail or data entry into a
database, it looks different on every
other machine. so need to stripe out
this characters and give a feedback
to not use that.
ERXStringUtilitiesEXTENDED.delete_ModelDependenceCharacters(true, s, 200, false,
false);
Because i don’t have a Win Machine here, so I wasn’t able to create a Sample-string,
but their is a command for deleting that kind of character Area.
Gaiji 外字
Japanese Alphabet
• 漢字 Kanji (Chinese characters)
• ひらがな Hiragana (Japanese Alphabet)
• カタカナ Katakana (Alphabet for Foreign Words)
• ローマ字 Romaji (English characters)
• 記号 Kigo (Sign)
• 絵文字 Emoji (Smilies)
• 外字 Gaiji (Self-made characters)
•振り仮名 Furigana
Furigana 振り仮名
• Furigana (振り仮名) is a Japanese reading aid, consisting of smaller kana, or syllabic characters, printed
next to a kanji (ideographic character) or other character to indicate its pronunciation. It is typically used
to clarify rare, nonstandard or ambiguous readings, or in children's or learners' materials.
Encoding
Encoding
• UTF-8
• EUC-JP
• Shift JIS
• ISO/IEC 2022
• and some more ...
UTF-8
• UTF-8 (UCS Transformation Format—8-bit[1]) is a variable-width encoding that can represent every
character in the Unicode character set. It was designed for backward compatibility with ASCII and to avoid
the complications of endianness and byte order marks in UTF-16 and UTF-32.
We use for every project UTF-8 now, and you are
mostly save and have not take care about other
Encoding, but...
EUC-JP
• EUC-JP Extended Unix Code
Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean,
and simplified Chinese.
• The structure of EUC is based on the ISO-2022 standard, which specifies a way to represent character
sets containing a maximum of 94 characters, or 8836 (942) characters, or 830584 (943) characters, as
sequences of 7-bit codes. Only ISO-2022 compliant character sets can have EUC forms. Up to four coded
character sets (referred to as G0, G1, G2, and G3 or as code sets 0, 1, 2, and 3) can be represented with
the EUC scheme. G0 is almost always an ISO-646 compliant coded character set (e.g. US-ASCII/KS X
1003/ISO 646:KR in EUC-KR and US-ASCII/the lower half of JIS X 0201 in EUC-JP) that is invoked on GL
(i.e. with the most significant bit cleared).
If you have to do work with some Win Machines it
can happen that you have to import Data that are
encoded with this encoding.
For my experience I never used that.
Shift JIS
• Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS) is a character encoding for the
Japanese language, originally developed by a Japanese company called ASCII Corporation in conjunction
with Microzoft and standardized as JIS X 0208 Appendix 1.
This is the most used encoding in Japan, and you can
be sure that if you get Data from an existing
Database or have to connect to an Database you
have to deal with this.
We did a lot of SJIS - UTF-8 conversion in the past.
ISO/IEC 2022
• ISO/IEC 2022 Information technology—Character code structure and extension techniques, is an ISO
standard (equivalent to the ECMA standard ECMA-35[1] ) specifying
• a technique for including multiple character sets in a single character encoding system, and
• a technique for representing these character sets in both 7 and 8 bit systems using the same encoding.
You have only to deal with that if you do some
Mailing solutions, but I really don’t care about that
anymore, JavaMail works just fine.
Localizing your apps
• Part 1 - WebObject
• Part II - What is a multibyte Language
• Part III - Combine multibyte Language with WebObjects
• Part IV - multibyte & WOdka
Localization ローカライズ
• Localization of your App
• Localization Data
• Sorting
Localization of your App
ERXLocalizer
// Writing Components and code with ERXLocalizer makes your life very easy
// their are so many things you can do with it, so get comfortable with it.
// Localized String from Code
ERXLocalizer.defaultLocalizer().localizedStringForKey("Nav.Main");
// Localized String in HTML
<wo:str value = "$localizer.Nav.Main" />
<wo:localized value="Nav.Main" />
* This is a bad example because I am using the power of the ‘dark force’ Inline Binding. You shouldn’t do that,
* but I use it always. Sorry I am a bad guy.
.strings
in your App ‘Resources’ folder create a folder with Language-name + ‘.lproj’
make it a plist file with KeyValue.
and save the File as
UTF-16UTF-8
with UTF-8 it is easier to read and also git commits can be viewed.
Localization Data
Localization of Data
1.Attributes in Entity
2. set Data in Edit-page
3. Display the Attribute
depending on the Localizer
[[eo]].name_en()
or
[[eo]].name_ja
or
[[eo]].valueForKey("name")
Sorting
Sorting 1
name
(how it is written)
furigana
(how it is pronounce)
Sorting 2
林森
漢字 Kanji
(Chinese characters)
Person 1 Person 2
ひらがな Hiragana
or
カタカナ Katakana
(Japanese Alphabet)もり はやし
Mr. Mori Mr. Hayashi
Localizing your apps
• Part 1 - WebObject
• Part II - What is a multibyte Language
• Part III - Combine multibyte Language with WebObjects
• Part IV - multibyte & WOdka
WOdka improvements
• Language-switching
WOdkaLanguageEnums
• Language name
• Locale Code
• Date format + 24 hours setting
• Data for Flag information
WOdkaCountryEnums
• Country name
• code2 : ISO Code for Country
• code3 : ISO Code for Country
• money : ERXMoneyEnums
• language :WOdkaLanguageEnums
• telephone code
• tax : tax info
• zip : zip format
• company Mailing Format
• family Mailing Format
• Localized words : male, female, sexMale, sexFemale
• flag : Path to Flag-data
• continent : ERXContinentEnums
• EU : ERXEuropeanUnionsEnums
"[S][CR][T][_][F][_][L]"
"[L] [F]様"
family Mailing Format
s = sex
t = title
f = first name
l = last name
cr = next line
Thanks to
• Masahiko TANI - A10 Objects Inc., (Japan)
• Hiroyuki FUKUI - Astonish Create (Japan)
Special Thanks to
• PaulYU - Green orchid llc (USA)
ThankYou
WOWODC
2013

Mais conteúdo relacionado

Mais procurados

Red beetle car top view powerpoint presentation slides ppt templates
Red beetle car top view powerpoint presentation slides ppt templatesRed beetle car top view powerpoint presentation slides ppt templates
Red beetle car top view powerpoint presentation slides ppt templates
SlideTeam.net
 
Pickup brown truck side view powerpoint presentation templates
Pickup brown truck side view powerpoint presentation templatesPickup brown truck side view powerpoint presentation templates
Pickup brown truck side view powerpoint presentation templates
SlideTeam.net
 
Green beetle car top view powerpoint presentation templates
Green beetle car top view powerpoint presentation templatesGreen beetle car top view powerpoint presentation templates
Green beetle car top view powerpoint presentation templates
SlideTeam.net
 
4 door red car side view powerpoint presentation templates
4 door red car side view powerpoint presentation templates4 door red car side view powerpoint presentation templates
4 door red car side view powerpoint presentation templates
SlideTeam.net
 
Red truck top view powerpoint presentation slides ppt templates
Red truck top view powerpoint presentation slides ppt templatesRed truck top view powerpoint presentation slides ppt templates
Red truck top view powerpoint presentation slides ppt templates
SlideTeam.net
 
4 door blue car side view powerpoint presentation templates
4 door blue car side view powerpoint presentation templates4 door blue car side view powerpoint presentation templates
4 door blue car side view powerpoint presentation templates
SlideTeam.net
 
Blue truck top view powerpoint presentation templates
Blue truck top view powerpoint presentation templatesBlue truck top view powerpoint presentation templates
Blue truck top view powerpoint presentation templates
SlideTeam.net
 
Yellow truck top view powerpoint presentation templates
Yellow truck top view powerpoint presentation templatesYellow truck top view powerpoint presentation templates
Yellow truck top view powerpoint presentation templates
SlideTeam.net
 
Pickup brown truck top view powerpoint presentation slides ppt templates
Pickup brown truck top view powerpoint presentation slides ppt templatesPickup brown truck top view powerpoint presentation slides ppt templates
Pickup brown truck top view powerpoint presentation slides ppt templates
SlideTeam.net
 
Pickup brown truck top view powerpoint presentation templates
Pickup brown truck top view powerpoint presentation templatesPickup brown truck top view powerpoint presentation templates
Pickup brown truck top view powerpoint presentation templates
SlideTeam.net
 

Mais procurados (12)

Red beetle car top view powerpoint presentation slides ppt templates
Red beetle car top view powerpoint presentation slides ppt templatesRed beetle car top view powerpoint presentation slides ppt templates
Red beetle car top view powerpoint presentation slides ppt templates
 
Pickup brown truck side view powerpoint presentation templates
Pickup brown truck side view powerpoint presentation templatesPickup brown truck side view powerpoint presentation templates
Pickup brown truck side view powerpoint presentation templates
 
Green beetle car top view powerpoint presentation templates
Green beetle car top view powerpoint presentation templatesGreen beetle car top view powerpoint presentation templates
Green beetle car top view powerpoint presentation templates
 
Green beetle car top view powerpoint presentation slides ppt templates
Green beetle car top view powerpoint presentation slides ppt templatesGreen beetle car top view powerpoint presentation slides ppt templates
Green beetle car top view powerpoint presentation slides ppt templates
 
4 door red car side view powerpoint presentation templates
4 door red car side view powerpoint presentation templates4 door red car side view powerpoint presentation templates
4 door red car side view powerpoint presentation templates
 
Red truck top view powerpoint presentation slides ppt templates
Red truck top view powerpoint presentation slides ppt templatesRed truck top view powerpoint presentation slides ppt templates
Red truck top view powerpoint presentation slides ppt templates
 
4 door blue car side view powerpoint presentation templates
4 door blue car side view powerpoint presentation templates4 door blue car side view powerpoint presentation templates
4 door blue car side view powerpoint presentation templates
 
Blue truck top view powerpoint presentation templates
Blue truck top view powerpoint presentation templatesBlue truck top view powerpoint presentation templates
Blue truck top view powerpoint presentation templates
 
Yellow truck top view powerpoint presentation templates
Yellow truck top view powerpoint presentation templatesYellow truck top view powerpoint presentation templates
Yellow truck top view powerpoint presentation templates
 
Pickup brown truck top view powerpoint presentation slides ppt templates
Pickup brown truck top view powerpoint presentation slides ppt templatesPickup brown truck top view powerpoint presentation slides ppt templates
Pickup brown truck top view powerpoint presentation slides ppt templates
 
Pickup brown truck top view powerpoint presentation templates
Pickup brown truck top view powerpoint presentation templatesPickup brown truck top view powerpoint presentation templates
Pickup brown truck top view powerpoint presentation templates
 
Timberline team
Timberline teamTimberline team
Timberline team
 

Semelhante a Localizing your apps for multibyte languages

Fuzzy search on plone & search for east asian language
Fuzzy search on plone & search for east asian languageFuzzy search on plone & search for east asian language
Fuzzy search on plone & search for east asian language
Manabu Terada
 
textprocessingboth.pptx
textprocessingboth.pptxtextprocessingboth.pptx
textprocessingboth.pptx
bdiot
 

Semelhante a Localizing your apps for multibyte languages (20)

Design with japanese characters 151104
Design with japanese characters 151104Design with japanese characters 151104
Design with japanese characters 151104
 
Development of TeXShop - The Past and the Future (TUG 2013)
Development of TeXShop - The Past and the Future (TUG 2013)Development of TeXShop - The Past and the Future (TUG 2013)
Development of TeXShop - The Past and the Future (TUG 2013)
 
This talk lasts 三十分钟
This talk lasts 三十分钟This talk lasts 三十分钟
This talk lasts 三十分钟
 
Modeless Japanese Input Method
Modeless Japanese Input MethodModeless Japanese Input Method
Modeless Japanese Input Method
 
Exploring Natural Language Processing in Ruby
Exploring Natural Language Processing in RubyExploring Natural Language Processing in Ruby
Exploring Natural Language Processing in Ruby
 
Common Challenges of Japanese – English Translation
Common Challenges of Japanese – English TranslationCommon Challenges of Japanese – English Translation
Common Challenges of Japanese – English Translation
 
Fuzzy search on plone & search for east asian language
Fuzzy search on plone & search for east asian languageFuzzy search on plone & search for east asian language
Fuzzy search on plone & search for east asian language
 
LocJAM April 2014
LocJAM April 2014LocJAM April 2014
LocJAM April 2014
 
Foreign Languages for Humans and Computers
Foreign Languages for Humans and ComputersForeign Languages for Humans and Computers
Foreign Languages for Humans and Computers
 
Chinese basics and translation guide
Chinese basics and translation guideChinese basics and translation guide
Chinese basics and translation guide
 
State of CJK issues of LibreOffice,2020 edition
State of CJK issues of LibreOffice,2020 editionState of CJK issues of LibreOffice,2020 edition
State of CJK issues of LibreOffice,2020 edition
 
Episode 6 write it on paper pdf
Episode 6 write it on paper pdfEpisode 6 write it on paper pdf
Episode 6 write it on paper pdf
 
Learning Japanese
Learning JapaneseLearning Japanese
Learning Japanese
 
Learning Japanese Hiragana and Katakana_ Workbook and Practice Sheets ( PDFDr...
Learning Japanese Hiragana and Katakana_ Workbook and Practice Sheets ( PDFDr...Learning Japanese Hiragana and Katakana_ Workbook and Practice Sheets ( PDFDr...
Learning Japanese Hiragana and Katakana_ Workbook and Practice Sheets ( PDFDr...
 
State of CJK issues of LibreOffice, 2018 edition
State of CJK issues of LibreOffice,  2018 editionState of CJK issues of LibreOffice,  2018 edition
State of CJK issues of LibreOffice, 2018 edition
 
Colourful japanese 3
Colourful japanese 3Colourful japanese 3
Colourful japanese 3
 
textprocessingboth.pptx
textprocessingboth.pptxtextprocessingboth.pptx
textprocessingboth.pptx
 
Internationalizing Your Apps
Internationalizing Your AppsInternationalizing Your Apps
Internationalizing Your Apps
 
Lesson 5(slide share)
Lesson 5(slide share)Lesson 5(slide share)
Lesson 5(slide share)
 
State of CJK issues of LibreOffice, 2019 edition
State of CJK issues of LibreOffice, 2019 editionState of CJK issues of LibreOffice, 2019 edition
State of CJK issues of LibreOffice, 2019 edition
 

Mais de WO Community

Mais de WO Community (20)

KAAccessControl
KAAccessControlKAAccessControl
KAAccessControl
 
In memory OLAP engine
In memory OLAP engineIn memory OLAP engine
In memory OLAP engine
 
Using Nagios to monitor your WO systems
Using Nagios to monitor your WO systemsUsing Nagios to monitor your WO systems
Using Nagios to monitor your WO systems
 
Build and deployment
Build and deploymentBuild and deployment
Build and deployment
 
High availability
High availabilityHigh availability
High availability
 
Reenabling SOAP using ERJaxWS
Reenabling SOAP using ERJaxWSReenabling SOAP using ERJaxWS
Reenabling SOAP using ERJaxWS
 
Chaining the Beast - Testing Wonder Applications in the Real World
Chaining the Beast - Testing Wonder Applications in the Real WorldChaining the Beast - Testing Wonder Applications in the Real World
Chaining the Beast - Testing Wonder Applications in the Real World
 
D2W Stateful Controllers
D2W Stateful ControllersD2W Stateful Controllers
D2W Stateful Controllers
 
Deploying WO on Windows
Deploying WO on WindowsDeploying WO on Windows
Deploying WO on Windows
 
Unit Testing with WOUnit
Unit Testing with WOUnitUnit Testing with WOUnit
Unit Testing with WOUnit
 
Life outside WO
Life outside WOLife outside WO
Life outside WO
 
Apache Cayenne for WO Devs
Apache Cayenne for WO DevsApache Cayenne for WO Devs
Apache Cayenne for WO Devs
 
Advanced Apache Cayenne
Advanced Apache CayenneAdvanced Apache Cayenne
Advanced Apache Cayenne
 
Migrating existing Projects to Wonder
Migrating existing Projects to WonderMigrating existing Projects to Wonder
Migrating existing Projects to Wonder
 
iOS for ERREST - alternative version
iOS for ERREST - alternative versioniOS for ERREST - alternative version
iOS for ERREST - alternative version
 
iOS for ERREST
iOS for ERRESTiOS for ERREST
iOS for ERREST
 
"Framework Principal" pattern
"Framework Principal" pattern"Framework Principal" pattern
"Framework Principal" pattern
 
Filtering data with D2W
Filtering data with D2W Filtering data with D2W
Filtering data with D2W
 
WOver
WOverWOver
WOver
 
WOdka
WOdkaWOdka
WOdka
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Localizing your apps for multibyte languages

  • 1. Localizing your apps for multibyte languages Ken ISHIMOTO (K’s Room Japan)
  • 2. Localizing your apps • Part 1 - WebObject • Part II - What is a multibyte Language • Part III - Combine multibyte Language with WebObjects • Part IV - multibyte & WOdka
  • 3. Localizing your apps • Part 1 - WebObject • Part II - What is a multibyte Language • Part III - Combine multibyte Language with WebObjects • Part IV - multibyte & WOdka
  • 4. Part 1 - WebObject • Eclipse • Ant build • Properties (to make WebObjects ready) • Database
  • 5. Eclipse • Set your Workspace to UTF-8 if you not do that you can get all kind of problems, also having not English Code in Source can break the compilation.
  • 6. Ant build • Set your Ant Compile task script to UTF-8
  • 7. Properties in you APP • This are the Properties that we use • file.encoding=UTF-8 • er.extensions.ERXApplication.DefaultEncoding=UTF-8 • er.extensions.ERXApplication.DefaultMessageEncoding=UTF-8 • er.extensions.ERXLocalizationEditor.encoding=UTF-8 • wodka.Application.LanguageEncoding={Japanese = UTF-8; }
  • 10. Database - MySQL • MySQL = &useUnicode=true&characterEncoding=UTF-8 don’t forget to create a ‘utf8’ database
  • 11. Database - FrontBase Nothing to do, just works
  • 12. Localizing your apps • Part 1 - WebObject • Part II - What is a multibyte Language • Part III - Combine multibyte Language with WebObjects • Part IV - multibyte & WOdka
  • 13. Part II - What is a multibyte Language (Japanese) • Basics • Alphabet (How works Japanese) • Encoding (What Encoding I have to use)
  • 14. Basics • This is a sample Page from a Book • a Book starting reading from right to left, so you open it where usually close it. • you read from right to left and from top to bottom • This can be very complex for Word-processing Software so XX Word isn’t a good choice to write Books or Magazines.That’s also one Reason why there are some Japanese Text Editor that can do that.
  • 15.
  • 16. Spaces between Words • This is a pen. • これはペンです。 • Today we have a good weather in Tokyo. • 今日、東京はとてもいい天気です。 also a big problem can be that there are no spaces between words.
  • 17. yen symbol vs backslash • If you’re familiar with the Japanese keyboard, the backslash key () is replaced by the symbol for theYen (¥). Way back when, we did a Japanese version of BRIEF, so I was familiar with this phenomenon—paths would be separated byYen symbols, but everything worked as expected. • set the URL_A_chars to “$+!’,?;&@=#%><{}[]"~`^|*()” • completely failed to compile, because it looked like this: • set the URL_A_chars to “$+!’,?;&@=#%><{}[]¥"~`^¥¥|*()” • and ¥ didn’t escape as you’d expect. • If I create a new file, either on my system or the English only system I can use any font and type the  key and I get the glyph. Side by side in this file I can use exactly the same font but when I type the symbol I get the ¥ glyph. 
  • 18. Japanese Alphabet • 漢字 Kanji (Chinese characters) • ひらがな Hiragana (Japanese Alphabet) • カタカナ Katakana (Alphabet for Foreign Words) • ローマ字 Romaji (English characters) • 記号 Kigo (Sign) • 絵文字 Emoji (Smilies) • 外字 Gaiji (Self-made characters) • 振り仮名 Furigana
  • 19. Japanese Alphabet •漢字 Kanji (Chinese characters) • ひらがな Hiragana (Japanese Alphabet) • カタカナ Katakana (Alphabet for Foreign Words) • ローマ字 Romaji (English characters) • 記号 Kigo (Sign) • 絵文字 Emoji (Smilies) • 外字 Gaiji (Self-made characters) • 振り仮名 Furigana
  • 20. 漢字 Kanji • The complexity of this Characters • The vast majority of these are not in common use in either Japan or China; as discussed below, approximately 2,000 to 3,000 characters are in common use in Japan, a few thousand more find occasional use, and a total of about 13,000 characters can be encoded in various Japanese Industrial Standards for kanji. • Kyōiku kanji The Kyōiku kanji (教育漢字, "education kanji") are 1,006 characters that Japanese children learn in elementary school. • Jōyō kanji The Jōyō kanji (常用漢字, "regular-use kanji") are 2,136 characters consisting of all the Kyōiku kanji, plus 1,130 additional kanji taught in junior high and high school. In publishing, characters outside this category are often given furigana. • Jinmeiyō kanji Since September 27, 2004, the Jinmeiyō kanji (人名用漢字, "kanji for use in personal
  • 21. Encoding of 生 • UNICODE : 751F • UTF-8 : E7 94 9F • Shift-JIS : 90B6 A character can have not only 16 bit, and today multibyte characters can also have more than 32 bit. so it is difficult to say in a database the name field has only 20 varchar. That would be enough for some Languages but in UTF-8 that can be only a few chars long and not enough. 生
  • 22. Pronunciation : 生 • ON : Chinese-style reading for kanji. ショウ, ショウ_ジル, ショウ_ズル, ジョウ, セイ, ゼイ Shou, Shou_jiru, Shou_zuru, Jou, Sei, Zei • KUN : Japanese-style reading for kanji. イ_カス, イ_キ, イ_キル, イ_ケル, ウ_マレ, ウ_マレル, ウ_ム, ウブ, ウマ_レ, ウマ_レル, オ _イ, オ_ウ, キ, ナ_ス, ナ_ル, ナマ, ハ_エ, ハ_エル, ハ_ヤス, バ_エ i_kasu, i_ki, i_kiru, i_keru, u_mare, U-mareru, u_mu .... • Special reading. アイ, イク, イケ, エ, オ, サ, ナリ, ニュウ, ヌク, フ, ブ, ム_ス, ヨイ ai, iku, ike, e, o, sa, nari, nyuu, nuku, fu, bu, mu_su, yoi • In China this get read : Shēng
  • 23. difference between Countries 手紙 Letter Toilet paper Japanese and Chinese are very different even if there are some Kanji’s that looks the some. It is like English and French, the share some Letters but can you read and understand it?
  • 24. Character : 生 • 生きる Ikiru ..... live, living , alive • 生クリーム Nama kuri-mu ..... fresh cream • 生涯 Shougai ..... lifetime • 生命 Seimei ..... life • 生む Umu ..... born We can see that 1 Kanji can have a lot of different meanings, and pronunciations. So it makes 100% no sense to sort a Database with Kanji’s. People wouldn’t find the Data where the excepted. And the sort would be only a Unicode Sort that has no meaning. every Char is very easy to use and access, no special treatment is necessary.
  • 25. Japanese Alphabet • 漢字 Kanji (Chinese characters) •ひらがな Hiragana (Japanese Alphabet) • カタカナ Katakana (Alphabet for Foreign Words) • ローマ字 Romaji (English characters) • 記号 Kigo (Sign) • 絵文字 Emoji (Smilies) • 外字 Gaiji (Self-made characters) • 振り仮名 Furigana
  • 26. ひらがな Hiragana • Hiragana is a Japanese syllabary, one basic component of the Japanese writing system. • Hiragana is used to write native words for which there are no kanji, including grammatical particles , and suffixes such as さん ~san "Mr., Mrs., Miss, Ms.". every Char is very easy to use and access, no special treatment is necessary.
  • 27. Japanese Alphabet • 漢字 Kanji (Chinese characters) • ひらがな Hiragana (Japanese Alphabet) •カタカナ Katakana (Foreign Words) • ローマ字 Romaji (English characters) • 記号 Kigo (Sign) • 絵文字 Emoji (Smilies) • 外字 Gaiji (Self-made characters) • 振り仮名 Furigana
  • 28. カタカナ Katakana • Katakana is a Japanese syllabary, one component of the Japanese writing system. • In contrast to the hiragana syllabary, which is used for those Japanese language words and grammatical inflections which kanji does not cover, the katakana syllabary is primarily used for transcription of foreign language words into Japanese every Char is very easy to use and access, no special treatment is necessary.
  • 29. Half-width kana 半角カナ • Half-width kana (半角カナ Hankaku kana) are katakana characters displayed at half their normal width (a 2:1 aspect ratio), instead of the usual square (1:1) aspect ratio. • Half-width kana were used in the early days of Japanese computing, to allow Japanese characters to be displayed on the same grid as monospaced fonts of Latin characters. • Half-width hiragana or kanji were not used. • Half-width kana characters are not generally used today, but find some use in specific settings, such as cash register displays, on shop receipts, and Japanese digital television and DVD subtitles. 注 意 ! those kind of char’s can be a pain, so a good program will make a conversion from half to full size Katakana.
  • 30. String s1 = "アナタ"; String s2 = "アナタ"; ERXStringUtilitiesEXTENDED.changeHanKatakanaToZenkakuKatakana(s1); // RESULT = "アナタ" s1.equalsIgnoreCase(s2) // RESULT = false s1.length() // RESULT = 3 s2.length() // RESULT = 3 Half-width kana 半角カナ
  • 31. Japanese Alphabet • 漢字 Kanji (Chinese characters) • ひらがな Hiragana (Japanese Alphabet) • カタカナ Katakana (Alphabet for Foreign Words) •ローマ字 Romaji (English characters) • 記号 Kigo (Sign) • 絵文字 Emoji (Smilies) • 外字 Gaiji (Self-made characters) • 振り仮名 Furigana
  • 33. NUMBER 数字 • like with Space also Numbers have variations. • single Byte (Hankaku) • double Byte (Zenkaku) • chinese Char version (Kanji)
  • 34. • Hankaku (Single) - 0123456789 • Zenkaku - 0123456789 • Kanji - 0 is 零 or 〇 1 is 一 or 壱  /  2 is 二 or 弐  /  3 is 三 or 参 四五六七八九 to convert every Number into single size before storing in the database is the easy way to go.
  • 35. String s1 = “0123456789”; String s2 = "0123456789"; ERXStringUtilities.isDigitsOnly(s1); // RESULT = true ERXStringUtilities.isDigitsOnly(s2); // RESULT = true s1.equalsIgnoreCase(s2); // RESULT = false isDigitsOnly
  • 36. replace double to single String s = "0123456789"; ERXStringUtilitiesEXTENDED.changeZenkakuNumberToHanNumber(s); // RESULT = “0123456789”
  • 38. LETTER 英字 • Everybody loves the simple 26 characters, that in most School takes 2 years to learn. • In some Countries there are variations like German with ÜÖÄ
  • 39. LETTER 英字 • There is for each Letter a double byte Letter • ‘U‘ == ‘U ’ to convert every Letter into single size before storing in the database is the easy way to go.
  • 40. String s1 = "BC"; String s2 = "BC"; s1.equalsIgnoreCase(s2); // RESULT = false s1 = ERXStringUtilitiesEXTENDED.changeZenkakuEijiToHanEiji(s2); // RESULT = ‘BC’ LETTER 英字
  • 41. Japanese Alphabet • 漢字 Kanji (Chinese characters) • ひらがな Hiragana (Japanese Alphabet) • カタカナ Katakana (Alphabet for Foreign Words) • ローマ字 Romaji (English characters) •記号 Kigo (Sign) • 絵文字 Emoji (Smilies) • 外字 Gaiji (Self-made characters) • 振り仮名 Furigana
  • 43. Sign 記号 • For each Sign there is a double byte counterpart • ‘!‘ == ‘! ’ to convert every Sign into single size before storing in the database is the easy way to go.
  • 44. String s1 = "!@#$%^&*()"; String s2 = "!@#$%^&*()"; s1 = ERXStringUtilitiesEXTENDED.changeZenkakuKigouToHanKigou(s2); // RESULT = ‘!@#$%^&*()’ Sign 記号
  • 46. SPACE スペース • String a = “ “; • String b = “ ”; a == space char b == double-size space char to convert every Number into single size before storing in the database is the easy way to go.
  • 47. // head and tail are 3 space chars String s = “ A B C ”; s.trim(); // RESULT = ‘A B C’ ERXStringUtilities.trimString(s); // RESULT = ‘A B C’ ERXStringUtilitiesEXTENDED.trimStringWithZenkaku(s); // RESULT = ‘A B C’ trim
  • 48. // head and tail are 3 japanese ZENKAKU(double byte) space chars String s = “   A B C   ”; s.trim(); // RESULT = ‘   A B C   ’ ERXStringUtilities.trimString(s); // RESULT = ‘   A B C   ’ ERXStringUtilitiesEXTENDED.trimStringWithZenkaku(s); // RESULT = ‘A B C’ better trim
  • 49. // between A and B are 2 single space + 2 double space + 2 single space String s = “A    B”; s.replace(" ", ""); // RESULT = ‘A  B’ ERXStringUtilities.removeCharacters(s, " "); // RESULT = ‘A  B’ ERXStringUtilitiesEXTENDED.changeZenkakuToHanKakaku(s).replace(" ", ""); // RESULT = ‘ABC’ remove Space between chars
  • 50. Japanese Alphabet • 漢字 Kanji (Chinese characters) • ひらがな Hiragana (Japanese Alphabet) • カタカナ Katakana (Alphabet for Foreign Words) • ローマ字 Romaji (English characters) • 記号 Kigo (Sign) •絵文字 Emoji (Smilies) • 外字 Gaiji (Self-made characters) • 振り仮名 Furigana
  • 52. 絵文字 Emoji (Smilies) • Emoji (絵文字); Japanese pronunciation: [emodʑi] is the Japanese term for the ideograms or smileys used in Japanese electronic messages and webpages. • Emoji pictograms by au are specified using the IMG tag. SoftBank Mobile emoji are wrapped between SI/SO escape sequences, and support colors and animation. DoCoMo's emoji are the most compact to transmit while au's version is more flexible based on open standards. If you are creating a CMS or Data Entry like Blog, Forum or whatever else, you will have to deal with this Emoji. Japanese People loves to use it.
  • 53. WOEmoji last year WOWODC 2012, I spoke about SnoWOman CMS and there is a Framework named WOEmoji, with using this Framework it is easy to convert Emojis for saving to the database and will automatically working also on Windofs or Android devices. Version 2 of this Framework(working on it) can also convert to the new open standard Emoji that is under developing just right now in Japan. I am a payed supporter of this Project and waiting for delivery, so WOEmoji can be updated.
  • 54. Japanese Alphabet • 漢字 Kanji (Chinese characters) • ひらがな Hiragana (Japanese Alphabet) • カタカナ Katakana (Alphabet for Foreign Words) • ローマ字 Romaji (English characters) • 記号 Kigo (Sign) • 絵文字 Emoji (Smilies) •外字 Gaiji (Self-made characters) • 振り仮名 Furigana
  • 55. 外字 Gaiji (Self-made characters) • Gaiji (外字), literally meaning "external characters", are kanji that are not represented in existing Japanese encoding systems.These include variant forms of common kanji that need to be represented alongside the more conventional glyph in reference works, and can include non-kanji symbols as well. Win XP : the had only a few 1000 Kanjis and it wasn’t easy to use some Kanjis that was not available. so People started with creating their own, also the look was sometimes different. WinVista : you can see the font is a little different. But you have to buy this 1500 char Gaiji Package for about USD 500.- OS X : works out of the Box and it is free.
  • 56. Gaiji 外字 Editor • This is a old Gaiji Editor, so the user could make his own characters and that was nice. it started with the first version of Win. but now with the Internet there is a problem, because lot of People really recognize that this character can bee seen only on this one machine, and after pushing it up via mail or data entry into a database, it looks different on every other machine. so need to stripe out this characters and give a feedback to not use that.
  • 57. ERXStringUtilitiesEXTENDED.delete_ModelDependenceCharacters(true, s, 200, false, false); Because i don’t have a Win Machine here, so I wasn’t able to create a Sample-string, but their is a command for deleting that kind of character Area. Gaiji 外字
  • 58. Japanese Alphabet • 漢字 Kanji (Chinese characters) • ひらがな Hiragana (Japanese Alphabet) • カタカナ Katakana (Alphabet for Foreign Words) • ローマ字 Romaji (English characters) • 記号 Kigo (Sign) • 絵文字 Emoji (Smilies) • 外字 Gaiji (Self-made characters) •振り仮名 Furigana
  • 59. Furigana 振り仮名 • Furigana (振り仮名) is a Japanese reading aid, consisting of smaller kana, or syllabic characters, printed next to a kanji (ideographic character) or other character to indicate its pronunciation. It is typically used to clarify rare, nonstandard or ambiguous readings, or in children's or learners' materials.
  • 61. Encoding • UTF-8 • EUC-JP • Shift JIS • ISO/IEC 2022 • and some more ...
  • 62. UTF-8 • UTF-8 (UCS Transformation Format—8-bit[1]) is a variable-width encoding that can represent every character in the Unicode character set. It was designed for backward compatibility with ASCII and to avoid the complications of endianness and byte order marks in UTF-16 and UTF-32. We use for every project UTF-8 now, and you are mostly save and have not take care about other Encoding, but...
  • 63. EUC-JP • EUC-JP Extended Unix Code Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese. • The structure of EUC is based on the ISO-2022 standard, which specifies a way to represent character sets containing a maximum of 94 characters, or 8836 (942) characters, or 830584 (943) characters, as sequences of 7-bit codes. Only ISO-2022 compliant character sets can have EUC forms. Up to four coded character sets (referred to as G0, G1, G2, and G3 or as code sets 0, 1, 2, and 3) can be represented with the EUC scheme. G0 is almost always an ISO-646 compliant coded character set (e.g. US-ASCII/KS X 1003/ISO 646:KR in EUC-KR and US-ASCII/the lower half of JIS X 0201 in EUC-JP) that is invoked on GL (i.e. with the most significant bit cleared). If you have to do work with some Win Machines it can happen that you have to import Data that are encoded with this encoding. For my experience I never used that.
  • 64. Shift JIS • Shift JIS (Shift Japanese Industrial Standards, also SJIS, MIME name Shift_JIS) is a character encoding for the Japanese language, originally developed by a Japanese company called ASCII Corporation in conjunction with Microzoft and standardized as JIS X 0208 Appendix 1. This is the most used encoding in Japan, and you can be sure that if you get Data from an existing Database or have to connect to an Database you have to deal with this. We did a lot of SJIS - UTF-8 conversion in the past.
  • 65. ISO/IEC 2022 • ISO/IEC 2022 Information technology—Character code structure and extension techniques, is an ISO standard (equivalent to the ECMA standard ECMA-35[1] ) specifying • a technique for including multiple character sets in a single character encoding system, and • a technique for representing these character sets in both 7 and 8 bit systems using the same encoding. You have only to deal with that if you do some Mailing solutions, but I really don’t care about that anymore, JavaMail works just fine.
  • 66. Localizing your apps • Part 1 - WebObject • Part II - What is a multibyte Language • Part III - Combine multibyte Language with WebObjects • Part IV - multibyte & WOdka
  • 67. Localization ローカライズ • Localization of your App • Localization Data • Sorting
  • 69. ERXLocalizer // Writing Components and code with ERXLocalizer makes your life very easy // their are so many things you can do with it, so get comfortable with it. // Localized String from Code ERXLocalizer.defaultLocalizer().localizedStringForKey("Nav.Main"); // Localized String in HTML <wo:str value = "$localizer.Nav.Main" /> <wo:localized value="Nav.Main" /> * This is a bad example because I am using the power of the ‘dark force’ Inline Binding. You shouldn’t do that, * but I use it always. Sorry I am a bad guy.
  • 70. .strings in your App ‘Resources’ folder create a folder with Language-name + ‘.lproj’ make it a plist file with KeyValue. and save the File as UTF-16UTF-8 with UTF-8 it is easier to read and also git commits can be viewed.
  • 72. Localization of Data 1.Attributes in Entity 2. set Data in Edit-page 3. Display the Attribute depending on the Localizer [[eo]].name_en() or [[eo]].name_ja or [[eo]].valueForKey("name")
  • 74. Sorting 1 name (how it is written) furigana (how it is pronounce)
  • 75. Sorting 2 林森 漢字 Kanji (Chinese characters) Person 1 Person 2 ひらがな Hiragana or カタカナ Katakana (Japanese Alphabet)もり はやし Mr. Mori Mr. Hayashi
  • 76. Localizing your apps • Part 1 - WebObject • Part II - What is a multibyte Language • Part III - Combine multibyte Language with WebObjects • Part IV - multibyte & WOdka
  • 78. WOdkaLanguageEnums • Language name • Locale Code • Date format + 24 hours setting • Data for Flag information
  • 79. WOdkaCountryEnums • Country name • code2 : ISO Code for Country • code3 : ISO Code for Country • money : ERXMoneyEnums • language :WOdkaLanguageEnums • telephone code • tax : tax info • zip : zip format • company Mailing Format • family Mailing Format • Localized words : male, female, sexMale, sexFemale • flag : Path to Flag-data • continent : ERXContinentEnums • EU : ERXEuropeanUnionsEnums "[S][CR][T][_][F][_][L]" "[L] [F]様" family Mailing Format s = sex t = title f = first name l = last name cr = next line
  • 80. Thanks to • Masahiko TANI - A10 Objects Inc., (Japan) • Hiroyuki FUKUI - Astonish Create (Japan) Special Thanks to • PaulYU - Green orchid llc (USA)