"Character sets and iconv" PHP source code

First play

<?php

//note that this script file is UTF-8

//set browser to ISO-8859-1
header("Content-Type: text/html; charset=ISO-8859-1;");

$utf8_sentence = 'That will be £500 please';

//gives [That will be Â£500 please] as UTF-8 is being displayed
//as ISO-8859-1
echo $utf8_sentence . ' ';

$iso_sentence = iconv('UTF-8', 'ISO-8859-1', $utf8_sentence);

//gives [That will be £500 please] as no mismatch
//between actual character set of string
//and browser
echo $iso_sentence . ' ';

//YOU TRY IT! When viewing this in your browser,
//set the page's encoding to UTF-8 and you will
//see the mojibake reverse!

?>

Within reason

<?php



//some Korean (contains two space characters)
$utf8_sentence = '연예가 뒷 이야기';

//gives [ì—°ì˜ˆê°€ ë’· ì ´ì•¼ê¸°] as UTF-8 is being displayed
//as ISO-8859-1

//gives [Notice: iconv(): Detected an illegal character in input
string]
$iso_sentence = iconv('UTF-8', 'ISO-8859-1', $utf8_sentence);

//gives an empty string
var_dump($iso_sentence);

?>

First transliteration

<?php




//gives [ì—°ì˜ˆê°€ ë’· ì´ì•¼ê¸°] as UTF-8 is being displayed
//as ISO-8859-1

//approximate characters that aren't in target character set
$iso_sentence = iconv('UTF-8', 'ISO-8859-1//TRANSLIT',
$utf8_sentence);

//gives [??? ? ???]
echo $iso_sentence . ' ';

?>

More realistic transliteration (extended)

<?php


//set browser to UTF-8
header("Content-Type: text/html; charset=UTF-8;");

//some German
$utf8_sentence = 'Weiß, Goldmann, Göbel, Weiss, Göthe, Goethe und
Götz';

//fine as UTF-8 is being displayed as UTF-8

$trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence);

//gives [Weiss, Goldmann, G?bel, Weiss, G?the, Goethe und G?tz]
//which is not quite what we expected (only 'ß' has been flattened)
echo $trans_sentence . ' ';

//BUT iconv interacts with system locale setting so let's have a
play:

$current_locale = setlocale(LC_ALL, '0');
//gives, for me, "C" which is a kind of nondescript default
echo $current_locale . ' ';

//we set the locale of the *target* character set
setlocale(LC_ALL, 'en_GB');

//try again...

//gives [Weiss, Goldmann, Gobel, Weiss, Gothe, Goethe und Gotz]
//which is our original string flattened into 7-bit ASCII!

//out of curiosity...
setlocale(LC_ALL, 'de_DE');


//gives [Weiss, Goldmann, Goebel, Weiss, Goethe, Goethe und Goetz]
//which is exactly how a German would transliterate those
//umlauted characters if forced to use 7-bit ASCII!
//(because really ä = ae, ö = oe and ü = ue)

?>

Ignore example

<?php




//gives [ì—°ì˜ˆê°€ ë’· ì´ì•¼ê¸°] as UTF-8 is being displayed
//as ISO-8859-1

//discard characters that aren't in target character set
//STILL gives [Notice: iconv(): Detected an illegal character in
input string]
$iso_sentence = iconv('UTF-8', 'ISO-8859-1//IGNORE', $utf8_sentence);

//gives " " (two space characters)
var_dump($iso_sentence);

?>

ob_iconv_handler

<?php


//character set of PHP scripts etc
iconv_set_encoding('internal_encoding', 'UTF-8');

//character set of browser output
//(sends HTTP header of "Content-Type: text/html; charset=ISO-8859-
1;")
iconv_set_encoding('output_encoding', 'ISO-8859-1//TRANSLIT');

ob_start('ob_iconv_handler'); //start output buffering

//Unicode string
$utf8_sentence = 'The Japanese title is "指輪物語"';

//when buffer is flushed, outputs [The Japanese title is "????"]
echo $utf8_sentence;

?>

iconv_strlen()

<?php


//set browser to UTF-8
header("Content-Type: text/html; charset=UTF-8;");

//some Russian (13 characters)
$utf8_sentence = 'Правительство';

//gives 13 which is correct
echo iconv_strlen($utf8_sentence, 'UTF-8') . ' ';

//let's try core PHP
//gives 26 (the *byte* count). Oops!
echo strlen($utf8_sentence) . ' ';

?>

Inter-Japanese conversion (not on presentation)

<?php


//set browser to EUC-JP (a Japanese character set)
header("Content-Type: text/html; charset=EUC-JP;");

//some Japanese
$utf8_sentence = '一斗缶に詰められた遺体は、藤森容疑者の妻と息子と判明。';

//gives mojibake as UTF-8 is being displayed as EUC-JP

$euc_sentence = iconv('UTF-8', 'EUC-JP', $utf8_sentence);

//gives intact Japanese string
echo $euc_sentence . ' ';

?>

"Character sets and iconv" PHP source code

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to "Character sets and iconv" PHP source code

Similar to "Character sets and iconv" PHP source code (20)

More from Daniel_Rhodes

More from Daniel_Rhodes (9)

Recently uploaded

Recently uploaded (20)

"Character sets and iconv" PHP source code