What Are The Drone Anti-jamming Systems Technology?
"Character sets and iconv" PHP source code
1. First play
<?php
//note that this script file is UTF-8
//set browser to ISO-8859-1
header("Content-Type: text/html; charset=ISO-8859-1;");
$utf8_sentence = 'That will be £500 please';
//gives [That will be £500 please] as UTF-8 is being displayed
//as ISO-8859-1
echo $utf8_sentence . '<br>';
$iso_sentence = iconv('UTF-8', 'ISO-8859-1', $utf8_sentence);
//gives [That will be £500 please] as no mismatch
//between actual character set of string
//and browser
echo $iso_sentence . '<br>';
//YOU TRY IT! When viewing this in your browser,
//set the page's encoding to UTF-8 and you will
//see the mojibake reverse!
?>
Within reason
<?php
//note that this script file is UTF-8
//set browser to ISO-8859-1
header("Content-Type: text/html; charset=ISO-8859-1;");
//some Korean (contains two space characters)
$utf8_sentence = '연예가 뒷 이야기';
//gives [연예가 ë’· ì ´ì•¼ê¸°] as UTF-8 is being displayed
//as ISO-8859-1
echo $utf8_sentence . '<br>';
//gives [Notice: iconv(): Detected an illegal character in input
string]
$iso_sentence = iconv('UTF-8', 'ISO-8859-1', $utf8_sentence);
//gives an empty string
var_dump($iso_sentence);
?>
2. First transliteration
<?php
//note that this script file is UTF-8
//set browser to ISO-8859-1
header("Content-Type: text/html; charset=ISO-8859-1;");
//some Korean (contains two space characters)
$utf8_sentence = '연예가 뒷 이야기';
//gives [연예가 ë’· ì´ì•¼ê¸°] as UTF-8 is being displayed
//as ISO-8859-1
echo $utf8_sentence . '<br>';
//approximate characters that aren't in target character set
$iso_sentence = iconv('UTF-8', 'ISO-8859-1//TRANSLIT',
$utf8_sentence);
//gives [??? ? ???]
echo $iso_sentence . '<br>';
?>
More realistic transliteration (extended)
<?php
//note that this script file is UTF-8
//set browser to UTF-8
header("Content-Type: text/html; charset=UTF-8;");
//some German
$utf8_sentence = 'Weiß, Goldmann, Göbel, Weiss, Göthe, Goethe und
Götz';
//fine as UTF-8 is being displayed as UTF-8
echo $utf8_sentence . '<br>';
$trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence);
//gives [Weiss, Goldmann, G?bel, Weiss, G?the, Goethe und G?tz]
//which is not quite what we expected (only 'ß' has been flattened)
echo $trans_sentence . '<br>';
//BUT iconv interacts with system locale setting so let's have a
play:
$current_locale = setlocale(LC_ALL, '0');
//gives, for me, "C" which is a kind of nondescript default
echo $current_locale . '<br>';
3. //we set the locale of the *target* character set
setlocale(LC_ALL, 'en_GB');
//try again...
$trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence);
//gives [Weiss, Goldmann, Gobel, Weiss, Gothe, Goethe und Gotz]
//which is our original string flattened into 7-bit ASCII!
echo $trans_sentence . '<br>';
//out of curiosity...
setlocale(LC_ALL, 'de_DE');
$trans_sentence = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_sentence);
//gives [Weiss, Goldmann, Goebel, Weiss, Goethe, Goethe und Goetz]
//which is exactly how a German would transliterate those
//umlauted characters if forced to use 7-bit ASCII!
//(because really ä = ae, ö = oe and ü = ue)
echo $trans_sentence . '<br>';
?>
Ignore example
<?php
//note that this script file is UTF-8
//set browser to ISO-8859-1
header("Content-Type: text/html; charset=ISO-8859-1;");
//some Korean (contains two space characters)
$utf8_sentence = '연예가 뒷 이야기';
//gives [연예가 ë’· ì´ì•¼ê¸°] as UTF-8 is being displayed
//as ISO-8859-1
echo $utf8_sentence . '<br>';
//discard characters that aren't in target character set
//STILL gives [Notice: iconv(): Detected an illegal character in
input string]
$iso_sentence = iconv('UTF-8', 'ISO-8859-1//IGNORE', $utf8_sentence);
//gives " " (two space characters)
var_dump($iso_sentence);
?>
4. ob_iconv_handler
<?php
//note that this script file is UTF-8
//character set of PHP scripts etc
iconv_set_encoding('internal_encoding', 'UTF-8');
//character set of browser output
//(sends HTTP header of "Content-Type: text/html; charset=ISO-8859-
1;")
iconv_set_encoding('output_encoding', 'ISO-8859-1//TRANSLIT');
ob_start('ob_iconv_handler'); //start output buffering
//Unicode string
$utf8_sentence = 'The Japanese title is "指輪物語"';
//when buffer is flushed, outputs [The Japanese title is "????"]
echo $utf8_sentence;
?>
iconv_strlen()
<?php
//note that this script file is UTF-8
//set browser to UTF-8
header("Content-Type: text/html; charset=UTF-8;");
//some Russian (13 characters)
$utf8_sentence = 'Правительство';
//gives 13 which is correct
echo iconv_strlen($utf8_sentence, 'UTF-8') . '<br>';
//let's try core PHP
//gives 26 (the *byte* count). Oops!
echo strlen($utf8_sentence) . '<br>';
?>
5. Inter-Japanese conversion (not on presentation)
<?php
//note that this script file is UTF-8
//set browser to EUC-JP (a Japanese character set)
header("Content-Type: text/html; charset=EUC-JP;");
//some Japanese
$utf8_sentence = '一斗缶に詰められた遺体は、藤森容疑者の妻と息子と判明。';
//gives mojibake as UTF-8 is being displayed as EUC-JP
echo $utf8_sentence . '<br>';
$euc_sentence = iconv('UTF-8', 'EUC-JP', $utf8_sentence);
//gives intact Japanese string
echo $euc_sentence . '<br>';
?>