I have read some other threads on this subject but I cannot understand what I am doing wrong.
I have a function
public function test($item)
{
if (! mb_detect_encoding($item, 'utf-8', true)) {
$item = utf8_encode($item);
}
return $item;
}
I am writing a test for this. I want to test a string that is not UTF-8
to see if this statement is hit. I am having trouble creating the test string.
$contents = file_get_contents('CyrillicKOI8REncoded.txt');
var_dump(mb_detect_encoding($contents));
$sanitized = $this->test($contents);
var_dump(mb_detect_encoding($sanitized));
Initially I used file_get_contents
on a file I encoded in sublime as Cyrillic (KOI8-R)
, HEX
and DOS (CP 437)
as it has been stated that file_get_contents
ignores the encoding. This seems to be true as the characters returned are a jumbled mess.
That said, every time I use mb_detect_encoding
on these variables, I always get ASCII
or UTF-8
. The statement is never triggered because ASCII
is a subset of UTF-8
.
So I have tried mb_convert_encoding
and iconv
to convert a basic string to UTF-16
, UTF-32
, base64
, hex
etc etc but every time mb_detect_encoding
returns ASCII
or UTF-8
In my tests I want to assert the encoding type before and after this function is called.
$sanitized = $this->test($contents);
$this->assertEquals('UTF-32', mb_detect_encoding($contents));
$this->assertEquals('UTF-8', mb_detect_encoding($sanitized));
I cannot understand what basic mistake I am doing to constantly get ASCII
or UTF-8
returned from mb_detect_encoding
.
Aucun commentaire:
Enregistrer un commentaire