Complex performance and correctness issues across multiple encodings require deep TextDecoder expertise.
The issue reports both correctness failures in multiple encodings and significant performance problems in TextDecoder implementations. While the requirements are somewhat clear from benchmark comparisons, this requires deep knowledge of text encoding standards and performance optimization. The maintainer discussion indicates this may involve cross-browser coordination.
Encodings that return invalid results:
ibm866 (fails at even ascii input)koi8-uwindows-874windows-1252windows-1253windows-1255gb18030):
gbk (should be identical to gb18030 but it is instead broken)big5euc-jpiso-2022-jpshift_jis (fails at even ascii input)euc-krUnimplemented encodings that throw:
iso-8859-16x-user-definedIf built without icu, utf-16le encoding also returns invalid results:
> new TextDecoder('utf-16le').decode(Uint16Array.of(0xd800))
'�' // correct
'\ud800' // no ICU
utf-8 (aka default) TextDecoder is much slower on ascii input than it can and should be
1.3x on 4096 bytes, ~3x on 1 MiB inputbuffer.toString() too
It's much slower on ASCII input than a checked js impl (same 1.3x-3x)windows-1252 aka new TextDecoder('ascii') aka new TextDecoder('latin1')
is ~2x-4x slower than an optimized impl on ascii inputwindows-1252 aka new TextDecoder('latin1')
is ~6x-12x slower than an optimized impl on latin1 inputwindows-1252 is ~7x-12x slower than an optimized js impliso-8859-3, iso-8859-6, iso-8859-7, iso-8859-8, iso-8859-8-i, windows-1253, windows-1255, windows-1257windows-1252 are >=10x slower than the js impl on ascii input
(windows-1252 is only ~2-4x slower)Nothing of the above requires any changes on the native side, I compared to a somewhat optimized JS implementation
See https://docs.google.com/spreadsheets/d/1pdEefRG6r9fZy61WHGz0TKSt8cO4ISWqlpBN5KntIvQ/edit
See tests in https://github.co
Claim this issue to let others know you're working on it. You'll earn 10 points when you complete it!