Skip to main content
GoodFirstPicks
DashboardIssuesReposLeaderboard

GoodFirstPicks by Leaveitblank © 2026

CreatorRequest a RepoPrivacy PolicyTerms of Service
TextDecoder is wrong and very slow | GoodFirstPicks

TextDecoder is wrong and very slow

nodejs/node 12 comments 1mo ago
View on GitHub
highopenScope: somewhat clearSkill match: noNode.jsJavaScript

Why this is a good first issue

Complex performance and correctness issues across multiple encodings require deep TextDecoder expertise.

AI Summary

The issue reports both correctness failures in multiple encodings and significant performance problems in TextDecoder implementations. While the requirements are somewhat clear from benchmark comparisons, this requires deep knowledge of text encoding standards and performance optimization. The maintainer discussion indicates this may involve cross-browser coordination.

Issue Description

Correctness

Encodings that return invalid results:

  • Single-byte:
    • ibm866 (fails at even ascii input)
    • koi8-u
    • windows-874
    • windows-1252
    • windows-1253
    • windows-1255
  • Multi-byte (all except gb18030):
    • gbk (should be identical to gb18030 but it is instead broken)
    • big5
    • euc-jp
    • iso-2022-jp
    • shift_jis (fails at even ascii input)
    • euc-kr

Unimplemented encodings that throw:

  • iso-8859-16
  • x-user-defined

If built without icu, utf-16le encoding also returns invalid results:

> new TextDecoder('utf-16le').decode(Uint16Array.of(0xd800))
'�' // correct
'\ud800' // no ICU

Performance

  • utf-8 (aka default) TextDecoder is much slower on ascii input than it can and should be 1.3x on 4096 bytes, ~3x on 1 MiB input
  • The above applies to buffer.toString() too It's much slower on ASCII input than a checked js impl (same 1.3x-3x)
  • windows-1252 aka new TextDecoder('ascii') aka new TextDecoder('latin1') is ~2x-4x slower than an optimized impl on ascii input
  • windows-1252 aka new TextDecoder('latin1') is ~6x-12x slower than an optimized impl on latin1 input
  • windows-1252 is ~7x-12x slower than an optimized js impl
  • Other single-byte encodings that are significantly slower than js impl even on non-ascii input: iso-8859-3, iso-8859-6, iso-8859-7, iso-8859-8, iso-8859-8-i, windows-1253, windows-1255, windows-1257
  • None of the single-byte encodings are faster than the js impl even on non-ascii input
  • All of the single-byte encodings except windows-1252 are >=10x slower than the js impl on ascii input (windows-1252 is only ~2-4x slower)

References

Nothing of the above requires any changes on the native side, I compared to a somewhat optimized JS implementation

See https://docs.google.com/spreadsheets/d/1pdEefRG6r9fZy61WHGz0TKSt8cO4ISWqlpBN5KntIvQ/edit

See tests in https://github.co

GitHub Labels

performance

Want to work on this?

Claim this issue to let others know you're working on it. You'll earn 10 points when you complete it!

Risk Flags

  • performance-sensitive
  • encoding expertise required
  • potential cross-browser implications
Loading labels...

Details

Points10 pts
Difficultyhigh
Scopesomewhat clear
Skill Matchno
Test Focusedno