snaggen@programming.dev to Programming@programming.dev · 2 years agoThe Absolute Minimum Every Software Developer Must Know About Unicode in 2023 (Still No Excuses!)tonsky.meexternal-linkmessage-square2fedilinkarrow-up10arrow-down10
arrow-up10arrow-down1external-linkThe Absolute Minimum Every Software Developer Must Know About Unicode in 2023 (Still No Excuses!)tonsky.mesnaggen@programming.dev to Programming@programming.dev · 2 years agomessage-square2fedilink
minus-squareabhibeckert@lemmy.worldlinkfedilinkarrow-up0·edit-22 years agoI love the comparison of string length of the same UTF-8 string in four programming languages (only the last one is correct, by the way): Python 3: len(“🤦🏼♂️”) 5 JavaScript / Java / C#: “🤦🏼♂️”.length 7 Rust: println!(“{}”, “🤦🏼♂️”.len()); 17 Swift: print(“🤦🏼♂️”.count) 1
minus-squareWalnut356@programming.devlinkfedilinkarrow-up1·edit-22 years agoThat depends on your definition of correct lmao. Rust explicitly counts utf-8 scalar values, because that’s the length of the raw bytes contained in the string. There are many times where that value is more useful than the grapheme count.
I love the comparison of string length of the same UTF-8 string in four programming languages (only the last one is correct, by the way):
Python 3:
JavaScript / Java / C#:
Rust:
Swift:
That depends on your definition of correct lmao. Rust explicitly counts utf-8 scalar values, because that’s the length of the raw bytes contained in the string. There are many times where that value is more useful than the grapheme count.