Quantcast
Viewing all articles
Browse latest Browse all 10

#1,001 – Representing Unicode Surrogate Pairs

UTF-16 encodes Unicode code points above U+FFFF using surrogate pairs that take up 4 bytes.

You can specify a surrogate pair within a string literal by inserting the character directly into the string (provided that you have a keyboard that can insert the character):

            string myString = "𠈓";   // CJK Ideograph

You can also represent the surrogate pair within a string literal using the \Unnnnnnnn (4 byte) syntax to specify the Unicode code point or the \unnnn\unnnn syntax to specify the encoded surrogate pair value.

            string s1 = "\U00020213";    // Codepoint E+20213
            string s2 = "\uD840\uDE13";  // Surrogate pair

Image may be NSFW.
Clik here to view.
1001_001

Note that because a surrogate pair requires more then 2 bytes, you cannot represent a surrogate pair within a single character (System.Char) literal.


Filed under: Basics Tagged: Basics, C#, String, Surrogate Pair, Unicode Image may be NSFW.
Clik here to view.
Image may be NSFW.
Clik here to view.

Viewing all articles
Browse latest Browse all 10

Trending Articles