You can use the string.Length property to get the length (number of characters) of a string. This only works, however, for Unicode code points that are no larger than U+FFFF. This set of code points is known as the Basic Multilingual Plane (BMP).
Unicode code points outside of the BMP are represented in UTF-16 using 4 byte surrogate pairs, rather than using 2 bytes.
To correctly count the number of characters in a string that may contain code points higher than U+FFFF, you can use the StringInfo class (from System.Globalization).
// 3 Latin (ASCII) characters string simple = "abc"; // 3 character string where one character // is a surrogate pair string containsSurrogatePair = "A𠈓C"; // Length=3 (correct) Console.WriteLine(string.Format("Length 1 = {0}", simple.Length)); // Length=4 (not quite correct) Console.WriteLine(string.Format("Length 2 = {0}", containsSurrogatePair.Length)); // Better, reports Length=3 StringInfo si = new StringInfo(containsSurrogatePair); Console.WriteLine(string.Format("Length 3 = {0}", si.LengthInTextElements));
Image may be NSFW.
Clik here to view.
Filed under: Strings Tagged: C#, Length, Strings, Surrogate Pairs, Unicode, UTF-16 Image may be NSFW.
Clik here to view.

Clik here to view.
