Quantcast
Channel: 2,000 Things You Should Know About C# » Unicode
Viewing all articles
Browse latest Browse all 10

#1,002 – Specifying Character Encoding when Writing to a File

$
0
0

In .NET, string data is stored in memory as Unicode data encoded as UTF-16 (2 bytes per character, or 4 bytes for surrogate pairs).

When you persist string data out to a file, however, you must be aware of what encoding is being used.  In the example below, we use a StreamWriter to write string data to a file.  StreamWriter by default uses UTF-8 as  the encoding.

            string s1 = "A";             // U+0041
            string s2 = "\u00e9";        // U+00E9 accented e
            string s3 = "\u0100";        // Capital A with bar
            string s4 = "\U00020213";    // CJK ideograph (d840, de13 surrogate)

            using (StreamWriter sw = new StreamWriter(@"C:\Users\Sean\Documents\sometext.txt"))
            {
                sw.WriteLine(s1);
                sw.WriteLine(s2);
                sw.WriteLine(s3);
                sw.WriteLine(s4);
            }

1002_001

We could also explicitly specify a UTF-16 encoding (Encoding.Unicode) when creating the StreamWriter object.

            using (StreamWriter sw = new StreamWriter(@"C:\Users\Sean\Documents\sometext.txt", false, Encoding.Unicode))

1002_002


Filed under: Basics Tagged: Basics, C#, Encoding, StreamWriter, Unicode

Viewing all articles
Browse latest Browse all 10

Trending Articles