Επεξεργασία

Κοινή χρήση μέσω


How to use Utf8JsonReader in System.Text.Json

This article shows how you can use the Utf8JsonReader type for building custom parsers and deserializers.

Utf8JsonReader is a high-performance, low allocation, forward-only reader for UTF-8 encoded JSON text. The text is read from a ReadOnlySpan<byte> or ReadOnlySequence<byte>. Utf8JsonReader is a low-level type that can be used to build custom parsers and deserializers. (The JsonSerializer.Deserialize methods use Utf8JsonReader under the covers.)

The following example shows how to use the Utf8JsonReader class. This code assumes that the jsonUtf8Bytes variable is a byte array that contains valid JSON, encoded as UTF-8.

var options = new JsonReaderOptions
{
    AllowTrailingCommas = true,
    CommentHandling = JsonCommentHandling.Skip
};
var reader = new Utf8JsonReader(jsonUtf8Bytes, options);

while (reader.Read())
{
    Console.Write(reader.TokenType);

    switch (reader.TokenType)
    {
        case JsonTokenType.PropertyName:
        case JsonTokenType.String:
            {
                string? text = reader.GetString();
                Console.Write(" ");
                Console.Write(text);
                break;
            }

        case JsonTokenType.Number:
            {
                int intValue = reader.GetInt32();
                Console.Write(" ");
                Console.Write(intValue);
                break;
            }

            // Other token types elided for brevity
    }
    Console.WriteLine();
}
' This code example doesn't apply to Visual Basic. For more information, go to the following URL:
' https://learn.microsoft.com/dotnet/standard/serialization/system-text-json-how-to#visual-basic-support

Note

Utf8JsonReader can't be used directly from Visual Basic code. For more information, see Visual Basic support.

Filter data using Utf8JsonReader

The following example shows how to synchronously read a file and search for a value.

using System.Text;
using System.Text.Json;

namespace SystemTextJsonSamples
{
    public class Utf8ReaderFromFile
    {
        private static readonly byte[] s_nameUtf8 = Encoding.UTF8.GetBytes("name");
        private static ReadOnlySpan<byte> Utf8Bom => new byte[] { 0xEF, 0xBB, 0xBF };

        public static void Run()
        {
            // ReadAllBytes if the file encoding is UTF-8:
            string fileName = "UniversitiesUtf8.json";
            ReadOnlySpan<byte> jsonReadOnlySpan = File.ReadAllBytes(fileName);

            // Read past the UTF-8 BOM bytes if a BOM exists.
            if (jsonReadOnlySpan.StartsWith(Utf8Bom))
            {
                jsonReadOnlySpan = jsonReadOnlySpan.Slice(Utf8Bom.Length);
            }

            // Or read as UTF-16 and transcode to UTF-8 to convert to a ReadOnlySpan<byte>
            //string fileName = "Universities.json";
            //string jsonString = File.ReadAllText(fileName);
            //ReadOnlySpan<byte> jsonReadOnlySpan = Encoding.UTF8.GetBytes(jsonString);

            int count = 0;
            int total = 0;

            var reader = new Utf8JsonReader(jsonReadOnlySpan);

            while (reader.Read())
            {
                JsonTokenType tokenType = reader.TokenType;

                switch (tokenType)
                {
                    case JsonTokenType.StartObject:
                        total++;
                        break;
                    case JsonTokenType.PropertyName:
                        if (reader.ValueTextEquals(s_nameUtf8))
                        {
                            // Assume valid JSON, known schema
                            reader.Read();
                            if (reader.GetString()!.EndsWith("University"))
                            {
                                count++;
                            }
                        }
                        break;
                }
            }
            Console.WriteLine($"{count} out of {total} have names that end with 'University'");
        }
    }
}
' This code example doesn't apply to Visual Basic. For more information, go to the following URL:
' https://learn.microsoft.com/dotnet/standard/serialization/system-text-json-how-to#visual-basic-support

The preceding code:

  • Assumes the JSON contains an array of objects, and each object might contain a "name" property of type string.

  • Counts objects and "name" property values that end with "University".

  • Assumes the file is encoded as UTF-16 and transcodes it into UTF-8.

    A file encoded as UTF-8 can be read directly into a ReadOnlySpan<byte> by using the following code:

    ReadOnlySpan<byte> jsonReadOnlySpan = File.ReadAllBytes(fileName);
    

    If the file contains a UTF-8 byte order mark (BOM), remove it before passing the bytes to the Utf8JsonReader, since the reader expects text. Otherwise, the BOM is considered invalid JSON, and the reader throws an exception.

Here's a JSON sample that the preceding code can read. The resulting summary message is "2 out of 4 have names that end with 'University'":

[
  {
    "web_pages": [ "https://contoso.edu/" ],
    "alpha_two_code": "US",
    "state-province": null,
    "country": "United States",
    "domains": [ "contoso.edu" ],
    "name": "Contoso Community College"
  },
  {
    "web_pages": [ "http://fabrikam.edu/" ],
    "alpha_two_code": "US",
    "state-province": null,
    "country": "United States",
    "domains": [ "fabrikam.edu" ],
    "name": "Fabrikam Community College"
  },
  {
    "web_pages": [ "http://www.contosouniversity.edu/" ],
    "alpha_two_code": "US",
    "state-province": null,
    "country": "United States",
    "domains": [ "contosouniversity.edu" ],
    "name": "Contoso University"
  },
  {
    "web_pages": [ "http://www.fabrikamuniversity.edu/" ],
    "alpha_two_code": "US",
    "state-province": null,
    "country": "United States",
    "domains": [ "fabrikamuniversity.edu" ],
    "name": "Fabrikam University"
  }
]

Tip

For an asynchronous version of this example, see .NET samples JSON project.

Read from a stream using Utf8JsonReader

When reading a large file (a gigabyte or more in size, for example), you might want to avoid loading the entire file into memory at once. For this scenario, you can use a FileStream.

When using the Utf8JsonReader to read from a stream, the following rules apply:

  • The buffer containing the partial JSON payload must be at least as large as the largest JSON token within it so that the reader can make forward progress.
  • The buffer must be at least as large as the largest sequence of white space within the JSON.
  • The reader doesn't keep track of the data it has read until it completely reads the next TokenType in the JSON payload. So when there are bytes left over in the buffer, you have to pass them to the reader again. You can use BytesConsumed to determine how many bytes are left over.

The following code illustrates how to read from a stream. The example shows a MemoryStream. Similar code will work with a FileStream, except when the FileStream contains a UTF-8 BOM at the start. In that case, you need to strip those three bytes from the buffer before passing the remaining bytes to the Utf8JsonReader. Otherwise the reader would throw an exception, since the BOM is not considered a valid part of the JSON.

The sample code starts with a 4 KB buffer and doubles the buffer size each time it finds that the size is not large enough to fit a complete JSON token, which is required for the reader to make forward progress on the JSON payload. The JSON sample provided in the snippet triggers a buffer size increase only if you set a very small initial buffer size, for example, 10 bytes. If you set the initial buffer size to 10, the Console.WriteLine statements illustrate the cause and effect of buffer size increases. At the 4 KB initial buffer size, the entire sample JSON is shown by each call to Console.WriteLine, and the buffer size never has to be increased.

using System.Text;
using System.Text.Json;

namespace SystemTextJsonSamples
{
    public class Utf8ReaderPartialRead
    {
        public static void Run()
        {
            var jsonString = @"{
                ""Date"": ""2019-08-01T00:00:00-07:00"",
                ""Temperature"": 25,
                ""TemperatureRanges"": {
                    ""Cold"": { ""High"": 20, ""Low"": -10 },
                    ""Hot"": { ""High"": 60, ""Low"": 20 }
                },
                ""Summary"": ""Hot"",
            }";

            byte[] bytes = Encoding.UTF8.GetBytes(jsonString);
            var stream = new MemoryStream(bytes);

            var buffer = new byte[4096];

            // Fill the buffer.
            // For this snippet, we're assuming the stream is open and has data.
            // If it might be closed or empty, check if the return value is 0.
            stream.Read(buffer);

            // We set isFinalBlock to false since we expect more data in a subsequent read from the stream.
            var reader = new Utf8JsonReader(buffer, isFinalBlock: false, state: default);
            Console.WriteLine($"String in buffer is: {Encoding.UTF8.GetString(buffer)}");

            // Search for "Summary" property name
            while (reader.TokenType != JsonTokenType.PropertyName || !reader.ValueTextEquals("Summary"))
            {
                if (!reader.Read())
                {
                    // Not enough of the JSON is in the buffer to complete a read.
                    GetMoreBytesFromStream(stream, ref buffer, ref reader);
                }
            }

            // Found the "Summary" property name.
            Console.WriteLine($"String in buffer is: {Encoding.UTF8.GetString(buffer)}");
            while (!reader.Read())
            {
                // Not enough of the JSON is in the buffer to complete a read.
                GetMoreBytesFromStream(stream, ref buffer, ref reader);
            }
            // Display value of Summary property, that is, "Hot".
            Console.WriteLine($"Got property value: {reader.GetString()}");
        }

        private static void GetMoreBytesFromStream(
            MemoryStream stream, ref byte[] buffer, ref Utf8JsonReader reader)
        {
            int bytesRead;
            if (reader.BytesConsumed < buffer.Length)
            {
                ReadOnlySpan<byte> leftover = buffer.AsSpan((int)reader.BytesConsumed);

                if (leftover.Length == buffer.Length)
                {
                    Array.Resize(ref buffer, buffer.Length * 2);
                    Console.WriteLine($"Increased buffer size to {buffer.Length}");
                }

                leftover.CopyTo(buffer);
                bytesRead = stream.Read(buffer.AsSpan(leftover.Length));
            }
            else
            {
                bytesRead = stream.Read(buffer);
            }
            Console.WriteLine($"String in buffer is: {Encoding.UTF8.GetString(buffer)}");
            reader = new Utf8JsonReader(buffer, isFinalBlock: bytesRead == 0, reader.CurrentState);
        }
    }
}
' This code example doesn't apply to Visual Basic. For more information, go to the following URL:
' https://learn.microsoft.com/dotnet/standard/serialization/system-text-json-how-to#visual-basic-support

The preceding example sets no limit to how large the buffer can grow. If the token size is too large, the code could fail with an OutOfMemoryException exception. This can happen if the JSON contains a token that is around 1 GB or more in size, because doubling the 1 GB size results in a size that is too large to fit into an int32 buffer.

ref struct limitations

Because the Utf8JsonReader type is a ref struct, it has certain limitations. For example, it can't be stored as a field on a class or struct other than a ref struct.

To achieve high performance, Utf8JsonReader must be a ref struct, because it needs to cache the input ReadOnlySpan<byte> (which itself is a ref struct). In addition, the Utf8JsonReader type is mutable since it holds state. Therefore, pass it by reference rather than by value. Passing the Utf8JsonReader by value would result in a struct copy, and the state changes wouldn't be visible to the caller.

For more information about how to use ref structs, see Avoid allocations.

Read UTF-8 text

To achieve the best possible performance while using Utf8JsonReader, read JSON payloads already encoded as UTF-8 text rather than as UTF-16 strings. For a code example, see Filter data using Utf8JsonReader.

Read with multi-segment ReadOnlySequence

If your JSON input is a ReadOnlySpan<byte>, each JSON element can be accessed from the ValueSpan property on the reader as you go through the read loop. However, if your input is a ReadOnlySequence<byte> (which is the result of reading from a PipeReader), some JSON elements might straddle multiple segments of the ReadOnlySequence<byte> object. These elements would not be accessible from ValueSpan in a contiguous memory block. Instead, whenever you have a multi-segment ReadOnlySequence<byte> as input, poll the HasValueSequence property on the reader to figure out how to access the current JSON element. Here's a recommended pattern:

while (reader.Read())
{
    switch (reader.TokenType)
    {
        // ...
        ReadOnlySpan<byte> jsonElement = reader.HasValueSequence ?
            reader.ValueSequence.ToArray() :
            reader.ValueSpan;
        // ...
    }
}

Read multiple JSON documents

In .NET 9 and later versions, you can read multiple, white space–separated JSON documents from a single buffer or stream. By default, Utf8JsonReader throws an exception if it detects any non-white-space characters that trail the first top-level document. However, you can configure that behavior using the JsonReaderOptions.AllowMultipleValues flag.

JsonReaderOptions options = new() { AllowMultipleValues = true };
Utf8JsonReader reader = new("null {} 1 \r\n [1,2,3]"u8, options);

reader.Read();
Console.WriteLine(reader.TokenType); // Null

reader.Read();
Console.WriteLine(reader.TokenType); // StartObject
reader.Skip();

reader.Read();
Console.WriteLine(reader.TokenType); // Number

reader.Read();
Console.WriteLine(reader.TokenType); // StartArray
reader.Skip();

Console.WriteLine(reader.Read()); // False

When AllowMultipleValues is set to true, you can also read JSON from payloads that contain trailing data that's invalid JSON.

JsonReaderOptions options = new() { AllowMultipleValues = true };
Utf8JsonReader reader = new("[1,2,3]    <NotJson/>"u8, options);

reader.Read();
reader.Skip(); // Succeeds.
reader.Read(); // Throws JsonReaderException.

To stream multiple top-level values, use the DeserializeAsyncEnumerable<TValue>(Stream, Boolean, JsonSerializerOptions, CancellationToken) or DeserializeAsyncEnumerable<TValue>(Stream, JsonTypeInfo<TValue>, Boolean, CancellationToken) overload. By default, DeserializeAsyncEnumerable attempts to stream elements that are contained in a single, top-level JSON array. Pass true for the topLevelValues parameter to stream multiple top-level values.

ReadOnlySpan<byte> utf8Json = """[0] [0,1] [0,1,1] [0,1,1,2] [0,1,1,2,3]"""u8;
using var stream = new MemoryStream(utf8Json.ToArray());

var items = JsonSerializer.DeserializeAsyncEnumerable<int[]>(stream, topLevelValues: true);
await foreach (int[] item in items)
{
    Console.WriteLine(item.Length);
}

/* This snippet produces the following output:
 * 
 * 1
 * 2
 * 3
 * 4
 * 5
 */

Property name lookups

To look up property names, don't use ValueSpan to do byte-by-byte comparisons by calling SequenceEqual. Instead, call ValueTextEquals, because this method unescapes any characters that are escaped in the JSON. Here's an example that shows how to search for a property that's named "name":

private static readonly byte[] s_nameUtf8 = Encoding.UTF8.GetBytes("name");
while (reader.Read())
{
    switch (reader.TokenType)
    {
        case JsonTokenType.StartObject:
            total++;
            break;
        case JsonTokenType.PropertyName:
            if (reader.ValueTextEquals(s_nameUtf8))
            {
                count++;
            }
            break;
    }
}

Read null values into nullable value types

The built-in System.Text.Json APIs return only non-nullable value types. For example, Utf8JsonReader.GetBoolean returns a bool. It throws an exception if it finds Null in the JSON. The following examples show two ways to handle nulls, one by returning a nullable value type and one by returning the default value:

public bool? ReadAsNullableBoolean()
{
    _reader.Read();
    if (_reader.TokenType == JsonTokenType.Null)
    {
        return null;
    }
    if (_reader.TokenType != JsonTokenType.True && _reader.TokenType != JsonTokenType.False)
    {
        throw new JsonException();
    }
    return _reader.GetBoolean();
}
public bool ReadAsBoolean(bool defaultValue)
{
    _reader.Read();
    if (_reader.TokenType == JsonTokenType.Null)
    {
        return defaultValue;
    }
    if (_reader.TokenType != JsonTokenType.True && _reader.TokenType != JsonTokenType.False)
    {
        throw new JsonException();
    }
    return _reader.GetBoolean();
}

Skip children of token

Use the Utf8JsonReader.Skip() method to skip the children of the current JSON token. If the token type is JsonTokenType.PropertyName, the reader moves to the property value. The following code snippet shows an example of using Utf8JsonReader.Skip() to move the reader to the value of a property.

var weatherForecast = new WeatherForecast
{
    Date = DateTime.Parse("2019-08-01"),
    TemperatureCelsius = 25,
    Summary = "Hot"
};

byte[] jsonUtf8Bytes = JsonSerializer.SerializeToUtf8Bytes(weatherForecast);

var reader = new Utf8JsonReader(jsonUtf8Bytes);

int temp;
while (reader.Read())
{
    switch (reader.TokenType)
    {
        case JsonTokenType.PropertyName:
            {
                if (reader.ValueTextEquals("TemperatureCelsius"))
                {
                    reader.Skip();
                    temp = reader.GetInt32();

                    Console.WriteLine($"Temperature is {temp} degrees.");
                }
                continue;
            }
        default:
            continue;
    }
}

Consume decoded JSON strings

Starting in .NET 7, you can use the Utf8JsonReader.CopyString method instead of Utf8JsonReader.GetString() to consume a decoded JSON string. Unlike GetString(), which always allocates a new string, CopyString lets you copy the unescaped string to a buffer that you own. The following code snippet shows an example of consuming a UTF-16 string using CopyString.

var reader = new Utf8JsonReader( /* jsonReadOnlySpan */ );

int valueLength = reader.HasValueSequence
    ? checked((int)reader.ValueSequence.Length)
    : reader.ValueSpan.Length;

char[] buffer = ArrayPool<char>.Shared.Rent(valueLength);
int charsRead = reader.CopyString(buffer);
ReadOnlySpan<char> source = buffer.AsSpan(0, charsRead);

// Handle the unescaped JSON string.
ParseUnescapedString(source);
ArrayPool<char>.Shared.Return(buffer, clearArray: true);

void ParseUnescapedString(ReadOnlySpan<char> source)
{
    // ...
}

See also