Share via


Logic Apps: Message Validation with XML, JSON and Flat-File Schemas (part 1)

Article Series

This article is part of a series of articles, you can find the other parts here: 

Introduction

Schemas are an important part of any integration solution. They are used to describe the structure of messages that are exchanged between systems and form a contract that all parties agree to adhere to. Schemas are used to validate received messages, enable the rejection of invalid messages, and assume the correctness of input into other parts of the solution downstream.

Logic Apps provides for two types of schema: XML and JSON. In addition to standard XML schemas, Logic Apps also supports Flat-File (BizTalk) schemas. These schemas extend the XML Schema Definition (XSD) functionality, through the use of xs: annotation elements, to describe flat-file structures. Further, BizTalk Server comes with several thousand flat-file schemas for EDIFACT, HIPPA, X12  standards out of the box and now many of these schemas have now been released as open source on GitHub (see here), making them available for use in Logic Apps (with an Integration Account). From a JSON validation perspective, there are a couple of limitations: JSON schemas must be hard-coded within the Logic App itself and they cannot be stored and retrieved from an Integration Account.

The purpose of this article is to describe the message validation choices available within Logic Apps, highlight any limitations, and suggest workarounds. It also addresses validation issues that have previously been raised on the Developer Network Logic Apps forum, and considers validation alternatives to using the Integration Account for schema storage and its associated actions for message validation. It is assumed the reader has knowledge of the schema definitions being discussed and used, and is familiar with Azure Logic Apps, the Enterprise Integration Pack, and Azure Functions.

 

Overview

Within Logic Apps there exits an Integration Account, part of the Logic Apps Enterprise Integration Pack (EIP), which can store XML schemas (XSDs). Once uploaded to the Integration Account, these schemas can subsequently be used in a Logic App XML Validation action to perform XML message validation, or the X12, EDIFACT, Flat File actions for decoding (and encoding). There are a few points worth noting when using the EIP schema storage:

  • Schema Number Limits: The maximum number of schemas for each tier are as follows: Free 25, Basic 500, Standard 500;
  • XML Version 1.0: The Logic Apps EIP will store any XML 1.0 compliant XML schema. The schema name can contain only letters, numbers, '-', '(', ')' or '.' and is limited to a maximum of 80 characters. Any attempt to upload an XML version 1.1 file results in the error “The content of schema ‘<schema name>’ of type ‘Xml’ must be a valid XML.” and the file is not uploaded (XML version 1.1 is best avoided, see here);
  • XML File Encoding: There seems to be an issue with the XSD file encoding for external schema references (see here). Schemas that are referenced (by the xsd:import or xsd:include elements) and have their encoding attribute set to UTF-16 (irrespective of the actual file encoding, i.e. ANSI, UTF8, Unicode) are not loaded when the XML Validation action is run and causes the schema validation of the importing schema to fail with the error “InvalidSchema. The provided schema content is not valid. If the schema has any references, please upload the referenced schemas to the Integration Account first. The compilation of schema failed with error: 'Type 'xxxxx' is not declared.'.” This is something to be aware of if migrating BizTalk Server schemas (where Envelope/Document schemas exist in separate files). A schema which has the encoding attribute set to UTF-16 but does not reference external schemas validates without error;

The schema validation functions (XML Validation and File Decode/Encode) together with the transformation actions (XML Transform, Liquid Transform) are, behind the scenes, hosted in a separate Azure Web Apps environment (https://xslt-*****.azurewebsites.net where ***** is your 32-character Integration Account ID, found in the Callback URL property e.g. https://prod-22.northcentralus.logic.azure.com:443/integrationAccounts/*****? ...). It is not accessible via Kudu but by browsing to the URL will reveal whether the environment is up and running.

An overview on how to validate XML messages with XML schemas in Azure Logic Apps with the EIP is given here. Some points to note are:

  • XSD Version 1.1: The EIP does allow the uploading of a version 1.1 XSD (note: XML 1.1 and XSD 1.1 are not the same), however, attempting to use the schema with the XML Validation action will fail with an error similar to the following (depending on the XSD 1.1 feature being used) “InvalidSchema. The provided schema content is not valid. If the schema has any references, please upload the referenced schemas to the Integration Account first. The compilation of schema failed with error: 'The 'http://www.w3.org/2001/XMLSchema:assert' element is not supported in this context.'.”;
  • XSD Large Files (greater than 2MB): Schemas larger than 2MB cannot be uploaded to the integration account, instead they must be uploaded to Blob Container (see this article on how to do this). Attempts to use a schema larger than 2GB will generated an error “Gateway issue encountered. Cannot fulfill the request to the remote server.” and the schema will not be added to the Schemas blade. Schemas in a Blob Container are limited to 8MB in size;
  • XSD File Metadata: Custom metadata can be added to schema artifacts which can then be retrieved at runtime (see here for further details). An example of this, which demonstrates how BizTalk Server schema promoted properties can be migrated in Logic Apps/Service Bus, can be see here;
  • XSD External Schema References: The Logic Apps EIP allows the use of the xsd:import and xsd:include elements to reference external schemas. The use of the xsd:redefine element is not currently supported and its use will result in the type not being imported and the error  “An error occurred while processing the XML schemas: ''SchemaLocation' must successfully resolve if <redefine> contains any child other than <annotation>.'.”;

An overview of each of the decoding scenarios is available in the How To guides: X12 (here), EDIFACT (here), and Flat-File (here). The flat-file schema editor in BizTalk Server has now been made available for Logic Apps in the form of the Logic Apps Enterprise Integration Tools for Visual Studio 2015 and, in addition, many of the out-of-box standards flat-file schemas have been made available as open source. Some points to note:

  • X12 Schemas: Soon after the EDI schemas were made open source a question arose relating to the licencing of the X12 schemas; they were subsequently removed from the repository;
  • BizTalk flat-file schemas: If the flat-file(s) being used in the Logic App Tools for Visual Studio were created using the BizTalk Server flat-file schema editor, the extension class in the annotation needs to be changed to the following: Microsoft.Azure.Integration.DesignTools.FlatFileExtension.FlatFileExtension (see here), otherwise the follow error will occur when trying to validate an instance “Failed to validate ‘xxxxx” as an instance of schema file ‘xxxxx.xsd’.”;
  • Content Validation: There is no option on the Flat-File Decode action to validate the flat-file message contents (unlike in BizTalk Server where the Validate document structure property in the Flat-File Disassembler component validates the document, header and footer structures). This must be done using an addition XML Validation action;
  • Suppress Empty Nodes property: The behaviour of the suppress_empty_nodes attribute in the Flat-File schemas has changed from BizTalk Server to Logic Apps and may produce unintended results (see here).

In contrast to XML and flat-file schemas, there is no EIP functionality to store or retrieve JSON schemas to validate JSON messages (unlike Liquid Transforms which can be stored). Instead, JSON schemas are embedded within the Logic App workflow HTTP Request trigger and Compose JSON action to perform JSON message validation. There are a couple of points about JSON schemas to note:

  • JSON Schema Version Draft 3, 4, & 6: The HTTP Request trigger and Parse JSON action can validate against the draft 3, draft 4, & draft 6 JSON schema specifications; 
  • JSON External Schema References: The $ref keyword in JSON schemas cannot be used to resolve external references. Attempting to references an external schema will generate the error “InvalidTemplate. Unable to process template language expressions in action 'xxxxx' inputs at line 'xx' and column 'xx': 'Could not resolve schema reference 'https://xxxxx'. Path 'xxxxx'.'.”;
  • JSON Schema pattern keyword: The JSON schema pattern keyword has been blocked due to DoS security concerns (see here & here) and cannot be used in the HTTP Request trigger or Compose JSON action to validate a JSON instance.

The following sections will now examine message validation in Logic Apps in more detail and suggest workarounds to some of the issues highlighted above.

 

Retrieving Schemas from the Integration Account

Logic Apps provides an Integration Account Artifact Lookup action to retrieve Integration Account artifacts (see Get artifact metadata). The action has a drop-down list to select the artifact type (Schema, Map, Partner, Agreement) and a name parameter for the artifact name in the Integration Account, which can be entered at design-time or specified dynamically at runtime.  

The Integration Account Artifact Lookup action is limited in that it cannot be used for searching artifacts. So, for example, BizTalk Server identifies an incoming message by both the target namespace and root node name and uses both these values to find the appropriate XML schema; it is possible for there to be different messages with the same root node name or the same message with a different target namespace (sometimes used for versioning). One solution to this is to write a custom Integration Account Artifact Lookup function. An example of an Azure Function to do this is given below (the RootNodeName and TargetNamespace values are passed in the header):

run.csx

using Microsoft.Azure.Management.Logic;
using Microsoft.Azure.Management.Logic.Models;
using Microsoft.Azure.Services.AppAuthentication;
using Microsoft.Rest;
using System;
using System.Collections.Concurrent;
using System.Net;
 
public static  ConcurrentBag<IntegrationAccountSchema> _schemas = null;
 
public static  async Task<HttpResponseMessage> Run(HttpRequestMessage req, TraceWriter log)
{
    string subscriptionId = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx";
    string resourceGroup = "xxxxxxxxxxxxxxx";
    string integrationAccount = "xxxxxxxxxxxxxxxx";
  
    // Get Integration Account Schemas
    if (_schemas == null)
    {
        // Get LogicManagementClient object to access the Integration Account artefacts
        var provider = new  AzureServiceTokenProvider();
        var token = await provider.GetAccessTokenAsync("https://management.azure.com/");
        var client = new  LogicManagementClient(new TokenCredentials(token)) { SubscriptionId = subscriptionId };
        _schemas = new  ConcurrentBag<IntegrationAccountSchema>(client.Schemas.ListByIntegrationAccounts(resourceGroup, integrationAccount));
    }
  
    // Find schema name
    string rootNodeName = req.Headers.GetValues("RootNodeName").First();
    string targetNamespace = req.Headers.GetValues("TargetNamespace").First();
    string name = String.Empty;
    var ias = from s in  _schemas where s.DocumentName == rootNodeName && s.TargetNamespace == targetNamespace select s;
    if (ias.Count() > 0) name = ias.First<IntegrationAccountSchema>().Name;
 
    return req.CreateResponse(HttpStatusCode.OK, name);
} 

 

project.json

{ 
 "frameworks": { 
   "net46":{ 
     "dependencies": { 
       "Microsoft.Azure.Management.Logic": "3.0.0", 
       "Microsoft.Rest.ClientRuntime": "2.3.10",
       "Microsoft.IdentityModel.Clients.ActiveDirectory": "3.14.2",
       "Microsoft.Azure.Services.AppAuthentication": "1.1.0-preview"
     } 
   } 
 } 
}

 

The example above demonstrates how to access the Integration Account artifacts from an Azure Function (for a given Subscription ID, Resource Group, and Integration Account). The LogicManagementClient class allows us to access all the Integration Account artifacts and extend the existing lookup functionality of the Integration Account Artifact Lookup action. For more information see here.

XML Validation

 

The Logic App XML Validation Action

To use the Logic App XML Validation action a Logic App Integration Account is required. Attempting to use it without one will result in the error “The workflow must be associated with an integration account to use the workflow run action 'XML_Validation' of type 'XmlValidation'.” The Logic App XML Validation action has two parameters: Content and Schema. The latter can be selected from a dropdown list or the value can be specified dynamically at runtime.

The schema is retrieved from the Integration Account as well as any referenced external schemas. Any referenced XSD file specified in the schemaLocation attribute must have been uploaded to the Integration Account. Any path included in the schemaLocation is ignored (i.e. Path.GetFileName(schemaLocation) is searched for). The schema(s) and message are then passed to the XML Validation Azure Function to validate. If either the schema array or content passed to the function is null an exception is thrown. The schemas get added to a XmlSchemaSet and compiled (XmlSchemaSet.Compile()). The compiled schemas are added to a XmlReaderSettings object which has the ValidationFlags property set to XmlSchemaValidationFlags.ReportValidationWarnings. An XmlReader object validates the XML contents (with the XmlReaderSettings) and a System.Collections.Generic.List<XmlError> list is populated with any warnings and errors found. If warnings or errors exist, they are added to the HttpResponseMessage.Content property of the response and returned to the calling Logic App.

The XmlSchemaSet has the XmlResolver property set to null (possibly due to security considerations, see here), which, in turn, causes the xsd: redefine element to fail. The xsd: redefine schemas are passed to the Azure Function, and they do get added to the XmlSchemaSet, but it seems there is a preprocessor issue which causes the Compile() method to fail with the error described in section 2. The solution to this is either rework the xsd:redefine to use the xsd:include an element (see Six strategies for extending XML schemas in a single namespace) or implement a custom XML Validation function (see the next section).

Large schemas (greater than 2MB) are still passed to the Azure Function i.e. there’s no passing of just the blob URL for the Azure Function to retrieve, and no caching is implemented. In other words, each time the XML Validation action is called the schema(s) are retrieve from the Integration Account and pass to the XML Validation Azure Function. There are a few options available that may help reduce the overall size of a schema to possibly avoid the use of Azure blob storage:

  • If the schema file has been saved with Unicode encoding, consider resaving it with ANSI or UTF-8 encoding;
  • Move common types into their own XSD and use the xsd:import and xsd:include elements to reference the external schema;
  • If the schema is autogenerated it may be possible to simplify the schema, reduce size of type names, remove annotations etc.;
  • If the schemas are domain standard schemas (such as the EDI schemas included with BizTalk Server) consider removing unused parts of the schema (see here);

Each schema has custom metadata defined which can be retrieved using the Integration Account Artifact Lookup action described in the previous section. It should be noted that the documentName property is the name of the first root node in the XSD file. This could cause an issue if a schema name is specified at runtime based on the root XML message node name. One solution is to move each root node into its own file and use elements xsd:import or xsd:include for any dependencies. There is also a restriction on the number of root elements a schema may contained, trying to upload a schema with more than 2000 will fail with the error "The provided schema has 'xxxx' root elements which exceeds the maximum limit of '2000'." the solution in this instance is to divide the root elements across two or more schema files and include them all in any referencing schemas.     

Validating XML and executing the Xml Validation action can produce several results in addition to a valid response. These include an empty validation request, a timeout and an error response. The article Logic App XML Validation with all the error handling describes in detail these different scenarios and how to handle each of them in a Logic App.

XML Validation without an Integration Account

It is possible to write a custom XML Validation function and call it from a Logic App. The System.Xml.Schema namespace can be used within an Azure Function to perform any required validation. This option may be appropriate when the overhead of an Integration Account is not needed or where there is some functionality lacking in the current Integration Account XML Validation action (e.g. xsd:redefine). The real design issue is one of how the schemas are retrieved (and where from), which essentially means either by the calling Logic App and passed to the Azure Function (like the XML Validation action) or by the Azure Function itself from a file store from being passed a schema name.

The example below retrieves the schema(s) from a file share for a given schemaName value passed in the header and validates the XML message passed in the body. This file share is created within the same file storage as the Azure Function which is why it uses the AzureWebJobsStorage connection string (otherwise a new connection string to some other file storage would need to be added to the function’s configuration). The function checks for external references and if the schema referenced is a simple filename, it will retrieve it from the file share (expecting it to exist in the same folder as the parent schema). This example also correctly processes nested external references and any xsd: redefine elements:

run.csx

#r "Microsoft.WindowsAzure.Storage"
#r "Newtonsoft.Json"
using Microsoft.Azure;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.File;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using Newtonsoft.Json.Serialization;
using System.Collections.Generic;
using System.IO;
using System.Net;
using System.Text;
using System.Xml;
using System.Xml.Schema;
using System.Xml.Resolvers;
 
public static  async Task<HttpResponseMessage> Run(HttpRequestMessage req, TraceWriter log)
{
    log.Info("C# HTTP trigger function processed a request.");
 
    dynamic input = await req.Content.ReadAsStringAsync();
    string schemaName = req.Headers.GetValues("schemaName").First();
 
    // Get Azure file share and schemas folder reference.
    CloudStorageAccount storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("AzureWebJobsStorage"));
    CloudFileClient fileClient = storageAccount.CreateCloudFileClient();
    CloudFileShare fileShare = fileClient.GetShareReference("integration");
    CloudFileDirectory rootDir = fileShare.GetRootDirectoryReference();
    CloudFileDirectory schemasDir = rootDir.GetDirectoryReference("Schemas");
 
    // Load schema specified in the header schemaName field.
    CloudFile schemaFile = schemasDir.GetFileReference(schemaName);
    XmlSchema xs = XmlSchema.Read(schemaFile.OpenRead(), null);
 
    // Resolve all the external XSD references first, then add the schema and compile.
    XmlPreloadedResolver xpr = new  XmlPreloadedResolver();
    XmlSchemaSet xss = new  XmlSchemaSet
    {
        XmlResolver = xpr
    };
    RetrieveExternals(xss, xpr, xs.Includes, schemasDir);
    xss.Add(xs);
    xss.Compile();
     
    // Create XML reader settings and validation handler.
    List<ValidationError> validationErrors = new  List<ValidationError>();
    XmlReaderSettings xrs = new  XmlReaderSettings
    {
        ValidationType = ValidationType.Schema,
        ValidationFlags = XmlSchemaValidationFlags.ReportValidationWarnings
    };
    xrs.ValidationEventHandler += delegate  (object  sender, ValidationEventArgs eventArgs)
    {
        validationErrors.Add(new ValidationError(eventArgs));
    };
 
    // Validate the XML messages passed in the request body.
    xrs.Schemas.Add(xss);
    using (StringReader stringReader = new StringReader(input))
    {
        using (XmlReader xr = XmlReader.Create(stringReader, xrs))
        {
            while (xr.Read()) {}
        }
    }
 
    // Create validation response.
    string response = String.Empty;
    if (validationErrors.Count > 0)
    {
        response = JsonConvert.SerializeObject(validationErrors);
    }
  
    return req.CreateResponse(HttpStatusCode.OK, response);
}
 
// Recursive method to retrieve all referenced external schemas.
static void  RetrieveExternals(XmlSchemaSet schemaSet, XmlPreloadedResolver preloadedResolver, XmlSchemaObjectCollection includes, CloudFileDirectory schemasDir)
{
    foreach (XmlSchemaExternal xse in includes)
    {
        // Check if the schemaLocation attribute is just a simple filename (URI's to external XSDs can be loaded by the resolver).
        if (Path.GetFileName(xse.SchemaLocation) == xse.SchemaLocation)
        {
            using (StreamReader sr = new StreamReader(schemasDir.GetFileReference(xse.SchemaLocation).OpenRead()))
            {
                // Retrieve schema
                var include = sr.ReadToEnd();
                XmlSchema xs = XmlSchema.Read(new StringReader(include), null);
 
                // Check for any nested external references before adding the schema to the schema set.
                if (xs.Includes.Count > 0) { RetrieveExternals(schemaSet, preloadedResolver, xs.Includes, schemasDir); };                
                schemaSet.Add(xs);
                 
                // For xsd:redefine the schema also needs adding to the resolver.
                if (xse is XmlSchemaRedefine)
                {
                    // Need to create a valid URI before adding the schema to the resolver.
                    xse.SchemaLocation = @"file:///" + xse.SchemaLocation;
                    preloadedResolver.Add(new Uri(xse.SchemaLocation), include);
                }
            }
        }
    }
}
 
// Validation details class.
public class  ValidationError
{
    public ValidationError(ValidationEventArgs vea)
    {
        Line = vea.Exception.LineNumber;
        Position = vea.Exception.LinePosition;
        Message = vea.Exception.Message;
        Severity = vea.Severity;
    }
 
    [JsonProperty]
    public int  Line
    {
        get;
        set;
    }
    [JsonProperty]
    public int  Position
    {
        get;
        set;
    }
    [JsonProperty]
    public string  Message
    {
        get;
        set;
    }
    [JsonProperty]
    public XmlSeverityType Severity
    {
        get;
        set;
    }
}

 

project.json

{
  "frameworks": {
    "net46":{
      "dependencies": {
        "Microsoft.WindowsAzure.ConfigurationManager": "3.2.3"
      }
    }
   }
}

 

XML Validation with Integration Account Maps & Assemblies

It is possible to perform XML validation within an XSLT transformation. This feature, the xsl:import-schema declaration, which was introduced in XSLT 2.0, makes it possible to validate both the input and output and also the validation of temporary trees. Unfortunately, a schema-aware XSLT processor is required and the XSLT 2.0/3.0 processor available in Logic Apps is not (or at least the version used in Logic Apps, the Saxon HE 9.8.0.8 open source processor). However, using XSLT 1.0 we can make use of the msxml: script element and C# script (also not available in XSLT 2.0/3.0) to perform this type of validation.

Obviously, the XSD files can be uploaded to the Schemas blade of the Logic App Integration Account. But thinking about this a little more, and understanding how the msxml: script can be used to reference assemblies, XSD files could be added to a .NET assembly as embedded resources and uploaded to the Assemblies blade. The assembly can then be loaded using the msxsl: assembly element, the XSD retrieved at runtime, and the source message validated.

When an XSLT 1.0 map is uploaded to the Integration Account it gets compiled, the XSL into one assembly, the inline C# code into another, and all the compiled map components get cached; there’s a separate cache for the XSLT 1.0 maps, XSLT 1.0 scripts, uploaded assemblies, and the XSLT 2.0/3.0 maps (although these get compiled when the map is first executed). Moreover, the Logic App XML Validation action resides within the same Azure Web App, so it is possible to reference and use the same XML Validation functionality from the XSLT map.

An example of how to do this is given below. The XSDs to use in the validation are passed in an xsl:param parameter and retrieved from a .NET assembly (Examples.Schemas):

XSLT 1.0

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt"  exclude-result-prefixes="msxsl s0 ScriptNS0" version="1.0" xmlns:s0="http://AzureSchemasDemo.examples.org/Person" xmlns:ns0="http://AzureLogicAppsMapsDemo/Contact" xmlns:ScriptNS0="http://schemas.microsoft.com/BizTalk/2003/ScriptNS0">
  <xsl:output omit-xml-declaration="yes" method="xml" indent="yes" version="1.0" />
  <xsl:param name ="SchemaList" select ="'Work.xsd,Person.xsd'" />
  <xsl:variable name ="validate" select ="ScriptNS0:Validate(., $SchemaList)" />
  <xsl:template match="/">
    <xsl:choose>
      <xsl:when test="$validate=''">
        <xsl:apply-templates select="/s0:Person" />      
      </xsl:when>
      <xsl:otherwise>
        <ValidationError><xsl:value-of select="$validate" /></ValidationError>
      </xsl:otherwise>
    </xsl:choose>    
  </xsl:template>
  <xsl:template match="/s0:Person">
    <ns0:Contact>
      <Title><xsl:value-of select="Tile/text()" /></Title> 
      <Forename><xsl:value-of select="Forename/text()" /></Forename> 
      <Surname><xsl:value-of select="Surname/text()" /></Surname> 
    </ns0:Contact>
  </xsl:template>
  <msxsl:script language="C#" implements-prefix="ScriptNS0">
    <msxsl:assembly name ="System.Core, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089"/>
    <msxsl:assembly name ="Microsoft.Azure.Function.Common, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
    <msxsl:assembly name ="Microsoft.Azure.Function.XmlValidation, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35"/>
    <msxsl:assembly name ="Newtonsoft.Json, Version=10.0.0.0, Culture=neutral, PublicKeyToken=30ad4fe6b2a6aeed"/>
    <msxsl:assembly name ="Examples.Schemas, Version=1.0.0.0, Culture=neutral, PublicKeyToken=3e534f4f2fa19166"/>
    <msxsl:using namespace ="System.Reflection"/>
    <msxsl:using namespace ="System.IO"/>
    <msxsl:using namespace ="System.Collections.Generic"/>
    <msxsl:using namespace ="Microsoft.Azure.Function.Common"/>
    <msxsl:using namespace ="Microsoft.Azure.Function.Common.Xml"/>
    <msxsl:using namespace ="Microsoft.Azure.Function.XmlValidation"/>
    <msxsl:using namespace ="Newtonsoft.Json" />
    <msxsl:using namespace ="Newtonsoft.Json.Linq" />
    <![CDATA[
      public string Validate(XPathNodeIterator nodes, string schemaList)
      {      
        byte[] plainTextBytes = null;
        string[] schemas = schemaList.Split(',');
         
        if (nodes.MoveNext())
        {
            plainTextBytes = System.Text.Encoding.UTF8.GetBytes(nodes.Current.InnerXml);
        }
       
        var assembly = Assembly.Load("Examples.Schemas, Version=1.0.0.0, Culture=neutral, PublicKeyToken=3e534f4f2fa19166");
        JArray jaXSDs = new JArray();         
        foreach (string schema in schemas) 
        { 
            using (Stream stream = assembly.GetManifestResourceStream(assembly.GetName().Name + "." + schema))
            {
                byte[] bytes = new byte[stream.Length];
                stream.Read(bytes, 0, bytes.Length);
                ContentEnvelope ce = new ContentEnvelope
                {
                    Content = Convert.ToBase64String(bytes),
                    ContentType = "application/xml"
                };
                jaXSDs.Add(JToken.FromObject(ce));
            }
        }
         
        ContentEnvelope ceXML = new ContentEnvelope
        {
            Content = Convert.ToBase64String(plainTextBytes),
            ContentType = "application/xml"
        };        
        ContentAndSchemasInput csi = new ContentAndSchemasInput
        {
            Content = JToken.FromObject(ceXML),
            Schemas = jaXSDs.ToObject<JToken[]>()
        };
 
        Microsoft.Azure.Function.Common.Xml.Validation.ContentAndSchemasFunctionInput(csi);
        List<XmlError> errList = XmlValidator.Instance.Execute(csi);
 
        if (errList.Count > 0)
        {
            return errList[0].Message;
             
            // errList[0].LineNumber
            // errList[0].LinePosition
            // errList[0].Message
            // errList[0].Severity (System.Xml.Schema.XmlSeverityType)
        }
        return String.Empty;       
      }
    ]]>
  </msxsl:script>    
</xsl:stylesheet>

 

C# Class Library

As mentioned previously, the schemas are added to a .NET assembly (C# class library with the default class class1 removed), the Build Action set to Embedded Resource and uploaded to the Integration Account assemblies’ blade.

XML Validation with XSD 1.1 Schemas

There are a number of third-party XML validators that can be used with .NET and it is possible to make use of these within an Azure Function rather than using the .NET System.Xml.Schema namespace. There may be several reasons for choosing to use a different XML Validator including the requirement to validate with XSD 1.1 schemas. As mentioned previously, XSD 1.1 schemas can be uploaded to the Integration Account (i.e. they’re a valid XmlDocument), but they can’t be used (i.e. read into a XmlSchema).

The Azure Function example below demonstrates how to use the Saxon EE XML Validator to validate an XML message against XSD 1.1 schemas that have been uploaded to the Logic App Integration Account (note: a valid Saxon license is required). Like the previous Azure Function example, it also correctly processes nested external references and any xsd:redefine elements:

run.csx

#r "D:\home\site\wwwroot\SaxonXMLValidation\bin\saxon9ee.dll"
#r "D:\home\site\wwwroot\SaxonXMLValidation\bin\saxon9ee-api.dll"
using Microsoft.Azure;
using Microsoft.Azure.Management.Logic;
using Microsoft.Azure.Management.Logic.Models;
using Microsoft.Azure.Services.AppAuthentication;
using Microsoft.Rest;
using System;
using System.Collections.Generic;
using System.Collections.Concurrent;
using System.IO;
using System.Linq;
using System.Net;
using Newtonsoft.Json;
using Saxon.Api;
 
private static  ConcurrentBag<IntegrationAccountSchema> schemas = null;
private static  Processor processor = new Processor(true);
private static  XPathCompiler xPathCompiler = processor.NewXPathCompiler();
private static  XPathSelector xPathSelector = null;
 
public static  async Task<HttpResponseMessage> Run(HttpRequestMessage req, TraceWriter log)
{
    log.Info("C# HTTP trigger function processed a request.");
 
    string subscriptionId = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx";
    string resourceGroup = "xxxxxxxxxxxxxxxx";
    string integrationAccount = "xxxxxxxxxxxxxxxxxx";
 
    // Get Integration Account Schemas.
    if (schemas == null)
    {
        // Get LogicManagementClient object to access the Integration Account artefacts.
        var provider = new  AzureServiceTokenProvider();
        var token = await provider.GetAccessTokenAsync("https://management.azure.com/");
        var client = new  LogicManagementClient(new TokenCredentials(token)) { SubscriptionId = subscriptionId };
        schemas = new  ConcurrentBag<IntegrationAccountSchema>(client.Schemas.ListByIntegrationAccounts(resourceGroup, integrationAccount));
    }
 
    // Set licence location.
    processor.SetProperty<string>(Saxon.Api.Feature<string>.LICENSE_FILE_LOCATION, @"D:\home\site\wwwroot\SaxonXMLValidation\bin\saxon-license.lic");
 
    dynamic input = await req.Content.ReadAsStringAsync();
    string schemaName = req.Headers.GetValues("schemaName").First();
 
    // Set processors.
    SchemaManager sm = processor.SchemaManager;
    PreloadedSchemaResolver psr = new  PreloadedSchemaResolver();
 
    // Configure XPathSelector for xs:import/xs:include/xs:redefine schemaLocation atttributes.
    string schemaLocationsXPath = @"/*[local-name()='schema' and namespace-uri()='http://www.w3.org/2001/XMLSchema']/*[(local-name()='import' or local-name()='include' or local-name()='redefine') and namespace-uri()='http://www.w3.org/2001/XMLSchema']/@schemaLocation";
    xPathSelector = xPathCompiler.Compile(schemaLocationsXPath).Load();
 
    // Get source schema.
    var ias = from s in  schemas where s.Name == Path.GetFileNameWithoutExtension(schemaName) select s;
    DocumentBuilder db = processor.NewDocumentBuilder();
    XdmNode xn = db.Build(new Uri(ias.First<IntegrationAccountSchema>().ContentLink.Uri));
    xPathSelector.ContextItem = xn;
 
    // Resolve any external schema references.
    List<string> includes = xPathSelector.Evaluate().Select(x => x.GetStringValue()).ToList<string>();
    RetrieveExternals(psr, includes, log);
 
    // Compile schemas.
    sm.SchemaResolver = psr;
    sm.Compile(xn);
 
    // Validate XML instance.
    SchemaValidator validator = sm.NewSchemaValidator();
    XdmDestination xdmDest = new  XdmDestination();
    List<ValidationFailure> failures = new  List<ValidationFailure>();
    validator.IsLax = false;
    validator.SetSource(new MemoryStream(System.Text.Encoding.UTF8.GetBytes(input)), new  Uri("file:///"));
    validator.ErrorList = failures;
 
    // Create validation response.
    string response = string.Empty;
    try
    {
        validator.Run();
    }
    catch
    {
        response = JsonConvert.SerializeObject(validator.ErrorList.Select(x => x.GetMessage()).ToList());
    }
 
    return req.CreateResponse(HttpStatusCode.OK, response);
}
 
// Recursive method to retrieve all referenced external schemas.
static void  RetrieveExternals(PreloadedSchemaResolver psr, List<string> includes, TraceWriter log)
{
    foreach (string include in includes)
    {
        var ias = from s in  schemas where s.Name == Path.GetFileNameWithoutExtension(include) select s;
 
        // Retrieve schema
        DocumentBuilder db = processor.NewDocumentBuilder();
        XdmNode xn = db.Build(new Uri(ias.First<IntegrationAccountSchema>().ContentLink.Uri));
        xPathSelector.ContextItem = xn;
 
        // Search for any external references.
        List<string> nestedIncludes = xPathSelector.Evaluate().Select(x => x.GetStringValue()).ToList<string>();
        if (nestedIncludes.Count > 0) { RetrieveExternals(psr, nestedIncludes, log); }
 
        // Add schema to the preload schema resolver.
        Uri schemaLocationURI = new  Uri(@"file:///"  + include);
        psr.Add(schemaLocationURI, xn);
    }
}
 
//PreloadedSchemaResolver class.
public class  PreloadedSchemaResolver : SchemaResolver
{
    private Dictionary<Uri, XdmNode> xsdStore = new Dictionary<Uri, XdmNode>();
    public void  Add(Uri uri, XdmNode xdmNode)
    {
        xsdStore.Add(uri, xdmNode);
    }
    public object  GetEntity(Uri absoluteUri)
    {
        return xsdStore[absoluteUri].OuterXml;
    }
    public Uri[] GetSchemaDocuments(string targetNamespace, Uri baseUri, string[] locationHints)
    {
        if (locationHints != null)
        {
            string location = (Path.GetFileName(locationHints[0]) == locationHints[0]) ? @"file:///" + locationHints[0] : locationHints[0];
            return xsdStore.Where(x => x.Key == new Uri(location)).Select(x => x.Key).ToArray();
        }
        return null;
    }
}

 

project.json

{
  "frameworks": {
    "net46":{
      "dependencies": {
        "Microsoft.WindowsAzure.ConfigurationManager": "3.2.3",
        "Microsoft.Azure.Management.Logic": "3.0.0", 
        "Microsoft.Rest.ClientRuntime": "2.3.10",
        "Microsoft.IdentityModel.Clients.ActiveDirectory": "3.14.2",
        "Microsoft.Azure.Services.AppAuthentication": "1.1.0-preview"
         
      }
    }
   }
}

 

bin folder SAXON assemblies

IKVM.OpenJDK.Charsets.dll

IKVM.OpenJDK.Core.dll

IKVM.OpenJDK.Security.dll

IKVM.OpenJDK.Text.dll

IKVM.OpenJDK.Util.dll

IKVM.OpenJDK.XML.API.dll

IKVM.Runtime.dll

saxon9ee.dll

saxon9ee-api.dll

saxon9ee-api.xml

saxon-license.lic

Summary

This first article has provided an overview of the different message validation options available within Logic Apps and focused in detail on how to perform XML validation with and without the use of the EIP Integration Account. It described an alternative method of searching and retrieve Integration Account artefacts, described in detail how the XML Validation action works behind the scenes and provided a workaround in order to use the xsd:redefine element. In the second article we’ll review the JSON validation options currently available in Logic Apps and look at how they can be supplemented.

References

Logic Apps XML retrieval & validation:
Validate XML with schemas in Azure Logic Apps with Enterprise Integration Pack
Manage artifact metadata in integration accounts with Azure Logic Apps and Enterprise Integration Pack

BizTalk Server schema:
BizTalk server EDI Schemas
BizTalk Server: How to Simplify Complex XML Schemas

C# XML schema classes:
XmlSchemaSet Class
XmlResolver Class
XmlSchemaObjectCollection Class

Script APIs:
Script Blocks Using msxsl:script
Saxon.API - the Saxon Application Programming Interface for .NET
Azure Functions C# script (.csx) developer reference
Microsoft.Azure.Storage.File Namespace