Rob Kennedy

Father, Leader, Author, Developer, Entrepreneur

How to Query SQL Server XML data type Columns in Entity Framework Core 3.x

I’m sure if you’ve found this page, you’re at wits end in trying to figure out how to use the long-built-in XML query functionality of SQL Server while using Entity Framework, Dapper, or other ORM. Some might find it easy to simply write their query in plain SQL and execute it. However in my case, I was building my LINQ expressions dynamically for a ad-hoc query builder interface.

We are taking in a query model in JSON passed from the user interface where the query is constructed by the user, and converting each expression in our JSON model into a related LINQ expression.

The problem arises when we want to query the contents of our XML column using an XPath command. Contains([ColumnName], 'text') or [ColumnName].query('path') is the typical SQL Server XML query syntax that unfortunately prevents us from using any other SQL Server technology as a work around to the EF LINQ shortcomings.

The only option you have in a scenario like mine, is to treat the XML column as a string, and allow the EF LINQ to SQL engine generate the SQL. Then intercept the SQL string before it’s executed against the database server, determine if the column is in use in the query, and if so, parse the SQL string and replace the syntax (typically [XmlColumn] LIKE '%value%') to something like Contains([XmlColumn], '"value"')

The other problem with EF Core 3.x is they now default to using CHARINDEX SQL function instead of LIKE for in-string LINQ Contains and comparisons. This blows up the use of SQL Server free-text and our parse and replace logic. There is however the EF.Functions.Like() replacement function in DbFunctionExtensions. Not happy with this but alas the EF Core team is looking to go with maximum speed by default.

Thankfully in EF Core 3.0, we now have interceptors which allow us to hook into the process of intercepting the SQL command built by the LINQ to SQL engine, modifying that SQL, and passing it on for execution to SQL Server.

So using the Like function above in our LINQ, we can get our needed SQL LIKE command back. With the interceptor, we simply look for a string "[XmlColumn] LIKE" (in the sample below, our XML content column is always named [XmlContent]) in the SQL Command query string, and if found, execute a parse and replace with our SQL Server XML query syntax. Here’s what that interceptor class might look like for you.

Let me know if this code was helpful for you! Comment and follow!

using System;
using System.Data.Common;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.EntityFrameworkCore.Diagnostics;

/// <summary>
/// This class is used to encapsulate and register the XmlContentInterceptor code with Entity Framework Core.
/// </summary>
public class XmlContentInterceptor : DbCommandInterceptor
{
	/// <summary>
	/// This method is used to intercept database commands and alter them if the query contains a search for content within an XML content column.
	/// </summary>
	/// <param name="command">Contains the database command.</param>
	/// <param name="eventData">Contains the event data.</param>
	/// <param name="result">Contains the data reader result.</param>
	/// <param name="cancellationToken">Contains a cancellation token.</param>
	/// <returns>Returns the interception result.</returns>
	[System.Diagnostics.CodeAnalysis.SuppressMessage("Security", "CA2100:Review SQL queries for security vulnerabilities", Justification = "Reviewed")]
	public override Task<InterceptionResult<DbDataReader>> ReaderExecutingAsync(DbCommand command, CommandEventData eventData, InterceptionResult<DbDataReader> result, CancellationToken cancellationToken = default)
	{
		if (command != null)
		{
			string sqlQuery = command.CommandText;

			// if our query contains XmlContent LIKE...
			if (ContainsXmlSearch(sqlQuery))
			{
				// we want to convert this to our T-SQL XQuery syntax .[XmlContent].exist() = 1
				command.CommandText = ParseReplaceXmlCompares(sqlQuery);
			}
		}

		return base.ReaderExecutingAsync(command, eventData, result, cancellationToken);
	}

	/// <summary>
	/// This method is used to intercept database commands and alter them if the query contains a search for content within an XML content column.
	/// </summary>
	/// <param name="command">Contains the database command.</param>
	/// <param name="eventData">Contains the event data.</param>
	/// <param name="result">Contains the data reader result.</param>
	/// <returns>Returns the interception result.</returns>
	[System.Diagnostics.CodeAnalysis.SuppressMessage("Security", "CA2100:Review SQL queries for security vulnerabilities", Justification = "Reviewed")]
	public override InterceptionResult<DbDataReader> ReaderExecuting(DbCommand command, CommandEventData eventData, InterceptionResult<DbDataReader> result)
	{
		if (command != null)
		{
			string sqlQuery = command.CommandText;

			// if our query contains XmlContent LIKE...
			if (ContainsXmlSearch(sqlQuery))
			{
				// we want to convert this to our T-SQL XQuery syntax .[XmlContent].exist() = 1
				command.CommandText = ParseReplaceXmlCompares(sqlQuery);
			}
		}

		return base.ReaderExecuting(command, eventData, result);
	}

	/// <summary>
	/// This method is used to determine if the specified command text contains a pattern that matches a need to refactor SQL query
	/// to support XML XPath query syntax.
	/// </summary>
	/// <param name="commandText">Contains the command to evaluate.</param>
	/// <returns>Returns a value indicating whether the refactoring is needed.</returns>
	private static bool ContainsXmlSearch(string commandText)
	{
		return commandText.Contains(".[XmlContent] LIKE", StringComparison.InvariantCulture);
	}

	/// <summary>
	/// This method is used to parse a SQL query string to find and replace query clauses
	/// that are doing string compares with [XmlContent] and replace them with T-SQL XML .exist()
	/// syntax.
	/// </summary>
	/// <param name="commandText">Contains the query string to parse and modify.</param>
	/// <returns>Returns the modified query string.</returns>
	private static string ParseReplaceXmlCompares(string commandText)
	{
		string outputQuery = commandText;
		const string SyntaxToFind = ".[XmlContent] LIKE ";
		int index = 0;

		// loop while there are still syntax to replace...
		while (index > -1 && index < outputQuery.Length && (index = outputQuery.IndexOf(SyntaxToFind, index, StringComparison.InvariantCulture)) > -1)
		{
			// find preceding bracket
			// for example: [Extent3].[XmlContent] LIKE...
			// in this example, need to find the "[" before Extent3
			int bracketIndex = outputQuery.Substring(0, index).LastIndexOf('[');

			if (bracketIndex > -1)
			{
				// get everything up to the point we found our [{extentName}].[XmlContent] LIKE text
				string queryPrefix = outputQuery.Substring(0, bracketIndex);

				// determine the name of the extent, such as [Extent3]
				string extentName = outputQuery.Substring(bracketIndex, index - bracketIndex);

				// move forward until we hit % and get the value between the two % % in the LIKE operation
				// yes this assumes it will be LIKE %value%, we will not support other types right now.
				// e.g. ([Extent3].[XmlContent] LIKE N'%blood%')
				index = outputQuery.IndexOf('%', index);

				// if we found the next % and we have an extent name...
				if (index > -1 && extentName.Length > 0)
				{
					// move past %
					++index;

					// get the query value
					string queryValue = outputQuery.Substring(index, outputQuery.IndexOf("%'", index, StringComparison.InvariantCulture) - index);

					// move on to the next %, then move past the trailing '
					index = outputQuery.IndexOf("%'", index, StringComparison.InvariantCulture) + 1;

					// check if there's one or more escaped characters used within our queryValue by looking for ESCAPE syntax...
					if (outputQuery.Substring(index + 2).TrimStart().StartsWith("ESCAPE", StringComparison.InvariantCulture))
					{
						// ignore ESCAPE characters for now.
						// ESCAPE was found so move past that syntax...
						index += 8;

						// we must parse the escape character here, then update the queryValue
						var escapeCharIndex = outputQuery.IndexOf('\'', index);

						// if we found the escape character index...
						if (escapeCharIndex > -1)
						{
							// retrieve the escape character defined...
							var escapeChar = outputQuery.Substring(escapeCharIndex + 1, 1);

							// modify our query value by removing the escape character
							queryValue = queryValue.Replace(escapeChar, string.Empty, StringComparison.Ordinal);

							// move forward
							index = escapeCharIndex + 1;
						}
					}

					// find the end of the query clause we've modified...
					index = outputQuery.IndexOf(")", index, StringComparison.InvariantCulture);

					const string XPathPrefix = "!XPath!";
					string replacementCommand;

					// determine if the query is advanced, using XPath, or basic content search using CONTAINS by looking for prefix
					if (queryValue.StartsWith(XPathPrefix, StringComparison.OrdinalIgnoreCase))
					{
						queryValue = queryValue.Remove(0, XPathPrefix.Length);
						replacementCommand = $"{extentName}.[XmlContent].exist('{queryValue.Replace("'", "'", StringComparison.InvariantCulture)}') = 1";
					}
					else
					{
						// build the new query command to replace the LIKE statement...
						// surround query value with double quotes to support phrase matching, remove any double quotes entered by user
						replacementCommand = $"Contains({extentName}.[XmlContent], '\"{queryValue.Replace("\"", string.Empty, StringComparison.InvariantCulture)}\"')";
					}

					// append the new command and the rest of the query syntax we are parsing.
					outputQuery = queryPrefix + replacementCommand + outputQuery.Substring(index);
				}
				else
				{
					// expected character not found, break out of while loop without further changes
					break;
				}
			}
			else
			{
				// expected character not found, break out of while loop without further changes
				break;
			}
		}

		// add in namespaces 
		string xmlNamespaces = XmlExtensions.GetNamespaces().Select(x => $"'{x.Value}' as {x.Key}").Aggregate((s1, s2) => s1 + ", " + s2);

		return "WITH XMLNAMESPACES (" + xmlNamespaces + ")\n" + outputQuery;
	}
}

Also, there’s a handy function used in one of our extension classes to populate XML namespaces in the query. Here it is below:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml;
using System.Xml.Linq;
using System.Xml.Schema;

/// <summary>
/// This class contains all XML related extension methods.
/// </summary>
public static class XmlExtensions
{
	/// <summary>
	/// This method is used to build a list of namespaces.
	/// </summary>
	/// <returns>Returns a dictionary of common namespaces.</returns>
	public static Dictionary<string, string> GetNamespaces()
	{
		return new Dictionary<string, string>
		{
			{ "docbook", "http://docbook.org/ns/docbook" },
			{ "xi", "http://www.w3.org/2001/XInclude" },
			{ "xlink", "http://www.w3.org/1999/xlink" },
			{ "xml", "http://www.w3.org/XML/1998/namespace" }
		};
	}
}

3 thoughts on “How to Query SQL Server XML data type Columns in Entity Framework Core 3.x

  • Chris Haskins
    March 2, 2021 at 8:08 PM

    Hi Rob,
    Thanks for this great post. This is an issue which has been bugging me for some time and I like this solution. I used it my MVC5 project and saw a huge performance hit over repeated page loads. I know it’s a different environment from the Core product, but I’m curious if you’ve encountered any performance issues with this?

    Thanks!
    Chris

    • Rob
      March 2, 2021 at 9:12 PM

      We haven’t seen any significant degradation that would make our query experience unbearable. The processing of our queries is relatively quick. relatively short String searches using Contains and then the parsing and replacing all happens pretty fast. The rest is speed on the back-end in SQL server. Overall, queries are about 1sec-ish. We did originally port this from a .NET Framework code, in that case we implemented the IDbCommandInterceptor interface but all the other parsing logic was pretty much the same inside of that implementation’s ReaderExecuting()… What sort of speed impact are you seeing?

  • Pragmatic
    May 11, 2023 at 10:22 PM

    This looks interesting but how is it used?

    It might be super obvious do you have some examples of the queries this supports ?

Leave a Reply