When Structured Query Language (SQL) Becomes Spaghetti Query Language

Slalom Consultant David Van De Somple

David Van De Sompele

Avoiding the Overuse and Misuse of Dynamic SQL

Spaghetti code is a derogatory term for poorly written source code that contains a tangled and complex control structure, is prone to errors that are often difficult to find, correct, and modify. Anybody who has worked in an IT organization is likely familiar with this term. Perhaps you’ve even had the misfortune of twisting and winding through someone’s spaghetti in an attempt to document, troubleshoot, or make modifications. If so, then you’ve lived the nightmare and will empathize with me in this article.  Spaghetti code can be written in just about any programming language and SQL is no exception. For this post I will focus on what I consider to be one of most egregious types of SQL spaghetti: Dynamic SQL.

In the context of SQL Server’s Transact-SQL language, Dynamic SQL could be defined as a statement that is composed or constructed “on the fly,” meaning that the SQL engine will get wildly varying SQL statements each time the Dynamic SQL executes. There are advantages and disadvantages to using Dynamic SQL, and I have listed a few of each:

Advantages:

  • Code is highly reusable
  • Use-variable names in statements that require constants
  • Can code statements that would be either impossible or very difficult to code otherwise because of the high number of possibilities involved
  • Return row sets with a variable number of columns and/or variable column names; highly useful for SQL Server’s PIVOT command
  • Allow parameterized filtering with the IN clause
  • Sorting and grouping by almost any column from a table

Disadvantages:

  • Performance loss: the execution plan for a dynamic query cannot be cached
  • Difficult to debug
  • Difficult to modify or enhance
  • Error management is unreliable—there isn’t an easy way to validate the dynamic code or control its effects
  • Temporary tables from the main statement cannot be used, unless they are global
  • If the algorithm of the main statement has many loops, calculations or slow queries, that time will increase the time of executing the dynamic code
  • Scalability issues will surface when a Dynamic SQL code base grows large
  • Security can be compromised (SQL injection attacks are a good example of this)

This isn’t to say that all Dynamic SQL is bad; there are some cases in which it is a perfect and elegant solution. But the misuse and overuse of Dynamic SQL—writing large-stored procedures comprised entirely of it, for example—is problematic because it will quickly become a tangled, under-performing mess. Here’s a real-world example, albeit simplified.

Several years ago my current client inherited an application, part of which had a customer-facing web UI designed so the customer could run sophisticated historical data searches. The UI contained a complex arrangement of drop-down multi-select combo boxes, radio button arrays, and date-range calendars—any combination of which could be selected to refine a desired search criteria. There were also some check-box arrays for different sorting and grouping options. Accounting for any combination of search criteria the customer might select, the developers reasoned, necessitated the use of Dynamic SQL in the stored procedures. But all of this flexibility came at a price. The stored procedures suffered tremendous performance issues, were error prone, and very difficult to troubleshoot and modify. Rewriting the stored procedures and replacing the Dynamic SQL with Static SQL was the long-term solution. The most used type of Dynamic SQL in the reporting stored procedures was the case in which a comma delimited list of values gathered from a multi-select combo box was passed as a variable to the procedure and used to build an IN list as follows; and I’m simplifying here.

Below we have a simplified stored procedure designed to receive a comma delimited list of values which it will use to build and execute a Dynamic SQL string:

CREATE PROCEDURE prcDynamicList

@vchINList nvarchar(500)

AS

SET NOCOUNT ON

DECLARE @vchSQL nvarchar(4000)

BEGIN

SET @vchSQL = ‘SELECT   Column1,

Column2

FROM  Table1 t1

WHERE t1.Column3 IN (‘ + @vchINList + ‘)’

EXECUTE sp_executesql @vchSQL

END

Assume you want to hand this procedure a comma delimited list of values like this:

EXEC prcDynamicList @vchINList = ”’value1”,”value2”,”value3”’

Using the value list passed to the parameter, the stored procedure will build a dynamic SQL statement that looks like this:

SELECT Column1,

Column2

FROM   Table1 t1

WHERE  t1.Column3 IN (‘value1′,’value2′,’value3’)

And execute it.

Now I’m going to illustrate an alternative method to accomplish the same task with the same functionality, but without using Dynamic SQL. To do this we first need a method for handling the delimited IN list. My solution was to create a SQL table-valued function, which I named fnParser, that receives the IN list, parses it, and returns the results in a table. The function requires two input parameters; the value string to parse, and the character used to delimit the string.  The following three lines of Transact-SQL illustrate how to pass the function a comma delimited list of values:

DECLARE @vchStringToParse varchar(500)

SET    @vchStringToParse = ‘Value1,Value2,Value3,Value4,Value5′

SELECT * FROM dbo.fnParser(@vchStringToParse,’,’)

Execution of the function via the SELECT statement will return the following table:

Id ParsedValue
1 Value1
2 Value2
3 Value3
4 Value4
5 Value5

Leveraging the fnParser function, I can rewrite the prcDynamicList stored procedure as follows:

CREATE PROCEDURE prcDynamicList

@vchINList nvarchar(255)

AS

SET NOCOUNT ON

BEGIN

SELECT t1.Column1,

t1.Column2

FROM   Table1 t1

JOIN   dbo.fnParser(@vchINList,’,’) t2 ON t1.Column3 = t2.ParsedValue

END

Calling the rewritten stored procedure looks like this:

DECLARE @vchStringToParse varchar(500)

SET    @vchStringToParse = ‘Value1,Value2,Value3,Value4,Value5’

EXEC prcDynamicList @vchINList = @vchStringToParse

Notice how I can use the table-valued function in the JOIN clause as you would any other table and I have completely eliminated the need for Dynamic SQL. This function is completely portable and you can use it as often as desired/necessary in the same SQL statement, stored procedure, and/or database. You might be wondering why I haven’t provided the specific code for the parser function; there are two reasons for that:

  • I want to challenge you to write your own function to perform this task.
  • My parser function is in production at a client site and is therefore considered proprietary.

I can, however, provide some hints on how a table-valued parser function might work and I would encourage you to read up on the workings of table-valued functions within SQL Server:

  1. It should trim leading and trailing spaces from the delimit parameter and the source string parameter. Unless blank spaces in the string to parse are valid in which case, your code should account for that.
  2. It will need to calculate the exact length of the string to parse.
  3. You will need to use a function like CHARINDEX or PATINDEX to find all instances of the delimit character within the string.
  4. Iterate the string for each delimiter found using a WHILE loop.
  5. Use the SUBSTRING function to extract each value between the delimiter.

Earlier I mentioned that there are some specific cases in which Dynamic SQL is the perfect solution for the problem at hand. SQL Server’s PIVOT command, introduced in SQL 2005, is a good example of that.

The PIVOT command has the ability to return a dynamic number of columns, determined by the data on which you pivot. Doing this requires Dynamic SQL and the result can be spectacular. There are plenty of online code examples, which illustrate how to use the PIVOT command with Dynamic SQL to account for this situation. I strongly encourage you to read up on it and experiment.

Dynamic SQL, while highly useful in specific situations, is often overused or misused to the point that it quickly becomes spaghetti code. The result of this usually manifests in the form of applications suffering from performance problems that are difficult to troubleshoot and rectify. You can avoid using Dynamic SQL in many situations by leveraging other SQL Server features, such as table-valued functions, to create a highly portable, scalable, and performance-oriented alternative.

About David Van De Sompele
Slalom Consultant David Van De Sompele's expertise includes performance tuning, production DBA activities, data modelling, ETL processes and reporting. He's based in Slalom's Seattle office.

One Response to When Structured Query Language (SQL) Becomes Spaghetti Query Language

  1. marek koenig says:

    Good read, you can also use a bit of XML to simplify the process. Most clients that I’ve been at have some kind of fnSplit already developed. Here is the XML snippet that will work in 2005 and later:

    DECLARE @S varchar(max),
    @Split char(1),
    @X xml

    SELECT @S = ‘1,2,3,4,5’,
    @Split = ‘,’

    SELECT @X = CONVERT(xml,” + REPLACE(@S,@Split,”) + ”)

    SELECT [Value] = T.c.value(‘.’,’varchar(20)’)
    FROM @X.nodes(‘/root/s’) T(c)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: