Improving Page Load Performance

Tags: ColdFusion | Object Oriented (OO) | Structures


Posted on October 26, 2009 by Daria |


From Scopes to Structures

As I think about my transition to Object Oriented Programming, I have to reflect on my 'ah-hah' moments when things started to make sense. I thought this next post was going to be about my first experiences with CFCs, but in analyzing what drove me to them I realized that my understanding of structures was an important step in the transition to OO thinking.

I was introduced to structures while consulting for a large corporation in one of their small Competitive Intelligence (CI) divisions. (CI = Think business domain specific library services) At one point in my time with them, we hosted their site on their internal shared ColdFusion host. To host a CF site with them, our site's code had to comply with their best practice guidelines. The one rule that has stuck with me all these years is: Avoid CFQUERY within a loop (CFLOOP or CFOUTPUT with the query attribute)

Note: This is the actual rule from that time period. Notice the capital tag names. I still have old sites with all their tags capitalized. (shutter) That's why I still have HomeSite installed, to use the tag case conversion tool. I'm just saying.

Prior to learning these techniques, I had only used simple variables and had never used structures or arrays. I had also used the session and application scopes extensively. My big 'ah-hah' moment came when I realized that scopes are just persistent structures. The concept of structures is a big stepping point in understanding the caching capabilities in ColdFusion and in CFCs.

Here is the documentation, with some updates using today's techniques and best practices:

Avoid CFQUERY within a loop (CFLOOP or CFOUTPUT with the query attribute)

Each time you go through the loop you create a database connection, send a query, the DBMS parses the query, executes the query, returns any results or error codes and closes the connection. Database calls are the most costly process for web applications so it should be avoided as much as possible.

Solution #1: For insert/update/delete scenarios

For INSERT/UPDATE/DELETE scenarios you should instead make a larger SQL statement that consists of several queries bundled together and sent to the database as one <cfquery>. This eliminates the repetitive connection time.

Before:

<cfloop query="outerQuery">
 <cfquery name="InsertData" datasource="#dbdsn#">
  INSERT INTO TableName (Field1, Field2, Field3)
   VALUES ('#outerQuery.Value1#', #outerQuery.Value2#, '#outerQuery.Value3#')
 </cfquery>
</cfloop>

Solution:

<cfset sqlstmt = "">
<!--- build a larger sql statement by concatenating all the different inserts into a single query --->
<cfloop query="outerQuery">
 <cfset sqlstmt = sqlstmt & "INSERT INTO TableName (Field1, Field2, Field3) VALUES ('#outerQuery.Value1#', '#outerQuery.Value2#', '#outerQuery.Value3#');">
</cfloop>

<!--- execute a SINGLE CFQUERY using the sqlstatement created in the loop --->
<cfquery name="InsertData" datasource="#dbdsn#">
 #PreserveSingleQuotes(SQLSTMT)#
</cfquery>

WARNING: This technique has a security flaw that can expose your web application to cross site scripting attacks. It is now best practice to use the <cfqueryparam> tag for any dynamic query variable (anywhere you use the ## signs). If you use this technique, be sure to validate the data before creating the query string. One way around this security flaw is to loop within the <cfquery> tag.

Better:

<cfquery name="InsertData" datasource="#dbdsn#">
 <cfloop query="outerQuery">
  INSERT INTO TableName (Field1, Field2, Field3)
   VALUES (<cfqueryparam value="#outerQuery.Value1#" cfsqltype="cf_sql_varchar">,
    <cfqueryparam value="#outerQuery.Value2#" cfsqltype="cf_sql_integer">,
    <cfqueryparam value="#outerQuery.Value3#" cfsqltype="cf_sql_varchar">);
 </cfloop>
</cfquery>

For simplicity, further queries in this post will not include the <cfqueryparam> tags, but should be used for the security of your application.

Solution #2: Nested SELECT statements with 0 to 1 results

Bring back the data prior to the loop. Do NOT bring back entire tables when it is unnecessary. Use IN clauses when possible to limit the results brought back. Within the loop it is usually more efficient to use structures rather than looping through a query or array.

Before:

<!--- outer query --->
<cfquery name="getEmployees" datasource="#dbdsn#">
 SELECT FirstName, LastName, EmpID
  FROM Employees
  ORDER BY LastName, FirstName
</cfquery>

<table>
 <cfloop query="getEmployees">
  <!--- nested query A --->
  <cfquery name="getTickets" datasource="#dbdsn#">
   SELECT Count(*) As NumTickets
    FROM Tickets
    WHERE EmpID = #getEmployees.EmpID#
  </cfquery>
  <!--- nested query B --->
  <cfquery name="getRMA" datasource="#dbdsn#">
   SELECT Count(*) As NumRMA
    FROM RMA
    WHERE EmpID = #getEmployees.EmpID#
  </cfquery>
  <cfoutput>
   <tr><td colspan="2">#getEmployees.FirstName# #getEmployees.LastName#</td></tr>
   <tr><td>Tickets</td><td>#getTickets.NumTickets#</td></tr>
   <tr><td>RMA</td><td>#getRMA.NumRMA#</td></tr>
  </cfoutput>
 </cfloop>
</table>

This query generates (outerquery.recordcount * 2) + 1 database hits. For example, if the outer query generated 500 queries, this code would generate 1001 database hits.

Here is an example that generates the same number of results in 3 total queries rather than the (number of records in the first query * 2) + 1:

Solution:

<cfquery name="getEmployees" datasource="#dbdsn#">
 SELECT FirstName, LastName, EmpID
  FROM Employees
  WHERE Management = 1
  ORDER BY LastName, FirstName
</cfquery>

<!--- get all the tickets in one query --->
<cfquery name="getTickets" datasource="#dbdsn#">
  SELECT Count(*) As NumTickets, EmpID
   FROM Tickets
   WHERE EmpID IN (#ValueList(getEmployees.EmpID)#)
   GROUP BY EmpID
</cfquery>

<!--- stick the counts in a structure for easy retrieval --->
<cfscript>
 strTickets = StructNew();
 for(i=1;i LTE GetTickets.RecordCount;i = i + 1)
  StructInsert(strTickets, GetTickets.EmpID[i], GetTickets.NumTickets[i]);
</cfscript>
<!--- end structure building --->

<cfquery name="getRMA" datasource="#dbdsn#">
  SELECT Count(*) As NumRMA, EmpID
   FROM RMA
   WHERE EmpID IN (#ValueList(getEmployees.EmpID)#)
   GROUP BY EmpID
</cfquery>

<!--- stick the counts in a structure for easy retrieval --->
<cfscript>
 strRMA = StructNew();
 for(i=1;i LTE GetRMA.RecordCount;i = i + 1)
  StructInsert(strRMA, GetRMA.EmpID[i], GetRMA.NumRMA[i]);
</cfscript>
<!---end structure building--->

<table>
 <cfloop query="getEmployees">
  <!--- no nested queries --->
  <cfoutput>
   <tr><td colspan="2">#getEmployees.FirstName# #getEmployees.LastName#</td></tr>
   <tr><td>Tickets</td>
    <td>
     <cfif StructKeyExists(strTickets, getEmployees.Empid)>
      #StructFind(strTickets, getEmployees.Empid)#
     <cfelse>
      0
     </cfif>
    </td>
   </tr>
   <tr><td>RMA</td>
    <td>
     <cfif StructKeyExists(strRMA, getEmployees.Empid)>
      #StructFind(strRMA, getEmployees.Empid)#
     <cfelse>
      0
     </cfif>
    </td>
   </tr>
  </cfoutput>
 </cfloop>
</table>

Using the structure technique yields 17x faster execution time than the same results generated via nested queries.

Note the difference in the notation to retrieve the data from the structure. #StructFind(strTickets, getEmployees.Empid)# can alternatively be called by #strTickets[getEmployees.Empid].NumTickets)# which includes the name of the query column. This notation can be used for queries with multiple query columns by referencing the name of the column in the variable output.

The original documentation was written before UDFs were added to the language. This technique can also be achieved using the QueryToStructOfStructures() function from cflib.org.

<cfquery name="getEmployees" datasource="#dbdsn#">
 SELECT FirstName, LastName, EmpID
  FROM Employees
  WHERE Management = 1
  ORDER BY LastName, FirstName
</cfquery>

<!--- get all the tickets in one query --->
<cfquery name="getTickets" datasource="#dbdsn#">
 SELECT Count(*) As NumTickets, EmpID
  FROM Tickets
  WHERE EmpID IN (#ValueList(getEmployees.EmpID)#)
  GROUP BY EmpID
</cfquery>

<!--- stick the counts in a structure for easy retrieval --->
<cfset strTickets = QueryToStructOfStructures(getTickets,"EmpID")>

<cfquery name="getRMA" datasource="#dbdsn#">
 SELECT Count(*) As NumRMA, EmpID
  FROM RMA
  WHERE EmpID IN (#ValueList(getEmployees.EmpID)#)
  GROUP BY EmpID
</cfquery>

<!--- stick the counts in a structure for easy retrieval --->
<cfset strRMA = QueryToStructOfStructures(GetRMA,"EmpID")>

<table>
 <cfloop query="getEmployees">
  <!--- no nested queries --->
  <cfoutput>
   <tr><td colspan="2">#getEmployees.FirstName# #getEmployees.LastName#</td></tr>
   <tr><td>Tickets</td>
    <td>
     <cfif StructKeyExists(strTickets, getEmployees.Empid)>
      #strTickets[getEmployees.Empid].NumTickets)#
     <cfelse>
      0
     </cfif>
    </td>
   </tr>
   <tr><td>RMA</td>
    <td>
     <cfif StructKeyExists(strRMA, getEmployees.Empid)>
      #strRMA[getEmployees.Empid].NumRMA#
     <cfelse>
      0
     </cfif>
    </td>
   </tr>
  </cfoutput>
 </cfloop>
</table>

Solution #3: Nested SELECT statements with 0 to many results

In this example, two techniques are used to achieve the same end result. The first technique is using the group attribute of the CFOUTPUT tag. This technique was not applicable in either of the two previous examples. This works when all data can be returned in a SINGLE query typically via joins. Slightly more data than necessary is returned in this method, however, the time saved is worth the heavier query. This technique is less flexible but more efficient than using structures.

Before:

<cfquery name="getLocations" datasource="#dbdsn#">
 SELECT LocationID, LocationName, City, State
  FROM Location
  ORDER BY LocationID
</cfquery>

<cfloop query="getLocations">
 <cfquery name="getEquipment" datasource="#dbdsn#">
  SELECT EquipmentID, EquipmentDescription
   FROM Equipment
   WHERE LocationID = #getLocations.LocationID#
   ORDER BY EquipmentID
 </cfquery>
 <cfoutput>
  #getLocations.LocationName#<br>
  #getLocations.City#, #getLocations.State#<br>
 </cfoutput>
 <p>Equipment:<br>
 <ul>
  <cfoutput query="getEquipment">
   <li>#EquipmentID# - #EquipmentDescription#</li>
  </cfoutput>
 </ul>
 <hr>
</cfloop>

Group Solution:

<cfquery name="getEquipment" datasource="#dbdsn#">
 SELECT Location.LocationID, Location.LocationName, Location.City, Location.State,
   Equipment.EquipmentDescription, Equipment.EquipmentID
  FROM Equipment, Location
  WHERE Equipment.LocationID = Location.LocationID
  ORDER BY Location.LocationID, Equipment.EquipmentID
</cfquery>

<cfoutput query="getEquipment" group="LocationID">
 #LocationName#<br>
 #City#, #State#
 <p>Equipment:<br>
 <ul>
  <cfoutput>
   <li>#EquipmentID# - #EquipmentDescription#</li>
  </cfoutput>
 </ul>
 <hr>
</cfoutput>

When using nested queries, the number of records in the outer query is essentially the number of iterations through the loop. Since there is a query nested within each iteration plus the outer query, the number of total queries is the number of location records plus one. When using structures, only two queries will always be executed. Using the group attribute of <cfoutput>, there will always be just one query.

The structure technique used in this next example is very different than in the previous examples. In the previous examples the nested queries that were replaced all returned single rows. In this example, the nested query may return zero to many results. This requires a modification to the key-single value pair method used in the previous examples. The different keys within the structure contain lists of the recordset positions of the second query that apply to that key. If you are familiar with common data structures, this is very similar to a linked-list hash table.

Structure Solution:

<cfquery name="getLocations" datasource="#dbdsn#">
 SELECT LocationID, LocationName, City, State
  FROM Location
  ORDER BY LocationID
</cfquery>

<!--- get all the equipment in one query --->
<cfquery name="Equipment" datasource="#dbdsn#">
 SELECT EquipmentID, LocationID, EquipmentDescription
  FROM Equipment
  ORDER BY EquipmentID
</cfquery>

<!--- build the linked-list hash table --->
<!--- stick the query indicies in a structure for easy retrieval --->
<!--- example:
say the locationid is '201'
and records 1,5,17,90 all have a locationid of '201'
the structure would contain a key named '201' with the value of '1,5,17,90' --->
<cfscript>
 strEquipment = StructNew();
 for(i=1;i LTE Equipment.RecordCount;i = i + 1) {
  if (StructKeyExists(strEquipment, Equipment.LocationID[i])) {
   temp = ListAppend(StructFind(strEquipment, Equipment.LocationID[i]), i);
   StructUpdate(strEquipment, Equipment.LocationID[i], temp);
  }
  else
   StructInsert(strEquipment, Equipment.LocationID[i], i);
 }
</cfscript>
<!--- end structure building --->

<cfloop query="getLocations">
 <cfoutput>
  #getLocations.LocationName#<br>
  #getLocations.City#, #getLocations.State#<br>
 </cfoutput>
 <p>Equipment:<br>
 <ul>
  <cfset currentLocationID = getLocations.LocationID>
  <cfif StructKeyExists(strEquipment, currentLocationID)>
   <cfoutput>
    <cfloop list="#StructFind(strEquipment, currentLocationID)#" index="i">
     <li>#Equipment.EquipmentID[i]# - #Equipment.EquipmentDescription[i]#</li>
    </cfloop>
   </cfoutput>
  </cfif>
 </ul>
 <hr>
</cfloop>