cfExcelProxy - Optimizing A Query Dump
I've been trying to finalize the rewritten version of cfExcelProxy, and one of the final obstacles I'm trying to get across was reducing the time required when populating an Excel page. In most situations, processing time is rather short. However, in one of my examples I need to dump a query with 4000 rows and 25+ columns... that's over 100k different cells!
In my first draft, the setCellsFromQuery method simply looped over the query and columnlist and turned the value over to the setCell method. Using this logic, the setCellsFromQuery method took a whopping 120 seconds to run! I don't know about you, but that's a little too much. Looking through setCell's code I realized that there were a couple of private method calls to make sure the workbook was ready to receive a setCell command. In our context, this validation would only need to happen once, before we even loop over the query.
WIth this new logic in mind, I removed the reference to the setCell method from setCellsFromQuery and decided to set the cells' values directly. Running the method again I managed to reduce processing time by 75%!
Total processing time setting the cells' values directly from setCellsFromQuery: 30 seconds.
This is already much better, however I still wasn't satisfied. I placed a couple of timers around my methods, trying to identify which portions were slowing down the processing. After a couple of refreshes and outputs, I managed to find the culprits. When looping over my query and columns, I made calls to the private __getRow and __getCell methods. These methods simply return a reference to the HSSFRow and HSSFCell classes. The reason why they have their own methods is that you need to check if the row or cell exists before returning it and I didn't want to have to check for the same logic every time I needed a reference to a row or cell. However, these calls were taking much more time than I would've liked, so I copied the logic out back into my setCellsFromQuery method. After this new modification, processing time was even further reduced!
Total processing time without referencing __getRow and __getCell: 8-9 seconds.
This last improvement left me baffled... I was using the exact same logic in setCellsFromQuery that I was within my private methods. Why was it suddenly taking less time? I can understand a little overhead, but shouldn't I be able to abstract logic into separate methods without sacrificing processing time? I'm really at a loss as to how I could improve this. Although I'm satisfied with the new and improved processing time (under 10 seconds for 100k+ cells on my laptop is fine for me) I was kind of hoping I could encapsulate my method calls better. If there are any ideas out there I would be really happy to hear from you.
As always, the project can be checked out from the repository. Just follow the instructions on the project's page. In the meantime, here are two snapshots of the setCellsFromQuery method.
Calling __getRow and __getCell.
<cfargument name="query" type="query" required="true" />
<cfargument name="row" type="numeric" required="false" />
<cfargument name="column" type="numeric" required="false" />
<cfargument name="cell" type="string" required="false" />
<cfargument name="columnList" type="string" required="false" />
<cfargument name="headerList" type="string" required="false" />
<cfargument name="showHeaderRow" type="boolean" required="true" default="true" />
<cfargument name="headerClass" type="string" required="false" />
<cfargument name="bodyClass" type="string" required="false" />
<cfargument name="altRowClass" type="string" required="false" />
<cfset var locals={} />
<cfset variables.__preCheck() />
<cfset locals.cellcoords=variables.__parseCell( arguments ) />
<cfset variables.__validateSheetSelected() />
<cfif structKeyExists(arguments, "columnList")>
<cfset locals.columnList=argumetns.columnList />
<cfelse>
<cfset locals.columnList=arguments.query.columnList />
</cfif>
<cfif structKeyExists(arguments, "headerList")>
<cfset locals.headerList=arguments.headerList />
<cfelse>
<cfset locals.headerList=locals.columnList />
</cfif>
<cfif arguments.showHeaderRow>
<cfloop from="1" to="#listLen(locals.headerList)#" index="locals.h">
<cfset locals.row=variables.__getRow( locals.cellcoords.row ) />
<cfset locals.cell=variables.__getCell( locals.row, locals.cellcoords.column + locals.h - 1 ) />
<cfset locals.cell.setCellValue( listGetAt( locals.headerList, locals.h ) ) />
</cfloop>
<cfset locals.cellcoords.row++ />
</cfif>
<cfloop query="arguments.query">
<cfloop from="1" to="#listLen(locals.columnList)#" index="locals.c">
<cfset locals.row=variables.__getRow( locals.cellcoords.row ) />
<cfset locals.cell=variables.__getCell( locals.row, locals.cellcoords.column + locals.c - 1 ) />
<cfset locals.cell.setCellValue( arguments.query[ listGetAt( locals.columnList, locals.c ) ][ currentrow ] ) />
</cfloop>
<cfset locals.cellcoords.row++ />
</cfloop>
<cfreturn this />
</cffunction>
<cffunction name="__getRow" access="private" returntype="any" output="false">
<cfargument name="row" type="numeric" required="true" />
<cfset var locals={} />
<cfset locals.row=variables.instance.activeSheet.getRow( arguments.row ) />
<cfif not structKeyExists( locals, "row" )>
<cfset locals.row=variables.instance.activeSheet.createRow( arguments.row ) />
</cfif>
<cfreturn locals.row />
</cffunction>
<cffunction name="__getCell" access="private" returntype="any" output="false">
<cfargument name="row" type="any" required="true" />
<cfargument name="cell" type="numeric" required="true" />
<cfset var locals={} />
<cfset locals.cell=arguments.row.getCell( arguments.cell ) />
<cfif not structKeyExists( locals, "cell" )>
<cfset locals.cell=arguments.row.createCell( arguments.cell ) />
</cfif>
<cfreturn locals.cell />
</cffunction>
Getting the row and cell references directly.
<cfargument name="query" type="query" required="true" />
<cfargument name="row" type="numeric" required="false" />
<cfargument name="column" type="numeric" required="false" />
<cfargument name="cell" type="string" required="false" />
<cfargument name="columnList" type="string" required="false" />
<cfargument name="headerList" type="string" required="false" />
<cfargument name="showHeaderRow" type="boolean" required="true" default="true" />
<cfargument name="headerClass" type="string" required="false" />
<cfargument name="bodyClass" type="string" required="false" />
<cfargument name="altRowClass" type="string" required="false" />
<cfset var locals={} />
<cfset variables.__preCheck() />
<cfset locals.cellcoords=variables.__parseCell( arguments ) />
<cfset variables.__validateSheetSelected() />
<cfif structKeyExists(arguments, "columnList")>
<cfset locals.columnList=argumetns.columnList />
<cfelse>
<cfset locals.columnList=arguments.query.columnList />
</cfif>
<cfif structKeyExists(arguments, "headerList")>
<cfset locals.headerList=arguments.headerList />
<cfelse>
<cfset locals.headerList=locals.columnList />
</cfif>
<cfif arguments.showHeaderRow>
<cfloop from="1" to="#listLen(locals.headerList)#" index="locals.h">
<cfset locals.row=variables.instance.activeSheet.getRow( locals.cellcoords.row ) />
<cfif not structKeyExists( locals, "row" )>
<cfset locals.row=variables.instance.activeSheet.createRow( locals.cellcoords.row ) />
</cfif>
<cfset locals.cell=locals.row.createCell( locals.cellcoords.column + locals.h - 1 ) />
<cfset locals.cell.setCellValue( listGetAt( locals.headerList, locals.h ) ) />
</cfloop>
<cfset locals.cellcoords.row++ />
</cfif>
<cfloop query="arguments.query">
<cfloop from="1" to="#listLen(locals.columnList)#" index="locals.c">
<cfset locals.row=variables.instance.activeSheet.getRow( locals.cellcoords.row ) />
<cfif not structKeyExists( locals, "row" )>
<cfset locals.row=variables.instance.activeSheet.createRow( locals.cellcoords.row ) />
</cfif>
<cfset locals.cell=locals.row.createCell( locals.cellcoords.column + locals.c - 1 ) />
<cfset locals.cell.setCellValue( arguments.query[ listGetAt( locals.columnList, locals.c ) ][ currentrow ] ) />
</cfloop>
<cfset locals.cellcoords.row++ />
</cfloop>
<cfreturn this />
</cffunction>


To help speed up the process a little I would recommend you use Array's instead of Lists when you are retrieving the list of columns for the query.
Just a thought.
Thank You
-Hem
Thanks for the info! I've just tried changing my lists to arrays, and I've indeed shaved a couple of seconds off of my processing time (both methods).
However, calling the private methods to retrieve row and cell still takes about twice as long as taking the logic to setCellsFromQuery.
http://www.batteryfast.co.uk/laptop-ac-adapter/com... AC Adapter for Compaq 6.5A 18.5V Pavilon ZD7000 NX9500
http://www.batteryfast.co.uk/dell/d620.htm dell d620 battery,