PHP Excel exponentially slow on large exports

Topics: Developer Forum, User Forum
Jan 14, 2013 at 7:46 PM

I'm currently working on a large inventory management system in PHP. We sometimes have the need of exporting very large excel export reports that could sometimes have 10,000 to 50,000 rows and up.

The problem with PHPExcel is that through my own benchmarks, the memory usage becomes exponential with more and more rows, and the time it takes to generate the export becomes exponentially longer. In other words the additional time needed to generate 5,000 more rows at 5,000 rows is much longer than the time it took to generate the first 5,000 rows.

I have looked into caching mechanisms with PHPExcel and it does not solve anything. I've also looked into XML generation outside of PHPExcel, and it is not compatible with Open Office. Excel can read specific XML files, but Open Office can't.

I could just create a tab delimited file outside of PHPExcel, and that could work - but I'd much prefer the ability to create a very large export through PHPExcel directly into Excel format.

Simply increasing the execution time is not a solution as the user shouldn't have to wait 30 seconds to a minute for an excel generated file. Nor is increasing the memory as there's no need to allocate 1 to 2 gigs for excel generation.

The only solutions I've found in the discussions and through google were to increase execution time, memory size, or to use the caching method:

PHPExcel_Settings::setCacheStorageMethod()

The first two solutions do not fix the problem, and the cache storage method has not made a dent in the problem.

Is there a way to just write out the file, save it, then just append the file without creating such large objects? Or is there a way to directly convert from tab delimited files, or xml without using so much memory?

Many people have encountered the same problem, but it doesn't seem like there is a viable solution yet.

Thank you for any advice you can give me.

Coordinator
Jan 14, 2013 at 8:41 PM

Most users, when working with large exports, run them as background tasks... informing the user when they are completed, rather than downloading as part of the user request.... and the more rows of data you are working with, the more memory will be used and the longer it will take to execute.... but the memory increase is close to linear. At least, all my tests indicate that this is the case.

Memory usage should be reduced by the use of cell caching. What methods are you using, and with what optional arguments? There is a post in this thread gives some details of the affects of cell caching. Used properly, it can reduce memory usage quite significantly, though there is always a cost in execution time.

 

When you say there is no viable solution.... please recognise that we have provided solutions such as cell caching, and range styling; but cannot work miracles. PHPExcel is designed as an "in memory" representation of a spreadsheet. Unless PHP can access data outside of its own memory limits as quickly as it can access data within its memory limits... cell caching uses available options for storing cell data outside of PHP memory, but there is an overhead associated with doing so whether its to disk, database or memcache.

 

There is no way of simply editing files. PHPExcel works the way it does, building a spreadsheet representation "in memory" and then allowing it to be written to a formatted file; and there are plenty of documented methods for reducing memory usage and keeping execution times to a minimum, and work on this is still ongoing. You're welcome to look at alternatives.... I maintain a list on StackOverflow.

Jan 14, 2013 at 9:20 PM

This is the caching method I was using:

 

		$cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp; 
		$cacheSettings = array( 'memoryCacheSize' => '32MB');
		PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);	

But, it doesn't seem to make much of a dent in the performance.

We could let the file generate in the background and have it available for the user later, but they need the reports as soon as the data is available, so I don't think our users will want to wait 10 minutes for an excel file.

I also noticed that you could save raw HTML output as a table and open it in Excel, and then resave it as an Excel file and the formatting would be exactly the same as the HTML. Would there be a way for PHPExcel to optionally load an HTML table and convert it to Excel then save it?

Also, what about saving the file every N number of rows and then just appending the data to the excel file? Is this possible? If the data is only going to be written out to a cell just once, why load the entire spreadsheet into memory? Or if we don't need fancy formatting for each cell, why load formatting objects for each cell? Couldn't there be a more streamlined way of handling this?

Spreadsheet Excel Writer could write out very large excel files where PHPExcel could not - only Spreadsheet Excel Writer is old code and does not handle new Excel files or save them. So that's not a viable option either.

Our clients also don't wish to deal with CSV files, so we need to find some way of generating large Excel spreadsheets.

My point is that to create an excel spreadsheet with say 30,000 rows or more with a majority of the cells having zero formatting, why should it take so much memory and so much time to generate a file?

What about some method or some set of functions to just import a large table of data into an Excel file without setting up so many objects or using up so much memory?

Jan 14, 2013 at 9:40 PM

Update: You're right, it doesn't seem AS exponential nowonce I've done a few benchmarks. The caching seems to help so that it doesn't run out of memory, but  some of the benchmark results I've found follow:

 

[14-Jan-2013 17:31:59] Excel Generation took 16.9410340786 sec
[14-Jan-2013 17:31:59] This was 0.00242014772551 sec per row for 7000 rows
[14-Jan-2013 17:32:31] -----------
[14-Jan-2013 17:32:31] Excel Generation took 16.856844902 sec
[14-Jan-2013 17:32:31] This was 0.00187298276689 sec per row for 9000 rows
[14-Jan-2013 17:33:19] -----------
[14-Jan-2013 17:33:19] Excel Generation took 30.5762820244 sec
[14-Jan-2013 17:33:19] This was 0.00203841880163 sec per row for 15000 rows
[14-Jan-2013 17:38:06] -----------
[14-Jan-2013 17:38:06] Excel Generation took 86.8547978401 sec
[14-Jan-2013 17:38:06] This was 0.002895159928 sec per row for 30000 rows

If you'll notice, the time it took per row does start increasing for anything over 10,000 rows.

For benchmarks, I am producing a single worksheet document with 6 random numbers in 6 cells on each row.

Jan 14, 2013 at 9:52 PM

This is also interesting - there is a huge jump in time from 30,000 rows to 60,000

[14-Jan-2013 17:38:06] Excel Generation took 86.8547978401 sec
[14-Jan-2013 17:38:06] This was 0.002895159928 sec per row for 30000 rows
[14-Jan-2013 17:50:16] -----------
[14-Jan-2013 17:50:16] Excel Generation took 364.755429029 sec
[14-Jan-2013 17:50:16] This was 0.00607925715049 sec per row for 60000 rows

Coordinator
Jan 14, 2013 at 11:17 PM
Edited Jan 14, 2013 at 11:21 PM
wfarmer wrote:

This is the caching method I was using:

		$cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp; 
		$cacheSettings = array( 'memoryCacheSize' => '32MB');
		PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);	

 But, it doesn't seem to make much of a dent in the performance.

Ensure you're setting the method before instantiating your PHPExcel object. And try with the other methods. With those settings, PHPExcel should use only 32MB of memory for cells, with all cells above that point being saved to a temporary disk file, plus an index entry for each cell to allow PHPExcel to locate where in the temporary file the cell data is stored.

If you're using the Excel5 Writer, then there will be a difference in performance for larger workbooks as the file structures change once a certain bytesize threshold is reached. It's difficult to measure this in terms of rows or cells because it depends on the size of data contained in each.

wfarmer wrote:

I also noticed that you could save raw HTML output as a table and open it in Excel, and then resave it as an Excel file and the formatting would be exactly the same as the HTML. Would there be a way for PHPExcel to optionally load an HTML table and convert it to Excel then save it?

PHPExcel can do this: there is an HTML Reader than can be used to load an HTML table - though it's still very basic and doesn't support much in the way of formatting yet - and then it can be written out as an Excel file.

wfarmer wrote:

Also, what about saving the file every N number of rows and then just appending the data to the excel file? Is this possible? If the data is only going to be written out to a cell just once, why load the entire spreadsheet into memory?

Because the more complex spreadsheet file formats simply aren't structured in a way that allows this. They simply don't support editing "in situ". You can't simply create a blank file, and then "edit in" a cell. Otherwise it would be easy, and everyone would be writing Excel editors. These comprise a whole series of directories and files (in a zip archive) with information about each cell and its contents being scattered across several of those files, while row information is stored across different files, and worksheet information across others.

If you're only reading using PHPExcel, you can tell it to read only certain worksheets, or certain cells.

wfarmer wrote:

Or if we don't need fancy formatting for each cell, why load formatting objects for each cell? Couldn't there be a more streamlined way of handling this?

If you're not using formatting when creating a workbook, then there is minimal overhead for styles as it simply uses a set of default values. Cell format objects only exist if they're created, either by the Readers (to reflect cell formatting read from the loaded file) or if you set cell properties. If you're reading a workbook from a file using PHPExcel, you can explicitly tell it not to read formatting information to reduce memory. 

wfarmer wrote:

Spreadsheet Excel Writer could write out very large excel files where PHPExcel could not - only Spreadsheet Excel Writer is old code and does not handle new Excel files or save them. So that's not a viable option either.

Of course, SEW can only do a fraction of what PHPExcel can do (only write files, and only in one format, and doesn't support a lot of the spreadsheet features that PHPExcel does), and has a number of bugs (many of which have been rectified in PHPExcel). In fact, PHPExcel used the core of SEW as the basis of its Excel5 Writer.

I'm sure that the PEAR community would be more than happy if somebody would revive SEW for them; but I'm not going to take it on - I have neither the time nor the inclination to fix the faults with SEW, and expand its functionality to handle all the features that PHPExcel can handle in one library.... it's been over 10 years since PHPExcel evolved from SEW; and I have no intention of taking it all back to square one again - especially as SEW is also still subject to the PHP memory limitations, so exactly the same problems would arise again.

wfarmer wrote:

My point is that to create an excel spreadsheet with say 30,000 rows or more with a majority of the cells having zero formatting, why should it take so much memory and so much time to generate a file?

What about some method or some set of functions to just import a large table of data into an Excel file without setting up so many objects or using up so much memory?

One of the benefits of Open Source is that you can actually look at the PHPExcel code. You can look at the methods that PHPExcel uses, make your own changes to that code, keep the changes to yourself, submit them changes back to the library (its a public git repository), or publish as your own version (LGPL is wonderful). The format details for xls, xlsx and ods files are also publicly available; making it a lot easier now to see exactly how the spreadsheet details are structured in the spreadsheet files.

I spend about 3-4 hours every day, 7 days a week and perhaps 300 days a year working on improving PHPExcel, and am very aware of the memory and speed problems (I'm constantly criticised for precisely that, several tweets a day, and was called the "god of shit" just last week for precisely that reason). Every change I make gets tested extensively to ensure it doesn't adversely affect performance, and I spend a lot of that time trying new methods to reduce memory and execution time - go back to PHPExcel 1.7.3 or 1.7.4 to see how much things have improved since then. Currently, I'm rewriting the XML-based readers (Excel2007, OOCalc and Gnumeric) to use the push-based XMLReader rather than SimpleXML, and completely rewriting the calculation engine to boost its speed.

 

Jan 14, 2013 at 11:31 PM

Thanks. It seems the memory isn't much of a problem now, but the time is the greater issue. Our users don't want to wait 5-8 minutes for an excel export. I'll have to make sure our version of PHPExcel is at head, then go over your recommendations to see if I can make it faster.

You're doing a fine job with the script, and if there are some improvements I can make to make it faster I'll let you know.

Coordinator
Jan 15, 2013 at 12:33 AM

There's a wealth of advice here on the forum for improving speed, such as setting styles across ranges rather than for each individual cell, and using the fromArray() method for setting several style features in one go, to setting preCalculateFormulas FALSE in the Writer. The current github develop branch isn't yet stable enough for a 1.7.9 release, but there have been a number of speed improvements already implemented there, and I should be doing a new release sometime toward the end of February with all those changes.