Jan 15, 2013 at 12:17 AM
Edited Jan 15, 2013 at 12:21 AM
This is the caching method I was using:
$cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp;
$cacheSettings = array( 'memoryCacheSize' => '32MB');
But, it doesn't seem to make much of a dent in the performance.
Ensure you're setting the method before instantiating your PHPExcel object. And try with the other methods. With those settings, PHPExcel should use only 32MB of memory for cells, with all cells above that point being saved to a
temporary disk file, plus an index entry for each cell to allow PHPExcel to locate where in the temporary file the cell data is stored.
If you're using the Excel5 Writer, then there will be a difference in performance for larger workbooks as the file structures change once a certain bytesize threshold is reached. It's difficult to measure this in terms of rows or cells because it depends
on the size of data contained in each.
I also noticed that you could save raw HTML output as a table and open it in Excel, and then resave it as an Excel file and the formatting would be exactly the same as the HTML. Would there be a way for PHPExcel to optionally load an HTML table and convert
it to Excel then save it?
PHPExcel can do this: there is an HTML Reader than can be used to load an HTML table - though it's still very basic and doesn't support much in the way of formatting yet - and then it can be written out as an Excel file.
Also, what about saving the file every N number of rows and then just appending the data to the excel file? Is this possible? If the data is only going to be written out to a cell just once, why load the entire spreadsheet into memory?
Because the more complex spreadsheet file formats simply aren't structured in a way that allows this. They simply don't support editing "in situ". You can't simply create a blank file, and then "edit in" a cell. Otherwise it would be
easy, and everyone would be writing Excel editors. These comprise a whole series of directories and files (in a zip archive) with information about each cell and its contents being scattered across several of those files, while row information is stored across
different files, and worksheet information across others.
If you're only reading using PHPExcel, you can tell it to read only certain worksheets, or certain cells.
Or if we don't need fancy formatting for each cell, why load formatting objects for each cell? Couldn't there be a more streamlined way of handling this?
If you're not using formatting when creating a workbook, then there is minimal overhead for styles as it simply uses a set of default values. Cell format objects only exist if they're created, either by the Readers (to reflect cell formatting read from the
loaded file) or if you set cell properties. If you're reading a workbook from a file using PHPExcel, you can explicitly tell it not to read formatting information to reduce memory.
Spreadsheet Excel Writer could write out very large excel files where PHPExcel could not - only Spreadsheet Excel Writer is old code and does not handle new Excel files or save them. So that's not a viable option either.
Of course, SEW can only do a fraction of what PHPExcel can do (only write files, and only in one format, and doesn't support a lot of the spreadsheet features that PHPExcel does), and has a number of bugs (many of which have been rectified in PHPExcel).
In fact, PHPExcel used the core of SEW as the basis of its Excel5 Writer.
I'm sure that the PEAR community would be more than happy if somebody would revive SEW for them; but I'm not going to take it on - I have neither the time nor the inclination to fix the faults with SEW, and expand its functionality to handle all the features
that PHPExcel can handle in one library.... it's been over 10 years since PHPExcel evolved from SEW; and I have no intention of taking it all back to square one again - especially as SEW is also still subject to the PHP memory limitations, so exactly the same
problems would arise again.
My point is that to create an excel spreadsheet with say 30,000 rows or more with a majority of the cells having zero formatting, why should it take so much memory and so much time to generate a file?
What about some method or some set of functions to just import a large table of data into an Excel file without setting up so many objects or using up so much memory?
One of the benefits of Open Source is that you can actually look at the PHPExcel code. You can look at the methods that PHPExcel uses, make your own changes to that code, keep the changes to yourself, submit them changes back to the library (its a public
git repository), or publish as your own version (LGPL is wonderful). The format details for xls, xlsx and ods files are also publicly available; making it a lot easier now to see exactly how the spreadsheet details are structured in the spreadsheet files.
I spend about 3-4 hours every day, 7 days a week and perhaps 300 days a year working on improving PHPExcel, and am very aware of the memory and speed problems (I'm constantly criticised for precisely that, several tweets a day, and was called the "god
of shit" just last week for precisely that reason). Every change I make gets tested extensively to ensure it doesn't adversely affect performance, and I spend a lot of that time trying new methods to reduce memory and execution time - go back to PHPExcel
1.7.3 or 1.7.4 to see how much things have improved since then. Currently, I'm rewriting the XML-based readers (Excel2007, OOCalc and Gnumeric) to use the push-based XMLReader rather than SimpleXML, and completely rewriting the calculation engine to boost its