More on less memory and faster execution

Topics: Developer Forum, User Forum
Jul 9, 2008 at 10:02 PM
Edited Jul 9, 2008 at 10:04 PM

Hi Maarten,

As much as I like PHPExcel, creating Excel files has still been giving me headaches over the last few weeks. I’ve figured out a solution and I want to see if we can incorporate it into PHPExcel.

My current PHPExcel related project involves outputting database contents to an Excel file.  A typical output file consists of a total of 134 columns distributed over multiple worksheets and each worksheet having up to 1500 rows leading 210.000 cells in total.

My current issue is with speed. For a medium file, say 250 rows and up PHP Excecution time surpasses 60 seconds. Upgrading it to 300 seconds both in php.ini and apache’s httpd.conf extends capabilities but merely postpones the problem. Moreso… waiting users are complaining users and there is no way I can get away with this. I’ve been reading about more people running into this so I’ve been doing some research into solving this.

Now, before I go into it I just want to make clear that I understand it’s all a matter of what software is designed for. So, as much as do have some issues in my particular usage of the library I don’t think of it less brilliant. I love the way it is set up, especially the API and the flexibility. Personally I mostly use it as an outputting engine but do consider the Writer classes as just one aspect of a wide functionality. And, as usual, if you zoom in on one thing observations may appear out of context but I do want to make sure you understand that it’s not my intent.

Anyway: here’s what I’ve found and I assume most of this will be familiar to you:

1. PHPExcel stores the spreadsheet in memory twice.

Ofcourse, the PHPExcel object only needs to exist once for a single workbook, but as soon as I save the workbook the Writer class constructs the actual xml sheet files (xl/worksheets/sheet1.xml, etc) inside a string before adding them to the zip file. To me this is clearly beyond obvious because we need to construct the file, but it does mean that a second representation of the spreadsheet is created when we save.

2. Multiple cell parses

If we populate and save a 100.000 cell worksheet we parse all the data while copying it from the database into the worksheet (creating cells, styles, etc.). Then, if we save it, we parse it again from the Writer classes.

3. Shared strings

If we populate the database with many similar strings we create many copies of the same data in memory. The Writer classes consolidate the data into a shared string table. However, in the meantime we do use a lot of memory.

These three items make perfect sense to me. Especially if the intent is to have an in memory spreadsheet with capabilities of modification and a rich feature set. However, if we look purely and solely from the point of view in which we want to output data from a database to an excel file these steps aren’t necessarily needed, in fact may be slowing down the process and I’m thinking of finding a way to have this rich and flexible feature set that is in PHPExcel now, while maintaining a very high performance level if we just want to use it to save data, to effectively use it as a conversion method, really.

Here’s what I tried:

I created a completely different lite version of PHPExcel from scratch. It stores cell data similar to PHPExcel with PHPExcel_Cell classes. Styles are not it yet. The basic interface is similar to PHPExcel, yet very limited for now.

While constructing the sheets, after writing all the cell information to <sheetData> but before finishing that section, I created an indefinite loop structure that triggers an OnTraverse() event for every iteration and then OnCreateData() for every sheet individually.

In the OnTraverse() event the calling application may forward the pointer to a datasource, as in ‘go to the next record’. In OnCreateData the calling application returns a set of data that represents one single row in the spreadsheet. The Writer saves it into the constructed worksheet xml files immediately, bypassing the creation of cell and style classes. All content is passes through a sharedStrings filter first and as such compressed.

Result: Writing 3.000 rows of 56 columns (168.000 cells) consumed 23Mb peak memory and 7 seconds. That is but a fraction of PHPExcel which required almost 5 minutes and 165Mb peak memory.

So, I guess the question is: can we find a way to incorporate the best of both worlds (I hope so)… or should PHPExcel have a sister library that is primarily focused on outputting data fast and quickly (I hope not).

I will email you the source code of what I did but I’m posting this here anyway, so that anyone can join in the thinking.

Let me know what you think. There is so much more to this than just an idea. I think it also comes down to how you want to position PHPExcel.




Jul 10, 2008 at 1:27 PM
Will look into this later this week.
Jul 11, 2008 at 7:44 AM
Hello Joost,

The whole background story is right: vision on a project makes a project evolve in a certain direction that may cause other usage to be slower/less usable. Luckily, there's open source projects where one can suggest to also look from another vision, for which I thank you. It is true that PHPExcel is reather slow on large amounts of data, especially if there's not formatting involved and data just needs to be "dumped" in an XLSX file.

I've had a look at the files you provided, and clearly see the concept you are trying to build. That said, I think this is quite error-prone to beginner developers using PHPExcel --> one needs to know how SpreadsheetML works, i.e. it is very possible to "break" output and create an invalid XLSX file.

However, your solution IS performant! And I indeed think we should find a method to incorporate this into PHPExcel without making too much concessions on features, for example adding formula's in a callback should still be possible in my opinion, no? Styles is a little bit harder and perhaps a thing to worry about later on.

Will see if I can find some time to incorporate your solution, thank's for the suggestions!

Jul 11, 2008 at 7:46 AM
A work item has been created:
Jul 11, 2008 at 12:20 PM

Hi Maarten,




You’re making a brilliant point by the way in your second paragraph. The concept I sent you is obviously very simple and because the callback is free to return an array with any data in it the chances for “breaking” output are very high. Clearly the concept displays the potential for improved performance and clearly it lacks the level of abstraction and validation we would expect from a final implementation.


From how I look at it, working on the final implementation requires a bit more thinking than simply making the performance point I just did and I haven’t gone so far yet.


Some ideas that might be helpful:


-          The concept included shared strings, not only in the output file but also for the in memory storage. This may not be necessary for the objective. Leaving that aspect out improves chances for compatibility with existing code.


-          I suggest anything that addresses the callback in the Writer class is way smarter and more intelligent than simply allowing the application to return an array. I wonder if we could somehow create an interface, similar to PHPExcel_Worksheet through which the callback has to pass it’s data. (e.g. function callback(PHPExcel_Writer_Worksheet $interfaceWorksheet, PHPExcel_Worksheet $callForSheet, [int] $callForRow). This interface would then incorporate most features (including formulas, shared formulas, merge cells, etc.) similar to PHPExcel_Worksheet but not all features (e.g. no calculation). Writing data to PHPExcel_Writer_Worksheet should only be allowed to the particular row pointed to by $callForRow.


-          Eventually styles become shared in SpreadsheetML. So, maybe applying a style during callback(…) me be done through a method similar to something like CopyStyle(). However, in PHPExcel_Worksheet and related classes this would cause the creation of more in memory objects while PHPexcel_Writer_Worksheet would figure out which of the SpreadsheetML shared styles is being applied so that they can be referenced directly.


-          We could leave an option open for the caller to choose whether to parse rows, then sheets or sheets then rows, depending on their kind of content.


Anyway… before I get lost ;)


What would you say if I come over to Belgium? We'll meet, have one or two of those wonderful beers you have over there and we'll spend a couple of hours aligning ideas and figuring out how we can merge the best of both worlds? After that I’ll be happy to spend time writing code for the work item. I bet that whatever we come up with wi'll be quite a bit of work but the truth is that writing an alternative that works for me will probably be at least as much work, yet less fancy. I'd prefer working on making PHPExcel better.




Jul 14, 2008 at 10:26 AM
Hello Joost,

Some ideas on how this can be implemented flexible yet error-proof...
Each worksheet can define 0, 1 or more "template" cells, each worksheet can define 0, 1 or more "template" styles. These templates are not rendered, but can be used for data-dumping. Using this approach, one can make a row template out of 8 cell templates, style template with alternating rows, ... On the worksheet, a property is set when to call the data-dumping procedures, for example if the writer reaches row 10, data-dumper is called.
This data-dumper class will fetch 1 row of data and return this data, also telling which cell and style template to use. Data-dumper class can decide to add alternating row styles, intermediate totals, ...
Optionally, when data-dumper has finished, the writer will continue normal processing, so you can add some other cells after the data dump.

The cell/row templates will obviously not create any objects, just fix references in OpenXML to a style definition for example.

Let me know what you think.


Jul 16, 2008 at 9:42 PM
Edited Jul 16, 2008 at 9:43 PM
Hi Maarten,

Sorry for being a bit slow. The weather was great and I've finally gotten the boat I bought some time back. I've been cruising the canals of Amsterdam all week. Some things simply outweigh everything else in the world ;)

Essentially what you are saying is the idea. The data-dumping procedures could even be a class wich implements PHPExcel_Interface_DataDumper. Obviously PHPExcel be checking the returned data.

Exactly how you see the template files I do not see but there's benefit of the doubt to be given here ;) Essentially the data dumper must have a way to know where it's dumped data ends up. It could dump to a column (like in my examply), to a name or may return data upon being queried ('give data for column X'). Either way would work.

For styles to be assigned we need to incorporate a way for the data dumper to refer to styles. Option 1: 'set style equivalent to that of A1' and then the writer knows what the OpenXML style Id of A1 is. Option 2: relate this to shared styles. Option 3, set styles for columns, Option 4... etc.

Jul 17, 2008 at 7:41 AM

Hello Joost,

First things first :-) Enjoying the weather is something you can't seem to do much this summer, so no one will blame you :-)

Seems my idea was just too conceptual, so let's do some "virtual changes" to PHPExcel.

  • PHPExcel_Cell contains a property isTemplateCell
    • If this property is true, the writer will NOT include this cell upon rendering
    • If it's false (default), normal behaviour is expected
  • No changes with styles --> even this template cell can have a style!
  • PHPExcel_Worksheet contains a property dataDumper of type PHPExcel_IDataDumper
    • This property is sheet dependent, so each worksheet can have its own dumper
  • PHPExcel_IDataDumper has the following structure (some sort of pseudo-php code):
    • {{interface PHPExcel_IDataDumper {

        // Reference to sheet
        PHPExcel_Worksheet $sheet;

        // Determines if the dumper process will start
        bool dumperShouldStart(int currentRowRendering);

        // Has more rows? Also advances the row pointer
        bool nextRow();

        // Dumps a row of data
        // Example row:
        // $row['ColumnCount'] = 2;
        // $row['A']['CellTemplate'] = $this->sheet->getCell('templatedcellRowA');
        // $row['A']['CellValue'] = 'Test';
        // $row['A']['CellCalculatedValue'] = 'Test';
        // $row['A']['Style'] = 'style_hash_code';
        // $row['B']['CellTemplate'] = $this->sheet->getCell('templatedcellRowA');
        // $row['B']['CellValue'] = 'Test';
        // $row['B']['CellCalculatedValue'] = 'Test';
        // $row['B']['Style'] = 'style_hash_code';
        // Possible that each call returns a different style, for example:
        // $row['A']['Style'] = 'alternating_style_hash_code';
        // $row['B']['Style'] = 'alternating_style_hash_code';
        array currentRow();

  • PHPExcel writer follows this flow:
    • Write row 1
    • dumpShouldStart()? --> false
    • Write row 2
    • dumpShouldStart()? --> false
    • Write row 3
    • dumpShouldStart()? --> true
    • while ($dumper->nextRow())
      • write $dumper->currentRow()
    • Write row X
    • dumpShouldStart()? --> false
    • Write row Y
    • dumpShouldStart()? --> false

There are some alternatives to consider:

  • Move the template cells into PHPExcel_IDataDumper
  • Move styles into PHPExcel_IDataDumper
  • Automatically set pre-calculate formulas to false on the writer (performance!)
  • Add multiple dumpers to a sheet so you can add "classic" rows follwed by a dump followed by classic followed by another dump ...


Jul 17, 2008 at 9:07 PM
Edited Jul 17, 2008 at 9:09 PM
I like it...

The data dumper goes pretty much the way I forsee it. I now understand your proposed template cells are slightly different from my original understanding. I understand you don't intent to make C12, or B2 a template cell. Instead you intend to add a cell object and give it a name. I like that.

Then, conceptually I would say that a template cell collection is different from a cell collection. So, as opposed to having isTemplateCell (or in addition too that) create PHPExcel_Worksheet::$_templateCellCollection in addition to PHPExcel_Worksheet::$_cellCollection. This *may* be just philosophical and symantics but the writers all read cellCollection and may assume it's only the actual cells. Also, in _cellCollection[ref] the ref identifier may mean a location on the sheet or may mean an abstract name. That's two different things.

Moving the templates to PHPExcel_DataDumper is a different question. It is in line with splitting the two collections but but the big question here is: is a template part of the sheet or is it not. Is it? Is it not? Is it? Is it not? My stomach is telling me that either way will have consequences in the future but right now I'm not able to come up with any logical reason to prefer either one. I'm not picky about it though. I know for sure that it will work both ways. However if you split the two collections, your style consolidation code in the writer must be adapted to take both sng tyles into account.

I would think the styles would go with the cells so whereever the cells are the styles are. I see them as part of the cell like I think you do. Getting the 'alternating_style_hash_code' (I assume this refers to OpenXML's styles.xml style index) will need to find a way, but I'm sure you can figure that out. Possibly by adding the has code to the style object or so.

I like multiple dumpers. You would have to know up front how many rows will be dumped by your dumper to know which rows to set as a footer as real cells in the worksheet though. Some databases may find that difficult, but that's another software layer.

Oh, definately allow for not precalculating the formulas. Yes, please. Did I say I like that? By the time we load this in Excel it will be so much faster to calculate than PHPExcel can ever be.


PS: Anyone else following this thread with suggestions?
Jul 18, 2008 at 7:05 AM
Hello Joost,

This is staring to be a nice, long series of posts :-)

I agree with a different cell collection for cell templates, this way there's no need for all writers to be adapted immediately. Perhaps template styles should also be located in a different property? This way we are sure that we're not writing styles that aren't used in the first place.

About these hash codes: these are simply the hash codes that are generated when calling PHPExcel_Style->getHashCode(). The collection of styles is indexed based on that hash, and can thus return the corresponding style (and its XLSX-id) very fast.

Eventually, the dumper can receive more context in the "dumperShouldStart" method. Perhaps you want it to start right after another dumper? Or when a cell contains the text "STARTDUMPHERE" or when the weather is actually summer-ish for one day.

Jul 18, 2008 at 10:21 PM


As I'm thinking about it... why would actually still need this line?

$row['A']['CellTemplate'] = $this->sheet->getCell('templatedcellRowA');

After all, we write the value, the calculated value and the style directly into $row[].

Maybe all we need is shared styles and data dumper or am I missing something?


Jul 22, 2008 at 5:22 PM

Good thinking, Joost! Probably also a cell data type (like with setCellValueExplicit) which allows the writer to determine what type of cell it should render.

Will be on vacation for two weeks starting this friday, I propose we wait with implementation untill after that? Feel free to do some proof-of-concept stuff in the meanwhile :-)


Jul 23, 2008 at 11:20 AM
Edited Jul 23, 2008 at 11:23 AM
Hi Maarten, hi Joost!
Sorry of my pure English. Try to understand it, please. As I understand, you would like to implement “template styling” inside code again. Sorry if I’m wrong. If not, please listen my short story.
This week a new girl- designer/web-designer- was getting in my company. In next month she will upgrade the design of my site and generated documents (invoices, orders, reports etc).  As you understand she is very stupid in programming. Absolutely zero! But good enough in CSS. So upgrading of the site design is probably because of CSS. My point is to provide the same cooperative work style into PHPExcel. I’m imaging ideal as:
1. She (or anybody else) prepare “template” in pure Excel. “Pure Excel” means that she has a PC with one installed program Microsoft Excel only. She opens Microsoft Excel and designs template without PHP-coding absolutely.
2. I (any other programmer) read this template by lReader and fill it by data only. “Data only” means no styling in PHP-code absolutely.
3. Writing Excel by IWriter.
The main point is to outside formatting and styling of document out from PHP to Excel. This is much more flexible and perspective. Just image how easy it will to redesign this report in next month (and how difficult it now). And how interesting is to open the design ability to another people. This is much needed to middle and heavy projects to participate and cooperate work.

Looks like I wrote before you start implementing.
Sorry of my English again.

Jul 23, 2008 at 12:10 PM
Hello Alexey,

Seems to be a different suggestion afterall. Nevertheless, a good one :-)
Will discuss this after my vacation, half August.

Aug 11, 2008 at 2:34 PM
Back! Joost, any news on this one?   
Aug 11, 2008 at 4:09 PM

Welcome back Maarten. I hope you had a good time.

No, no developments here. Time pressured, I took the proof of concept I sent you earlier and turned in into something I could satisfy my client with. We went live just last week. My own vacation started a couple of days ago and I'll be gone untill september 1st. I'll catch up when I get back and we'll see where we can go then.


Aug 25, 2008 at 8:12 PM
Is there any way I can get a hold of those files you posted for the lite version?  I am doing something very similar and the performance of PHPExcel is causing me a lot of trouble.