Writing an Excel2007 file from pre-constructed array

Topics: Developer Forum
Mar 7, 2013 at 9:22 AM
Hi,

I'm dealing with a large chunk of rows record file. Apparently, I've managed to get all the data, and re-constructed it into an array as per below:
$lines_data = array(
  ['A1'] => 'id',
  ['B1'] => 'fname',
  ['C1'] => 'postalcode',
.....
  ['BK1'] => 'customhead30',
  ['BL1'] => 'customval30',
.....
  ['A2'] => '1001',
  ['B2'] => 'sarah',
  ['C2'] => '06950',
.....
  ['BK2'] => 'dept',
  ['BL2'] => 'IT',
.....
);
After finished constructing the array, I proceed with the Excel Writing. While debugging, I captured the load time for the array reconstruction, and it just run for less than 0.9 second for about 6K of data. Which I consider acceptable.

However, After the process of writing into Excel started, the time runs exceed beyond my expectation, which from test, around 20-25 seconds.
foreach ($lines_data as $key_cols => $write){
    if (array_key_exists($key_cols, $text_format_array)){ //set formatting for some selected data
        $objPHPExcel->getActiveSheet()->getCell($key_cols)
                            ->setValueExplicit($write, PHPExcel_Cell_DataType::TYPE_STRING); //set as text if this is the selected data
    } else {
        $objPHPExcel->getActiveSheet()->setCellValue($key_cols, $write); //default formatting
    }
}
It checks with another sets of array that contains cells that need to be formatted into text as per array sample below:
$text_format_array = array(
    ['C2'] => 'C2',
    ['C3'] => 'C3',
    ['C4'] => 'C4',
......
);
Does this slow-ness issues is coming from checking for the formatting of selected cells, or I'm doing this wrong?

Any Idea about this?

Thanks in advance.
Mar 10, 2013 at 4:46 PM
Hum... A few ideas...
If you pre-built an array with the data, why not make more depending on the data type (or an array with an extra dimension that contains the type)? This would prevent the test in the loop. On the other hand, there should be multiple loops (or two nested loops).
Use setValueExplicit is a good idea, but why limit to the String type.

Alternative: as you have many columns, the fromArray method could make a profit, by writing a whole line at once, unfortunately, this method does not support the explicit declaration of the type.
Depending on the source of your data, this could prevent you from getting to build your first table: a query returns an array that you pass directly to PHPExcel

Test one and other should not ask for writing lots of code.

I'm not sure that your method takes so long that, but you can measure it by not doing other actions within the loop.
Your table $texte_format_array containing keys and values identical, why not fill it only with values and test with in_array?
Mar 11, 2013 at 3:59 AM
Edited Mar 11, 2013 at 4:00 AM
LWol wrote:
Hum... A few ideas...
If you pre-built an array with the data, why not make more depending on the data type (or an array with an extra dimension that contains the type)? This would prevent the test in the loop. On the other hand, there should be multiple loops (or two nested loops).
Initially, I have done the nested loop. But, I think the nested loop are slowing down the performance of the process, so I make it to run only one loop.
Use setValueExplicit is a good idea, but why limit to the String type.
As of now, I only need the String type to be formatted for certain cells. The rest can just use default formatting. However, as per your suggestion, I can try putting the type as the key, i.e:
$lines_data = array(
  ['A1_default'] => 'id',
  ['B1_default'] => 'fname',
  ['C1_string'] => 'postalcode',
.....
Or you have any other suggestion that might make this thing easier. If possible, I would like to avoid using multidimensional array as it would require writing a nested loop.
Alternative: as you have many columns, the fromArray method could make a profit, by writing a whole line at once, unfortunately, this method does not support the explicit declaration of the type.
Depending on the source of your data, this could prevent you from getting to build your first table: a query returns an array that you pass directly to PHPExcel

Test one and other should not ask for writing lots of code.
I tried using the fromArray method, but resulting in error. Apparently, the method tried to generate the cell coordinate automatically, in which I already supply the coordinates beforehand. I'm not sure whether I'm using it correctly or not, but, AFAIK, the usage is quite straight forward. However, if you could provide me with some sample that I can follow, then, maybe I can try to re-adjust the code as per your sample.
I'm not sure that your method takes so long that, but you can measure it by not doing other actions within the loop.
Your table $texte_format_array containing keys and values identical, why not fill it only with values and test with in_array?
If I merge the type together with the keys as per illustrated above, this will not be an issue anymore, right?

All and all, thanks for the suggestion. I'll try it and provide with some result later.
Mar 11, 2013 at 4:51 PM
You need to impose the formatting for some cells, such as a postal code that can begin with a zero, but is not the sole purpose of setValueExplicit, its use avoids PHPExcel wasting time to divination.
Hence my suggestion to create your data tables based on this: you have no test in loops.
$stringArray = array('A1'='>Name", 'B1'=>Town, 'C1'=>'ZipCode');
$numericArray = array ('D2' = > 123, 'F2' = > 345);
$F =$ objPHPExcel-> getActiveSheet();
foreach($stringArray as $Coord=>$Value) $F-> setCellValueExplicit($Coord, $Value, PHPExcel_Cell_DataType::TYPE_STRING);
foreach($numericArray as $Coord=>$Value) $F-> setCellValueExplicit($Coord, $Value, PHPExcel_Cell_DataType::TYPE_NUMERIC);

You can see also I stores the worksheet in a variable before the loop to avoid calling the method many times within it, similarly, I use the method setCellValueExplicit of the worksheet object to avoid calling getCell.
When I spoke of nested loops and additional dimension, I thought a picture of this style:
$data = array (PHPExcel_Cell_DataType:TYPE_STRING => array('A1'='>Name", 'B1'=>Town, 'C1'=>'ZipCode'),
PHPExcel_Cell_DataType::TYPE_NUMERIC => array ('D2' => 123, 'F2' => 345));
and two loops:
foreach($data as $cellType=>$dataCells) {
foreach($dataCells as $Coord=>$Value) {
$F->setCellValueExplicit($Coord, $Value, $cellType);
}
}
Perhaps a little more resource intensive.
Mar 12, 2013 at 6:53 AM
Edited Mar 12, 2013 at 6:55 AM
Hi LWOL, Thanks for the idea.

The method works. The execution time did reduced. I had no problem processing 10k rows record. However, I always get Memory Exhausted Error Message when processing 13K row of records onwards.
Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 64 bytes) in /var/www/myapp/application/third_party/PHPExcel/Cell.php on line 532
I've already use the setCachedMemoryMethod, tried using cache_in_memory_gzip, phpTemp and apc when writing the data to Excel File. Note that I changed from one method to another, not use them all at once.
//cache_in_memory_gzip
        $cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_in_memory_gzip;
        PHPExcel_Settings::setCacheStorageMethod($cacheMethod);        

//cache_to_phpTemp
        $cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp;
        $cacheSettings = array( 'memoryCacheSize' => '1024MB');
        PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);        

//cache_to_apc
        $cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_apc;
        $cacheSettings = array( 'cacheTime' => '600');
        PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);        
I'm not sure whether I'm using it correctly or not, but, I assumed that when I set the memoryCacheSize to 1024MB, it should allow the cache to be store up to 1024MB, which I'm quite frustrated when the Memory Exhausted still showing 512MB allowed. I've already changed my php.ini memory_limit to 1024MB, as well as APC shm_size to 1024MB (even though I doubt it has nothing to do if I choose phpTemp for the CacheStorageMethod). Did I miss somewhere else that need to be change as well?

Why does it still shows 512MB is the limit?
Mar 12, 2013 at 2:41 PM
You can check that the method initializes correctly by testing the returns of setCacheStorageMethod (true/false).
You can also verify that the memory_limit setting is taken into account (the reached 512 MB while the setting is set to 1024MB is curious) displaying or storing the results of an ini_get('memory_limit'). If you do not have the same value, somewhere a configuration file change the value, but you must be able to impose it in the script even (memory_limit is PHP_INI_ALL)
For the phpTemp, the memoryCacheSize is misleading, in fact this value corresponds to the memory that php has the right to use before writing to disk, so increasing it always generates problems.
In addition, if I read correctly the documentation, the value must be specified in bytes, nowhere is mention with a capacity of the wrapper to take into account the short notation (K, M, G), I know not how a value like 512 MB is interpreted, as well for the value that is used by default by Mark (1 MB), therefore, unless test this, I would try 2097152 (2 MB, the default value according to the php documentation - this same documentation where the example clearly shows the conversion of 5 MB in the equivalent in bytes for the passage of the parameter)