Again memory

Topics: Developer Forum, User Forum
Oct 21, 2008 at 5:21 PM
Edited Oct 21, 2008 at 5:32 PM

I've written a small app for my boss that takes 7 or 8 xls files and outputs the contents to a csv file.
The data in the files is just plain text fields ...
The script uses over 10 gigs of RAM to read the 50MB excel file (8GB physical, 2GB swap) and drags the server down to a crawl.

Is there anything that can be done about this? A previous poster has pointed out it seems the scripts do not release the memory used for the objects. I second this.
Watching the top output while the script runs shows no decrease in memory usage between the reads of the different xls files even thought the variables are unset after each file.

After the first file
13:26:46 Peak memory usage: 810.75 MB
Second
13:27:27 Peak memory usage: 2779.5 MB
Third
13:27:59 Peak memory usage: 4330.25 MB
Fourth
13:29:19 Peak memory usage: 8081.75 MB

And right about there is where I get this in top
Cpu(s):  0.0%us, 72.7%sy,  0.0%ni,  0.0%id, 27.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8018092k total,  8003476k used,    14616k free,      268k buffers
Swap:  2031608k total,  2031608k used,        0k free,     5480k cached

 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  285 root      10  -5     0    0    0 D   40  0.0  41:19.61 kswapd0
  286 root      10  -5     0    0    0 D   27  0.0  52:08.42 kswapd1
 6375 root      18   0  9.8g 7.6g 4568 R   22 98.9   3:40.79 php


More Details ...
The output of program is like this (I have to delimit the fields using ~!~ because of commas in the name fields):
"Marketing Division"~!~"Clark,Ranjana B"~!~""~!~""~!~""~!~""~!~""~!~"Marketing Division"~!~"Clark,Ranjana B"~!~"a020253"~!~"Covington, Debra"~!~"A"~!~"F"~!~"I"~!~"I"~!~"NR"~!~"NR"~!~"I"~!~"NR"~!~"NR"~!~"NR"~!~"NR"~!~"I"~!~"NR"~!~"Incomplete"~!~"Y"
XLS File sizes are:
-rwxr--r-- 1 webdev webdev 27252736 Oct 21 12:37 CapitalManagement1.xls
-rwxr--r-- 1 webdev webdev  5411328 Oct 21 12:37 CorporateInvestmentBanking.xls
-rwxr--r-- 1 webdev webdev 21814272 Oct 21 12:37 Finance.xls
-rwxr--r-- 1 webdev webdev 50259456 Oct 21 12:37 GeneralBankingGroup.xls
-rwxr--r-- 1 webdev webdev  3201024 Oct 21 12:37 HRCorporateRelations.xls
-rwxr--r-- 1 webdev webdev  1312256 Oct 21 12:37 Legal.xls
-rwxr--r-- 1 webdev webdev   753152 Oct 21 12:37 Marketing1.xls
-rwxr--r-- 1 webdev webdev  1761280 Oct 21 12:37 RiskManagement.xls
-rwxr--r-- 1 webdev webdev  3918336 Oct 21 12:37 WealthManagement.xls

Script is below...
<?php
error_reporting(E_ALL);
ini_set('memory_limit', '2048M');
set_include_path(get_include_path() . PATH_SEPARATOR . './Classes/');
include 'PHPExcel/IOFactory.php';
$excelfiles[] = "Marketing1.xls";
$excelfiles[] = "RiskManagement.xls";
$excelfiles[] = "Legal.xls";
$excelfiles[] = "HRCorporateRelations.xls";
$excelfiles[] = "GeneralBankingGroup.xls";
$excelfiles[] = "Finance.xls";
$excelfiles[] = "CorporateInvestmentBanking.xls";
$excelfiles[] = "CapitalManagement1.xls";
$excelfiles[] = "WealthManagement.xls";
foreach ($excelfiles as $excelfile) {
    $objReader = PHPExcel_IOFactory::createReader('Excel5');
    $objPHPExcel = $objReader->load( $excelfile );

// Get the First and second Sheets for ($s = 0; $s <= 1; $s++) { $objPHPExcel->setActiveSheetIndex( $s ); $maxrows = $objPHPExcel->getActiveSheet()->getHighestRow(); $maxcols = $objPHPExcel->getActiveSheet()->getHighestColumn();
// Start at ROW 3 continue to end for ($i = 3; $i <= $maxrows ; $i++) {
// Get COLUMNS A-Z for ($j = 0; $j <= 25 ; $j ++) {
$cell = $j . $i ; $celldata = $objPHPExcel->getActiveSheet()->getCellByColumnAndRow( $j , $i )->Getcalculatedvalue(); if ($j == "25") { echo "\"". addslashes($celldata) . "\"" ; } else { echo "\"". addslashes($celldata) . "\"~!~" ; } } echo "\n"; } } unset($objReader); unset($objPHPExcel); } ?>
Developer
Oct 22, 2008 at 1:43 AM
Edited Oct 22, 2008 at 1:45 AM
This is unfortunately a known problem. As far as I understand this is really a problem with PHP rather than with PHPExcel.

There are circular object references between some objects so that unsetting the PHPExcel object will not free memory due to this bug:
http://bugs.php.net/bug.php?id=33595
http://paul-m-jones.com/?p=262

Looking at the first link, it seems that PHP 5.3 will fix this. However, I would like to see a confirmation of this somewhere.

This problem was also noted by whisper here
http://www.codeplex.com/PHPExcel/Thread/View.aspx?ThreadId=18404
He proposes Destroy methods for breaking circular references which will enable one to get rid of the PHPExcel object while freeing the memory.
Not sure about what came out of it. At the moment, it may be better to just wait and see if PHP 5.3 fixes the issue.

In your situation, I would try to split the job over multiple scripts each handling only one workbook. Memory should be released as each script finishes.