Memory problem to load

Topics: User Forum
Jan 21, 2011 at 8:19 AM

Hi everybody !!

I have a huge file that I would like to read so I can fill my mysql database. I tried to use the PHPExcel library but I get an error when I want to load my file :

Fatal error: Allowed memory size of 1610612736 bytes exhausted (tried to allocate 22 bytes) in C:\Wamp\www\Classes\PHPExcel\Worksheet.php on line 964

I have already increase the value of the memory_limit in the php.ini file but it's still not enough. My Excel file is 55MB (5 109 719 cells). Do you have any idea how to solve this problem ? Thanks :)

Coordinator
Jan 21, 2011 at 9:14 AM
On 21/01/2011 09:19, kaptainchok wrote:

I have a huge file that I would like to read so I can fill my mysql database. I tried to use the PHPExcel library but I get an error when I want to load my file :

I have already increase the value of the memory_limit in the php.ini file but it's still not enough. My Excel file is 55MB (5 109 719 cells). Do you have any idea how to solve this problem ? Thanks :)
There's plenty been written about the memory usage of PHPExcel here on the forum; so reading through some of those previous discussions might give you a few ideas. PHPExcel holds an "in memory" representation of a spreadsheet, and is susceptible to PHP memory limitations.

The "rule of thumb" that I've always used is an average of about 1k/cell, so your 5M cell workbook is going to require 5GB of memory. However, there are a number of ways that you can reduce that requirement. These can be combined, depending on exactly what information you need to access within your workbook, and what you want to do with it.

If you have multiple worksheets, but don't need to load all of them, then you can limit the worksheets that the Reader will load using the setLoadSheetsOnly() method.
To load a single named worksheet:
$inputFileType = 'Excel5'; 
$inputFileName = './sampleData/example1.xls'; 
$sheetname = 'Data Sheet #2'; 
/**  Create a new Reader of the type defined in $inputFileType  **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/**  Advise the Reader of which WorkSheets we want to load  **/ 
$objReader->setLoadSheetsOnly($sheetname); 
/**  Load $inputFileName to a PHPExcel Object  **/
$objPHPExcel = $objReader->load($inputFileName);

Or you can specify several worksheets with one call to
setLoadSheetsOnly() by passing an array of names:

$inputFileType = 'Excel5'; 
$inputFileName = './sampleData/example1.xls'; 
$sheetnames = array('Data Sheet #1','Data Sheet #3'); 
/** Create a new Reader of the type defined in $inputFileType **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/** Advise the Reader of which WorkSheets we want to load **/
$objReader->setLoadSheetsOnly($sheetnames); 
/**  Load $inputFileName to a PHPExcel Object  **/
$objPHPExcel = $objReader->load($inputFileName);


If you only need to access part of a worksheet, then you can define a Read Filter to identify just which cells you actually want to load:

$inputFileType = 'Excel5'; 
$inputFileName = './sampleData/example1.xls'; 
$sheetname = 'Data Sheet #3'; 
/**  Define a Read Filter class implementing PHPExcel_Reader_IReadFilter  */ 
class MyReadFilter implements PHPExcel_Reader_IReadFilter 
{ 
    public function readCell($column, $row, $worksheetName = '') { 
        //  Read rows 1 to 7 and columns A to E only 
        if ($row >= 1 && $row <= 7) { 
            if (in_array($column,range('A','E'))) { 
                return true; 
            } 
        } 
        return false; 
    } 
} 
/**  Create an Instance of our Read Filter  **/ 
$filterSubset = new MyReadFilter(); 
/** Create a new Reader of the type defined in $inputFileType **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/**  Advise the Reader of which WorkSheets we want to load 
     It's more efficient to limit sheet loading in this manner rather than coding it into a Read Filter  **/ 
$objReader->setLoadSheetsOnly($sheetname); 
echo 'Loading Sheet using filter
'
;
/**  Tell the Reader that we want to use the Read Filter that we've Instantiated  **/ 
$objReader->setReadFilter($filterSubset); 
/**  Load only the rows and columns that match our filter from $inputFileName to a PHPExcel Object  **/
$objPHPExcel = $objReader->load($inputFileName);

Using read filters, you can also read a workbook in "chunks", so that only a single chunk is memory-resident at any one time:

$inputFileType = 'Excel5'; 
$inputFileName = './sampleData/example2.xls'; 
/**  Define a Read Filter class implementing PHPExcel_Reader_IReadFilter  */ 
class chunkReadFilter implements PHPExcel_Reader_IReadFilter 
{ 
    private $_startRow = 0; 
    private $_endRow = 0; 
    /**  Set the list of rows that we want to read  */ 
    public function setRows($startRow, $chunkSize) { 
        $this->_startRow    = $startRow; 
        $this->_endRow      = $startRow + $chunkSize; 
    } 
    public function readCell($column, $row, $worksheetName = '') { 
        //  Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow 
        if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) { 
            return true; 
        } 
        return false; 
    } 
} 
/**  Create a new Reader of the type defined in $inputFileType  **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/**  Define how many rows we want to read for each "chunk"  **/ 
$chunkSize = 20; 
/**  Create a new Instance of our Read Filter  **/ 
$chunkFilter = new chunkReadFilter(); 
/**  Tell the Reader that we want to use the Read Filter that we've Instantiated  **/ 
$objReader->setReadFilter($chunkFilter); 
/**  Loop to read our worksheet in "chunk size" blocks  **/ 
/**  $startRow is set to 2 initially because we always read the headings in row #1  **/
for ($startRow = 2; $startRow <= 65536; $startRow += $chunkSize) { 
    /**  Tell the Read Filter, the limits on which rows we want to read this iteration  **/ 
    $chunkFilter->setRows($startRow,$chunkSize); 
    /**  Load only the rows that match our filter from $inputFileName to a PHPExcel Object  **/ 
    $objPHPExcel = $objReader->load($inputFileName); 
    //    Do some processing here 

    //    Free up some of the memory 

    $objPHPExcel->disconnectWorksheets(); 
    unset($objPHPExcel); 
}


If you don't need to load formatting information, but only the worksheet data, then the setReadDataOnly() method will tell the reader only to load cell values, ignoring any cell formatting:

$inputFileType = 'Excel5';
$inputFileName = './sampleData/example1.xls';
/** Create a new Reader of the type defined in $inputFileType **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/** Advise the Reader that we only want to load cell data, not formatting **/
$objReader->setReadDataOnly(true);
/**  Load $inputFileName to a PHPExcel Object  **/
$objPHPExcel = $objReader->load($inputFileName);


Use cell caching. This is a method for reducing the PHP memory that is required for each cell, but at a cost in speed. It works by storing the cell objects in a compressed format, or outside of PHP's memory (eg. disk, APC, memcache)... but the more memory you save, the slower your scripts will execute. You can, however, reduce the memory required by each cell to about 300bytes, so your 5M cells would require about 1.4GB of PHP memory.

Cell caching is described in section 4.2.1 of the Developer Documentation



Coordinator
Jan 21, 2011 at 9:23 AM

Note that the 1k/cell and 300 bytes/cell figure above are estimates, based on a 32-bit server. If you're running on a 64-bit server, this adds an overhead of about 60%.

If you're writing large files, then double those figures.

Jan 21, 2011 at 9:47 AM

I could imagine that a php process requiring over 5GB would be a problem no a 32bit system ;)

Jan 24, 2011 at 2:31 AM

Thanks a lot for your answers !! I tried to implement some of the above solutions. Here is my code :

set_time_limit(0);

$inputFileType = 'Excel5';
$inputFileName = 'EUFRA_base_export.xls';
$sheetname = '_UL'; 
//$sheetname = '_AgCentre'; 
$chunkSize = 1; 


$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$chunkFilter = new chunkReadFilter(); 
$objReader->setReadFilter($chunkFilter);
$objReader->setReadDataOnly(true);
$objReader->setLoadSheetsOnly($sheetname);
echo 'start'; 
for ($startRow = 2; $startRow <= 10; $startRow += $chunkSize) 
{ 
	echo $startRow;
	echo("<br>");
	$chunkFilter->setRows($startRow,$chunkSize);
	$objPHPExcel = $objReader->load($inputFileName);
	echo('B'.$startRow);
	$value = $objPHPExcel->getActiveSheet()->getCell('B'.$startRow)->getValue();
	$objPHPExcel->disconnectWorksheets(); 
	unset($objPHPExcel);
	//echo $startRow;
	echo $value;
	echo("<br>");
}
echo 'end';

This works fine... except for the time processing. I just access to 10 records to display them and it longed 5 minutes. Is my code well written? Is there a way to improve the speed of processing?

Coordinator
Jan 24, 2011 at 8:50 AM
kaptainchok wrote:

Thanks a lot for your answers !! I tried to implement some of the above solutions. 

 This works fine... except for the time processing. I just access to 10 records to display them and it longed 5 minutes. Is my code well written? Is there a way to improve the speed of processing?

 I have to agree that it seems excessively slow, and I can't explain why... even my little netbook should process 10 records from a 5M cell worksheet lot faster than that. However, I wouldn't use a chunksize of 1, but would set the value to as high as I could without exceeding memory limits. For every "chunk" that's loaded, you're still reading the entire file; so you're reading and parsing 55MB every iteration of the chunk loop.... that will take time. You should try to minimize the number of iterations... read "chunks" of 100 rows at a time, or 500 rows at a time, as many as you can handle in memory.

If you remove the lines

   $objPHPExcel->disconnectWorksheets();
   unset($objPHPExcel);

you will get a degree of memory leakage, but it will also process the loop faster.

Jan 25, 2011 at 4:16 AM

It seems that in my case (very huge file), it should be better to use another tool... I will try with VBA :(  Anyway thanks for your help and congrats for the work you have done !!

Mar 9, 2012 at 7:55 AM
MarkBaker wrote:

For every "chunk" that's loaded, you're still reading the entire file; so you're reading and parsing 55MB every iteration of the chunk loop.... that will take time. You should try to minimize the number of iterations... read "chunks" of 100 rows at a time, or 500 rows at a time, as many as you can handle in memory.

As far as I understand, each time it parses the whole file to know where to read the needed rows. So it creates a certain map of the file.

So why should it create the map of the file each time? Coudn't it be cached? So it gets creates the cache only at the first pass, and then either uses the cache (if exists or passed to the method), or generates the new one. There can be an option to use the cache (and possibly a link to the cache file) or to generate the new one.

Can this be implemented?

Thanks.

Coordinator
Mar 9, 2012 at 8:25 AM
Edited Mar 9, 2012 at 8:25 AM

For each "chunk" it simply instantiates a new PHPExcel object, reads through the (entire) Excel file, and tests each cell read from the file against the read filter to determine whether to store it or discard it. If the choice is "discard" then no memory is used within the PHPExcel object for that cell, only in the reader. It doesn't create any "map" during this process.

First point to note is that readFilter is designed simply to allow reading of part of a worksheet rather than the entirety of that worksheet... it doesn't create any map, because that is not a part of its purpose which is simply to identify whether a cell should be read or not. Reading a file in chunks is simply a way of taking advantage of the existence of this filtering feature to process a large file in several stages rather than all in one go. The filter doesn't even know it's being used in a loop.

If you were to create such a map, it would require a substantial amount of memory.... almost exactly the amount of memory required as though you'd actually read in every cell... so there would be no real benefit, and a lot of drawbacks.

Mar 9, 2012 at 9:07 AM

Look what I want.

I want to read line, process it with my some tasks, read next line and so on.

This is how I work with a huge CSV. I have a class and here is the logic:
1. I read line by line and check if the time passed is not close to the time limit. So when reading a line I don't use much memory. I never handle more then a line in the memory.
2. If I'm close to the time limit, then I save the current file position to a class variable, store my class to a file (json encoded) and rerun my web-page with some parameters in QUERY, like "continue".
3. When my web-page page is loaded, it reads the QUERY and and if there is a key to continue, reads the stored class, decodes it (remember it knows the file position stored in the class) and reads the next line.

Time is not important in my case. Memory limit is anyway important, but only for one step.

Even with not important time you note above that chunksize=1 it extremly slow. I'm afraid of the work "extremly".

You suggest "However, I wouldn't use a chunksize of 1, but would set the value to as high as I could without exceeding memory limits". This can be a way, but demands some extra logic on determining the optimal chunksize. But the xml is a non-uniform and some rows can be small, others can be larger. One row is anyway within the limits, but a number of  rows - depends.

Any suggestion?

Thanks.

Coordinator
Mar 9, 2012 at 4:48 PM

I am aware of a wrapper class that uses PHPExcel, whos epurpose is to read files line by line. I've not used it, and don't have the link to hand.... I'll try and track it down later.

May 28, 2012 at 10:26 AM
Edited May 28, 2012 at 11:10 AM

Hi,

I used above code, it is working fine with .xls file. But I am facing an issue with .xlsx file. I need to get the total number of rows of sheet but I am unable to get it with .xlsx file. It ways return 1 total rows. When I removed chunkfilter code then it returns correct total rows. Following is code.

 

$ext = strtolower(array_pop(explode(".", $xlsFile)));
	
$objLib = ($ext == "xls") ? 'Excel5' : 'Excel2007';
	
// Cell caching to reduce memory usage.
$cacheMethod = PHPExcel_CachedObjectStorageFactory:: cache_to_phpTemp;
$cacheSettings = array( ' memoryCacheSize ' => '8MB');
PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);
	
$objReader = PHPExcel_IOFactory::createReader($objLib);
if (!$objReader->canRead($xlsFile)) {
	return false;
}
	
// Load only first sheet.
$sheets = $objReader->listWorksheetNames($xlsFile);
$objReader->setLoadSheetsOnly( array($sheets[0]) );
	
/**  Create a new Instance of our Read Filter  **/ 
$chunkFilter = new chunkReadFilter(); 
	
/**  Tell the Reader that we want to use the Read Filter that we've Instantiated  **/ 
// WHEN I COMMENTED FOLLOWING LINE THEN IT RETURNS CORRECT TOTAL ROWS.
$objReader->setReadFilter($chunkFilter);

$chunkSize = 100;

$chunkFilter->setRows(1, $chunkSize);
$objPHPExcel = $objReader->load($xlsFile);
$objPHPExcel->setActiveSheetIndex(0);
$objSheet = $objPHPExcel->getActiveSheet();

// get total rows
$totalRows = $objSheet->getHighestRow();

 

Can you please help me on this matter?

I also check by increasing or decreasing chunk size but i always got 1 total rows in case of .xlsx file. Please again note that there is no issue with .xls file. I also checked by setting read only data ($objReader->setReadDataOnly(true);)  but no difference.

Awaiting your reply.

Thanks.

Coordinator
May 28, 2012 at 8:38 PM

If you're setting a read filter, the highest row returned by getHighestRow() will normally be the highest row identified by that read filter.

You can get the value of the highest row before actually reading the file by using the Reader's listWorksheetInfo() method

May 29, 2012 at 5:50 AM

Thanks. It works. But in this way we need to load file two times. First without read filter & second time with read filter. Isn't take extra time to load file two times? If we have big file then it will take long time.

Coordinator
May 29, 2012 at 6:28 AM

Why do you need to load twice? If you need to load the entire file a first time without the read filter, why reload a second time?

May 29, 2012 at 8:29 AM

Because I need to read file with read filter otherwise out of memory error may occur against large files. Thanks.

Coordinator
May 29, 2012 at 11:29 AM

But you say you're already loading the entire file once already, before you reread it using the read filter.... I'm clearly missing something in what you're trying to do because I can't see why you should need to load it twice... once without the read filter ("First without read filter") the second time with ("second time with read filter")?

I'm kinda stupid, so please explain... If you've already loaded the file once, why reload it?

May 29, 2012 at 1:51 PM
Edited May 30, 2012 at 2:04 PM

Hi MarkBaker,

Thanks for your kind help. I understand, there is no need to reload the file. Thanks.

Actually I am facing "out of memory" issue on writing when I process large file. I have a file of near about 4MB having 55308 rows & 16 columns. I am processing this file on 64 bit Windows Server 2008 with 8GB RAM. I assigned memory limit up to 6GB for PHP script.

Following is flow of the program.

Step 1: Verify Header Columns & Reset Styles
In this step, program verify the header row & reset background colors and borders of all cells. First program verifies the columns heading, if required column headings are missing on the sheet then program automatically adds these new columns on appropriate columns indexed & reset the styles. Then program saves this file.

When I process above file of 4MB, I got the following error.

Fatal error: Out of memory (allocated 1431568384) (tried to allocate 58 bytes) in C:\WAMP\www\PHPExcel-1.7.7\Classes\PHPExcel\Worksheet.php on line 1161

Please check following image to see system memory status before processing of this file.

http://s7.postimage.org/jbkhppqvv/Before.png

Following image is taken when file is in-process & this is peak memory status.

http://s17.postimage.org/f5cyw7rf3/In_process.png

As you can notices, there are lot of free memory available but program still generate out of memory error.

Following is sample code.

 

$xlsFile = "test.xls";
$ext = strtolower(array_pop(explode(".", $xlsFile)));

$objLib = ($ext == "xls") ? 'Excel5' : 'Excel2007';

$objReader = PHPExcel_IOFactory::createReader($objLib);
if (!$objReader->canRead($xlsFile)) {
	return false;
}

// Load only first sheet.
$sheets = $objReader->listWorksheetNames($xlsFile);
$objReader->setLoadSheetsOnly( array($sheets[0]) );

$objPHPExcel = $objReader->load($xlsFile);
$objPHPExcel->setActiveSheetIndex(0);
$objSheet = $objPHPExcel->getActiveSheet();

$total_rows = $objSheet->getHighestRow();
$total_cols = PHPExcel_Cell::columnIndexFromString($objSheet->getHighestColumn());

// reset background color and borders of all cells
$row_range = "A2:" . PHPExcel_Cell::stringFromColumnIndex($total_cols) . $total_rows;
$objSheet->getStyle($row_range)->getFill()->setFillType(PHPExcel_Style_Fill::FILL_NONE);
$objSheet->getStyle($row_range)->getBorders()->getAllBorders()->setBorderStyle(PHPExcel_Style_Border::BORDER_NONE);

// Call custom function to read row.
$arrRow = readRow($objSheet, 1, $total_cols);

// Style
$stl_green = array(
				'fill' => array(
							'type'	=> PHPExcel_Style_Fill::FILL_SOLID,
							'startcolor' => array('rgb' => '99FFCC'),
							'endcolor' => array('rgb' => '99FFCC'),
						),
			);

// Check missing required columns & add new cols if not exist
$arrReqCols = array("Req Col 1", "Req Col 2", "Req Col 3", "Req Col 4");
$newFlag = false;
$missingFlag = false;
foreach ($arrReqCols as $key => $value) {
	if (!in_array($value, $arrRow)){
		$objSheet->insertNewColumnBeforeByIndex(0);
		$objSheet->setCellValueByColumnAndRow(0, 1, $value);
		$objSheet->getStyle("A1:A$total_rows")->applyFromArray($stl_green);
		$objSheet->getColumnDimension(PHPExcel_Cell::stringFromColumnIndex(0))->setAutoSize(true);
		$total_cols++;
		$newFlag = true;
		$missingFlag = true;
	}
}

// Add some new cols if not exist
$arrNewCols = array("Test Col 1", "Test Col 2");
foreach ($arrNewCols as $key => $value) {
	if (!in_array($value, $arrRow)){
		$objSheet->insertNewColumnBeforeByIndex(0);
		$objSheet->setCellValueByColumnAndRow(0, 1, $value);
		$objSheet->getStyle("A1:A$total_rows")->applyFromArray($stl_green);
		$objSheet->getColumnDimension(PHPExcel_Cell::stringFromColumnIndex(0))->setAutoSize(true);
		$total_cols++;
		$newFlag = true;
	}
}	

if ($newFlag) {
	// Save File
	$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, $objLib);		
	$objSheet->unfreezePane();
	$objWriter->save($xlsFile);
}

if($missingFlag){
	//let the user download the file
	echo "<span class='error'>Please download <a href='{$xlsFile}' target='_blank'>this file</a> with missing columns added to it.</span>";
	
}else{
	// Perform Step 2
}


Can you please let me know how I can process this file with above code?

Step 2: Reading Data.
In this step, program reload the saved file & read all the data & store into database. Memory issue resolved from here by using above technique.

 

Step 3: Export Sheet
In this step, program performs some calculations & update data on database. To export this calculated data into new sheet, program takes long time on writing & at the end out of memory error generated. 

Please let me know how to resolve this issue. Is there any way we write data into multiple files & at the end merge these files into one file?

Awaiting your response.

Many Thanks.

Coordinator
May 29, 2012 at 7:57 PM

>> I assigned memory limit up to 6GB for PHP script.
>> Fatal error: Out of memory (allocated 1431568384) (tried to allocate 58 bytes) in C:\WAMP\www\PHPExcel-1.7.7\Classes\PHPExcel\Worksheet.php on line 1161

1,431,568,384 doesn't equate to a limit of 6GB PHP memory limit, it's about 1.33GB - double check how much memory PHP is really set to use.

If you're running 32-bit PHP, you have an upper limit of 4GB, irrespective of the 64-bit Operating System. My "rule of thumb" is about 1k/cell on 32-bit PHP, 1.6k/cell on 64-bit PHP.

Rather than saving the file then reloading and doing the additional processing, simply loop through the worksheet a second time doing your database update, and only save the file and present it to the user at the very end of the script once all processing has been finished.

Consider using cell caching to save memory.

 

May 30, 2012 at 2:03 PM

1. Yes, I have rechecked memory limit is 6GB. I donot know why it stops on 1.33GB. Is it restricted on PHPExcel library?

2. If 1.6k/cell then this file should be processed within 1.36GB.

3. Actually after storing data into database, there are lot of operations performed on this data & lot of reports & analysis generated from this data. User has also ability to perform many operations. So due to large number of operations, I needed to store into database. Program also populate many other data from this data. So on exporting step, program export this data on different ways depends on the user's selection. 55308 rows may increase upto 3 times. That's why data stored on the database & program creates new excel file on export. The file is exported with formatting, every cell or every row may contains different styles.
This all functionality is working fine with small file. The only issue occurred with large file on writing.

4. I used cell caching & I got following error.

Fatal error: Call to a member function getCellCacheController() on a non-object in C:\WAMP\www\PHPExcel-1.7.7\Classes\PHPExcel\Cell.php on line 108

This error occurred when we reset background color and borders of all cells. If I commented this code then it worked fine.

I also analyzed that reset background color & borders take more resources. When I commented this code & enable cell caching then I were able to process this file but it took long time. But reset background color & borders functionality is also necessary. So I need to process this file with this functionality. Please let me know further about this. Thanks.

5. Is there any way we write data into multiple files & at the end merge these files into one file?

Awaiting your response.

Thanks,

Coordinator
May 30, 2012 at 8:18 PM
sohailisonline wrote:

1. Yes, I have rechecked memory limit is 6GB. I donot know why it stops on 1.33GB. Is it restricted on PHPExcel library?


PHPExcel does nothing to restrict the size of files that it can process, other then applying restrictions of columns and rows as per Excel itself within the writers. The only memory limitations and execution time limitations are those imposed by PHP itself.

And I'm still confused by the figures in your error message.

 

sohailisonline wrote:

2. If 1.6k/cell then this file should be processed within 1.36GB.

 

This is a rule of thumb for maintaining the workbook in memory. Using a writer almost doubles that, but still well within your 6GB.

 

 

4. I used cell caching & I got following error.

Fatal error: Call to a member function getCellCacheController() on a non-object in C:\WAMP\www\PHPExcel-1.7.7\Classes\PHPExcel\Cell.php on line 108

This error occurred when we reset background color and borders of all cells . If I commented this code then it worked fine.

I also analyzed that reset background color & borders take more resources. When I commented this code & enable cell caching then I were able to process this file but it took long time. But reset background color & borders functionality is also necessary. So I need to process this file with this functionality. Please let me know further about this. Thanks.

Styling does take more resources.

Can you provide more details of the cell caching you're using. If there's problems with one or other of the caching mechanisms, it would be useful to know which one, otherwise I have to check 3000 lines of code for potential errors rather than a couple of hundred.

 

sohailisonline wrote:

5. Is there any way we write data into multiple files & at the end merge these files into one file?

 You can create multiple files, but you'd still need to merge them, and that would use all the memory required for the fully merged PHPExcel object, as well as for (at least) the last file you loaded, so it would take even more memory.

 

May 31, 2012 at 5:47 AM
Edited May 31, 2012 at 5:55 AM

I just added following cell caching code before the above sample code.

// Cell caching to reduce memory usage.
$cacheMethod = PHPExcel_CachedObjectStorageFactory:: cache_to_phpTemp;
$cacheSettings = array( ' memoryCacheSize ' => '8MB');
PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);

If I commented following code then it works fine else it generates fatal error (Call to a member function getCellCacheController() on a non-object).

// reset background color and borders of all cells
$row_range = "A2:" . PHPExcel_Cell::stringFromColumnIndex($total_cols) . $total_rows;
$objSheet->getStyle($row_range)->getFill()->setFillType(PHPExcel_Style_Fill::FILL_NONE);
$objSheet->getStyle($row_range)->getBorders()->getAllBorders()->setBorderStyle(PHPExcel_Style_Border::BORDER_NONE);
Many Thanks.

Jun 2, 2012 at 5:53 AM

Hi MarkBaker,

Did you able to find the solution of above issue?

Awaiting your response.

Thanks.

Jun 2, 2012 at 10:24 AM
Edited Jun 2, 2012 at 12:06 PM

Hi MarkBaker,

I just found that above issue occurred when we provide last row in range to reset the background color & borders. For example if we have total 50 rows in the sheet & we provide range like "A1:AB50" then above cache error would occur. If we provide range one row less like "A1:AB51" then it would work fine.

So I have edited above code as following & issue resolved.

for($i = 0; $i < 1; $i++){
	if($i == 0){
		$row_range = "A2:" . PHPExcel_Cell::stringFromColumnIndex($total_cols) . ($total_rows - 1);
	}else{
		$row_range = "A".$total_rows.":" . PHPExcel_Cell::stringFromColumnIndex($total_cols) . $total_rows;
	}
	$objSheet->getStyle($row_range)->getFill()->setFillType(PHPExcel_Style_Fill::FILL_NONE);
	$objSheet->getStyle($row_range)->getBorders()->getAllBorders()->setBorderStyle(PHPExcel_Style_Border::BORDER_NONE);
}

I think, this is a bug & should be fixed on the library.

Thanks,

Jun 23, 2012 at 11:34 AM

Hi MarkBaker,

I am facing an issue with .xlsx file. I have another large .xlsx file of 13MB, having 50,000 rows & 60 columns. Memory limit is 7GB. When I process this file with above code then I got following warning message. After this, program unable to read cell values. It always return empty cell value.

Warning: simplexml_load_string(): Memory allocation failed : growing buffer in PHPExcel-1.7.7\Classes\PHPExcel\Reader\Excel2007.php on line 760

Then I convert this file in .xls format & process again, then it worked fine.

Can you please let me know what is the issue with .xlsx format while there was lot of free memory available?

Thanks,

Coordinator
Jun 23, 2012 at 6:09 PM

The problem with the SimpleXML Parser used by PHPExcel's readers (for the XML based formats such as .xlsx files) is that it loads the entire file into memory for parsing, though I'm surprised that 7GB of memory can't handle the file. I assume you're using a 64-bit version of PHP (otherwise you're limited to 4GB), and this is about 60% more greedy in memory usage than 32-bit PHP.

The roadmap for this year includes plans to switch to using a pull parser (XMLReader) which doesn't load the entire file into memory, only those elements that are explicitly requested, as and when they are needed, and the elements are removed from memory when no longer being accessed.

Coordinator
Jun 29, 2012 at 9:05 PM
Edited Jun 30, 2012 at 9:28 AM

I've been doing some experimentation with SimpleXML and XMLReader, including testing different methods of extracting XML directly from zip files to pass to the XML parser.

Testing the following techniques:

getFromName() read string to simpleXML

(This is the method currently used)

$z1 = new ZipArchive;
if ($z1->open($testWorkBook)) {
    $xmlSheet = simplexml_load_string($z1->getFromName($testFileStream));
}

getStream() and stream_get_contents() read string to simpleXML

$z2 = new ZipArchive();
if ($z2->open($testWorkBook)) {
    $fp = $z2->getStream($testFileStream);
    if(!$fp) {
        echo "failed to retrieve test stream from zip file<br />";
    } else {
        $xmlSheet = simplexml_load_string(stream_get_contents($fp));
        fclose($fp);
    }
}

getStream() and fread() read string to simpleXML

$z3 = new ZipArchive();
if ($z3->open($testWorkBook)) {
    $fp = $z3->getStream($testFileStream);
    if(!$fp) {
        die("failed to retrieve test stream from zip file");
    } else {
        while (!feof($fp)) {
            $contents .= fread($fp,8192);
        }
        fclose($fp);
        $xmlSheet = simplexml_load_string($contents);
        unset($contents);
    }
} else {
    die('Unable to open zipped file');
}

getFromName() read string to XMLReader

$z4 = new ZipArchive;
$x4 = new XMLReader();
if ($z4->open($testWorkBook)) {
    $result = $x4->xml($z4->getFromName($testFileStream));
} else {
    die('Unable to open zipped file');
}

getStream() and stream_get_contents() read string to XMLReader

$z5 = new ZipArchive();
$x5 = new XMLReader();
if ($z5->open($testWorkBook)) {
    $fp = $z5->getStream($testFileStream);
    if(!$fp) {
        die("failed to retrieve test stream from zip file");
    } else {
        $result = $x5->xml(stream_get_contents($fp));
        fclose($fp);
    }
} else {
    die('Unable to open zipped file');
}

getStream() and fread() read string to XMLReader

$z6 = new ZipArchive();
$contents = '';
$x6 = new XMLReader();
if ($z6->open($testWorkBook)) {
    $fp = $z6->getStream($testFileStream);
    if(!$fp) {
        die("failed to retrieve test stream from zip file");
    } else {
        while (!feof($fp)) {
            $contents .= fread($fp,8192);
        }
        fclose($fp);
        $result = $x6->xml($contents);
    }
    unset($contents);
} else {
    die('Unable to open zipped file');
}

open() zip stream directly using XMLReader

$x7 = new XMLReader();
if (!($result = $x7->open('zip://'.dirname(__FILE__).'/'.$testWorkBook.'#'.$testFileStream))) {
    die('Unable to open zipped stream');
}

Then I have two different parsing methods, one using SimpleXML as at present, the second using XMLReader. Working with a simple worksheet  of 4096x16. For the reader techniques, I get the following results:

 

  Data Extract and Load Time (s) Memory after Extract and Load Memory after Parse Peak Memory Usage   Total Time (s)
getFromName() read string to simpleXML 0.5349          524,288 1,048,576        4,980,736   0.8330
getStream() and stream_get_contents() read string to simpleXML 0.5874          524,288 1,048,576        4,980,736   0.8855
getStream() and fread() read string to simpleXML 0.5888          524,288 1,048,576        4,980,736   0.8869
getFromName() read string to XMLReader 0.0348          524,288      524,288        4,980,736   1.3534
getStream() and stream_get_contents() read string to XMLReader 0.0711          524,288      524,288        4,980,736   1.3896
getStream() and fread() read string to XMLReader 0.0824          524,288      524,288        4,980,736   1.4009
open() zip stream directly using XMLReader 0.0009          524,288      524,288          524,288   1.3194

Clearly streaming the xml directly from the zip file to XMLReader is incredibly fast, and eliminates all the memory overheads of the extraction, or of SimpleXML. Parsing using XMLReader is slower than SimpleXML; but overall well worth the effort for the memory savings, but at a performance cost. It's the old trade-off... I'd expected XMLReader to be slower than SimpleXML, but not quite as slow as that.

I'm looking at a possible hybrid method using XMLReader to locate the nodes and then SimpleXML to parse them, which may or may not improve the XMLReader method speed... initial experiments suggest that it's even slower than straight XMLReader.

The real drawback: it adds more complexity to the code, and it's a significant effort to convert the code to use these methods. This isn't something that can be done for the next release, but I will target at least the Excel2007 (.xlsx) Reader for the 1.7.9 release, with the other zipped-xml readers before the end of the year. Rather than a simple replacement for the existing readers, I'll try to make it configurable, offering the choice of SimpleXML or XMLReader.

 

 

 

Jul 17, 2012 at 2:49 AM

Mark,

PHPExcel plays an important role in an open source framework I'm working on. (thedatatank.com, https://github.com/iRail/The-DataTank)

A streaming feature would be of great use for us.

I don't know the xlsx format internally but would it be possible to get a list of available sheets and named ranges before you start reading.

And then define a sheet, named range or cell range and start streamreading the xlsx file based on that information?

Would this also be possible on the old xls format?

Currently we only use PHPExcel to read excel files but we need to include the full library in our project. Would it be possible to disconnect the Reader part as a seperate library or download?

Thanks,

Lieven Janssen

Coordinator
Jul 17, 2012 at 6:47 AM

It already is possible to get the list of worksheets without loading the entire file using the listWorksheetNames() or listWorksheetInfo() methods of the Readers. listWorksheetNames() returns just an array of the actual worksheet names, while listWorksheetInfo() returns the size of each worksheet as well. It is also possible to define a sheet (or sheets) to load rather than to load all worksheets using the Reader's setLoadSheetsOnly() method, and to specify a range of cells (rather than all cells) to read using a Read Filter. These options are already available for all Readers (xlsx, xls, ods, csv, gnumeric, etc.

 

Jul 17, 2012 at 7:07 AM

Woow thanks for the prompt reply!

Is it  possible to get the named ranges as well without loading the entire file?

Is the file streaming functionality still foreseen to be implemented in version 1.7.9?

Coordinator
Jul 17, 2012 at 5:02 PM

The listWorksheetNames() and listWorksheetInfo() methods don't return anything more than I've indicated; and while named ranges would be possible, it's probably not a requirement that a majority of users would consider necessary (especially as it would slow down those methods). The main benefit would be that you could configure a read filter based on the value of the named range, so I can add it to the TODO list, but it won't be a high priority.

Streaming is still likely to be version 1.7.9 (probably somewhere around November/December). Because it's  trade-off between speed/memory, I need to work out how to make it configurable so that people who want speed and aren't too worried about memory can still use the simpleXML Readers, while those who need the memory and are less worried about speed can use the XMLReader-based Reader... without giving myself too much duplication to maintain between two versions of the Readers.

I'm also starting work on some changes to the cell definition to handle array formulae, which entails deprecating some cell methods, so that will be staged across several releases (culminating in a complete rewrite of the calculation engine) which may be the one thing that delays the XMLReader alternative to SimpleXML - especially as I'll need to modify every single reader/writer that needs to support array formulae to handle this... I want to get that change complete before I branch the XML-based readers into SimpleXML/XMLReader variants. If my timing is right, both those changes will be in 1.7.9, but it might push November into December.

Coordinator
Jan 14, 2013 at 9:11 PM
Edited Jan 14, 2013 at 9:21 PM

Some initial details of using XMLReader with file streaming based on its implementation for the listWorksheetNames() and listWorksheetInfo() methods.

  Version 1.7.8 Current Development Code
  listWorksheetNames() listWorksheetInfo() listWorksheetNames() listWorksheetInfo()
Format Time (s) Memory after call (MB) Peak memory usage (MB) Time (s) Memory after call (MB) Peak memory usage (MB) Time (s) Memory after call (MB) Peak memory usage (MB) Time (s) Memory after call (MB) Peak memory usage (MB)
Excel 2007 .xlsx 0.0114 1.25 1.25 4.1998 2.50 8.75 0.0113 1.25 1.25 3.2238 1.25 1.25
Open/Libre Office .ods 0.1404 0.50 5.75 0.5296 0.50 5.75 0.1194 0.50 0.50 0.2543 0.50 0.50
Gnumeric 3.4176 1.00 38.75 5.2841 1.00 38.75 0.0077 0.75 0.75 3.1676 0.75 0.75

Testing was done against a relatively small spreadsheet, comprising 2 worksheets, each with 16370 rows by 9 columns.

While the main load() code won't be as performant as these methods, I hope that the peak memory savings avoiding loading the file itself into memory will be every bit as good.

I'm still hoping that I can get at least the Excel2007 Reader converted to working with this method in time for the 1.7.9 release.

Jan 15, 2013 at 7:48 AM
15.01.13 00:12, MarkBaker написав(ла):
> From: MarkBaker
>
> Some initial details of using XMLReader with file streaming based on its
> implementation for the listWorksheetNames() and listWorksheetInfo() methods.
>
> Version 1.7.8 Current Development Code
> listWorksheetNames() listWorksheetInfo() listWorksheetNames()
> listWorksheetInfo()
> Format Time (s) Memory after call (MB) Peak memory usage (MB) Time
> (s) Memory after call (MB) Peak memory usage (MB) Time (s) Memory
> after call (MB) Peak memory usage (MB) Time (s) Memory after call
> (MB) Peak memory usage (MB)
> *Excel 2007 .xlsx* 0.0114 1.25 1.25 4.1998 2.50 8.75 0.0113
> 1.25 1.25 3.2238 1.25 1.25
> *Open/Libre Office .ods* 0.1404 0.50 5.75 0.5296 0.50 5.75
> 0.1194 0.50 0.50 0.2543 0.50 0.50
> *Gnumeric* 3.4176 1.00 38.75 5.2841 1.00 38.75 0.0077 0.75
> 0.75 3.1676 0.75 0.75
>
> Testing was done against a relatively small spreadsheet, comprising 2
> worksheets, each with 16370 rows by 9 columns.
>
> While the main load() code won't be as performant as these methods, I
> hope that the peak memory savings avoiding loading the file itself into
> memory will be every bit as good.
>
> I'm still hoping that I can get at least the Excel2007 Reader converted
> to working with this method in time for the 1.7.9 release.
>
> Read the full discussion online
> <http://phpexcel.codeplex.com/discussions/242712#post986723>.
>
> To add a post to this discussion, reply to this email
> ([email removed]
> <mailto:[email removed]?subject=[PHPExcel:242712]>)
>
> To start a new discussion for this project, email
> [email removed] <mailto:[email removed]>
>
> You are receiving this email because you subscribed to this discussion
> on CodePlex. You can unsubscribe
> <https://phpexcel.codeplex.com/discussions/242712/unsubscribe/> on
> CodePlex.com.
>
> Please note: Images and attachments will be removed from emails. Any
> posts to this discussion will also be available online at CodePlex.com
>

Really cool! Thanks!