Chunk Reader Problem

Apr 25, 2012 at 10:35 AM
Edited Apr 25, 2012 at 10:52 AM

I have been attempting to use the well documented technique for reading a spreadsheet in chunks and I am having a time of it.  Clearly, I am missing something important in how to successfully implement this technique.  Basically, what I want to do is

  1. Open the spreadsheet and load a chunk
  2. Read a chunk
  3. Process each row in the chunk by auditing specific cells in the chunk.  As required, apply conditionals and insert notes
  4. At the end of each chunk processing:
    • Save the spreadsheet file,
    • disconnect the worksheets,
    • unset $objPHPExcel after processing all the rows in the chunk
  5. go back to 1 and repeat until done.

Here is the code starting with the well documented for loop: 

 

for ($startRow = 2; $startRow <= $highestRow; $startRow += $chunkSize) {    
// SET CHUNK READER
    $chunkFilter->setRows($startRow,$chunkSize);/**  Tell the Read Filter the limits on which rows we want to read this iteration  **/     
    $objPHPExcel = $objReader->load($theDIRandFileName);/**  Load only the rows that match the chunkfilter  **/ 
    echo '<<<<<<<<<<<<<<<<<<<reading chunk starting at row '.$startRow.'>>>>>>>>>>>>>>>>>>>>>>>>';
// For a specific $chunkSize of rows, initially starting in row 2, read each of the returned rows in the chunk and audit	 
    foreach ($objPHPExcel->getActiveSheet()->getRowIterator() as $aRow) { // I assume this is bounded by the chunkSize
	$row = $aRow->getRowIndex(); 
	/* Audit the Business Name column*/
	$getThis = 'B'.$row;
	$thiscell = $objPHPExcel->getActiveSheet()->getCell($getThis)->getValue();	
	if($thiscell == '' || $thiscell == ' ') {  
	        $errCount += 1;                   
		$rowFlag[$row] = false;          		
		$objPHPExcel->getActiveSheet()->getComment($getThis)->setAuthor($noteAuthor);
		$objPHPExcel->getActiveSheet()->getComment($getThis)->setWidth('200');
		$objPHPExcel->getActiveSheet()->getComment($getThis)->setHeight('150');
		$objCommentRichText = $objPHPExcel->getActiveSheet()->getComment($getThis)->getText()->createTextRun($noteAuthor);
		$objCommentRichText->getFont()->setBold(true);
		$objPHPExcel->getActiveSheet()->getComment($getThis)->getText()->createTextRun("\r\n");
		$objPHPExcel->getActiveSheet()->getComment($getThis)->getText()->createTextRun('Business Name cannot be blank.  Make sure the Quickbooks Business Name field on the Customer Additional Info tab has been selected.');	
		// Now apply corresponding conditional to format the cell.  
		$conditionalStyles = $objPHPExcel->getActiveSheet()->getStyle($getThis)->getConditionalStyles();
		array_push($conditionalStyles, $objConditional5);
		$objPHPExcel->getActiveSheet()->getStyle($getThis)->setConditionalStyles($conditionalStyles);
	} // end of audit Business Name

	/* lots more columns here.  They are all generally like the "B" column, above. */

	echo 'inside foreach. chunkSize=['.$chunkSize.'] Row=['.$row.'] startrow=['.$startRow.'] and Business Name=['.$thiscell.']<br>';	

    } // End of for-loop for a specific chunk of rows
    echo 'End of Chunk. <br>'; // <<<<<<<<<<<<<<<<<<<< This echo never fires.  Never gets out of the foreach loop.	
    $objWriter->save(str_replace('.php', '.xls', $theDIRandFileName));  // write as a simple Excel file, not latest version
    $objPHPExcel->disconnectWorksheets(); 
    unset($objPHPExcel);
} // End of Chunk Loop.  

My initial problem with the above is that the foreach loop is not bounded by the chunk size and goes forth and processes all the rows in the spreadsheet until it runs out of memory (evidence: the 'End of Chunk' never displays).  Its like the chunksize makes no difference to the processing of the foreach loop.

I have looked at this until my eyes went cross-eyed.  Can't see it.  What am I doing wrong?

Apr 27, 2012 at 7:01 PM
Edited Apr 27, 2012 at 7:03 PM

Here's a sample of how I do this:

Notice that I break out of the loop when the $RowId is greater than the $ChunkSize.

I think this is the solution you are looking for.

 

for ($StartRow = 1; $StartRow <= $WorksheetInfo['totalRows']; $StartRow += $ChunkSize) {

  $ReadFilter->setStartRow($StartRow);

  $SpreadsheetObj = $SpreadsheetReaderObj->load($FileName);
  $SpreadsheetObj->setActiveSheetIndex($WorksheetIndex);

  $WorksheetObj = $SpreadsheetObj->getActiveSheet();

  foreach ($WorksheetObj->getRowIterator() as $RowObj) {

    $RowId = $RowObj->getRowIndex();

    if ($RowId > $ChunkSize) {
      break;
    }

    $SheetRowId = $StartRow + $RowId - 1;

    if ($SheetRowId > $WorksheetInfo['totalRows']) {
      break;
    }

    $RecordId = "{$WorksheetId}-{$SheetRowId}";

    $this->data["{$RecordId}"] = array(); // array() represents column data.
  }
}
Apr 28, 2012 at 2:14 PM
Edited Apr 30, 2012 at 9:37 AM

Thanks for the code!

But my ultimate problem was (is) memory.  (bleh) This is what drove me to using the technique that Mark Baker has documented a bizzillion places, like this : http://phpexcel.codeplex.com/discussions/242712.  Read a chunk, then process that chunk, then free memory and go back to read chunks...

In my quest for a solution I tried something like what you documented but had various problems.  I actually have the chunk reader working now (code below), but now have a STINKY resultant problem:  dealing with interim saves of the initially read file:  double-blehhhh. 

The code, below loops just fine and reads all the rows in the spreadsheet.  (evidence is the echo statements display what is expected).

My problem is (a) I need to save the applied conditionals and cell notes to the cells, and (b) it needs to be saved onto the server so when the used downloads the spreadsheet they have oodles of information that they can use to update the source data. 

My solution for the moment is after each chunk unset($objPHPExcel) and then immediately execute--

    $objPHPExcel     = new PHPExcel();
    $inputFileType    = PHPExcel_IOFactory::identify($theDIRandFileName);       
    $objReader         = PHPExcel_IOFactory::createReader($inputFileType);       

to reestablish the read environment.  Seems to do that OK, but here is the deal:

Once completely done and all the annotated spreadsheet cells with colors and notes has been performed and saved to the server directory filename and directory (the cleverly named variable '$theDIRandFileName' in the code, below) the saved file completely blank spreadsheet.  Ohhh POOH.

By the way:  the annotations based upon conditionals and the application of cell notes works fine in the non-chunk-but-runs-out-of-memory-version.

Addendum 1 (30 April, 20 2012): I now seem to be in a catch 22.

My ultimate goal is to read the spreadsheet, audit it, and then write the audited spreadsheet with colored cells indicating classes of problems and inserted cell notes explaining what the problem is(are).  That part is working fine.  Its just the memory problem.

  • If I read in chunks, then before moving onto the next chunk of cells I need to save the revised cell contents.  But I can't do this because the chunk filter only makes available the rows specified for the chunk.
  • If I just do start-to-finish read/audit/apply conditionals to the spreadsheet rows in a for loop I run out of memory before I get to the last row.
  • If I use discISAM caching, it appears to work, memory wise, and I can read/audit/apply conditionals in one for loop (no chunks), BUT when I go to save the spreadsheet with all this neat stuff I get:

Fatal error: Allowed memory size of 18874368 bytes exhausted (tried to allocate 238592 bytes) in ... /functions/utilities/PHPExcel/Shared/XMLWriter.php on line 100

Basic and standard write at the end of the for loop:

$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel2007');
$objWriter->save(str_replace('.php', '.xlsx', $theDIRandFileName));  


 

 

// Determine the file type of the spreadsheet
Apr 30, 2012 at 6:27 PM
Edited Apr 30, 2012 at 6:28 PM

Well I can't help you with the writer portion of the issue. Fortunately my projects only have to deal with reading in the data of various spreadsheet files.

However, concerning the memory issue, I would read in all the data into a database table (a temporary one since you would need to create the structure based on the spreadsheet), and then close down the reader and then read back the data into the writer. Not sure if that would help all that much, but that's all I can offer at this point.

- Christopher Mullins