Saving Memory Reading File

Topics: User Forum
Mar 25, 2012 at 4:39 AM

I have a 25M xls file (22000 rows & 44 columns) that I am trying to convert into csv.  Just doing a simple read of the file is using in excess of 492M of memory and my server automatically kills the script.

First I attempted using the "chunk" method and I do not know if I set it up correctly or not but it seemed pretty useless.  I went as low as 25 rows for the chunk and the script was still being killed.

So since I only need 6 columns from the file, I thought I would try the code for specific rows and columns.  The only problem with this is the 6 columns I need are C, D, I, J, K, T.  As you can see, the columns are not consecutive.  Is it possible to retrieve columns that are not consecutive?

If I get this far, I am still going to need to save memory somewhere because if I just try the range('I','K'), it is still resource intensive and killing the script.

If someone could help with the various columns and get the chunk method working correctly, I would greatly appreciate it.

	require_once '/Classes/PHPExcel.php';
	
	fn_echo('Create new PHPExcel object..<br />');
	$objPHPExcel = new PHPExcel();

	fn_echo('Load temp.xls file..<br />');
	$inputFileName = '/temp.xls';
	$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
	$sheetname = 'gsprods';
	
	class MyReadFilter implements PHPExcel_Reader_IReadFilter
	{
		private $_startRow = 0;
		private $_endRow   = 0;
		private $_columns  = array();
		
		public function __construct($startRow, $endRow, $columns) {
			$this->_startRow = $startRow;
			$this->_endRow   = $endRow;
			$this->_columns  = $columns;
		}
		
		public function readCell($column, $row, $worksheetName = '') {
			if ($row >= $this->_startRow && $row <= $this->_endRow) {
				if (in_array($column,$this->_columns)) {
					return true;
				}
			}
			return false;
		}
	}
		
	$filterSubset = new MyReadFilter(1,22000,range('I','K'));
	$objReader = PHPExcel_IOFactory::createReader($inputFileType);
	$objReader->setReadFilter($filterSubset);
	$objReader->setReadDataOnly(true);
	$objPHPExcel = $objReader->load($inputFileName);

	fn_echo('Write to CSV format..<br />');
	$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'CSV');
	$objWriter->save(str_replace('.php', '.csv', DIR_COMPILED . 'gsprods.csv'));

Mar 26, 2012 at 3:58 PM
Edited Mar 27, 2012 at 11:23 PM

Here is how I implemented the ReadFilter so that I can pass it an array of just the Column Letters I want read. Not sure if this applicable to what you want though. The following will read just the columns specified and will read 2000 rows at a time until row 22000 is reached. The full code example is here but it involves some customized code of the readers. My full code example also has a routine for determining the chunk size based on the systems available memory which can be tweaked as needed.

$ReadFilter = new ReadFilter();
$SpreadsheetReaderObj->setReadFilter($ReadFilter);
$ReadFilter->setColumns(array('C', 'D', 'I', 'J', 'K', 'T'));
$ReadFilter->setEndRow(22000);
$ReadFilter->setChunkSize(2000);

//
// This class overrides the default reader in order to ignore extraneous data.
//
class ReadFilter implements PHPExcel_Reader_IReadFilter
{
  private $_worksheetName;
  private $_startRow;
  private $_endRow;
  private $_columns;
  private $_isChunkRead;
  private $_chunkStartRow;
  private $_chunkSize;

  public function __construct() {
    $this->_worksheetName = 'Worksheet';
    $this->_startRow      = 0;
    $this->_endRow        = 65530;
    $this->_columns       = $this->getColumnRange('A','XFD');
    $this->_isChunkRead   = false;
    $this->_chunkStartRow = 1;
    $this->_chunkSize     = 1000; // Number of rows read in each chunk
  }

  public function setWorksheetName($worksheetName) {
    $this->_worksheetName = (!empty($worksheetName)) ? $worksheetName : 'Worksheet';
  }

  public function setStartRow($startRow) {
    if(($startRow >= 0) && ($startRow <= $this->_endRow)) {
      $this->_startRow = $startRow;
    }
  }

  public function setEndRow($endRow) {
    if($endRow >= $this->_startRow)
    {
      $this->_endRow = $endRow;
    }
  }

  public function setColumns($columns) {
    $this->_columns = (is_array($columns)) ? $columns  : $this->getColumnRange('A','XFD');
  }

  public function setChunkRead() {
    $this->_isChunkRead = true;
  }

  public function setChunkSize($chunkSize = null) {
    if(!is_null($chunkSize)) {
      $this->_chunkSize = $chunkSize;
    }

    $this->_isChunkRead = true;
  }

  public function setChunkStartRow($chunkStartRow = null) {
    if(!is_null($chunkStartRow)) {
      $this->_chunkStartRow = $chunkStartRow;
    }
  }

  public function unsetChunkRead() {
    $this->_isChunkRead = false;
  }

  public function readCell($column, $row, $worksheetName = '') {
    if((!empty($worksheetName)) && ($worksheetName != $this->_worksheetName)) {
      return false;
    }

    if($this->_isChunkRead) {
      //  Read rows in chunks and only the columns specified
      if(($row >= $this->_chunkStartRow) && ($row < ($this->_chunkStartRow + $this->_chunkSize)) && ($row <= $this->_endRow)) {
        if(in_array($column, $this->_columns)) {
          return true;
        }
      }
    } else {
      //  Read all rows but only the columns specified
      if($row >= $this->_startRow && $row <= $this->_endRow) {
        if(in_array($column, $this->_columns)) {
          return true;
        }
      }
    }

    return false;
  }

  //
  // Returns array of column letters between ColumnLetter1 and ColumnLetter2 inclusively
  //
  public function getColumnRange($ColumnLetter1, $ColumnLetter2)
  {
    static $ColumnsReference = '';

    if (empty($ColumnsReference)) {
      $ColumnsReference = $this->initializeColumnsReference(3);
    }

    $ColumnRange = array();

    $Length1 = strlen($ColumnLetter1);
    $Length2 = strlen($ColumnLetter2);

    $Pos1 = strpos($ColumnsReference, "|{$ColumnLetter1}|");
    $Pos2 = strpos($ColumnsReference, "|{$ColumnLetter2}|");

    if ($Pos2 >= $Pos1) {
      $StartPos = $Pos1 + 1;
      $EndPos = $Pos2 + $Length2;

      $Length = $EndPos - $StartPos + 1;

      $RangeStr = substr($ColumnsReference, $StartPos, $Length);

      $ColumnRange = explode('|', $RangeStr);
    }

    return $ColumnRange;
  }

  //
  // Generate string that contains column letters for future reference.
  // initializeColumnsReference(1) = "|A|B|C|...|X|Y|Z|"
  // initializeColumnsReference(2) = "|A|B|C|...|ZX|ZY|ZZ|"
  // initializeColumnsReference(3) = "|A|B|C|...|ZZX|ZZY|ZZZ|"
  //
  private function initializeColumnsReference($Width = 1)
  {
    $ColumnsStr = '';
    $ColumnLetter = '';
    $LastColumnLetter = str_repeat('Z', $Width);
    $Tier = '';

    do {
      for ($ChrVal = 65; $ChrVal <= 90; $ChrVal++) {
        $ColumnLetter = $Tier . chr($ChrVal);
        $ColumnsStr .= "|{$ColumnLetter}";
      }

      if (empty($Tier)) {
        $Tier = 'A';
      } else {
        $IsEndTier = false;
        $IsFirst = true;
        $ZCount = 0;
        $NewTier = '';

        foreach (array_reverse(str_split($Tier)) as $TierChr) {
          if (($TierChr == 'Z') && ($IsEndTier || $IsFirst)) {
            $NewTierChr = 'A';
          } elseif ($IsEndTier || $IsFirst) {
            $NewTierChr = chr(ord($TierChr) + 1);
          } else {
            $NewTierChr = $TierChr;
          }

          $IsEndTier = ($TierChr == 'Z') ? true : false;
          $ZCount += ($TierChr == 'Z') ? 1 : 0;

          $NewTier = "{$NewTierChr}{$NewTier}";
          $IsFirst = false;
        }

        $Tier = $NewTier;
        $Tier .= ($ZCount == strlen($Tier)) ?  'A' : '';
      }
    } while ($ColumnLetter != $LastColumnLetter);

    $ColumnsStr .= "|";

    return $ColumnsStr;
  }
}

- Christopher Mullins

Mar 27, 2012 at 2:35 PM
Edited Mar 27, 2012 at 2:36 PM

Hello schir1964,

Thanks for the response.  I had a chance to implement your code however, I am getting an "Parse error: syntax error, unexpected T_PUBLIC" for:

public function getColumnRange($ColumnLetter1, $ColumnLetter2)

Does it need to be added to the function list?

Mar 27, 2012 at 11:18 PM
Edited Mar 28, 2012 at 12:24 AM

Oops. Sorry about that. If you move these last two functions inside the class, you should be fine.

The original example code had those two functions inside of a example read spreadsheet class that implemented the filter. But it should still work fine if you move the last two functions inside of the ReadFilter class.

I've change my post above to correct this for you.

- Christopher Mullins

Mar 28, 2012 at 10:19 PM

Chris,

Thanks again for the reply.  However, your code seems a little overkill for what I need and plus I keep getting a blank page when using it.

I have rewritten my code and successfully been able to pull just the columns I need and I also know I can read 500 rows without the server memory skyrocketing and killing the script.  However, I am still having trouble adding the chunk method into my code.  This is what I have and the script is still being killed even if I set the chunkSize at 250.

	class MyReadFilter implements PHPExcel_Reader_IReadFilter
	{
		private $_startRow = 1;
		private $_endRow   = 22000;
		private $_columns  = array('C', 'D', 'I', 'J', 'K', 'T');

		/**  Get the list of rows and columns to read  */
		public function __construct($startRow, $endRow, $columns) {
			$this->_startRow = $startRow;
			$this->_endRow   = $endRow;
			$this->_columns  = $columns;
		}

		public function setRows($startRow, $chunkSize) {
			$this->_startRow = $startRow;
			$this->_endRow   = $startRow + $chunkSize;
		}

		public function readCell($column, $row, $worksheetName = '') {
			//  Only read the rows and columns that were configured
			if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {
				if (in_array($column,$this->_columns)) {
					return true;
				}
			}
			return false;
		}
	}

	$filterSubset = new MyReadFilter(1,22000,array('C', 'D', 'I', 'J', 'K', 'T'));
	$chunkSize = 250;
	for ($startRow = 2; $startRow <= 65536; $startRow += $chunkSize) {
		$filterSubset->setRows($startRow,$chunkSize);
		$objReader = PHPExcel_IOFactory::createReader($inputFileType);
		$objReader->setReadFilter($filterSubset);
		$objReader->setReadDataOnly(true);
		$objPHPExcel = $objReader->load($inputFileName);
	}

Mar 29, 2012 at 5:53 PM
Edited Mar 29, 2012 at 6:27 PM

You are creating your reader inside the loop.

Try this:

 

$filterSubset = new MyReadFilter(1, 22000, array('C', 'D', 'I', 'J', 'K', 'T'));

$chunkSize = 250;

$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$objReader->setReadFilter($filterSubset);
$objReader->setReadDataOnly(false);

for ($startRow = 2; $startRow <= 65536; $startRow += $chunkSize) {
  $filterSubset->setRows($startRow, $chunkSize);
  $objPHPExcel = $objReader->load($inputFileName); // Rows are now accessible after this command.

  /*
      Read cell data code goes here
  */
}
Mar 30, 2012 at 12:45 PM
Edited Mar 30, 2012 at 12:49 PM

Thanks again for your help Chris.  However, I just cannot get the chunk to work no matter what.

Just to recap for anyone browsing this, the following code will take an xls file and read only columns C, D, I, J, K, T and write it into a csv file.  The only drawback (on my server anyway) is that is takes too much memory to read the entire file and the most I could read is 500 rows without the server killing the script.  If anyone else can take this code and add the chunk method and make it work, then my hat is off to you.

 

require_once '/Classes/PHPExcel.php';
	
fn_echo('Create new PHPExcel object..<br />');
$objPHPExcel = new PHPExcel();

fn_echo('Load temp.xls file..<br />');
$inputFileName = 'temp.xls';
$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
	
class MyReadFilter implements PHPExcel_Reader_IReadFilter
{
	private $_startRow = 1;
	private $_endRow   = 500;
	private $_columns  = array('C', 'D', 'I', 'J', 'K', 'T');

	public function __construct($startRow, $endRow, $columns) {
		$this->_startRow = $startRow;
		$this->_endRow   = $endRow;
		$this->_columns  = $columns;
	}

	public function readCell($column, $row, $worksheetName = '') {
		//  Only read the rows and columns that were configured
		if ($row >= $this->_startRow && $row <= $this->_endRow) {
			if (in_array($column,$this->_columns)) {
				return true;
			}
		}
		return false;
	}
}

$filterSubset = new MyReadFilter(1,500,array('C', 'D', 'I', 'J', 'K', 'T'));
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$objReader->setReadFilter($filterSubset);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load($inputFileName);
	
fn_echo('Write to CSV format..<br />');
$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'CSV');
$objWriter->save(str_replace('.php', '.csv', _FILE_ ));

 

Sample output:

 

"","","MfgNo","Quantity","","","","","Price","MAP","MSRP","","","","","","","","","MfgAbbv"
"","","12841","0","","","","","372.22","400","440","","","","","","","","","AIMP"
"","","81200","4","","","","","6.12","0","12.95","","","","","","","","","DT"
"","","MPO II","0","","","","","844.6799999999999","1069","1125","","","","","","","","","EOTE"
"","","2125KV","2","","","","","57.35","0","130","","","","","","","","","COLU"
"","","75204BK","6","","","","","119.05","0","199.99","","","","","","","","","BLAC"
"","","68000","13","","","","","199","299.99","374.99","","","","","","","","","LEUP"
"","","12172","12","","","","","674.4","774","860","","","","","","","","","AIMP"
"","","66740","3","","","","","976","1239.99","1549.99","","","","","","","","","LEUP"
"","","59275","5","","","","","691","799.99","1039.99","","","","","","","","","LEUP"
"","","59265","1","","","","","579","699.99","874.99","","","","","","","","","LEUP"
"","","TA31RCO-M4CP","0","","","","","1234","1398.25","1645","","","","","","","","","TRIJ"
"","","67870","1","","","","","875","1049.99","1314.99","","","","","","","","","LEUP"
"","","66610","2","","","","","810","1049.99","1314.99","","","","","","","","","LEUP"
"","","TA01","1","","","","","868","983.45","1157","","","","","","","","","TRIJ"
"","","66600","3","","","","","754","969.99","1214.99","","","","","","","","","LEUP"
Mar 30, 2012 at 4:48 PM

How much memory does your system have?

- Christopher Mullins

Mar 30, 2012 at 7:38 PM

I'm not sure but it really doesn't matter.  The server is set up to kill a script which exceeds a certain limit.   I am guessing that limit is somewhere around 512M since logs show the script using as much as 492M.

Jun 28, 2012 at 12:00 PM

Hi,

If you solved your problem, it can be helpful to me also.

I have a 55Mb XLS file to load and the loading fails due to memory limit.

I try to implement filtering but I have the feeling that it does even walk through the readCell method (writting debug information does'nt work or just returning false does'nt change the behaviour of my script)

 

Regards,

 

Olivier

Jun 28, 2012 at 4:07 PM

@ogerault; If you are willing to risk sending me your file, I can run it through my configuration to see if I can reproduce the issue and offer a solution.

 

- Christopher Mullins

Jul 6, 2012 at 11:49 AM

Hello

TexasTrophy thanks for your code. I worked with that and come along following solution current project. I copy chunk of code from their. So you will get an idea. 

Some of the variable have not declared here(that values come from other forms ) but it will work fine if u define all 

 

$dail_code= array();

$destination= array();

$rate=array();

$e_date=array();

class MyReadFilter implements PHPExcel_Reader_IReadFilter //come from IReadfilter.php

 

private $_columns  = array('A', 'B', 'C', 'D'); public function __construct($s_row, $e_row, $columns)

$this->_startRow = $s_row;  $this->_endRow   = $e_row;  $this->_columns  = $columns; 


public function readCell($column, $row, $worksheetName = '')  {

if ($row >= $this->_startRow && $row <= $this->_endRow)  { 

if (in_array($column,$this->_columns)) 

return true; } 

return false;

 
}// filter class end  

$filterSubset = new MyReadFilter($begin,$finish,array($_POST['dal_col'],$_POST['des_col'], $_POST['dat_col'],$_POST['rat_col']));

$objReader = PHPExcel_IOFactory::createReader($inputFileType);

$objPHPExcel = new PHPExcel();

$objReader->setReadFilter($filterSubset); 
$objPHPExcel = $objReader->load($target_path);

$objWorksheet = $objPHPExcel->setActiveSheetIndex($_POST['sheet_dail']);

foreach($objWorksheet->getRowIterator() as $row)  {  

        $cellIterator = $row->getCellIterator();

$cellIterator->setIterateOnlyExistingCells(true);

 $track=1;   foreach ($cellIterator as $cell) {   $d=$cell->getCalculatedValue();  
 if($track==1)   {         $dail_code[]=$d;  }

 elseif($track==2)   {    $destination[]=$d;    }

    elseif($track==3)   {    $rate[]=$d;  }

   elseif($track==4)   {   $e_date[]=$d;    }    

 $track++;

   }  $v=$v+1;
 }

 for($j=0;$j<count($rate);$j++) {

echo " ".$dail_code[$j]. " ".$destination[$j]. " ".$rate[$j]." ".$e_date[$j];  

}

Sincerely

Arenna