Memory size in CachedObjectStorage/Memory.php

Topics: Developer Forum, Project Management Forum, User Forum
Aug 2, 2012 at 8:04 PM

I am trying to read the excel and insert the record into mysql.

I am getting this error after it seem to have run successfully.  Any ideas what is the problem:

 

Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 524288 bytes) in /var/www/html/PHPExcel/Classes/PHPExcel/CachedObjectStorage/Memory.php on line 47

Aug 2, 2012 at 9:15 PM

From FAQ:

 

Fatal error: Allowed memory size of xxx bytes exhausted (tried to allocate yyy bytes) in zzz on line aaa

PHPExcel holds an "in memory" representation of a spreadsheet, so it is susceptible to PHP's memory limitations. The memory made available to PHP can be increased by editing the value of the memorylimit directive in your php.ini file, or by using iniset('memory_limit', '128M') within your code (ISP permitting);

Some Readers and Writers are faster than others, and they also use differing amounts of memory. You can find some indication of the relative performance and memory usage for the different Readers and Writers, over the different versions of PHPExcel, here http://phpexcel.codeplex.com/Thread/View.aspx?ThreadId=234150

If you've already increased memory to a maximum, or can't change your memory limit, then this discussion on the board describes some of the methods that can be applied to reduce the memory usage of your scripts using PHPExcel http://phpexcel.codeplex.com/Thread/View.aspx?ThreadId=242712

Aug 2, 2012 at 10:30 PM

I tried to use memory caching and I am still getting the error.  I even try to disconnect and unset so that there is no memory leakage.

Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 17030419 bytes) in /var/www/html/PHPExcel/Classes/PHPExcel/Reader/Excel2007.php on line 428

 

Here is my code:

//file type
$inputFileType = 'Excel2007';

//get file name
$inputFileName = "../../files/sample.xlsx";

/**  Define a Read Filter class implementing PHPExcel_Reader_IReadFilter  */
class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
    private $_startRow = 0;

    private $_endRow = 0;

    /**  Set the list of rows that we want to read  */
    public function setRows($startRow, $chunkSize) {
        $this->_startRow    = $startRow;
        $this->_endRow        = $startRow + $chunkSize;
    }

    public function readCell($column, $row, $worksheetName = '') {
        //  Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow
        if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {
            return true;
        }
        return false;
    }
}

//check that file exists
if(file_exists($inputFileName))
{   
    //Create a new Reader of the type defined in $inputFileType
    $objReader = PHPExcel_IOFactory::createReader($inputFileType);
   
    //cache memory so that it does not use to much memory
    $cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_in_memory_serialized;
    PHPExcel_Settings::setCacheStorageMethod($cacheMethod);
       
    //Advise the Reader to load all Worksheets
    $objReader->setLoadAllSheets();

    //Load $inputFileName to a PHPExcel Object
    $objPHPExcel = $objReader->load($inputFileName);


   
    //get the total number of rows
    $total_rows = $objSheet->getHighestRow();
   
   
    //Define how many rows we want to read for each "chunk"
    $chunkSize = 2048;

    // Create a new Instance of our Read Filter
    $chunkFilter = new chunkReadFilter();

    //Tell the Reader to use the Read Filter
    $objReader->setReadFilter($chunkFilter);
   
    //set variables
    $tableName = "prep_mysoft_wireless";
    $create_date = date("Y-m-d G:i:s");
    $start_time = date("Y-m-d G:i:s");
   
       
    //Loop to read our worksheet in "chunk size" blocks
    for ($startRow = 2; $startRow <= $total_rows; $startRow += $chunkSize)
    {
        //Tell the Read Filter, the limits on which rows we want to read this iteration
        $chunkFilter->setRows($startRow,$chunkSize);
       
        //  Load only the rows that match our filter from $inputFileName to a PHPExcel Object
        $objPHPExcel = $objReader->load($inputFileName);       
       
        //get only the first sheet
        $objPHPExcel->setActiveSheetIndex(0);
        $objWorksheet = $objPHPExcel->getActiveSheet();       
       
        foreach ($objWorksheet->getRowIterator(2) as $row)
        {   
            $cellIterator = $row->getCellIterator();
            $cellIterator->setIterateOnlyExistingCells(false);
           
            $sql = "INSERT INTO ".$tableName." (user_id, dirn, user_name, organization, create_date) VALUES (";
           
            $i = 0;
            foreach ($cellIterator as $cell)
            {       
                $cellValue = trim($cell->getValue());
                $cellValue = str_replace("\"","",$cellValue);
                $cellValue = str_replace("=","",$cellValue);
                $cellValue = str_replace(",","",$cellValue);
                $cellValue = $mysqli->real_escape_string($cellValue);
               
                if($i == 0)
                {
                    $cellValue = "'".$cellValue."'";
                    $rowData = ($cellValue != "") ? true : false;
                }
                else
                {
                    $cellValue = ($cellValue == "") ? ", NULL" : ", '".$cellValue."'";
                }
                $sql .= $cellValue;
                $i++;
            } //end foreach cellIterator
           
            $sql .= ", '".$create_date."')";
           
            if($rowData)
            {
                if(!$mysqli->query($sql))
                {
                    die("Error in inserting data for table $tableName: ".$sql." ==> ".$mysqli->error);
                }
            }
        } //end foreach getRowIterator                               
    } //end for loop chunk
   
   
    $objPHPExcel->disconnectWorksheets();
    unset($objPHPExcel);
}

Coordinator
Aug 3, 2012 at 1:04 PM

Well to start with, you're loading the entire workbook, and then setting a readFilter on the reader, which is pretty meaningless.... if you define a readFilter, then set it against the reader before loading

Aug 3, 2012 at 1:23 PM

I don't understand what you mean.  The workbook only have one active sheet.  But because we have to assume that we don't know what the first sheet is name, I can not just load that worksheet.  Can you example what you mean by setting the readFilter against the reader before loading.

I modified it a little to get the total row from listWorksheeetInfo.  Would this work better?  I had to remove the cache memory because for some reasons it is importing blank records.  I have a check that it should only import record if the first cell in the row have a value.  Yet with the cache memory, it is importing thousands of empty rows.  Any ideas why that might be?

//check that file exists
if(file_exists($inputFileName))
{   
    //Create a new Reader of the type defined in $inputFileType
    $objReader = PHPExcel_IOFactory::createReader($inputFileType);
   
    $fileinfo = $objReader->listWorksheetInfo($inputFileName);
    $totalRows = $fileinfo[0]['totalRows'];
     
    //Advise the Reader to load all Worksheets
    $objReader->setLoadAllSheets();

    //Define how many rows we want to read for each "chunk"
    $chunkSize = 2048;

    // Create a new Instance of our Read Filter
    $chunkFilter = new chunkReadFilter();

    //Tell the Reader to use the Read Filter
    $objReader->setReadFilter($chunkFilter);
   
    //set variables
    $tableName = "prep_mysoft_wireless";
    $create_date = date("Y-m-d G:i:s");
    $start_time = date("Y-m-d G:i:s");
       
    //Loop to read our worksheet in "chunk size" blocks
    for ($startRow = 2; $startRow <= $totalRows; $startRow += $chunkSize)
    {
        //Tell the Read Filter, the limits on which rows we want to read this iteration
        $chunkFilter->setRows($startRow,$chunkSize);
       
        //  Load only the rows that match our filter from $inputFileName to a PHPExcel Object
        $objPHPExcel = $objReader->load($inputFileName);
       
        $objWorksheet = $objPHPExcel->getActiveSheet();

       .....

      }

}

 

 

Coordinator
Aug 3, 2012 at 2:41 PM

    $fileinfo = $objReader->listWorksheetInfo($inputFileName);
    $totalRows = $fileinfo[0]['totalRows'];
     
    //Advise the Reader to load all Worksheets
    $objReader->setLoadAllSheets();

replace with

   // Advise the reader to load only the first worksheet

    $fileinfo = $objReader->listWorksheetInfo($inputFileName);
    $totalRows = $fileinfo[0]['totalRows'];
    $firstSheetName = $fileinfo[0]['worksheetName'];
      
    //Advise the Reader to load just the first worksheet
    $objReader->setLoadSheetsOnly($firstSheetName);

Get rid of the

    //Load $inputFileName to a PHPExcel Object
    $objPHPExcel = $objReader->load($inputFileName);

outside the loop. You're telling PHPExcel to load the entire workbook into memory before you even start looping

 

Replace

        foreach ($objWorksheet->getRowIterator(2) as $row)
        {   
with

        foreach ($objWorksheet->getRowIterator($startRow) as $row)
        {   
so that your rowIterator starts at the first row of the chunk when you loop, not at row 2 every time

Aug 3, 2012 at 4:55 PM

Mark:

I made all the changes you selected.  But now, after it retrieve the first 2048 records, afterward ... it seem to not be able to read the cell value because all the value after 2048, they are all set to NULL.  I checked the cell and they have value in them.  This is cause when I replace the

     foreach ($objWorksheet->getRowIterator(2) as $row)

with

     foreach ($objWorksheet->getRowIterator($startRow) as $row)

 

Here is my entire code:

//Define a Read Filter class implementing PHPExcel_Reader_IReadFilter
class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
    private $_startRow = 0;

    private $_endRow = 0;

    /**  Set the list of rows that we want to read  */
    public function setRows($startRow, $chunkSize) {
        $this->_startRow    = $startRow;
        $this->_endRow        = $startRow + $chunkSize;
    }

    public function readCell($column, $row, $worksheetName = '') {
        // Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow
        if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {
            return true;
        }
        return false;
    }
}

//check that file exists
if(file_exists($inputFileName))
{
    //Create a new Reader of the type defined in $inputFileType
    $objReader = PHPExcel_IOFactory::createReader($inputFileType);
   
    $fileinfo = $objReader->listWorksheetInfo($inputFileName);
    $totalRows = $fileinfo[0]['totalRows'];
    $firstSheetName = $fileinfo[0]['worksheetName'];   
   
    // Load only first sheet.
    $objReader->setLoadSheetsOnly($firstSheetName);

    //Define how many rows we want to read for each "chunk"
    $chunkSize = 2048;

    // Create a new Instance of our Read Filter
    $chunkFilter = new chunkReadFilter();

    //Tell the Reader to use the Read Filter
    $objReader->setReadFilter($chunkFilter);
   
    //set variables
    $tableName = "prep_mysoft_wireless";
    $create_date = date("Y-m-d G:i:s");
   
    //Loop to read our worksheet in "chunk size" blocks
    for ($startRow = 2; $startRow <= $totalRows; $startRow += $chunkSize)
    {
        //Tell the Read Filter, the limits on which rows we want to read this iteration
        $chunkFilter->setRows($startRow,$chunkSize);
       
        //  Load only the rows that match our filter from $inputFileName to a PHPExcel Object
        $objPHPExcel = $objReader->load($inputFileName);
       
        $objWorksheet = $objPHPExcel->getActiveSheet();
       
        foreach ($objWorksheet->getRowIterator($startRow) as $row)
        {   
            $cellIterator = $row->getCellIterator();
            $cellIterator->setIterateOnlyExistingCells(false);
            $rowData = false;

            $sql = "INSERT INTO ".$tableName." (user_id, dirn, user_name, organization, create_date) VALUES (";
                   
            $i = 1;
            foreach ($cellIterator as $cell)
            {
                $cellValue = trim($cell->getValue());
                $cellValue = str_replace(","," - ",$cellValue);
                $cellValue = $mysqli->real_escape_string($cellValue);
               
                if($i == 0)
                {
                    $cellValue = "'".$cellValue."'";
                    if (($cellValue != "") && (!empty($cellValue))) $rowData = true;
                }
                else if ($i > 0)
                {
                    $cellValue = (($cellValue == "") || (empty($cellValue))) ? ", NULL" : ", '".$cellValue."'";
                }
                $sql .= $cellValue;
                $i++;   
            } //end foreach cellIterator
           
            $sql .= ", '".$create_date."')";
           

            if($rowData)
            {
                if(!$mysqli->query($sql))
                {
                    die("Error in inserting data for table $tableName: ".$sql." ==> ".$mysqli->error);
                }                   
            }
        } //end foreach getRowIterator
    } //end for loop chunk
   
    $objPHPExcel->disconnectWorksheets();
    unset($objPHPExcel);
}

Coordinator
Aug 3, 2012 at 8:37 PM

Have you tried putting any diagnostics in to display the address of each cell as it's processed?

First iteration of the for ($startRow = 2; $startRow <= $totalRows; $startRow += $chunkSize) loop, the filter should load rows 2 to 2049. The row iterator is set to start iterating from row 2 until it has read the highest loaded row (row 2049).

Second iteration of the for ($startRow = 2; $startRow <= $totalRows; $startRow += $chunkSize) loop, the filter should load rows 2050 to 4097. The row iterator should then be set to start iterating from row 2050 until again it has read the highest loaded row (row 4097).

Third iteration, the filter should load rows 4098 to 6145, the row iterator should be set to start iterating from row 4098, etc.

I have tested this principle and it's working correctly for me. Please can you try adding some diagnostics to show what cells are being accessed to try and help me identify what is going wrong for you

Aug 6, 2012 at 4:15 PM

Hi Mark,

The problem was with the chunksize.  If I set $chunkSize = 2048, I guess it is taking lots of memory to process that amount of records.  I decrease it to 100 then 200 then 500 and it seem to work fine.  I am fine with it processing 500 records/chunk.  For approx. 17,000 records, it is taking 6 minutes to process.  Since this process will be run at night and once a month, this should be ok.

Thank you for your trouble and assistance!!! 

Aug 6, 2012 at 6:14 PM

Here's some code that automatically calculates the optimal ChunkSize based on the available system memory.

//
// Estimate maximum chunk size based on system memory.
//
$MemoryLimit = getMemorySizeAsBytes();
$BufferFactor = 1; 
$RowMemory = (1024 * 1) * $WorksheetListInfo[0]['totalColumns'];
$ChunkSize = max(intval(($MemoryLimit / $BufferFactor) / $RowMemory), 100);

//
// Convert Memory Size To Bytes
//
function getMemorySizeAsBytes() {
  $memory_size = ini_get('memory_limit');

  switch (substr($memory_size, -1)) {
    case 'G':
    case 'g':
      return (int) $memory_size * 1073741824;
    case 'M':
    case 'm':
      return (int) $memory_size * 1048576;
    case 'K':
    case 'k':
      return (int) $memory_size * 1024;
  }

  return $memory_size;
}

Aug 7, 2012 at 8:51 PM

Oh wow!  Thank you.  I am going to try this out!  Thanks again!

Aug 8, 2012 at 3:57 AM

If you have any questions on integrating my code with yours, just post them here.

- Christopher Mullins