Really confused about memory usage with this project.

Topics: User Forum
Feb 25, 2011 at 12:53 PM
Edited Feb 25, 2011 at 12:54 PM

I am having major problem with memory issues and just can't seem to get around them.

Why is it that one spreadsheet, size: 4,890,624 bytes (5 columns, 35653 rows), works
with no problems but another file, size: 74,240 bytes (5 columns, 193 rows), causes:

Allowed memory size of 268435456 bytes exhausted (tried to allocate 80 bytes) in
PHPExcel/Classes/PHPExcel/CachedObjectStorage/DiscISAM.php on line 47


Here are the cache files created when trying to open the 74K file:

Size      Date         FileName
        0 Feb 25 05:34 PHPExcel.4394d67afd5891272.23106874.cache
139934791 Feb 25 05:34 PHPExcel.4394d67afd58efe77.58621720.cache
        0 Feb 25 05:34 PHPExcel.4394d67afd5b1da06.04285814.cache
        0 Feb 25 05:34 PHPExcel.4394d67afd5b8b7b4.42921373.cache

 


Both files are attached to this issue:
http://phpexcel.codeplex.com/workitem/15414

I'm having to manually convert all these files to tab-delimited in order for our process to import them. I can't take a chance on any of them throwing this error and causing the cache files  to be left, which is what happens when it crashes with that error.

It seems that your old adage that 1K per cell just doesn't make sense at all.

Coordinator
Feb 25, 2011 at 2:16 PM

A part of your problem is that the smaller spreadsheet: 74,240 bytes (5 columns, 193 rows), is actually (8 columns, 65,536 rows).... there's an awful lot of blank cells in there.

When you use the CSV writer, it creates a 2D array of all the cells, and then loops through that array a row at a time.... that means it's building a 65,536 x 8 array in memory... before writing it out to your tab-separated file. WIth the current 1.7.5 release, that also creates blank cells up to H65536.

With the latest additions to the worksheet's toArray() methods (namely the rangeToArray() method, and the fact that it doesn't create blank cells any more; I can modify the CSV Writer to process a row at a time, but you'll still end up with a lot of blank rows in your tsv unless you delete the spurious rows from your Excel file.

Feb 25, 2011 at 2:56 PM

1. I am not using the CSV writer to parse the files. I am just opening and reading them.

2. Why would you create and array element for blank cells? Kind of bogus to me.

If the row is empty, how do you delete them from your Excel file?

Feb 25, 2011 at 2:58 PM

Still, 8 columns, 65,536 rows is only 524288.

Coordinator
Feb 25, 2011 at 3:30 PM
mrbaseball34 wrote:

1. I am not using the CSV writer to parse the files. I am just opening and reading them.

When I open/read those files, without using any cell caching at all, I get:

42_KATECO-LESS PRICING-2-24-2011.xls  - Peak Memory Usage 8MB

493_legende-BFSTOCK_2_23_2011.xls  - Peak Memory Usage 236.25MB

So I don't know what you're doing with these files that might be exhausting memory. What are you doing with the files once you've loaded them?

mrbaseball34 wrote:

2. Why would you create and array element for blank cells? Kind of bogus to me.

 Yes, it is kind of bogus... which is why it's being changed.

Feb 25, 2011 at 3:47 PM
Edited Feb 25, 2011 at 3:48 PM

Well, the memory exhaustion actually occurs somewhere inside the cell iterator but I don't know which row, I am taking care to unset as many of the variables as I can:

 

    // This is preliminary code to parse an Excel spreadsheet
    set_include_path(get_include_path() . PATH_SEPARATOR . 'PHPExcel/Classes/'); 
    require_once 'PHPExcel/Classes/PHPExcel.php';
    require_once 'PHPExcel/Classes/PHPExcel/IOFactory.php';
    $cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_discISAM;
    PHPExcel_Settings::setCacheStorageMethod($cacheMethod);
    $objReader = PHPExcel_IOFactory::createReaderForFile($file);
    $objReader->setReadDataOnly(true);
    $objPHPExcel  = $objReader->load($file);
    $objWorksheet = $objPHPExcel->getActiveSheet();

    // counter to skip the header rows
    $q = 1;
    $rowIterator = $objWorksheet->getRowIterator();
    /*
       $format is an array of numbers defining column ordering.
       we want stuff imported in the correct order so we provide something like this
       We want columns imported in this order:
       Part#, AltPart#, CC, Qty, Desc
       (default column format is 1,2,3,4,5,6,7)

       Most files are not configure in this order so say lets have one in this order:
       Qty, Desc, Mfg, Part#, CC

       The format would be like this:
       4,5,6,1,3,2,7
    */
    
    foreach ($rowIterator as $row) {
      // if file has heaer row(s), skip them
      if($q > $skip_rows) {
        $fields = array();
        $sql = '';
        $cellIterator = $row->getCellIterator();
        $cellIterator->setIterateOnlyExistingCells(false);
        $g = 1;
        foreach ($cellIterator as $cell) 
        {
          $cellval = trim(iconv("UTF-8","ISO-8859-1",$cell->getValue())," \t\n\r\0\x0B\xA0");
        } // foreach ($cellIterator as $cell)
        unset($cellIterator);
        // reset the array
        $fields_order = array();
        // arrange the array values as defined by the user's file format configuration 
        $y = 0;
        foreach($format as $index => $value ) {
          switch($value) {
          case 1:
            // if partno is blank, use altpartno
            if(trim($fields[$index]) == '') {
              $fields[$index] = $fields[intval($value)+1];
            }
            $fields_order[$value] = strtoupper($fields[$index]);
            $y = $value;
          case 3: //Default and Uppercase the condition code
            $allowed_cc = array('NE', 'NS', 'SV', 'OH', 'AR');
            if(trim($fields[$index]) == '') {
              $fields[$index] = 'NS';
            }
            $fields_order[$value] = strtoupper($fields[$index]);
            if(!in_array($fields_order[$value], $allowed_cc)) {
              $fields_order[$value] = 'NS';
            }
          break;
          case 5: // defailt description to PART
            if(trim($fields[$index]) == '') {
              $fields[$index] = 'PART';
            }
            $fields_order[$value] = $fields[$index];
          break;
          default:
            $fields_order[$value] = $fields[$index];
          } // if ($value == 3)
        } // foreach($format as $index => $value ) 

        // sort the array by key
        ksort( $fields_order );

        // Now just get the values, don't need the previous keys
        $fields_order = array_values( $fields_order );

        // remove all but the first 5 elements.
        $fields_order = array_slice($fields_order,0,5);

        // if it is the part number column
        $j = strpos(strtolower($fields_order[$y]), 'number');
        // or it is not empty
        $k = strlen(trim(str_replace($fields_order[$y], '"', '')));
        
        $ipn  = trim($this->stripIllegalChars($fields_order[0]));
        $iapn = trim($this->stripIllegalChars($fields_order[1]), false);
        $cc   = strtoupper($fields_order[2]);
        $qty  = $fields_order[3];
        $desc = addslashes($fields_order[4]);
        // build the insert statement from the values in the cells only if partno is not blank
        if(trim($ipn) != '') {
          $field_str = sprintf('"%s",  "%s",  "%s",  %d,  "%s"', $ipn,$iapn,$cc,$qty,$desc);
          $insertsql = "INSERT INTO inventory (user_id,inventory_part_number,inventory_alt_part_number,".
                                              "inventory_condition_code,inventory_quantity,inventory_description,".
                                              "last_update) ".
                                     "VALUES ($user_id, ".$field_str.", NOW())";
          $result = $this->db->query($insertsql) or die($insertsql."<BR>".mysql_error());
        } // if(trim($ipn) != '')
      } // if($q > $skip_rows)
      $q++;
    } // foreach ($objWorksheet->getRowIterator() as $row)
    unset($rowIterator);
    $objPHPExcel->disconnectWorksheets();
    unset($objPHPExcel);