removeRows on many rows

Topics: Developer Forum
Jun 5, 2009 at 7:20 PM
Edited Jun 5, 2009 at 7:44 PM

Whenever I remove the rows of a large excel file with about 1000+ rows, the last rows on the worksheet does not get removed.  Here is the issue.  Once I have read in an xls file with a reader, I run a function that traverses the entire worksheet and checks for certain values in a column.  Based on this value, the rows with those values are removed from the worksheet.  Now, because the worksheet is large, I don't call a removeRow on every row that is checked.  The function creates an array of values that contain a starting row and a row count that identifies which contiguous rows in the worksheet will be removed. ie.  array(array($start, $count));   There are mutliple sections of contiguous rows that get removed after the starting points and row counts are identified.  This reduces the amount of times the removeRow function is called significantly.  Calling the function on every row is unfeasible.   It takes way too long even with using the new memory and speed enhancements svn and runs out of memory on the production version(1.6.7).

The problem is that it always leaves the last section of row removal in the worksheet.  Another strange thing is that any row from the final section of removed rows has it's date column changed to a number representing the value of the date.  The dollar column from the section of rows that should have been removed also changes to regular numbers without $ signs.  My function removes row sections from the end of the worksheet first.

Here is the test excel file:
www.chisdev.net/test.xls

As you can see, the first column contains the values I check and based on it, select the contiguous rows to remove.  I need to check every row as I do include empty cells in my test case.  Here is my function:

protected function _removeAccTypes(){


            $_accTypes = array('BCS','DBT','APP','IVT','NTR','RDT','CR','HRT');  // accounts types to keep
            // initialize counting and recording.
            $max = $this->_excel->getNumRows();           // calls PHPExcel->getActiveSheet()->getHighestRow()
            $removes = array();
            //echo "maxrow=$max : ";
            $oldVal = 'APP';
            $newVal = '';
            $start = 1;
            $count = 1;
   
            for($row=1;$row<=$max;$row++){
                   
                $newVal = $this->_excel->getCellValue($row,0);
                if($newVal == ''){
                    if(in_array($oldVal,$_accTypes) == false){
                        $count++;
                    }
                }else{
                   
                    if($newVal == $oldVal){
                       
                        if(in_array($oldVal,$_accTypes) == false){
                            $count++;
                        }
                    }else{
                           
                        if(in_array($oldVal,$_accTypes) == false and in_array($newVal,$_accTypes)== true){
                           
                            array_unshift($removes, array($start,$count,$oldVal,$newVal));  // places start/count in the front of the array
                            $count=1;
                        }
                           
                        if(in_array($oldVal,$_accTypes) == true and in_array($newVal,$_accTypes)== false){
                            $count=1;
                            $start=$row;
                        }
                        if(in_array($oldVal,$_accTypes) == false and in_array($newVal,$_accTypes)== false){
                            $count++;
                        }
                        $oldVal = $newVal;
                    }
                }
   
            }
            array_unshift($removes, array($start,$count,$oldVal,$newVal));   // add the final rows to be removed
            $this->_removeRows($removes);   // loops through and gets the values of the start/count, then calls removeRow($start,$count)
       }

This could also be a case of the number of rows to be removed causing an issue when the row count is large.  In some cases, I need to remove about 1000+ rows from the worksheet.  Any help regarding this would be greatly appreciated. 

 

Thanks.

Developer
Jun 6, 2009 at 6:14 AM

I am able to reproduce the error with your script. Will look into this today.

Developer
Jun 6, 2009 at 10:15 AM

>> The problem is that it always leaves the last section of row removal in the worksheet.
>> Another strange thing is that any row from the final section of removed rows has it's date
>> column changed to a number representing the value of the date.


There was a bug in PHPExcel 1.6.7. It is fixed now.
http://phpexcel.codeplex.com/WorkItem/View.aspx?WorkItemId=10098

You may try next days source code release from here:
http://phpexcel.codeplex.com/SourceControl/ListDownloadableCommits.aspx


>> Calling the function on every row is unfeasible.   It takes way too long even
>> with using the new memory and speed enhancements svn and runs out of memory
>> on the production version(1.6.7).


The method you are using is a good one. The fewer calls to removeRow() the faster. In a future version we could perhaps also allow deleting non-contiguous ranges of rows/columns in one function call to allow for even faster operation.

With that said, as an experiment you may try to comment the following line in PHPExcel/ReferenceHelper.php as follows and see if it speeds up things.

// $pSheet->garbageCollect();


(It will not make any difference to the generated workbook because garbageCollect() is anyway called immediately before writing the workbook.)

Jun 9, 2009 at 2:08 AM

Thank you so much.  It's working very well now.  

Dec 1, 2014 at 11:00 AM
Hi fivefeet8,

Can you please tell me what solved your problem as i am facing the same issue.
Dec 1, 2014 at 11:01 AM
Can you please tell me what solved your problem as i am facing the same issue.
Dec 1, 2014 at 11:03 AM
HI koyama,

I am using PHPEXCEL 1.8 but still i am unable to use multiple conditions in if loop to remove multiple rows.