Cache files not getting removed

Topics: User Forum
Apr 20, 2011 at 6:23 PM

Listing of /tmp dir when script running:
listinve@listinventory.com [/tmp]# ls  -s -t P*.*
378730 PHPExcel.185814daf21facff063.26741214.cache
     0 PHPExcel.187544daf22ad31b524.80122883.cache
     0 PHPExcel.187544daf22d39c83b5.55911410.cache
     0 PHPExcel.187544daf22ab769029.92446900.cache
     0 PHPExcel.185814daf222a8d59f9.86390666.cache
     0 PHPExcel.185814daf221e33bc22.62313752.cache
     0 PHPExcel.185814daf21f9add224.16743158.cache
    
    
listing of /tmp dir after script finished execution:    
listinve@listinventory.com [/tmp]# ls  -s -t P*.*
0 PHPExcel.187544daf22ad31b524.80122883.cache 
0 PHPExcel.187544daf22ab769029.92446900.cache
0 PHPExcel.187544daf22d39c83b5.55911410.cache


listing of /tmp dir 5 min after script finishing
with no other scripts running:
listinve@listinventory.com [/tmp]# ls  -s -t P*.*
0 PHPExcel.187544daf22ad31b524.80122883.cache 
0 PHPExcel.187544daf22ab769029.92446900.cache
0 PHPExcel.187544daf22d39c83b5.55911410.cache

Is there a way to tell what created each particular cache file and, if so, why it is not being removed after script execution?
Also, when an exception occurs, like out of memory, why are the cache files not removed?
This can cause havoc with mySQL and other things running on a system that depend on the /tmp directory space.

Coordinator
Apr 20, 2011 at 6:48 PM
Edited Apr 20, 2011 at 7:00 PM
Mrbaseball34 wrote:

Is there a way to tell what created each particular cache file?

Not without modifying the cache coding to actually report on this. Each worksheet instance of each workbook generates a unique ID whenever a script executes. This value is used as part of the filename, which is maintained internally (privately) in the cache controller.

Mrbaseball34 wrote:

why it is not being removed after script execution?

I have absolutely no idea whatsoever. The destructor code:

public function __destruct() {
   if (!is_null($this->_fileHandle)) {
      fclose($this->_fileHandle);
      unlink($this->_fileName);
   }
   $this->_fileHandle = null;
} // function __destruct()

should always execute when the worksheets are unset, either by user code, or by PHP itself when the script terminates normally. 

Mrbaseball34 wrote:

Also, when an exception occurs, like out of memory, why are the cache files not removed?

 If a script terminates with an out of memory exception, then the code simply terminates execution, and destructors will not fire. There is no way of trapping for this within PHP itself: out of memory is a fatal error condition that terminates the script without allowing any shutdown code to run. In this case, PHP doesn't appear to execute destructor code, and has no way of deleting the cache files.

 

Apr 20, 2011 at 7:10 PM
Edited Apr 20, 2011 at 7:11 PM

Well, I have proof that it doesn't always happen. The script executes and terminates

this is called at the end of the script:

 

$objPHPExcel->disconnectWorksheets();
unset($objPHPExcel);

Or should I unset the worksheet objects, too?

If I'm calling getActiveSheet(), why is there more than one cache file if the cache files are per worksheet?

Is there a way to load only the first sheet and ignore the others?

I still don't understand the chunk methodology that I read about in some other posts, could you explain it?

Coordinator
Apr 20, 2011 at 8:24 PM
Edited Apr 20, 2011 at 8:33 PM
Mrbaseball34 wrote:

Well, I have proof that it doesn't always happen. The script executes and terminates

OK! you have proof that it can happen, but that doesn't help identify the cause of the problem, or provide a solution. I've explained how PHP is supposed to work, firing the destructor for each worksheet object as it is flushed from memory, which should delete the cache file; and I've confirmed that an out of memory error will leave residual files, and explained why that is. I can't think of any other circumstances that will leave residual files on normal termination of PHP code.

Mrbaseball34 wrote:
$objPHPExcel->disconnectWorksheets();
unset($objPHPExcel);

Or should I unset the worksheet objects, too?

Because the PHPExcel object contains references to each child worksheet, and each worksheet object contains a reference to the parent PHPExcel object, PHPExcel has cyclic references. Because of these cyclic references, using a straight unset() on either will not actually unset these objects... it's something that PHP has real problems with. The latest versions of PHP 5.3 do provide a mechanism to handle unsetting in this circumstance; but that is not available in PHP 5.2, so we provide the disconnectWorksheets() method to break the cyclic reference so that unsetting the workbook will unset it correctly, and unset all its worksheets as well (which should trigger the worksheet destructor to delete the cache files. On normal script termination, this should be unnecessary, as PHP should be able to do this itself, despite the cyclic references.

Mrbaseball34 wrote:

If I'm calling getActiveSheet(), why is there more than one cache file if the cache files are per worksheet?

 getActiveSheet() simply identifies the sheet that you are currently working with. Unless you tell PHPExcel only to load selected worksheets, it will read all worksheets in the workbook, and each of those worksheets will have its own cache file. Using getActiveSheet() is simply identifying which of those worksheets you want to manipulate. It selects the currently active worksheet, and it's collection of cells.

Mrbaseball34 wrote:

Is there a way to load only the first sheet and ignore the others?

You can load a selected subset of worksheets by name, as described in section 4.2 of the User Documentation entitled "Reading Only Named WorkSheets from a File". If you don't know the names of the worksheets, you can retrieve a list of these without reading the entire workbook as described in section 4.3 of the User Documentation entitled "Reading the List of Worksheets from the Spreadsheet". To read the first worksheet, even without knowing its name:

$inputFileType 'Excel5';
$inputFileName './sampleData/example1.xls';

/**  Create a new Reader of the type defined in $inputFileType  **/
$objReader PHPExcel_IOFactory::createReader($inputFileType);
/**  Read the list of Worksheet Names from the Workbook file  **/
$worksheetNames $objReader->listWorksheetNames($inputFileName);

/**  Advise the Reader of which WorkSheets we want to load  **/
$objReader->setLoadSheetsOnly($worksheetNames[0]);
/**  Load $inputFileName to a PHPExcel Object  **/
$objPHPExcel $objReader->load($inputFileName);

Mrbaseball34 wrote:

I still don't understand the chunk methodology that I read about in some other posts, could you explain it?

Chunking works by using a Read Filter to tell the Reader which cells to actually read. The loader will then read only those cells which match the readCell() rule defined in the Read Filter, ignoring all other cells. So, if you have a worksheet with values in the range A1:IV65535, but use a read filter that tells PHPExcel to load only cells A1:B2, then only the four cells in that range will be loaded when you execute the load()method. Instead of needing over 16GB of memory (assuming no caching) to load every cell, that worksheet will only need about 4kB to load just 4 cells.

Section 4.4 of the User Documentation entitled "Reading Only Specific Columns and Rows from a File (Read Filters)" describes this process in more detail.