Reading some corrupted file doesn't throw exception

Topics: Developer Forum, User Forum
Dec 7, 2010 at 10:02 AM

Hi!

I have to automatically read Excel files provided by clients who sometimes send corrupted ones. (They can't be read in Excel itself, but it's a mass treatment and I can't try to open all of them to know if they are OK.)
I tried to catch any exception that could have been thrown by the loader with this code :

$objReader = PHPExcel_IOFactory::createReader('Excel5');
try {
      $objPHPExcel = $objReader->load($this->getFilesPath ().DIRECTORY_SEPARATOR.$pFile);
} catch (Exception $e) {
      // throwing an explicit exception that will be caught later
}


With a test file (I took a .doc file, renamed it with .xls extension and suppressed the first lines in a text editor), it does work fine and throws this exception : ''The filename xxx is not recognised as an OLE file".

But with one of the client's corrupted file, this is not working, and I get a "maximum  execution time exceeded" (whatever the value of time_limit I put in my PHP code). It seems to cause an infinite loop in the read function of OLERead.php, or something like that. This is the error I get :

Array
(
[type] => 1
[message] => Maximum execution time of 30 seconds exceeded
[file] => ...\PHPExcel-1.7.4\PHPExcel\Shared\OLERead.php
[line] => 150
)

As you can see in the error details, I'm currently using PHPExcel v1.7.4. My PHP version is 5.2.10, and the problem seems to appear in both Windows and Linux environments.

Any idea of how I could know the file is unreadable before getting the maximum execution time error?

Thank you!

Coordinator
Dec 7, 2010 at 12:05 PM
Edited Dec 7, 2010 at 12:06 PM

LNA wrote:

Any idea of how I could know the file is unreadable before getting the maximum execution time error?

 

We can't trap for every conceivable corruption in a file without slowing performance to the point where PHPExcel is completely unusable; and if you simply load() the file, we don't do any additional checks, assuming that the files is a valid workbook of the appropriate type, only throwing an exception when there are very obvious problems.

However, we do provide the canRead() method in every Reader, returning a boolean true or false, that will identify some basic file errors.

$objReader = PHPExcel_IOFactory::createReader('Excel5');
try {
      if (!$objReader->canRead()) {
          throw new Exception('File is not a valid Excel5 file');
      }
      $objPHPExcel = $objReader->load($this->getFilesPath ().DIRECTORY_SEPARATOR.$pFile);
} catch (Exception $e) {
      // throwing an explicit exception that will be caught later
}

While this isn't guaranteed to trap for every eventuality, it may help identify a few corrupted files before you try to load them

Dec 7, 2010 at 1:34 PM
MarkBaker wrote:

However, we do provide the canRead() method in every Reader, returning a boolean true or false, that will identify some basic file errors.

[...]

While this isn't guaranteed to trap for every eventuality, it may help identify a few corrupted files before you try to load them

 

In my case, the same problem appear because the canRead() method calls the read() method of OLERead class where the loop problem due to my corrupted file is.

It's the one which begin line 146 of the 1.7.4 OLERead.php file : while ($sbdBlock != -2) { ... }

I saw that, with my corrupted file, the value of $sbdBlock goes to '' (empty string) after 2 times in the loop.

I wondered if this case could be tested by an if condition, assuming that this is always a problem to have $sbdBlock='' when we call $this->bigBlockChain[$sbdBlock]; with an empty string.

It seems that adding this test in the loop in my local library is solving my problem of time execution exceeded.

What's your opinion about this?

Coordinator
Dec 8, 2010 at 1:51 PM

The difficulty with trapping for this is the cost on performance, especially as this loop test would need replicating at several points in the code to identify other variations in the corruption. In this particular case, it seems simple, but other workbooks might execute that loop hundreds of times, where the overhead of the "corruption trap" might become more noticeable.

People are constantly criticising PHPExcel for its slow performance: having spent most of the last three months working on speeding up code execution to address that criticism, I'm reluctant to introduce a slowdown testing for something that shouldn't be there in the first place.

Dec 8, 2010 at 1:59 PM

I understand since I've been through the performances problems too. I'll have to fix that on my side then.

Looking forward to trying the 1.7.5 release!

Thanks again for your reply and your time.