Excel 2007 xlsx file being read as html?

Topics: Developer Forum
Jan 11, 2013 at 3:55 PM
Edited Jan 11, 2013 at 3:56 PM

Just starting to integrate PHPExcel 1.7.8 into a project of mine. Seeing a weird issue that any file created in Excel 2007 and saved as .xlsx seems to instantiate a PHPExcel_Reader_HTML instead of the expected PHPExcel_Reader_Excel2007 reader (similar for .ods files created in LibreOffice). Using a standard .xls file generated in either Excel 2007 or LibreOffice works fine though. The spreadsheet is fairly simple, two rows (one header, one data) with about 8 columns of strings.

This is what I'm using to read the file:

 

$path = "temporary://newimport_3.xlsx";
$xls_reader = PHPExcel_IOFactory::createReaderForFile($path); $xls_reader->setReadDataOnly(TRUE); $xls_data = $xls_reader->load($path);

And of course because PHPExcel_Reader_HTML doesn't actually have a setReadDataOnly method, this gives a fatal error. If I take out the setReadDataOnly line, then I get a bunch of other errors:

 

Warning: DOMDocument::loadHTMLFile(): Invalid char in CDATA 0x3 in temporary://newimport_3.xlsx, line: 1 in PHPExcel_Reader_HTML->loadIntoExisting() (line 458 of /mnt/nas/drupal-7.0/sites/all/libraries/PHPExcel/PHPExcel/Reader/HTML.php).
Warning: DOMDocument::loadHTMLFile(): Invalid char in CDATA 0x4 in temporary://newimport_3.xlsx, line: 1 in PHPExcel_Reader_HTML->loadIntoExisting() (line 458 of /mnt/nas/drupal-7.0/sites/all/libraries/PHPExcel/PHPExcel/Reader/HTML.php).
Warning: DOMDocument::loadHTMLFile(): Invalid char in CDATA 0x14 in temporary://newimport_3.xlsx, line: 1 in PHPExcel_Reader_HTML->loadIntoExisting() (line 458 of /mnt/nas/drupal-7.0/sites/all/libraries/PHPExcel/PHPExcel/Reader/HTML.php).
The import file was an unrecognized format and cannot be imported

Any thoughts or pointers? PHP seems to have the proper extensions enabled, running PHP 5.3.3 on Redhat EL6.

Coordinator
Jan 13, 2013 at 10:58 AM

I can't identify any path through the code that could lead to this for xlsx files generated by any version of MS Excel (tried with 2003, 2007 and 2010) or with Open or Libre Office.

The signatures for xlsx and html should be very different, and I can't understand how they could get confused.

Can you upload an example of a file that demonstrates this behaviour to http://phpexcel.codeplex.com/workitem/10749?ProjectName=phpexcel

Jan 14, 2013 at 10:31 PM

Actually just did some debugging and figured out the issue. Essentially its due to the fact that the file I was trying to load was referenced using a stream wrapper, and PHPExcel doesn't play nicely with them (specifically the canRead() method can't read the file, so it fails to use the reader it detected, and instead tries to use whatever it finds through the autoresolveclasses loop further down)

Coordinator
Jan 14, 2013 at 11:28 PM

Ah, no! PHPExcel explicitly requires direct access to open and close the file as it sees fit, so it doesn't work particularly well with stream wrappers. Glad you were able to figure it out.

Aug 27, 2014 at 3:51 PM
I realize I'm very late to this discussion, but I've just encountered this problem, and I was hoping you might be able to help me understand how you managed to work around the issue. I'm downloading an XLS file using CURL in PHP, and saving the file to disk, then trying to parse it using the PHPExcel reader. Here's my basic implementation.
    $fp = fopen('output/myfile.xls', 'wb');
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, 'http://www.example.com/excelfileurl'); 
    curl_setopt($ch, CURLOPT_COOKIE, ' SOME SESSION COOKIE STUFF HERE THAT I'VE OMITTED FROM MY EXAMPLE');
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
    curl_setopt($ch, CURLOPT_FILE, $fp);
    curl_exec($ch);
    curl_close($ch);
    fclose($fp);


    require_once './Classes/PHPExcel/IOFactory.php';
    $objPHPExcel = PHPExcel_IOFactory::load("output/myfile.xls");
So it seems like PHPExcel should have direct access to the file since it's sitting on the disk. But I'm seeing the following errors leading me to believe that the XLS file is being read as HTML instead of XLS. I'm able to open the XLS file with Excel and it opens fine.
Notice: DOMDocument::loadHTMLFile(): Namespace prefix ss is not defined in output/myfile.xls, line: 2 in /<full path>/Classes/PHPExcel/Reader/HTML.php on line 427
Warning: DOMDocument::loadHTMLFile(): Namespace prefix xmlns of attribute ss is not defined in output/myfile.xls, line: 2 in /<full path>/Classes/PHPExcel/Reader/HTML.php on line 427
Thanks for any suggestions you may have.