phpexcel output to java excel reader issue

Feb 10, 2011 at 3:11 PM

Good Morning Mark;

I read in an xls file, modify a field, and save it. This file then gets handed to a java program that tries to read the file. The java program is choking on it. (yes, I can open the modified xls file in excel and using a excel reader)

First off, here is the php code snippet used to read/modify/save the xls file (pretty straightforward). Following the php snippet is 4 or 5 lines details the java complaint. It feels like some sort of  xml file structure issue.

Any help/guidance would be appreciated.

thanks,
Walter

===================================================================

$inputFileType = PHPExcel_IOFactory::identify($newXlsFilename);

echo "XLS file is of type ".$inputFileType."\n";

/** Create a new Reader of the type defined in $inputFileType **/

$objReader = PHPExcel_IOFactory::createReader($inputFileType);

/** Advise the Reader of which WorkSheets we want to load **/

//$objReader->setLoadSheetsOnly($sheetName);

/** Load $inputFileName to a PHPExcel Object **/

$objPHPExcel = $objReader->load($newXlsFilename);

//time to perform the xls modifications

foreach($numtimes->sheetMod as $sheetMod)

{

$operation = strtolower($sheetMod->sheetOp["value"]);

if ($operation == "chngtitle")

{

//cellPos has type SimpleXmlElement, so we

//need to type cast before using it since

//PHPExcel will take only a string

$cellPos = $sheetMod->cellPos["value"];

settype($cellPos,"string");

$curCellVal = $objPHPExcel->getActiveSheet()->getCell($cellPos)->getValue();

settype($curCellVal,"string");

$cellTxt = getUniqueCellVal($curCellVal);

$objPHPExcel->getActiveSheet()->setCellValue($cellPos, $cellTxt);

echo "change Title Field to:".$cellTxt."\n";

}

else if ($operation == "insertcol")

{

$cellPos = $sheetMod->cellPos["value"];

settype($cellPos,"string");

$objPHPExcel->getActiveSheet()->insertNewColumnBefore($cellPos,1);

echo "column insertion:".$cellPos."\n";

}

}//end foreach numtimes

//save the modified xls file

$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, $inputFileType);

$objWriter->save($newXlsFilename);

//disengage the worksheet so that it can be used

$objPHPExcel->disconnectWorksheets();

unset($objPHPExcel);

==========================================================================

The java program registers the following complaint:

  [2/9/11 15:08:45:135 GMT+05:30] 000024f3 SystemErr     R [Fatal Error] :11:11: XML document structures must start and end within the same entity.

[2/9/11 15:09:19:613 GMT+05:30] 00000145 SystemErr     R java.io.IOException: block[ 0 ] already removed

[2/9/11 15:09:19:613 GMT+05:30] 00000145 SystemErr     R        at org.apache.poi.poifs.storage.BlockListImpl.remove(BlockListImpl.java:97)

[2/9/11 15:09:19:613 GMT+05:30] 00000145 SystemErr     R        at org.apache.poi.poifs.storage.BlockAllocationTableReader.fetchBlocks(BlockAllocationTableReader.java:190)

Feb 10, 2011 at 5:27 PM

A little more detail on this....I changed my code slightly to read in the xls, and simply write it out. No modification, no nothing.

The input .xls file is 12KB, the output .xls is 14KB.....Why would the file size be different ?? Is something be prepended/appended to the file ?

thanks,
Walter

Coordinator
Feb 10, 2011 at 11:04 PM

First off... you refer to this as an .xls file. Unless the file extension has been changed, then .xls files are the proprietary Microsoft BIFF format (using the Excel5 Reader and Writer), which is not XML-based. I might expect this error from an Excel2007 .xlsx file (which is a collection of XML documents), though I'd hope that our XML is all valid; but not from a BIFF8 .xls file. Your script actually echoes out the file type identified by the IOFactory, what value is it echoing for $inputFileType.

Secondly... can the java program read the original file?

wakelt wrote:

A little more detail on this....I changed my code slightly to read in the xls, and simply write it out. No modification, no nothing.

The input .xls file is 12KB, the output .xls is 14KB.....Why would the file size be different ?? Is something be prepended/appended to the file ?

 PHPExcel does not "edit files". It reads in the original file, converting it to its own internal representation of a "generic spreadsheet", then writes that "generic spreadsheet" in the appropriate format. There are certainly going to be some differences then between the input and output files.... for example, an input .xls file is likely to be codepage ASCII or codepage CP437, but will always be written as UTF-8. Features like VBA or macros are discarded by the readers (because they cannot be represented in the "generic spreadsheet", so they are not able to be written, while styles may not be as cleanly optimized when they are written. In the specific case of Excel2007 files, the zip methods may not be the same when written as they were in the original file. All this means that it is quite possible for a read file and a written file to be different sizes, even if you have made no changes and are saving them in the same format.

Feb 11, 2011 at 12:02 AM

The $inputFileType echo'd out is excel5.

Yes, the java can read the original file w/o issue.

Any tests/experiments I could try to get to the bottom

of this ??

thanks,

walter

p.s. Thanks for the filesize difference explanation !!


----- Original Message -----
From: "MarkBaker" <notifications@codeplex.com>
To: [email removed]
Sent: Thursday, February 10, 2011 6:04:23 PM
Subject: Re: phpexcel output to java excel reader issue [PHPExcel:245451]

From: MarkBaker

First off... you refer to this as an .xls file. Unless the file extension has been changed, then .xls files are the proprietary Microsoft BIFF format (using the Excel5 Reader and Writer), which is not XML-based. I might expect this error from an Excel2007 .xlsx file (which is a collection of XML documents), though I'd hope that our XML is all valid; but not from a BIFF8 .xls file. Your script actually echoes out the file type identified by the IOFactory, what value is it echoing for $inputFileType.

Secondly... can the java program read the original file?

wakelt wrote:

A little more detail on this....I changed my code slightly to read in the xls, and simply write it out. No modification, no nothing.

The input .xls file is 12KB, the output .xls is 14KB.....Why would the file size be different ?? Is something be prepended/appended to the file ?

PHPExcel does not "edit files". It reads in the original file, converting it to its own internal representation of a "generic spreadsheet", then writes that "generic spreadsheet" in the appropriate format. There are certainly going to be some differences then between the input and output files.... for example, an input .xls file is likely to be codepage ASCII or codepage CP437, but will always be written as UTF-8. Features like VBA or macros are discarded by the readers (because they cannot be represented in the "generic spreadsheet", so they are not able to be written, while styles may not be as cleanly optimized when they are written. In the specific case of Excel2007 files, the zip methods may not be the same when written as they were in the original file. All this means that it is quite possible for a read file and a written file to be different sizes, even if you have made no changes and are saving them in the same format.

Coordinator
Feb 11, 2011 at 12:36 AM
Edited Feb 11, 2011 at 12:36 AM

If the file is BIFF8, then the error you're getting is very confusing... BIFF8 is a pure binary format, that bears about as much resemblence to XML as a ziggurat does to an aardvark.

The only similar case I've come across (when reading a file using apache poi) was with an xml error message actually embedded within the xls binary... in that particular case, the error had been triggered when the file was being checked out from an SVN repository, and the xml was a dump of the SVN error. If you open the file in a text editor, can you locate any "human-readable" xml within the file? If so, that might give some clues.

Feb 11, 2011 at 4:28 PM

I have attached the orig file for you to look at. I can see readable text, can't easily

tell if its xml or not.

thanks,

walter


----- Original Message -----
From: "MarkBaker" <notifications@codeplex.com>
To: [email removed]
Sent: Thursday, February 10, 2011 7:36:09 PM
Subject: Re: phpexcel output to java excel reader issue [PHPExcel:245451]

From: MarkBaker

If the file is BIFF8, then the error you're getting is very confusing... BIFF8 is a pure binary format, that bears about as much resemblence to XML as a ziggurat does to an aardvark.

The only similar case I've come across was with an xml error message actually embedded within the xls binary... in that particular case, the error had been triggered when the file was being checked out from an SVN repository, and the xml was a dump of the SVN error. If you open the file in a text editor, can you locate any "human-readable" xml within the file? If so, that might give some clues.

Coordinator
Feb 11, 2011 at 10:49 PM

@walter

Can you send the file to:

define('XX','.');
define('YY','@');
echo strrev(str_rot13('xh').XX.str_rot13('bp').XX.str_rot13('abzrq').XX.str_rot13('rtany').YY.str_rot13('xenz'));
Feb 12, 2011 at 12:58 PM
hi Mark;

do you want me to put the input file thru this routine, or the output file? should it be done line by line?
i am curious as to why?

thanks,
walter

Sent from my iPad

On Feb 11, 2011, at 5:49 PM, "MarkBaker" <notifications@codeplex.com> wrote:

From: MarkBaker

@walter

Can you send the file to:

define('XX','.');
define('YY','@');
echo strrev(str_rot13('xh').XX.str_rot13('bp').XX.str_rot13('abzrq').XX.str_rot13('rtany').YY.str_rot13('xenz'));
Coordinator
Feb 12, 2011 at 1:04 PM

That little routine will (if run as a PHP script) simply echo my e-mail address (I'm not going to write it in plain text... I don't want automated email address harvesters to pick it up so I get even more spam in my mailbox).

Feb 17, 2011 at 4:09 PM
TWFyaywgSSBzZW50IHRoZSAyIGZpbGVzIHRvIHlvdXIgcHJpdmF0ZSBlbWFpbCBhZGRyZXNzLiAg
VGh4IHdhbHRlcg0KLS0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tLQ0KRnJvbTogTWFya0Jha2Vy
DQpUbzogd2FrZWx0QGNvbWNhc3QubmV0DQpSZXBseVRvOiBQSFBFeGNlbEBkaXNjdXNzaW9ucy5j
b2RlcGxleC5jb20NClN1YmplY3Q6IFJlOiBwaHBleGNlbCBvdXRwdXQgdG8gamF2YSBleGNlbCBy
ZWFkZXIgaXNzdWUgW1BIUEV4Y2VsOjI0NTQ1MV0NClNlbnQ6IEZlYiAxMCwgMjAxMSA3OjM2IFBN
DQoNCkZyb206IE1hcmtCYWtlcg0KSWYgdGhlIGZpbGUgaXMgQklGRjgsIHRoZW4gdGhlIGVycm9y
IHlvdSdyZSBnZXR0aW5nIGlzIHZlcnkgY29uZnVzaW5nLi4uIEJJRkY4IGlzIGEgcHVyZSBiaW5h
cnkgZm9ybWF0LCB0aGF0IGJlYXJzIGFib3V0IGFzIG11Y2ggcmVzZW1ibGVuY2UgdG8gWE1MIGFz
IGEgemlnZ3VyYXQgZG9lcyB0byBhbiBhYXJkdmFyay4NClRoZSBvbmx5IHNpbWlsYXIgY2FzZSBJ
J3ZlIGNvbWUgYWNyb3NzIHdhcyB3aXRoIGFuIHhtbCBlcnJvciBtZXNzYWdlIGFjdHVhbGx5IGVt
YmVkZGVkIHdpdGhpbiB0aGUgeGxzIGJpbmFyeS4uLiBpbiB0aGF0IHBhcnRpY3VsYXIgY2FzZSwg
dGhlIGVycm9yIGhhZCBiZWVuIHRyaWdnZXJlZCB3aGVuIHRoZSBmaWxlIHdhcyBiZWluZyBjaGVj
a2VkIG91dCBmcm9tIGFuIFNWTiByZXBvc2l0b3J5LCBhbmQgdGhlIHhtbCB3YXMgYSBkdW1wIG9m
IHRoZSBTVk4gZXJyb3IuIElmIHlvdSBvcGVuIHRoZSBmaWxlIGluIGEgdGV4dCBlZGl0b3IsIGNh
biB5b3UgbG9jYXRlIGFueSAiaHVtYW4tcmVhZGFibGUiIHhtbCB3aXRoaW4gdGhlIGZpbGU/IElm
IHNvLCB0aGF0IG1pZ2h0IGdpdmUgc29tZSBjbHVlcy4NClJlYWQgdGhlIGZ1bGwgZGlzY3Vzc2lv
biBvbmxpbmUuDQpUbyBhZGQgYSBwb3N0IHRvIHRoaXMgZGlzY3Vzc2lvbiwgcmVwbHkgdG8gdGhp
cyBlbWFpbCAoUEhQRXhjZWxAZGlzY3Vzc2lvbnMuY29kZXBsZXguY29tKQ0KVG8gc3RhcnQgYSBu
ZXcgZGlzY3Vzc2lvbiBmb3IgdGhpcyBwcm9qZWN0LCBlbWFpbCBQSFBFeGNlbEBkaXNjdXNzaW9u
cy5jb2RlcGxleC5jb20NCllvdSBhcmUgcmVjZWl2aW5nIHRoaXMgZW1haWwgYmVjYXVzZSB5b3Ug
c3Vic2NyaWJlZCB0byB0aGlzIGRpc2N1c3Npb24gb24gQ29kZVBsZXguIFlvdSBjYW4gdW5zdWJz
Y3JpYmUgb24gQ29kZVBsZXguY29tLg0KUGxlYXNlIG5vdGU6IEltYWdlcyBhbmQgYXR0YWNobWVu
dHMgd2lsbCBiZSByZW1vdmVkIGZyb20gZW1haWxzLiBBbnkgcG9zdHMgdG8gdGhpcyBkaXNjdXNz
aW9uIHdpbGwgYWxzbyBiZSBhdmFpbGFibGUgb25saW5lIGF0IENvZGVQbGV4LmNvbQ0KDQpTZW50
IHZpYSBCbGFja0JlcnJ5IGJ5IEFUJlQ=
Coordinator
Feb 18, 2011 at 12:01 AM

I can't see any problem with your file: there's certainly no XML messages embedded in it; so I don't understand why POI should be complaining about invalid XML. Is it possible to debug this from the java side?

Feb 18, 2011 at 4:18 PM

Hi Mark;

unfortunately, I have NO exposure to the actual java code.

I have done a little homework on the POI library.. POI supports BIFF8 and OOXML.

Perhaps I need to use a different PhpExcel writer so that my output file would be either

of these. Does PhpExcel have a writer that puts out either of these formats ?? Perhaps

the java implementation only supports OOXML ??

thanks,

walter


----- Original Message -----
From: "MarkBaker" <notifications@codeplex.com>
To: [email removed]
Sent: Thursday, February 17, 2011 7:01:28 PM
Subject: Re: phpexcel output to java excel reader issue [PHPExcel:245451]

From: MarkBaker

I can't see any problem with your file: there's certainly no XML messages embedded in it; so I don't understand why POI should be complaining about invalid XML. Is it possible to debug this from the java side?

Coordinator
Feb 18, 2011 at 4:31 PM

The Excel5 Writer generates a BIFF8 file by default (it could be configured to write BIFF5, but this has been disabled for some time). The Excel2007 Writer generates OOXML.

Feb 21, 2011 at 8:23 PM

Hi Mark;

I've decided to use the POI example code to read in the original xls final, and then

to read in the xls output file of PhpExcel. I have this up and running in eclipse.

The original file reads in fine...All rows and field values are printed out correctly.

No warnings, no Errors.

I then use the PHPexcel BIFF output file as a source file to the same java Excel reader.

I do indeed get errors that cause the reader to throw exceptions. The warning message

looks as follows:

Warning, incorrectly terminated empty data blocks in POIFS block listing (should end at -2, ended at 0)

java.lang.NullPointerException

at org.apache.poi.poifs.filesystem.DocumentInputStream.readUShort(DocumentInputStream.java:301)

at org.apache.poi.hssf.record.RecordInputStream$SimpleHeaderInput.readRecordSID(RecordInputStream.java:98)

at org.apache.poi.hssf.record.RecordInputStream.readNextSid(RecordInputStream.java:177)

at org.apache.poi.hssf.record.RecordInputStream.<init>(RecordInputStream.java:115)

at org.apache.poi.hssf.record.RecordInputStream.<init>(RecordInputStream.java:103)

at org.apache.poi.hssf.record.RecordFactoryInputStream.<init>(RecordFactoryInputStream.java:165)

at org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:439)

at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:263)

at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:188)

at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:305)

at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:286)

at excel.ExcelReadWrite.readFile(ExcelReadWrite.java:33)

at excel.ExcelReadWrite.main(ExcelReadWrite.java:140)

At this point, we can debug in the Java world since I have the code up and running.

However, I am not versed in the processing of Excel sheets, perhaps the warning

message is meaningful to you ??

-Walter


----- Original Message -----
From: "MarkBaker" <notifications@codeplex.com>
To: [email removed]
Sent: Friday, February 18, 2011 11:31:58 AM
Subject: Re: phpexcel output to java excel reader issue [PHPExcel:245451]

From: MarkBaker

The Excel5 Writer generates a BIFF8 file by default (it could be configured to write BIFF5, but this has been disabled for some time). The Excel2007 Writer generates OOXML.

Coordinator
Feb 21, 2011 at 10:11 PM

I'm going to have to accept that this has defeated me. I don't know what those messages mean.

Without spending time wading through the java poi code to find out how that works, I don't believe I can do much about this... I just don't have that time.

Feb 21, 2011 at 10:36 PM
no problem...i wish there was more i could do to help u get to the bottom of this. if there
is legwork you'd like me to perform, please let me know.

i really appreciate your efforts on this. now that i have the java poi library up and running, i can use it to make the spreadsheet mods. i've modified the example code to do what i need and things seem to look good.

thanks
walter

Sent from my iPad

On Feb 21, 2011, at 5:11 PM, "MarkBaker" <notifications@codeplex.com> wrote:

From: MarkBaker

I'm going to have to accept that this has defeated me. I don't know what those messages mean.

Without spending time wading through the java poi code to find out how that works, I don't believe I can do much about this... I just don't have that time.

Mar 23, 2011 at 12:47 PM

I found the same issue with a small sample program - The strange thing is that if I open the output file in MS Excel and save it that the file size doubles form 7 to 15 KB.

Also the latest java library has the issue - but now the error message is different.
Warning, incorrectly terminated empty data blocks in POIFS block listing (should end at -2, ended at 0) - may be this helps

Here the code if anybody wants to look at it:

<?php

require_once('PHPExcel.php');
require_once('PHPExcel/IOFactory.php');

// see also
// http://phpexcel.codeplex.com/discussions/209438
// http://phpexcel.codeplex.com/discussions/245451
// http://phpexcel.codeplex.com/discussions/235968

$objPHPExcel  = new PHPExcel();

$ExcelSheet = $objPHPExcel->getActiveSheet();

$ExcelSheet->setTitle("2002");

$ExcelSheet->setCellValue("A1", "Code");
$ExcelSheet->setCellValue("B1", "FooBar");

$objPHPExcel->setActiveSheetIndex(0);   

$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel5');  

$objWriter->save("Output.xls");

?>

Coordinator
Mar 23, 2011 at 1:39 PM
On 23/03/2011 11:48, pircher wrote:
I found the same issue with a small sample program - The strange thing is that if I open the output file in MS Excel and save it that the file size doubles form 7 to 15 KB.
The filesize change shouldn't be a worry: PHPExcel uses always certain optimisation techniques when writing the file, which aren't always used by Excel itself. These don't cause any problem in and of themselves. There are also certain blocks of data written by Excel itself that aren't supported by PHPExcel. Again, these should not cause any issues.

Also the latest java library has the issue - but now the error message is different.
Warning, incorrectly terminated empty data blocks in POIFS block listing (should end at -2, ended at 0) - may be this helps


I believe that this may be related to the following issue recorded on the MS support site:
http://support.microsoft.com/kb/2411912
which has been raised as Issue 15508 (http://phpexcel.codeplex.com/workitem/15508) in the PHPExcel Issue Tracker.

Basically, the workbook data is written as a series of numbered "blocks" of data, with an index that lists each block. Most Excel library software (and most versions of Excel itself) simply reads the list of blocks from that index, then reads each listed block in turn; but Apache POI (and MS Excel2007+ on Windows 7) do an additional validation check on the actual block numbers.

It looks as though Apache POI has updated its error handling for this... at least the error is more meaningful now.

I've traced this back to the underlying PHP_OLE library that actually writes the OLE directory/filestream structure for the Excel workbook substreams, but haven't yet managed to isolate exactly where in the code the fault lies, or how to fix it. However, this part of the PHPExcel xls Writer does need some refactoring(1) so that it can support multiple filestreams (e.g. the workbook properties streams, chart streams, etc)... the original PHP_OLE library only supported a single filestream, which is used for the actual workbook data.
I may find a solution to the block numbering soon: but if not, it will be fixed when I do that necessary rewrite of PHP_OLE.

---
Mark Baker

(1) For "some refactoring", read "a total rewrite from scratch".