XLS file size

Topics: User Forum
Nov 29, 2008 at 8:06 AM
First thanks for the awesome library!!!

A couple of other threads have touch on this issue but I have not been able to find a definitive answer to this question.  When I create XLS files they are coming out 2-10 times larger than they should.  To reproduce this:
- run the 14excel5.php test
- open the output file in Excel (2003 or 2007)
- slightly modify the spreadsheet
- save the changes
- note that the new file is about half the size of the original.

Can anyone offer any suggestions as to why this may be happening and how I may be able to solve this problem?

Thanks and regards

Nov 29, 2008 at 2:23 PM
There may be no definitive answer what you can do about this. Library may need improvements, but the "Save" step in MS Office Excel could be a solution for some users.

1. Images

The main file size problem with output from PHPExcel_Writer_Excel5 (based on PEAR Spreadsheet_Excel_Writer) is when you have images. Images are always written as uncompressed bitmaps just like in the old BIFF5 format, that is Excel 5.0 - Excel 95. These occupy tons of space.

In the BIFF8 format (Excel 97 - Excel 2003), there is more flexibility, you can store images as e.g. JPEG, PNG which are compressed formats.

The solution to this would be to alter PHPExcel_Writer_Excel5 and store images the way the are stored in BIFF8. This would allow images to be stored as e.g. PNG resulting in much lower file size.

Moreover, it may be pure luck that we even see the images when we open 14excel5.xls in MS Office Excel. It is a BIFF8 file with BIFF5 image records. I wouldn't call it a valid file. There is also the problem that this type of images are not supported by OpenOffice.org 3.

There is a work item that addresses these problems:

What complicates solving this is the poor documentation on the BIFF8 file format.

2. Strings
This may also have to do with file size being large in output produced by PHPExcel_Writer_Excel5. In Excel, strings are stored as UTF-16, but with a compression option for strings where all the high order bytes are 0. That would for example be strings with only English characters. This would allow for 50 percent reduction in space occupied by the strings.

I believe PHPExcel_Writer_Excel5 never uses this compression option.

3. "Save" in MS Office Excel after generation
For the above reasons, if your Excel files are intended for long term use, I recommend, after generating your spreadsheet using PHPExcel_Writer_Excel5, that you open it in MS Office Excel (preferably, version 2003) and do a plain save of the file. That is if you have the chance, of course.

There are also other problems that will be solved, e.g. problems with collapsed row height when opening in OpenOffice.org.

This will also safe-guard you against lack of forward compatibility with future applications for handling xls files. If you look at the past, files in circulation that were generated by PEAR Spreadsheet_Excel_Writer would open fine in MS Office Excel 2003, but would no longer open correctly in MS Office Excel 2003 SP3 and MS Office Excel 2007.