Unable to deal with big files

Topics: Developer Forum, User Forum
Nov 27, 2011 at 11:23 AM

Hi all,

I've read a ton of documentation and topics, but I'm still unable to deal with my huge files (we are talking multiple workbook with twenty-something sheets, 15.000 rows x 50 columns each).

I already tried both PHPExcel_Settings::setCacheStorageMethod(PHPExcel_CachedObjectStorageFactory::cache_to_discISAM) and ini_set('memory_limit', '3096M'); but it still isn't enough), and reading just a sheet per time is not really faseable, since I need to merge multiple of them.

 

Let's start easy: I need to delete a specific (by name, not by index) sheet: how can I do that without loading the whole workbook first?

Then: I need to merge some of these sheets and write the merge back, without loosing the other sheets: how to?

 

I've already tried to read one line per time, but without a stream writer...

 

Thanks!

Nov 29, 2011 at 9:43 PM
Edited Nov 29, 2011 at 9:44 PM

Hi Zane,

you wrote you need to delete a specific worksheet...
...why not only load the worksheets you need with objReader->setLoadSheetsOnly(...) ? Did you try that?

And...do you need to read all 50 columns? - If not, load and the read only the columns you need using a filter and getCellByColumnAndRow(2, 23).
To get all the columns of each worksheet each in an array, take a look at this thread: http://phpexcel.codeplex.com/discussions/259971

Unfortunately these are all the hints I may give you, also still experimenting with large files :-)

Yours Henri Schomäcker

Dec 3, 2011 at 11:43 AM

I am also having problems with a large Excel 2007 file.  The file has over 200,000 rows, and I get an error while loading it:

"Error loading file: Invalid cell coordinate A100000"

It didn't even get halfway to the bottom.  What can I do?  BTW, the file is narrow, with only 6 columns.

Thanks,

Mike

Dec 6, 2011 at 3:13 PM

BTW, when I split the sheet into multiple tabs, it was able to load the spreadsheet.  I just shouldn't have to do that.

Can anyone point me to where it physically loads the file into memory?  I can't find it in the "load()" function itself.

Thanks. 

Coordinator
Dec 6, 2011 at 4:49 PM
psyclo wrote:

Can anyone point me to where it physically loads the file into memory?  I can't find it in the "load()" function itself.

An Excel 2007 file is a zipped archive that contains a whole series of individual XML files. The load() method extracts each file from the ziparchive in turn, calling the _getFromZipArchive() for each constituent file that it needs. All the processing of the files is handled from within _load().... it is one large method (and not particularly good OO coding). SImpleXML is used for parsing these files: someday, I'll rewrite to use the XMLReader which will be more memory efficient for the load.

Dec 6, 2011 at 7:20 PM

Mark,

Thanks for responding so quickly.  I appreciate that a bunch.

Is there anything I can do right now to get PHPExcel to load a single sheet with more than 200,000 rows?

I looked for the _load() function but couldn't find it in the PHPExcel directory structure.  I did find the public load(), but nothing else.

Thanks again,

Mike

Coordinator
Dec 6, 2011 at 9:32 PM
psyclo wrote:

Is there anything I can do right now to get PHPExcel to load a single sheet with more than 200,000 rows?

I looked for the _load() function but couldn't find it in the PHPExcel directory structure.  I did find the public load(), but nothing else.

The load() method for an Excel 2007 workbook is in the Classes/PHPExcel/Reader/Excel2007.php file.

psyclo wrote:

Is there anything I can do right now to get PHPExcel to load a single sheet with more than 200,000 rows?

I looked for the _load() function but couldn't find it in the PHPExcel directory structure. I did find the public load(), but nothing else.

 One option is to use cell caching, which reduces the memory usage but at a cost in speed. There's a section in the documentation that describes the cache methods available - this thread gives some indication of the relative benefits/costs of the different caching methods. And this thread describes some of the other techniques available to reduce memory usage.

Dec 7, 2011 at 2:07 PM

Mark,

Thanks again for your help.

The problem is not about caching, and does not appear to be a memory problem, since I'm able to load the entire spreadsheet when I split up the rows into multiple tabs.  The maximum of 100,000 rows per tab appears to be an artificial constraint, and I can't find anyplace in the code where it sets that limit.  OK, so I found the public load() function in the Excel2007.php file, and I see on line 530 where it loads the sheet from the file using _getFromZipArchive().  I'm not sure I understand what ZipArchive is, or where getFromName() is defined to be used on line 274.

Thanks.

Mike

Dec 7, 2011 at 2:16 PM

It is not a limit of PHPExcel, I've already created sheets bigger than that. However, writing excel files in excess of 100K rows, takes a lot of cpu and memory (i've seen peaks of 24GB allocated). Are you sure there's no error, that's just not being reported? On the other hand, I haven't tried to read files that big.

 

Tip: when looking at the code, use a decent IDE; that way you can just click a function to find its definition.

Dec 7, 2011 at 7:09 PM

As I've said a few times, this is about loading/reading an existing Excel spreadsheet, not writing anything.  The error I'm getting is not about running out of memory.  I've seen that one and this isn't it.  I'm getting an error during the load about "Invalid cell coordinate A100000"which indicates to me that the load is stopping at row 99,999 for some reason.

Dec 7, 2011 at 7:25 PM

I found this entry on this site, but I don't know if it has any relevance to your issue directly.

http://phpexcel.codeplex.com/workitem/15029

- Christopher Mullins

Dec 7, 2011 at 8:50 PM

schir, That was close, but the referenced line of code has already been fixed in my version of PHPExcel, so it isn't limited to 5 digits.  I can't find any references to "[1-5]" in the code.  <sigh>

Thanks anyway.

Mike