setReadDataOnly and getMergeCells

Topics: Developer Forum, User Forum
Aug 10, 2012 at 11:42 AM

Hi all,

it seems to me that when i set the readDataOnly for my reader, i'm no longer able to get the merged cells array.

if i get merged cells with read data only to FALSE, i get the full array

if i get merged cells with read data only to TRUE, i get an empty array

Now, i'm trying to compare 2 excel files for content (i'm not caring of styles), but i'd like to "bypass" merged cells considering the same content for all the cells merged.

Is there a way to have ReadDataOnly AND the getMErgeCells method to work?

thanks a lot

bye

Coordinator
Aug 12, 2012 at 8:37 PM

You're quite correct - setting readDataOnly means read only the data in the cells.... it does not read read any additional information such as styling, merged cells, conditional formatting, etc.

I have actually opened an issue on github for discussion of this very topic, but so far nobody has expressed any opinion on the matter, specifically referencing merged cells as an example.

Aug 16, 2012 at 1:00 PM

Thank you mark.

in my opinion, merged cells are something related to "structure" and not to "style", that's why i was expecting to read them even in "read data only" mode.

Btw, my workaround at the moment is: walking the excel file with readdataonly FALSE and build an array of arrays (merged cells for each worksheet)

then, walking the same excel with readdataonly TRUE and finally read data using the merged cells array found before..

not the best efficiency, but at least it works :)

thank you again

Coordinator
Aug 16, 2012 at 4:54 PM

My thinking is largely in agreement: merging is structure rather than style, so should be read regardless... but I wish a few people would express an opinion on the matter before I'm forced to take a dictatorial approach and then get all the complaints from people because the behaviour has changed.

Aug 16, 2012 at 8:34 PM
MarkBaker wrote:

My thinking is largely in agreement: merging is structure rather than style, so should be read regardless... but I wish a few people would express an opinion on the matter before I'm forced to take a dictatorial approach and then get all the complaints from people because the behaviour has changed.


Yes Mark, merging is structure (later in this post another request :P)

But the date format, if it's up to me, is NOT structure.

We all know that a date in excel is saved as a number. and if i'm reading "data only" it's right to get that number.

And 23% if i fetch data only is 0,23, nothing else.

or maybe it's the case of setting up another level.

Now we have level "read everything" and level "read only data". It should be nice to have a "intermediate" level reading also cell formatting (and i agree, nothing regarding style)

but, i underline it, merged cells are structure and should be in the level "only data"! it's not the same level of cell (content) formatting!

 

Regarding merged cells, what about inserting a method for cells reading to get the "merged value"?  i mean, if my cell merged is A1:A5, fetching cell A5 should return A1 value.. Now i've my function to do this, but why don't integrate this functionality in phpexcel? it should be useful for everyone..

bye and thanks

Coordinator
Aug 16, 2012 at 9:32 PM

The issue with date is that I get so many complaints from people who use readDataOnly(TRUE) and then can't identify dates... at least several a week (nearly as many as "I'm using getValue() on a cell that contains '16-Aug-2012' but it returns 41137, is this a bug?". Perhaps multiple levels is the answer, especially now that I've added includeCharts() and am looking at includeAutoFilterCriteria(). But any such change will need a bit of time to warn people of pending changes.

 

Merged cells are more of a problem if getValue() returns the merged cell value for every cell in the merge range. If A1:A5 is merged and contains 2, then =SUM(A1:A5) will return 2 at present... if each cell returned 2, then =SUM(A1:A5) would return 10. I know I'm planning a complete rewrite of the calc engine, and could code it to handle this difference, but it would entail additional testing for every cell referenced in a formula to see if it was in a merge range, which is additional overhead.

Aug 17, 2012 at 11:44 AM
MarkBaker wrote:

The issue with date is that I get so many complaints from people who use readDataOnly(TRUE) and then can't identify dates... at least several a week (nearly as many as "I'm using getValue() on a cell that contains '16-Aug-2012' but it returns 41137, is this a bug?". Perhaps multiple levels is the answer, especially now that I've added includeCharts() and am looking at includeAutoFilterCriteria(). But any such change will need a bit of time to warn people of pending changes.

 

Merged cells are more of a problem if getValue() returns the merged cell value for every cell in the merge range. If A1:A5 is merged and contains 2, then =SUM(A1:A5) will return 2 at present... if each cell returned 2, then =SUM(A1:A5) would return 10. I know I'm planning a complete rewrite of the calc engine, and could code it to handle this difference, but it would entail additional testing for every cell referenced in a formula to see if it was in a merge range, which is additional overhead.

I understand what you say, and a RTFM is always a good thing :), but i assume that 95 times on 100 you are reading a file that YOU KNOW, column by column, what is containing. So you know that column A is reporting a date, and you know how to manage it.

If you really do not know how the file is (so what are you reading?) you can avoid the "read data only", isn't it?

 

Regarding merged cells, i was thinking more to a "GetMergedValue" than to overwrite the GetValue. You are, as always, right on the SUM case, i didn't think to it..