Reader returning unexpected results?

Topics: User Forum
Sep 5, 2012 at 5:47 PM
Edited Sep 5, 2012 at 5:49 PM

My code imports user provided spreadsheets that may well be in a particular format, but the data provided in some cells can "appear" to be numeric, text, or date/time.  In particular, firmware version column can have a variety of values.  And I cannot know ahead of time what that value may look like.  That is, I cannot enforce the type that the users will format the cell.  Although I have tried formatting the cells as TEXT only to still get the same problem.

Although the type is generally text, reader can interpret the results differently.

Currently, if the cell value is 00.01.02 then reader will tell me it's a date 2000-01-02.

How can I ensure that I get the value exactly as the user provided?

I have tried setReadDataOnly() to both TRUE and FALSE with no difference.

I have tried getValue() and getFormattedValue() which gives me either the date format or the numerical data value.

Any direction would be appreciated.

Sep 5, 2012 at 6:45 PM
Edited Sep 5, 2012 at 7:00 PM

Couple of notes:

setReadDataOnly(): Tells the reader to skip empty cells (only cells that contain data are read).

getValue(): Reads the raw data as stored by the spreadsheet (example: dates are stored as an integer by the spreadsheet but are displayed as defined by the format code).

getFormattedValue(): Returns the data after applying the spreadsheet format code to it.

Are you trying to normalize the data supplied by the user for processing?

I've written code for my company to specifically do this kind of thing.  My solution is to take a snapshot of the data (first 10 rows of the first worksheet) and display it to the user in spreadsheet fashion (with row and columns headers) and show a form where they select the column that contains the data you are trying to collect.

For example, I would have form field labeled 'Birthdate' with a drop down list beside it which lists all the column letters displayed in the sample data. The customer selects the column letter of the column that contains the birth dates.

This way the customer tells you which column contains each specific piece of data you are trying to collect.

From there, you can write your code to handle specific exceptions of formatting differences.

For example, sometimes the birth dates in the spreadsheet only contain month and year (text field usually) and others may only contain the day values (text or numeric field usually) for the current month. I wrote my code to look at the sample data and if there is a single integer in that column I then provide a Month selector (January - December) that the customer uses to specify with month those days belong to (for our application the Birth Year isn't necessary so I fill that in with current year if it is missing).

Side note: I also used an internal PHP Excel function to determine if the column is storing the date as an integer internally and apply a format code to it to normalize those dates to mm/dd/yyyy automatically.

Dates are the most work for normalizing, but I've been able to code things so the application is pretty robust for handling differently formatted types of data.

If you have more specific questions on how I handle things just ask.

- Christopher Mullins

Sep 5, 2012 at 7:39 PM

Thanks for the reply.

I'm not really "normalizing".  But the value is compared against a database for validity.  There are way too many cells that are being compared, and way too many possible values to load up a sheet with validation / drop down lists.

It seems that PHPExcel is seeing the value '00.02.01' (the sample version # in question from the source sheet) as a date.  GetValue() and GetFormattedValue() return '2000-02-01'.  Unless I change setReadDataOnly() = TRUE in which case I get the numeric data value for '2000-02-01'.

If I set the source spreadsheet column as TEXT and save the file (before importing it into my app), it doesn't help.

Since different vendors make their software version numbers in a wide variety of formats, I cannot really enforce the values entered in this column.

So I guess really what I am after is how to get PHPExcel to give me the value '00.02.00' instead of PHPExcel converting it into a date first?

 

- John Mullan

Sep 5, 2012 at 8:03 PM
Edited Sep 5, 2012 at 10:19 PM

I think PHP Excel is returning the value according to its Excel datatype (as stored in the Excel file itself).

If that is the case then this piece of code might help you find a solution:

if (PHPExcel_Shared_Date::isDateTime($cell)) {
  $data = PHPExcel_Style_NumberFormat::toFormattedString($cell->getValue(), "M/D/YYYY");
}

You can check to find out if the cell is marked as a date field (not sure how to unset this).

It's at least a starting point.

- Christopher Mullins

Addendum: I amended my statement above to what I was meaning which Mark stated much more clearly than I.

Coordinator
Sep 5, 2012 at 10:01 PM
Edited Sep 5, 2012 at 10:08 PM

PHPExcel will not automatically convert any cell content into a date unless you tell it to do so. If '00.02.01' is a string cell content in the workbook you are loading, then it will be stored in PHPExcel as a string with a value of '00.02.01'. It doesn't try to change anything, but stores the value according to its Excel datatype (as stored in the Excel file itself).

If PHPExcel is reading the value as a date, then something in the Excel file is explicitly telling it that the value is a date. I'm not aware of any exceptions to this with files that are genuinely spreadsheet files rather than CSV or equivalent (and even then, a value would only be converted to a date if you were applying the advanced value binder.

Note that if this was the case, then getValue() would be returning an Excel date/timestamp (numeric value) and getFormattedValue() would be returning a human-readable date. If it is returning the same formatted value for both getValue() and getFormattedValue() calls, then it is almost certain a string containing that value in the Excel workbook.

 

Can you please provide a workbook that demonstrates this issue

Sep 6, 2012 at 12:15 PM
Edited Sep 6, 2012 at 12:15 PM
schir1964 wrote:

I think PHP Excel is returning the value according to its Excel datatype (as stored in the Excel file itself).

If that is the case then this piece of code might help you find a solution:

if (PHPExcel_Shared_Date::isDateTime($cell)) {
  $data = PHPExcel_Style_NumberFormat::toFormattedString($cell->getValue(), "M/D/YYYY");
}

You can check to find out if the cell is marked as a date field (not sure how to unset this).

It's at least a starting point.

- Christopher Mullins

Addendum: I amended my statement above to what I was meaning which Mark stated much more clearly than I.

Actually I have tried ensuring the data type was set to TEXT.  It was defined as GENERAL before I did that.  Still same result.

Sep 6, 2012 at 12:17 PM
Edited Sep 6, 2012 at 12:26 PM
MarkBaker wrote:

PHPExcel will not automatically convert any cell content into a date unless you tell it to do so. If '00.02.01' is a string cell content in the workbook you are loading, then it will be stored in PHPExcel as a string with a value of '00.02.01'. It doesn't try to change anything, but stores the value according to its Excel datatype (as stored in the Excel file itself).

If PHPExcel is reading the value as a date, then something in the Excel file is explicitly telling it that the value is a date. I'm not aware of any exceptions to this with files that are genuinely spreadsheet files rather than CSV or equivalent (and even then, a value would only be converted to a date if you were applying the advanced value binder.

Note that if this was the case, then getValue() would be returning an Excel date/timestamp (numeric value) and getFormattedValue() would be returning a human-readable date. If it is returning the same formatted value for both getValue() and getFormattedValue() calls, then it is almost certain a string containing that value in the Excel workbook.

 

Can you please provide a workbook that demonstrates this issue

 

I will attempt to upload a sample of the workbook.  Can someone tell me how to?

Sep 6, 2012 at 12:36 PM
jmullan99 wrote:
schir1964 wrote:
if (PHPExcel_Shared_Date::isDateTime($cell)) { $data = PHPExcel_Style_NumberFormat::toFormattedString($cell->getValue(), "M/D/YYYY"); }

This indeed gives me $data = '2/1/2000'

The source spreadsheet cell(s) do begin with a single quote as well.

Still don't know how to attach a file here.

Coordinator
Sep 6, 2012 at 1:15 PM

There is a special "issue" here specifically for file uploads from discussion threads - scroll to the bottom of the page and you will find an upload option

Sep 6, 2012 at 1:40 PM

Okay, it is uploaded.  Sorry for the generic file name.  I called it sample.xls

Cheers.

Sep 6, 2012 at 3:45 PM

I have more information that may help.  Probably to show how "green" I am working with PHPExcel.

First, I read in the source workbook/spreadsheet and basically loop through the cells in a particular order and copy them into a new workbook, send the new workbook out to the user.

So what I did this morning is to also add the source spreadsheet to the output workbook via addExternalSheet().  When I open the resultant workbook the source sheet shows the original values ('0.02.01') but my processed sheet shows '2000-02-01'.

Obviously I am missing something here.  But I don't know what.  Is my newly created sheet "converting" things for me?  AdvancedValueBinder?

Cheers,

John

Coordinator
Sep 6, 2012 at 5:44 PM

I'll run some tests tonight against you sample.xls and try to identify exactly what is happening.

AdvancedValueBinder shouldn't be affecting this, especially if you haven't enabled it

Sep 7, 2012 at 12:50 AM

Well, actually, I did have it enabled.  I "disabled" it and now things are fine.

I originally activated it for a reason quite a while ago and can't remember why.  I suppose I'll find out at some point.

I am now rewriting my routine to copy the input worksheet into the workbook I will send out to the user, then analyze the cells in-place.  This will ultimately run faster and save on memory.

However, it would still be good to know how to beat the issue.

Cheers,

John