Read cells as String

Topics: Developer Forum, User Forum
Sep 17, 2009 at 9:10 AM

Hi There.

I have to do a simple question: I would like to read each single cell as a string without considering the specific cell properties.

 

I mean...if i have a 'Percentage typed' cell (1,85%) I would like to read the cell as "1,85%". At the moment when I get a cell content of type percentage I get a value encoded in this way "0,0185".

The same thing happens when I try to get date typed cell values.

I read the documentation but I didn't find any help. I read that is possible to encode a cell from string to a specific value when you write a file...but not the opposite thing.

I would appreciate any helps

 

Thanks in advance

 

R.

Developer
Sep 17, 2009 at 9:43 AM

Try something like this:

$objWorksheet = $objPHPExcel->getActiveSheet();
$cell = $objWorksheet->getCell('A1');

$cellData = PHPExcel_Style_NumberFormat::toFormattedString(
    $cell->getCalculatedValue(),
    $objPHPExcel->getCellXfByIndex( $cell->getXfIndex() )->getNumberFormat()->getFormatCode()
);

var_dump($cellData);

Agreed, we should try to make this simpler. I think there will be a simpler way in the future.

Sep 17, 2009 at 10:03 AM

I'm sorry but I forgot some importants information:

- I'm using PHPExcel_Reader_Excel5();

- I'm using this library inside a symfony plugin.

So the problem I have is that the function PHPExcel::getCellXfByIndex() is not defined for my object...

 

Thank you again

 

R.

Developer
Sep 17, 2009 at 10:22 AM

How about this?

$objWorksheet = $objPHPExcel->getActiveSheet();
$cell = $objWorksheet->getCell('A1');

$cellData = PHPExcel_Style_NumberFormat::toFormattedString(
$cell->getCalculatedValue(),
$objPHPExcel->getStyle('A1')->getNumberFormat()->getFormatCode()
);

var_dump($cellData);
Coordinator
Sep 17, 2009 at 11:16 AM

Erik

I know it's a slight memory overhead for each cell, but it might be worth adding a getFormattedValue() method to the PHPExcel_Cell class. I notice that Excel itself actually holds this in the according to the Excel object map

Developer
Sep 17, 2009 at 12:05 PM

@Mark

Ok, I better see if I can get my head into that Excel 2007 developer reference. What I am wondering about with $_formattedValue is how this will deal with color information in the number format, e.g. [Red]. If $_formattedValue should contain all information about styling then color information would have to be contained in it, I guess?

In connection with another work item (April 2009) we briefly discussed the possibility introducing getFormattedValue() vs. modifying the advanced value binder.

http://phpexcel.codeplex.com/WorkItem/View.aspx?WorkItemId=9830

We have had recent discussion "How to get datatype for calculated value?" somewhat related to this.

http://phpexcel.codeplex.com/WorkItem/View.aspx?WorkItemId=10607

Since then I have been thinking that it is perhaps best to let the value binder control in addition to setValue(), also getValue(), and even getDataType(). That way it would be possible to let the default value binder stay fully compatible with the way the library currently works. AdvancedValueBinder.php could then be modified so that getValue() returns what our current toFormattedString() returns. So when people ask how to get the formatted value we would just answer: "You probably want to use AdvancedValueBinder".

Once AdvancedValueBinder.php becomes the default value binder in PHPExcel we would have a library working more intuitively. I would really be happy if we could find some kind of solution to these issues.

Coordinator
Sep 17, 2009 at 12:23 PM
Edited Sep 17, 2009 at 12:26 PM

@Erik

I'm not sure we actually need to store the $_formattedValue, simply provide a simple method that returns the value as a formatted value

I'd certainly consider it something that could be tied in with the changes we're considering for getValue(), getFormula(), isFormula()/isCalculatedValue(), getFormattedValue().

The Excel formatted value doesn't seem to hold colour information, simply to hold numbers with the correct number of decimals, comma separators, fractions and date masking. It might be possible to pass an extra argument to the getFormattedValue() call identifying whether it should return an HTML span with colour information as well.

Developer
Sep 17, 2009 at 1:22 PM

Ok, here is something. There are many connected issues. What do you say about this solution?

Cell properties

 

PHPExcel 1.7.0

proposed

$_value

holds explicit value, or formula if the cell has a formula

holds explicit value if no formula, or calculated value if cell has formula

$_dataType

dataType of the raw value, or ‘f’ is cell has a formula

dataType of raw value, or, if cell has a formula, the dataType of the calculated value

$_calculatedValue

 

remove this property, store calculated value in $_value instead

$_formula

 

new proposed property, holds e.g. ‘A1+A2’

 

Cell method and value binders

 

AdvancedValueBinder

DefaultValueBinder

setValue()

 

 

getValue()

let it return the formatted, calculated value (like toFormattedString)

 

setDataType()

as it works now, but ensure that value is updated to match datatype

getDataType()

let it return the dataType of the calculated value. Never ‘f’

 

setValueExplicit()

deprecate setting formulas using this method. Instead recommend using setFormula()

getValueExplicit()

[new method]

returns the unformatted calculated value as contained in $_value property

setFormula()

[new method]

e.g. setFormula(‘A1+5’)

 

getFormula()

[new method]

returns e.g. ‘A1+5’

as contained in $_formula property

 

hasFormula()

[new method]

 

getFormattedValue()

[new method, needed?]

 

 

Coordinator
Sep 18, 2009 at 10:37 PM
Edited Sep 18, 2009 at 10:40 PM

That's basically what I was thinking, definitely as regards the cell properties... personally I'd use the getValue() method to return the raw, unformatted, calculated value (you're suggesting getValueExplicit() for that), and have the getFormattedValue to return the value formatted like toFormattedString, dropping the need for getValueExplicit().

The calculate() method should update the $_value and $_dataType properties with the result of the calculation, and setting a formula using setFormula() should automatically trigger the calculate() method as well.

At least, that's my take.

 

It might also be worthwhile having a method that will return both $_value and $_dataType in a single call, though that's not critical.

Developer
Sep 18, 2009 at 11:33 PM
MarkBaker wrote:

... personally I'd use the getValue() method to return the raw, unformatted, calculated value (you're suggesting getValueExplicit() for that), and have the getFormattedValue to return the value formatted like toFormattedString, dropping the need for getValueExplicit().


Yes, I understand that the proposal may seem weird. I thought so too. But I think the beauty is that it allows you to think of setValue() / getValue() as somewhat symmetric methods per value binder.

Example with the advance value binder:

$cell->setValue('2009-09-19');
echo $cell->getValue(); // '2009-09-19' (depends on the format mask)


Likewise with setValueExplicit() / getValueExplicit(). These would be symmetric methods too.

I don't think users will complain about getValue() returning the formatted, calculated value. That is how SourceForge PHP-ExcelReader works. In that library, if you want the unformatted, calculated value, you instead need to fetch the entry in the array called "raw" which would correspond to getValueExplicit().

Methods setValueExplicit() / getValueExplicit() / setFormula() / getFormula() have no direct counterpart in Excel. These may be considered as advanced methods provided by PHPExcel. They are not affected by any value binder. Advanced developers would always prefer those methods whenever high control or speed is required (being faster than setValue() / getValue())

P.S.: For the sake of the discussion I am trying to defend the proposal, but don't get me wrong. I am unsure about all the consequences. :)

Apr 7, 2010 at 9:25 PM

Hi,

Has anyone figured out a simple solution on this topic?

Thanks in advance,

Jordan

Developer
Apr 20, 2010 at 4:08 AM

@dancho413: Not yet, but we will return to this later.

Nov 18, 2010 at 5:26 AM

@koyama:

I have thought of another reason/argument for furthering this.

I am dealing with .ods spreadsheets with very old dates. OpenOffice, unlike Excel, does correctly handle pre1900 dates. So while the spreadsheet, in OOo Calc, correctly handles say 04/28/1675, and treats it as a date formatted column, reading it from PHP seems to drag it through the bad date conversion mud which I am assuming is assuming Excel's broken behaviour.

Now if I were able to read that string as it is formatted, I could strtotime() it in PHP and have an actual useable timestamp to do anything I needed with or reformat as necessary ie: 1675-04-28 for DB insertion. I read this thread: http://phpexcel.codeplex.com/Thread/View.aspx?ThreadId=215120 on the subject, and the date object has the same limitation, I suspect because of the Excel assumptions again because my PHP is quite happy dealing with dates in the 1600's even without using the date object.

Do you see a better solution than exposing more simply the formatted date?

Cheers,

CHgsd

Nov 18, 2010 at 5:33 AM

@koyama:

Actually scratch that. I just tried the longer example you gave of returning a formatted string and it breaks the date just as well. Should I start another thread about this because I am right now 100% dead in the water. So much for launching this tomorrow.

Cheers,

CHgsd

Coordinator
Nov 18, 2010 at 8:16 AM
chgsd wrote:

I am dealing with .ods spreadsheets with very old dates. OpenOffice, unlike Excel, does correctly handle pre1900 dates. So while the spreadsheet, in OOo Calc, correctly handles say 04/28/1675, and treats it as a date formatted column, reading it from PHP seems to drag it through the bad date conversion mud which I am assuming is assuming Excel's broken behaviour.

Now if I were able to read that string as it is formatted, I could strtotime() it in PHP and have an actual useable timestamp to do anything I needed with or reformat as necessary ie: 1675-04-28 for DB insertion. I read this thread: http://phpexcel.codeplex.com/Thread/View.aspx?ThreadId=215120 on the subject, and the date object has the same limitation, I suspect because of the Excel assumptions again because my PHP is quite happy dealing with dates in the 1600's even without using the date object.

Until recently, PHPExcel worked purely with Excel files (the OOCalc reader was only introduced in version 1.7.1, and Gnumeric hasn't yet been released) so all the date handling was based on Excel's date range. This caused problems enough at the time, because it covered a wider date range than PHP's integer date range on 32-bit servers (not to mention 1900 as a leap year), hence allowing date conversion functions to work with PHP DateTime objects, which do cover the full Excel range of dates. Internally, all dates are held as Excel date/time stamps. PHP DateTime objects are used for all conversions.

It would be possible to modify PHPExcel to work with the larger date range internally, extending the internal use of Excel date/time stamps to allow negative values for dates prior to the 1900 base (complicated a bit with the alternative 1904 calendar). 32-bit PHP servers would still be limited to the Unix minimum date (or 64-bit windows which seems to mask date functions at 32-bits), unless using DateTime objects. The biggest drawback would be extra validation in the Excel Writers to ensure that dates prior to 1900 threw an exception. I'll put it on the to-do list, but it isn't a priority: most users aren't even concerned with dates prior to 2000, let alone 1900.

Nov 18, 2010 at 1:56 PM

@MarkBaker,

First of all thank you for your detailed reply. I ended up discovering a bit after my post the source of the problem by reading in OOCalc.php how date fields were interpreted. I was unfortunately a little too hot with the time I invested in using this library and having to scrap many hours worth of work, to come back here and update my previous post. Fortunately a couple hours sleep has helped me see this from a little more perspective and I have momentarily patched OOCalc.php

It is funny in a sad way actually because I specifically chose this library because it supported both Excel and OpenDocument as although I am a strong supporter of OOo, I try to be fair to other people's choices. Little did I know that would come around and mess me over like it did. So I believe my ultimate solution is dropping Excel support for the time being due to my patch and bring in a different library which simply handles ODS and CSV when I can dedicate the time to it. Someday down the road if PHPExcel allows wider date ranges I will be able to re visit. I can live with the patch I made to it for the time being but do not wish to handle upgrades as the patch is a hack.

Now concerning the overall philosphy behind what you detailed, I will share the way I see things (my opinion) which will perhaps better explain my surprise when digging into OOCalc.php and discovering how dates were handled internally.

Because you now have the capability to handle a wider date range than Excel, I would move towards doing that, especially since it would entail handing the date management to the core language the library is written in. Even if Excel is the last credible spreadsheet program to be arrogant enough to ignore pre 1900/1904 dates, someday it will get fixed. Cursory google searches report that this has been a very long standing frustration for a lot of users. Naturally your decision to carry on the same limitation is not a fault, because at the rate bugs are fixed in their land, we will all grow old and grey bearded before it is fixed. That being said I find it bothersome that now that you are expanding to support other spreadsheet formats and it appears that you are trying to be the competent best reader of all spreadhseets, which is a very noble goal, that even though the other applications do not suffere from the problems of Excel, the library forces it on them regardless.

I am sure you understand the importance of our history and don't need to hear me talk about the importance of pre-1900 dates, but for anyone reading this down the road, I will cite a few examples:

  • If you are doing a spreadsheet with important dates for your country: When did your given country establish independence, foundation, liberty?
  • If you are doing a spreadsheet with important discoveries: When was the theory of radioactivity created?
  • If you are doing a spreadsheet with birth/death dates of important historical figures: When were the life events of the discoverer of the process of pasteurization?

What it comes down to I guess if I were to summarize what I am trying to say is that right now the handling of xls(x) files could probably be deemed lossless, because you are bound internally by the same limitations as Excel. Then handling of ODS files however is lossy, and I feel that should either carry a big warning or perhaps be fixed :) This reminded me of Lossy Zip but I can't find any results on that from google anymore. I guess some tech humor can fade away on the internet.

Thank you again and sorry for the frustration, I was just heartbroken and disappointed that I didn't do more testing going into this before spending a lot of time integrating. It is 100% my fault. 

Off to find some windmills,

CHgsd