Formula evaluation fails with Japanese sheet refs

Topics: Developer Forum
Nov 12, 2010 at 8:19 AM

Hello. I'm using PHPExcel in a tool for manipulating xls files, and it is a godsend. Thank you very much for all your hard work.

There's just one issue I'm having at the moment: My xls files often have Japanese sheet names, and all formulas containing references to cells on these sheets, no matter how simple the formula may be, will cause getCalculatedValue() to throw an exception: "Formula Error: An unexpected error occured".

I imagine this could be an encoding issue, or lack of consideration for multibyte sheet names? Is there anything I can do to get around this? (Renaming the sheets is not an option).

I've made a simple test file that demonstrates this problem. I can supply it to you if it would be helpful.

Thank you for any advice you can offer.

Coordinator
Nov 12, 2010 at 9:19 AM

I've only really started introducing support for non-ASCII values in formulae over the last couple of releases, and I'm in fairly uncharted territory with Japanese, but the regular expressions used in the calculation engine probably won't work with multi-byte characters in the worksheet name. It should be fairly straightforward for me to fix this (though I've a nagging feeling that it'll require a couple of other changes in the calculation engine code besides the one regular expression). I'd certainly appreciate a test file to work with.

Nov 12, 2010 at 9:46 AM
Edited Nov 12, 2010 at 9:52 AM

Thanks for your quick reply.

The file is available here: http://dl.dropbox.com/u/2459482/phpexcel-test.xls

As indicated, reading cell A2 on the sheet "Test" will cause getCalculatedValue() to throw an exception.

Note that setActiveSheetIndexByName() has no problem setting Japanese-named sheets, and reading values from such sheets is not problematic either. As far as I can tell it's really only calculating formulas with Japanese (non-ASCII?) cell references that is an issue.

I haven't looked at the regex used by the calculation engine, but if you're using one of the preg_ functions, it's usually sufficient to set the /u flag at the end of the search string. The mb_ereg functions are kind of crappy, so in general I think they're not recommended.

Also FYI, be aware that Microsoft has a hardon for Shift-JIS encoding in general for anything Japanese. So if you run into encoding issues (I haven't so far; PHPExcel provides and accepts sheet names in UTF-8, thank god) you might try converting to/from that. However, when using mb_convert_encoding, you should always use "SJIS-win" (not just "SJIS") as the encoding name, and you often have to suggest it as the source encoding (otherwise e.g. mb_detect_encoding will just return NULL or FALSE or whatever when trying to detect the encoding of a SJIS string).

Coordinator
Nov 12, 2010 at 9:28 PM
Edited Nov 12, 2010 at 9:37 PM

On 12/11/2010 09:46, amake wrote:

I haven't looked at the regex used by the calculation engine, but if you're using one of the preg_ functions, it's usually sufficient to set the /u flag at the end of the search string. The mb_ereg functions are kind of crappy, so in general I think they're not recommended.

The regexp used for parsing cell references in the calculation engine is

(((\w*)|(\'[^\']*\')|(\"[^\"]*\"))!)?\$?([a-z]{1,3})\$?(\d+) 

so the /u switch should work correctly for multibyte sheet names, as you suggest. It's using a preg_match()... I wouldn't use mb_ereg() as I've had problems with it in the past, and there's a likelihood that it will be deprecated (like the ereg functions) before long.

If you're feeling generous and helpful, can you also try using a multibyte named range, both defining it within code, and then using it within a formula? I've a suspicion that this might be a problem in formulae as well; and if I'm fixing multibyte worksheet names in the calc engine, I might as well check/fix multibyte named ranges at the same time. The regexp used for Named Ranges is

(((\w*)|(\'.*\')|(\".*\"))!)?([_A-Z][_A-Z0-9]*)

so this is likely to be slightly more awkward to fix than simply setting the /u flag; but should still be fairly straightforward.

Coordinator
Nov 12, 2010 at 9:40 PM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.