RichText cyrillic problem

Topics: Developer Forum
Dec 8, 2012 at 7:26 PM

Hello.

When I try to save .xls file with Rich cyrillic Text in cells, I get the broken file. After Excel 2003 "repairing" all data is lost. I use utf-8 encoded text and haven't problem with cyrillic in other cases, also no errors or exceptions while file creating. I find that the problem in PHPExcel_Shared_String::UTF8toBIFF8UnicodeShort. If use PHPExcel_Shared_String::UTF8toBIFF8UnicodeLong instead of it, file working, but text formating is lost.

Here is an example of broken Excel file: https://rapidshare.com/files/175635924/loads_test.xls

Coordinator
Dec 9, 2012 at 11:00 AM

This is a known problem: Rich Text doesn't handle UTF-8 correctly in the Excel5 Writer - it is currently under investigation, but I haven't identified a solution yet

Dec 9, 2012 at 4:11 PM
Edited Dec 10, 2012 at 8:15 AM

Thanks for reply.

I think I found the solution:

1. In method PHPExcel_Shared_String::UTF8toBIFF8UnicodeShort we need to set also 0-bit, not only 3-bit (0x09) and to add UTF-8 to UTF-16 conversion:

$data = pack('vC', $ln, 0x09); // Line #447
...
$data .= self::ConvertEncoding($value, 'UTF-16LE', 'UTF-8'); // Line #450

2. While calculation of first formatted character in method PHPExcel_Writer_Excel5_Worksheet::close we need to consider the encoding:

$str_len = PHPExcel_Shared_String::CountCharacters($cVal->getPlainText(), 'UTF-8'); // Line #427
...
$str_pos += PHPExcel_Shared_String::CountCharacters($element->getText(), 'UTF-8'); // Line #440

Need to test, but it works on my file.

Also I found few little mistakes:

1. Is Named Ranges supported in source of List type in Data validation by Excel5_Writer? When I try to set it, I get the empty source in file.

2. When I want to set width of column, I get width less than I want on 0.71:

$objSheet->getColumnDimension('A')->setWidth( 10 ); // it will be 9.29

3. Method PHPExcel_Calculation::getInstance()->translateFormulaToEnglish returns wrong name when the local name of function is a part of other local name. For example, conversion of "ЕСЛИ" to "IF" has "ЕСЛAND" as result.

4. Also it's logical to inherit the cell font formatting by Rich-Text object.

And thanks a lot for your great project!

Coordinator
Dec 11, 2012 at 8:52 PM

Nice bit of detective work with UTF-8 in rich text... I've tested and it works without breaking anything else.

 

Named Range isn't supported in List Types for Excel 5... there's a Work Item for it, though I can't recall which one off the top of my head.

Excel column widths are adjusted by Excel itself to the nearest value that it can use based on a whole range of factors (including their own proprietary units, text font, etc).... it's not an algorithm that is published (as far as I'm aware).

My Christmas break will be spent completely rewriting the Calculation engine, including the locale formula handling, so I'll keep this in mind

Will also consider some reworking of cell font handling with rich text objects.