OOCalc and Rich Text

Topics: User Forum
Aug 19, 2011 at 7:02 PM

Does the OOCalc reader support rich text?

It seems to be entirely omitting rich-text runs from read cells, i.e. losing data rather than faulting to plain text.

Aug 19, 2011 at 7:47 PM

Looking at the highly-styled Tests/OOCalcReader.php and its plain output XLSX, it seems that it almost entirely does not.

But my issue is mainly on test row 28. There is some text with a few words in bold -- those bold words are totally missing in the output XLSX.

I can live without rich text for ODS, but it has to at least read all the text data.

Aug 19, 2011 at 10:01 PM

It seems to have something to do with how SimpleXML parses the text. If you have XML like this...

<table:table-cell office:value-type="string" table:style-name="ce1"><text:p>Download from <text:s/><text:span text:style-name="T8">http://www.phpexcel.net</text:span>.</text:p></table:table-cell>
...it just drops everything between and including the text:span tags.

Aug 26, 2011 at 5:53 PM
Edited Aug 26, 2011 at 5:54 PM

This should resolve it, so that at least all of the plain text (but no rich text) is there -- search for "case 'string'" in Reader/OOCalc.php and replace with this...

case 'string' :
	$type = PHPExcel_Cell_DataType::TYPE_STRING;
	$dataValue = $cellDataText->p;
	if (isset($dataValue->a)) {
		$dataValue = $dataValue->a;
		$cellXLinkAttributes = $dataValue->attributes($namespacesContent['xlink']);
		$hyperlink = $cellXLinkAttributes['href'];
	} elseif (isset($dataValue->span)) { // new
		$dataValue = (string)$cellDataText->p->asXml();
		$dataValue = str_replace("<text:s/>", " ", $dataValue);
		$dataValue = strip_tags($dataValue);
Jun 8, 2012 at 2:20 PM

Bump: still an issue -- can we add the above to the latest?

Jun 8, 2012 at 2:55 PM

I made a partial change to this last night... not rich text yet, but just to handle multiple p blocks that could contain multiple spans. Next step will be to read the style information from that

Sep 21, 2012 at 8:30 PM
Edited Sep 21, 2012 at 8:31 PM

I'm just now testing this with latest working dev from GitHub, and I don't think it's all working just yet.

I have a paragraph with a single italic word in it. Instead of returning the whole paragraph in plain text, only this single word is returned. Hmm...

Sep 21, 2012 at 8:57 PM

This makes it work again, near line 560...

foreach ($cellDataText->p as $pData) {
	if (isset($pData->span)) {
		// span sections do not newline, so we just create one large string here
		/* $spanSection = "";
		foreach ($pData->span as $spanData) {
			$spanSection .= $spanData;
		array_push($dataArray, $spanSection); */
		$dataValue = (string)$pData->asXml();
		$dataValue = str_replace("<text:s/>", " ", $dataValue);
		$dataValue = strip_tags($dataValue);
		array_push($dataArray, $dataValue);
	} else {
		array_push($dataArray, $pData);