[SOLVED] problems reading file

Topics: User Forum
Aug 17, 2010 at 7:27 PM
Edited Aug 17, 2010 at 7:31 PM
I am using the following code to read an XLS file.
    $objReader = PHPExcel_IOFactory::createReaderForFile($file);
    $objReader->setReadDataOnly(true);
    $objPHPExcel  = $objReader->load($file);
    $objWorksheet = $objPHPExcel->getActiveSheet();
    foreach ($objWorksheet->getRowIterator() as $row) {
      $cellIterator = $row->getCellIterator();
      $cellIterator->setIterateOnlyExistingCells(false);
      $values = array();
      foreach ($cellIterator as $cell) 
      {
        $cellval = trim($cell->getValue());
        iconv("UTF-8","CP1252","$cellval");
        if (is_numeric($cellval)) {
          array_push($values, $cellval);
        } else {
          array_push($values, '"'.$cellval.'"');
        } // if (is_numeric($cellval))
      } // foreach ($cellIterator as $cell)
      // build the insert statement from the values in the cells.
      $sql .= "INSERT INTO TABLE VALUES (".implode(',', $values).")/n
"; // insert into the DB here, I am just echoing it out to screen //$this->db->query(@sql); } //foreach ($objWorksheet->getRowIterator() as $row)
A whole lot of data is coming out with the  character in the first column. sample output:
INSERT INTO TABLE VALUES ("04-1604-1429 Â ","","NE",1,"")/n
INSERT INTO TABLE VALUES ("0724000-2","","NE",1,"")/n
INSERT INTO TABLE VALUES (724125,"","NE",1,"")/n
How do I strip out those chars?
Aug 17, 2010 at 7:29 PM
What happened to the formatting buttons to enter code? I used [code] tags but it still came out messed up.
Aug 20, 2010 at 12:38 PM

Doesn't anyone know the answer to this?

Coordinator
Aug 20, 2010 at 1:06 PM

Have you tried without the

iconv("UTF-8","CP1252","$cellval");

?

Aug 20, 2010 at 1:17 PM

Yes, and it is the same result.

Aug 20, 2010 at 1:18 PM

What is the  char, anyway?

Coordinator
Aug 20, 2010 at 1:31 PM
mrbaseball34 wrote:

What is the  char, anyway?

 Generally an indication that something is going fubar in the character set conversion. PHPExcel tries to use UTF-8 internally, so it is only normally necessary to use iconv or mbstring to do character conversion if you need the data in a character set other than UTF-8

Aug 20, 2010 at 1:41 PM

When I open the file in Excel, I don't see this char at all.

So it must be PHPExcel having the problem.

Want me to upload a sample?

Coordinator
Aug 20, 2010 at 1:48 PM
mrbaseball34 wrote:

When I open the file in Excel, I don't see this char at all.

So it must be PHPExcel having the problem.

Want me to upload a sample?

 Feel free to upload a sample, and I'll take a look at it. We haven't had any issues with charsets in PHPExcel for a long time, so I wouldn't be so sure that it is an issue. When you're echoing the INSERT statements out to your loader page, is the page definitely set as UTF-8?

Aug 20, 2010 at 1:57 PM

if I str_replace the char out of the value, I cannot trim the spaces off the end of some values.

go figure...

      foreach ($cellIterator as $cell) 
      {
        $cellval = trim(str_replace('Â', '', $cell->getValue()));
        if (is_numeric($cellval)) {
          array_push($values, intval($cellval));
        } else {
          array_push($values, '"'.trim($cellval).'"');
        } // if (is_numeric($cellval))
      } // foreach ($cellIterator as $cell)

Aug 20, 2010 at 2:00 PM

Sample file is uploaded in Issue:

http://phpexcel.codeplex.com/workitem/14050

Aug 20, 2010 at 2:07 PM

Seems the data may be encoded using ISO 8859-1, or at least that is where that char comes from.

Coordinator
Aug 20, 2010 at 2:56 PM

The spurious character in some of these cells is a non-breaking space character ( character 160, hex A0 in ISO-8859-1, haven't done a lookup for the equivalent character in UTF-8) which is not removed by trim().

You can get rid of it by converting from UTF-8 to ISO-8859-1, then explicitly identifying it in the list of characters that should be removed by trim(). The following is the standard list of characters that trim() will remove, plus the additional \xA0

$cellval = trim(iconv("UTF-8","ISO-8859-1",$cell->getValue())," \t\n\r\0\x0B\xA0");
Aug 20, 2010 at 3:05 PM

That worked great, thanks.