Special characters break row

Topics: Developer Forum, User Forum
Mar 28, 2011 at 1:41 PM

I have been given an excel file to read into a database that needs serious cross table referencing for validation. The client absolutely cannot be trusted, so I need a bulletproof system of checking for encoding of the supplied csv and encryption into the utf8 to match the database.


My problem:

Import the csv.

Check the csv encoding.

Change the encoding IF NECESSARY.

write to array

loop throught the array, validate the fields and write to screen grid showing validation errors. Rows that fail validation are able to be edited in the screen and submission is able to be tried again.


The problem is that I cannot get past the write toarray unless the file provided is UTF8, as the norwegian special characters seem to break the toArray function a little. Basically, everytime I reach a special character, the rest of the row is ignored.


So once I have loaded the CSV into a PHPObject worksheet, I need to know what encoding it uses and convert it without damaging it. Yes, they should just supply it in the utf8 format, but we all know that clients are not to be trusted, and they also can't control the actions of their staff.


What do you think? Is this easy, or do I have to open the file  with native php functions and convert the string before I give it to PHPExcel?


Much appreciate the reading, hope to hear from someone soon.



Mar 28, 2011 at 1:48 PM

Hi Plumper,


Why would oyu need PHPExcel? It seems to me, you might as well use the native csv reading functions of php. That will make your code faster and easier to maintain. Since csv is a plain text format (and thus has no metadata), there is no bulletproof way to determine encoding.  :(



Mar 28, 2011 at 1:55 PM

I see. Thanks for the quick reply.

I am using the PHPExcel library to future proof it against future requirements to upload any spreadsheet format to the application. I am starting with CSV functionality and then moving into the xls, xlsx formats, then OOcalc.

Anyway, I have a serious time constraint on this iteration now, so I think I will just take your advice for now and look at this again in the future.


Thanks again.