What is the most efficient way to remove duplicated ROWS from the object?

Topics: Developer Forum
Jun 3, 2013 at 3:15 PM
Edited Jun 3, 2013 at 3:25 PM
I am reading a spreadsheet and need to remove duplicate rows before writing the data to a mysql database. What is the most efficient way of doing this? My thought so far is that I'll have to write a method to check the row data as I loop through it, but I'm looking for something faster (I haven't found any methods built into PHPExcel which will do this, so I will be writing one). The spreadsheets come from 3rd parties and range from about 1000 rows to 30000 rows in some cases. The larger spreadsheets are more likely to have a lot of duplicated data.

The options are as follows:
  1. Check for duplicates as I build each row and add only one copy to the mv array.
  2. Build the mv array from every row (with duplicates) and then remove the duplicates before uploading to the mysql database.
  3. Upload every row to the mysql database in a temp table and then remove duplicate rows using mysql functions before writing to my production tables (not ideal but it may be faster).
  4. Any other options which may be faster.
Any suggestions on which route is fastest - and if there are any methods to do duplicate row checking in PHPExcel (I haven't come across any in the API so far).