Jun 3, 2013 at 4:15 PM
Edited Jun 3, 2013 at 4:25 PM
I am reading a spreadsheet and need to remove duplicate rows before writing the data to a mysql database. What is the most efficient way of doing this? My thought so far is that I'll have to write a method to check the row data as I loop through it, but
I'm looking for something faster (I haven't found any methods built into PHPExcel which will do this, so I will be writing one). The spreadsheets come from 3rd parties and range from about 1000 rows to 30000 rows in some cases. The larger spreadsheets are
more likely to have a lot of duplicated data.
The options are as follows:
- Check for duplicates as I build each row and add only one copy to the mv array.
- Build the mv array from every row (with duplicates) and then remove the duplicates before uploading to the mysql database.
- Upload every row to the mysql database in a temp table and then remove duplicate rows using mysql functions before writing to my production tables (not ideal but it may be faster).
- Any other options which may be faster.
Any suggestions on which route is fastest - and if there are any methods to do duplicate row checking in PHPExcel (I haven't come across any in the API so far).