What's the most efficient way to read large XLSX/XLS/CSV files?

Topics: Developer Forum
May 4, 2012 at 8:19 PM
Edited May 7, 2012 at 2:34 PM

I have read a lot about the performance issues on large spreadsheets so I know there will be overhead, but my question is:

What is the most efficient way to READ large files? CPU load is out of control.

The file can potentially be any type of supported spreadsheet. I don't need formatting, I'm only interested in getting the cell's calculated value. I want to make sure I'm using as many best practices as possible and disabling as many unneeded functions as possible.

Thanks in advance for your help.  Mark you have done an amazing job, keep up the great work!

 

**EDIT 5.7.12: I'm running into extremely high CPU load (90%+) when trying to import a large XLSX doc.  The memory is very low at around 10-15%.  I didn't have this issue with previous versions of PHPExcel. Any ideas?

--------------------

Here is my current code:
- Detects File
- Loads file in chunks
- Imports into DB (omitted)

 

 

$inputFileType = PHPExcel_IOFactory::identify($file_path);
/**  Create a new Reader of the type that has been identified  **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);


$total_sheet_rows = 500000;

/**  Define how many rows we want to read for each "chunk"  **/ 
$chunkSize = 1000; 
/**  Create a new Instance of our Read Filter  **/ 
$chunkFilter = new chunkReadFilter(); 

/**  Tell the Reader that we want to use the Read Filter  **/ 

$objReader->setReadFilter($chunkFilter); 

$row_count = 0;

/**  Loop to read our worksheet in "chunk size" blocks  **/ 
for ($startRow = 1; $startRow <= $total_sheet_rows; $startRow += $chunkSize) { 
	//sleep(60);
    /**  Tell the Read Filter which rows we want this iteration  **/ 
    $chunkFilter->setRows($startRow,$chunkSize); 
	
    /**  Load only the rows that match our filter  **/ 
    $objPHPExcel = $objReader->load($file_path); 
	
	$objWorksheet = $objPHPExcel->getActiveSheet();
	
	/////////////////////////////////////////////
	///////////////DO PROCESSING/////////////////

	//$row_count = $startRow; //was 0
	foreach ($objWorksheet->getRowIterator() as $row) {
		
				
	  	$cellIterator = $row->getCellIterator();
		$cellIterator->setIterateOnlyExistingCells(false); 
		$col_count = 0;		
		$row_data_flag = 0;
		$cells_array = array();
		
		//move through columns
		foreach ($cellIterator as $cell) {
			   $col_count++; 
			   $row_data_flag = '1';
			   
				
				$temp_value = returnAscii($cell->getCalculatedValue());
				
				
			
			
			} //end of foreach going through columns

 

 

 

		///Free up some of the memory 
		mysql_free_result($query_part_lookup);
		$objPHPExcel->disconnectWorksheets(); 
		unset($objPHPExcel, $objWorksheet, $cellIterator); 
	
	///////////////DO PROCESSING/////////////////
	/////////////////////////////////////////////

} ///end of for going through chunks
May 10, 2012 at 3:41 PM

I solved my issue:  Sessions were enabled on this script, which killed performance.  Turning them off reduced the CPU usage dramatically.

Sep 23, 2014 at 6:42 PM
Edited Sep 23, 2014 at 6:43 PM
it is reading row by row how i insert it into database?
I want to insert half row in 1 table and half in another