We've been having problems with our app where it runs out of memory when producing a CSV file. Specifically on big CSV files where there are more than 10k rows. We are using Spring Boot 2.0.8, and SuperCSV 2.4.0.
What would be the correct approach to handle these cases, so that our Spring MVC API does not crash due to OutOfMemoryException
.
Would SuperCSV be the cause of this problem? I'd imagine it's not but just in case.
I have been reading about @Async
, would it be a good idea to use it on this method as to open a separate thread?
Suppose I have the following method in a controller:
@RequestMapping(value = "/export", method = RequestMethod.GET)
public void downloadData(HttpServletRequest request,HttpServletResponse response) throws SQLException, ManualException, IOException, NoSuchMethodException, InvocationTargetException, IllegalAccessException {
List<?> data = null;
data = dataFetchService.getData();
ICsvBeanWriter csvWriter = new CsvBeanWriter(response.getWriter(), CsvPreference.STANDARD_PREFERENCE);
//these next lines handle the header
String[] header = getHeaders(data.get(0).getClass());
String[] headerLocale = new String[header.length];
for (int i = 0; i < header.length; i++)
{
headerLocale[i] = localeService.getLabel(this.language,header[i]);
}
//fix for excel not opening CSV files with ID in the first cell
if(headerLocale[0].equals("ID")) {
//adding a space before ID as ' ID' also helps
headerLocale[0] = headerLocale[0].toLowerCase();
}
csvWriter.writeHeader(headerLocale);
//the next lines handle the content
for (Object line : data) {
csvWriter.write(line, header);
}
csvWriter.close();
response.getWriter().flush();
response.getWriter().close();
}
The code:
data = dataFetchService.getData();
looks like it may consume a lot of memory. This list may be millions of records in size. Or if many users export at the same time,this will result in memory issues.
Since the dataFetchService is backed by a Spring data repository you should get the amount of records it will return and then get data one Pagable at a time.
Example : If there are 20,000 rows in the table you should get 1000 rows of data at a time 20 times and slowly build up your CSV.
You should also request your data in some order or your CSV may end up in random order.
Look at implementing PagingAndSortingRepository on your repository
Example Application
Product.java
import javax.persistence.Entity;
import javax.persistence.Id;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
@Entity
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Product {
@Id
private long id;
private String name;
}
ProductRepository.java
import org.springframework.data.repository.PagingAndSortingRepository;
public interface ProductRepository extends PagingAndSortingRepository<Product, Integer> {
}
MyRest.java
import java.io.IOException;
import java.util.List;
import javax.servlet.http.HttpServletResponse;
import lombok.RequiredArgsConstructor;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.domain.Page;
import org.springframework.data.domain.PageRequest;
import org.springframework.data.domain.Pageable;
import org.springframework.data.domain.Sort;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import org.supercsv.io.CsvBeanWriter;
import org.supercsv.io.ICsvBeanWriter;
import org.supercsv.prefs.CsvPreference;
@RestController
@RequiredArgsConstructor
public class MyRest {
@Autowired
private ProductRepository repo;
private final int PAGESIZE = 1000;
@RequestMapping("/")
public String loadData() {
for (int record = 0; record < 10_000; record += 1) {
repo.save(new Product(record, "Product " + record));
}
return "Loaded Data";
}
@RequestMapping("/csv")
public void downloadData(HttpServletResponse response) throws IOException {
response.setContentType("text/csv");
String[] header = {"id", "name"};
ICsvBeanWriter csvWriter = new CsvBeanWriter(response.getWriter(), CsvPreference.STANDARD_PREFERENCE);
csvWriter.writeHeader(header);
long numberRecords = repo.count();
for (int fromRecord = 0; fromRecord < numberRecords; fromRecord += PAGESIZE) {
Pageable sortedByName = PageRequest.of(fromRecord, PAGESIZE, Sort.by("name"));
Page<Product> pageData = repo.findAll(sortedByName);
writeToCsv(header, csvWriter, pageData.getContent());
}
csvWriter.close();
response.getWriter().flush();
response.getWriter().close();
}
private void writeToCsv(String[] header, ICsvBeanWriter csvWriter, List<Product> pageData) throws IOException {
for (Object line : pageData) {
csvWriter.write(line, header);
}
}
}
1) Load data by calling
curl http://localhost:8080
2) Download CSV
curl http://localhost:8080/csv