as I mentioned, I have huge ArrayLists in this format:
List<List<String>> alist;
I get the Lists out of some .CSV which I get over a extern databasesystem. (I wrote a Visual Objects script to export the data we need for some calculations)
After I have the .CSV, I load the content into my List like this:
String line;
alist = new ArrayList<List<String>>();
int i=0;
// Datei laden, und anschließend die Zeilen der CSV in eine ArrayList speichern
try {
br = new BufferedReader(new FileReader(path));
while((line = br.readLine()) != null)
{
line = line.replace(",", ".");
if(line.endsWith(delimitter))
line = line + " ";
alist.add(Arrays.asList(line.split(delimitter)));
i++;
if(i==10000000)
break;
}
}
It takes me like ~15900ms to store the Data in my ArrayList (~1,1 million rows and 11 columns). Pretty okay I guess. Now, that I have the List in the format I need, I would like to create a Insert out of it, so we can import the data in our database. I'm creating the insert like this:
public String getInsertString()
{
// Tabelle ergibt sich aus dem Dateinamen, ohne das .csv
String insert="REPLACE INTO "+tablename + " (";
// Spaltennamen aus Array auslesen, immer die erste Zeile des CSV
for(int i=0; i< alist.get(0).size();i++)
{
if(i==0)
insert = insert + alist.get(0).get(i).trim();
else
insert = insert + " ,"+ alist.get(0).get(i).trim();
}
insert= insert + ") \rVALUES";
// Values der SPalten in den Insert schreiben + korreckte Syntax des Bfehels sicherstellen usw.
for(int i=1;i < alist.size();i++) // Size nach "unten" in der 2D Liste
{
insert= insert +"(";
for(int j=0; j < alist.get(0).size();j++) // Size nach "rechts" in der 2D Liste
{
// bei dem ersten ohne "," starten, damit die Syntax stimmt
// Sollte der aktuelle Wert eine Zahl oder "null" sein, keine "'" setzen. Ansonsten "'" setzen fuer den Insert in die DB
if(j==0)
{
if((StringUtils.isStrictlyNumeric(alist.get(i).get(j).trim())) || alist.get(i).get(j).trim().contains("null"))
insert = insert + alist.get(i).get(j).trim();
else
insert = insert + "'" + alist.get(i).get(j).trim() + "'";
}
else
{
if(((StringUtils.isStrictlyNumeric(alist.get(i).get(j).trim()))) || (alist.get(i).get(j).trim().contains("null")))
insert = insert +","+ alist.get(i).get(j).trim();
else
insert = insert + ",'" + alist.get(i).get(j).trim() + "'";
}
}
if(i < alist.size()-1)
insert= insert +"),";
else
insert= insert +")";
insert = insert +"\r";
}
//System.out.println(insert);
return insert;
}
Here I go over the whole List and add the values to a string so that I can use the string for a insert. I use the filenames as table names and the first row of a file for the columns in the insert. All other rows are the values.
After this step is done, I get a string like this "REPLACE INTO tablename (column1,column2 ... columnx) VALUES(value1, value2 ... valuex), (value1, value2 ... valuex), ...."
Now I would execute my InsertInDb class with that string and yeah, that's it.
But the second step takes way too much time. (I wait for like a hour now) Is there a smarter way to do what I want to do? (Insert all CSV automatically in our Database)
Would a BigList maybe improve the speed? https://dzone.com/articles/biglist-scalable-high (can't test it right now)
Edit: What I did to solve the problem:
I created a class for my SQL statement -->
public class BuildInsert {
private String insertString;
public String getINSERTSTRING()
{
return insertString;
}
BuildInsert(String tablename, List<String> alist )
{
int size = alist.size();
this.insertString = "REPLACE INTO " + tablename + "(";
// Insert "Header"
for(int j=1; j <= size ;j++)
{
if(j < size)
this.insertString = this.insertString + alist.get(j-1)+",";
else
this.insertString = this.insertString + alist.get(j-1)+")\n";
}
this.insertString = this.insertString +"VALUES(";
// Insert values
for(int j=1; j <= size ;j++)
{
if(j < size)
this.insertString = this.insertString + "?,";
else
this.insertString = this.insertString + "?)";
}
}
}
and tock this String for a batch insert / prepared statement as Mike mentioned:
Connection con;
Statement stmt;
final int batchSize = 1000;
int count = 0;
int sizeH = alist.size();
int sizeL = alist.get(0).size();
try {
// Connection oeffnen und prepared statment vorbereiten
System.out.println("Connecting to database...");
con = DriverManager.getConnection(DB_URL,USER,PASS);
con.setAutoCommit(false);
ps = con.prepareStatement(insertString);
stmt = con.createStatement();
//< alist.size()
for(int i=1;i < sizeH ;i++) // Size nach "unten"
{
for(int j=0; j < sizeL;j++) // Size nach "rechts"
{
ps.setString(j+1, alist.get(i).get(j));
}
ps.addBatch();
if(++count % batchSize == 0){
ps.executeBatch();
con.commit();
}
}
ps.executeBatch();
con.commit();
}
The insert is much faster now. (~230 seconds for 1,1 mil rows)
Thank you guys
Now, that I have the List in the format I need, I would like to create a Insert out of it, so we can import the data in our database.
Don't do that! Do a JDBC batch insert instead. (See this on how to do a PreparedStatement which will save you a lot of trouble on formatting the insert statements)