javasqlarraylistjdb

Data duplicates in Arraylist


I want to fill my ArrayList with data selected from my database but it duplicates all the rows i don't get it.

        Connection conn = DriverManager.getConnection(url, userid, password);
        String strSql = "SELECT * FROM events, eventStats" ;
        PreparedStatement Stmt = conn.prepareStatement(strSql);
        ResultSet rs = Stmt.executeQuery(strSql);

        while(rs.next()){
            Events user = new Events();
            user.seteName(rs.getString(2));
            user.setePlace(rs.getString(4));
            user.seteDate(rs.getString(5));
            user.seteInscrit(String.valueOf(rs.getInt(8)));
            user.seteTotal(String.valueOf(rs.getFloat(10)));
            data.add(user);
        }

Thanks for your help !


Solution

  • You're not actually seeing 'duplicate' rows in the strictest sense of the word.

    What you're seeing is a Cartesian Product, which is every row in one table joined to every single row in another table. You're getting this because you've got a cartesian join (or a cross-join if you prefer) in your SQL statement:

    SELECT * FROM events, eventStats
    --            ^^^^^^^^^^^^^^^^^^
    

    This means that for each row in the events table, you will get a number of results equal to the number of rows in the eventStats table. Add that all together and what looks like a lot of duplicates is actually a potentially huge number of unique combinations of all of the rows from both tables lumped together.

    There are a select number of scenarios where a cartesian join is desirable, however it is very often an indicator of a faulty query, and can have a huge performance impact on your system if one of the tables is particularly large.1

    To avoid it you will need to specify which column in each of your tables to use to join the two tables together, either through an explicit join:

    SELECT *
      FROM events v
           [INNER|LEFT OUTER|RIGHT OUTER] JOIN eventStats s
             ON v.SomeColumn = s.SomeMatchingColumn
    

    or an implicit one:

    SELECT *
      FROM events v, eventStats s
     WHERE v.SomeColumn = s.SomeMatchingColumn
    

    It's hard to tell you more than that without seeing the table structure, but there is probably a foreign key relationship between those two tables, and that's the column that you want to join on.

    1 The number of rows that a cartesian join will return is equal to the the product of the sizes of the tables involved in the join. As an example, say the events table contains 50,000 rows and the eventStats table also has 50,000 rows, one for each event. With a cartesian join on those tables your query will return a whopping 2,500,000,000, yes, that's 2.5 billion, rows of data. Now imagine reading all of that into an ArrayList... except you can't, because Java arrays can't get that large! Your application will crash with either a java.lang.OutOfMemoryError: Requested array size exceeds VM limit or a java.lang.OutOfMemoryError: Java heap space before you can read all the data in.