javapostgresqloopjdbc

using sql joins vs using separated methods in java


I'm working on a Java project using JDBC and I'm looking for best practices for fetching related data from multiple tables. Given the following class diagram:

I have User, Project, Estimate, Labor, and Material entities with the following relationships:

A User can have multiple Projects. A Project can have multiple Estimates. A Project have a one to many relationship with Labor and Material

Using SQL Joins: Fetching all related data in a single query with joins, this way can be useful to assign the values of an object in single block of code . Separate Methods: Fetching User data first, then using separate methods to fetch related Projects, and within those methods, fetching Estimates and component(Labor and Material)

here is what i want to do in details :

public User getUser(int id) { //now here i need to write an sql to the user with the specific id //the user has a list of projects so i need to return the user with the projects too //each project has one or many estimates so i need to get the estimates to of each single project before returning the user //same for materials and lobor where each project contains two lists for labor and material

What are the pros and cons of each approach, and which one is recommended for maintainability and performance in a typical Java application?

i've try to use separated methods to make sure that a class can handle one job (Single responsibilty principle) , also i want to make my code cleaner and well separated . i know that i can lose performance but i dont know if i can sacrifice the the cleanliness for performance .


Solution

  • By joining two tables together, you are Denormalizing the data. So if you join the "Project" and "Estimate" tables together the "Project" values will be repeated for each "Estimate" for that project. This problem gets larger the more tables you join as you end up with a Cartesian Product in the result set. Ultimately your result sets will get large if you join everything together which will affect performance. The logic to build an object tree from the cartesian product is also quite complex

    On the other hand, you want to avoid doing a query per record since this requires a request/response to the database server for every record which will also perform badly. You need to be careful to avoid the N+1 selects problem

    Typically it's best to run one query per table

    select * from user where user_id = ?;
    
    select * from product where user_id = ?;
    
    select * from estimate where product_id in (...)
    
    select * from labour where estimate_id in (...)
    
    select * from material where estimate_id in (...)