javaazureapache-poiazure-blob-storageazure-sdk-for-java

Reading excel file (.xlsx) from Azure Blob using Java and Azure SDK without downloading it locally


I have created Azure blob with an excel file(.xlsx). I want to read the same using Java(Apache-poi) and Azure SDK without downloading it in locally. I can download the blob/file to a temp location and read it, but I should be able directly read the file from the blob without downloading the blob as per my requirement. Is there any way as such?


Solution

  • Reading excel file (.xlsx) from Azure Blob using Java and Azure SDK without downloading it locally.

    You can use the below code to read the .xlsx file from Azure Blob storage using Azure Java SDK without downloading locally.

    Code:

    import com.azure.storage.blob.BlobClient;
    import com.azure.storage.blob.BlobClientBuilder;
    import org.apache.poi.ss.usermodel.*;
    import org.apache.poi.xssf.usermodel.XSSFWorkbook;
    import java.io.InputStream;
    
    public class App {
        public static void main(String[] args) {
            String connectionString = "xxxx";
            String containerName = "test";
            String blobName = "sample5000.xlsx";
    
            BlobClient blobClient = new BlobClientBuilder()
                    .connectionString(connectionString)
                    .containerName(containerName)
                    .blobName(blobName)
                    .buildClient();
    
            try (InputStream inputStream = blobClient.openInputStream()) {
                Workbook workbook = new XSSFWorkbook(inputStream);
                Sheet sheet = workbook.getSheetAt(0);
    
                // Iterate through the rows
                for (Row row : sheet) {
                    boolean isRowEmpty = true;  // Flag to check if the row is empty
    
                    // Iterate through each cell in the row
                    for (Cell cell : row) {
                        if (cell.getCellType() != CellType.BLANK) {
                            isRowEmpty = false;  // If we find a non-blank cell, the row is not empty
                        }
    
                        // Handle different cell types
                        switch (cell.getCellType()) {
                            case STRING:
                                System.out.print(cell.getStringCellValue() + "\t");
                                break;
                            case NUMERIC:
                                // Handle dates if the cell contains a date
                                if (DateUtil.isCellDateFormatted(cell)) {
                                    System.out.print(cell.getDateCellValue() + "\t");
                                } else {
                                    System.out.print(cell.getNumericCellValue() + "\t");
                                }
                                break;
                            case BOOLEAN:
                                System.out.print(cell.getBooleanCellValue() + "\t");
                                break;
                            default:
                                System.out.print("Unknown Type\t");
                        }
                    }
    
                    // Only print a new line if the row is not empty
                    if (!isRowEmpty) {
                        System.out.println();
                    }
                }
                workbook.close();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }
    

    The above code used BlobClient, opens an Excel file (.xlsx) as stream from the specified blob, and reads the first sheet. It iterates through each row and cell, printing the cell contents based on their type (string, numeric, date, or boolean). Once all the rows and cells are processed, the workbook is closed to finish up.

    In my environment, the file contains 5000 rows and the above executed and successfully retrieved from Azure Blob Storage without downloading locally.

    Output:

    4992.0  Dorcas  Darity  Female  United States   37.0    21/05/2015      8765.0
    4993.0  Angel   Sanor   Male    France  24.0    15/10/2017      3259.0
    4994.0  Willodean       Harn    Female  United States   39.0    16/08/2016      3567.0
    4995.0  Weston  Martina Male    United States   26.0    21/05/2015      6540.0
    4993.0  Angel   Sanor   Male    France  24.0    15/10/2017      3259.0
    4994.0  Willodean       Harn    Female  United States   39.0    16/08/2016      3567.0
    4995.0  Weston  Martina Male    United States   26.0    21/05/2015      6540.0
    4995.0  Weston  Martina Male    United States   26.0    21/05/2015      6540.0
    4996.0  Roma    Lafollette      Female  United States   34.0    15/10/2017      2654.0
    4997.0  Felisa  Cail    Female  United States   28.0    16/08/2016      6525.0
    4998.0  Demetria        Abbey   Female  United States   32.0    21/05/2015      3265.0
    4999.0  Jeromy  Danz    Male    United States   39.0    15/10/2017      3265.0
    5000.0  Rasheeda        Alkire  Female  United States   29.0    16/08/2016      6125.0
    

    enter image description here

    Reference: Azure Storage Blob client library for Java | Microsoft Learn