node.jsexpressarchitecture

How to organize my backend if its only job is to scrape and return API data?


I'm working on nodejs + Express application that just does the following for each endpoint ('/dataset-type1', '/dataset-type2', '/dataset-type3', ...):

  1. Make calls to several different third party APIs
  2. Parse them into a dataset of type corresponding to endpoint
  3. Return said dataset There are also a few endpoints for saving and retrieving these datasets from a DB.

The current file structure is questionable, since it looks like this:

src
|- /routers
|    |- a bunch of express routers (e.g. for /dataset-type1, /dataset-type2, ...)
|- /utils
|    |- get-apiA-data.js
|    |- get-apiB-data.js
|    |- get-apiC-data.js
|    |- get-apiD-data.js
|    |- axios-instances.js
|    |- standardize-apidata-format-util.js
|    |- handle-missing-data-util.js
|    |- helper-function-for-specific-edge-case.js
|    |- parse-dataset-type1.js
|    |- dataset-type1-helpers.js
|    |- parse-dataset-type2.js
|    |- parse-dataset-type3.js
|    |- dataset-type3-helpers.js
|    |- lots of other similar files
|- /db
|    |- a bunch of SQL queries wrapped in javascript functions
|- app.js 

Aside from /util being a misnomer since it's really just holding all the dataset building logic, the main concern here is that it's getting very unwieldy and it's becoming hard to tell which scripts are for which features.

I was looking into refactoring options online and everything recommended the layered approach with something like a routes/controller/service/data-access layer, which I feel would just make my code harder follow by separating things into unnecessary layers. There really isn't much business logic, request handling logic or advanced DB usage, so the scheme I pictured was like this:

src
|- /routers
|   |- contents of existing routers folder, except logic in each router handler is moved to a controller
|- /controllers 
|   |- controller files that look like the old router files
|- /services 
|   |- contents of utils folder, probably refactored to look more service-like
|- /data
|- app.js

Where I would have the exact same issue I currently do, except the unwieldy folder is named 'services' instead of 'util' and I have a bunch of extra controller files. Really the only involved part of this app is the data scraping and parsing part, and I just want a scheme for organizing the files for that.

Ultimately my questions are:


Solution

  • You need to combine elements of layering and functional slicing (Vertical Slice). Using layers is usually wise, but you always have to evaluate which layers you actually need.

    It looks like you need to identify a pattern that "solves" how a given data source (3rd party API) is called. This pattern will probably include elements of:

    1. Technology-based utilities (JSON parsers, REST Request builders, etc).
    2. Data Source specific logic.
    3. Data Source specific data types.
    4. Application logic that forms the runtime core of your app.
    5. Common data types used by your application (API config, etc).

    The pattern will be a combination of common code that is written once and reused, and data source specific code that is unique for each data source.

    This would give you 3 logical layers:

    In terms of the vertical slices you could say you had N+1, where:

    Robert C Martin talks about "Screaming Architecture", I have not read a lot about this but the basic idea is that when someone looks at your code, it's purpose should "scream" at you. For example, if you have code arranged in folders like "controllers" its' not clear what the purpose is, but if you have code that included folders like "ChartOfAccount", "Billing", "AccountsPayable" it would be clear this was a financial management application.

    So in your case you might have something like:

     - /Common
     - /Common/DTO
     - /DataSources/API_A
     - /DataSources/API_B
     - /DataSources/API_X
     - /MainAppCode
     - /Utilities
     - /DataAccess
     - /DataAccess/Sqlite
    

    I'm not sure about the name MainAppCode specifically but hopefully you get the general idea.

    Data access - meaning your backend database, you should consider if its appropriate to abstract this out or not, and how. The purist approach would mean having DTO's that represent the physical database in your data access code, and nowhere else - your main app logic would use database agnostic entities. An alternative is to use one set of DTO's that includes representation of objects straight out of the database as well as database independent objects - this will be less complicated but is a choice with trade-offs.