neo4j

Why doesn’t my Neo4j database size increase after adding data?


I’m using Neo4j to work with a graph, and I need to measure the database size to understand how much space my nodes and relationships occupy after adding new data. To do this, I go to the root folder data/databases and check the size of the data folders. However, after adding a large amount of data, I notice that the folder sizes remain the same.

What could be causing this issue? Is Neo4j caching data or using some kind of optimization that prevents these changes from being reflected in the file system? Or is there an action I need to take to update the data on disk?

Any hints or advice would be greatly appreciated.


Solution

  • Neo4j's Space reuse documentation says:

    Neo4j uses logical deletes to remove data from the database to achieve maximum performance and scalability. A logical delete means that all relevant records are marked as deleted, but the space they occupy is not immediately returned to the operating system. Instead, it is subsequently reused by the transactions creating data.

    and also:

    The store files [...] do not shrink when data is deleted. The space that the deleted records take up is kept in the store files. Until the space is reused, the store files are sparse and fragmented, but the performance impact of this is usually minimal.

    So, the data that you added could have been stored in what was unused space in sparsely-populated store files.

    Also, neo4j does not immediately persist writes to data store files. Instead, it keeps them in memory and also logs them in transaction logs, periodically doing a checkpoint to flush the latest writes to the data stores and reduce the size of the transaction log files. So, to see the ultimate effect of recent data changes to persisted data, you should so a checkpoint first (which will actually store the data and reduce the transaction log overhead). You can use the Cypher statement CALL db.checkpoint() to force a checkpoint.