androidiosdata-sharing

Sync large amounts of data between mobile app and webserver


The Setup

I have native iOS and Android apps which sync data to and from my webserver. A requirement of the apps is that they work offline so data is stored on the apps in sqlite databases.

The apps communicate with the server with a series of REST calls which send JSON from the server for the apps to store in their databases.

My Problem

The scale of this data is very large, some tables can have a million records, and the final size of the phone databases can approach 100mb.

The REST endpoints must limit their data and have to be called many times with different offsets for a whole sync to be achieved.

So I'm looking for ways to improve the efficiency of this process.

My Idea

An idea I had was to create a script which would run on the server which would create an sqlite file from the servers database, compress it and put it somewhere for the apps to download. Effectively creating a snapshot of the server's current data.

The apps would download this snapshot but still have to call their REST methods in case something had changed since the snapshot happened.

The Question

This would add another level of complexity to my webapp and I'm wondering if this is the right approach. Are there other techniques that people use when syncing large amounts of data?


Solution

  • This is a complex question, as the answer should depend on your constraints:

    1. How often will data change? If it is too often, then the snapshot will get out of date really fast, thus apps will be effectively updating data a lot. Also, with the big volume of data, an application will waste CPU time on synchronization (even if user is not actively using all of that data!), or may become quickly out of sync with the server - this is especially true for iOS where Applications have very limited background capabilities (only small window, which is throttled) compared to Android apps.

    2. Is that DB read-only? Are you sending updates to the server? If so, then you need to prepare conflict resolution techniques and cover cases, in which data is modified, but not immediately posted to the server.

    3. You need to support cases when DB scheme changes. Effectively in your approach, you need to have multiple (initial) databases ready for different versions of your application.

    Your idea is good in case there are not too many updates done to the database and regular means of download are not efficient (which is what you generally described: sending millions of records through multiple REST calls is quite a pain).

    But, beware of hitting a wall: in case data changes a lot, and you are forced to update tens/hundreds of thousands of records every day, on every device, then you probably need to consider a completely different approach: one that may require your application to support only partial offline mode (for most recent/important items) or hybrid approach to data model (so live requests performed for most recent data in case user wants to edit something).