Efficient SQLite backup on Android
This is a post about backing up the SQLite database on Android
In a previous article we’ve seen that the Backup Manager on Android can be a viable option for automagically backing up data. Although it is a bit of a black box, it is easy to use and should meet most needs. The platform offers helper classes for handling Shared Preferences and files, but there is no out of the box support for data stored in the SQLite database. In this post, I’ll cover an efficient way to back up and restore data, and I’ll also provide some timing and size measurements.
The database schema
Let’s use a simple database schema! Basically there are Locations that can contain zero or more Stuffs and each Locations and Stuff has a name:
It is a simple schema, but it provides a variable amount of entities and also a relation between them.
The backup architecture
The main idea behind the backup is simple: We’ll use a Cursor to fetch all the entities from the database one by one, convert them to JSON by a streaming architecture using GSON and finally GZIP the whole stream. Except the last step, the whole process is streaming, so it does not require a proportional amount of RAM to the number of entities. Only the GZIPed data must fit in the memory, but we’ll see that it won’t be a problem either (also, as the Backup Manager has a practical limit of 2MB, producing larger dump would make the saving problematic).
First, let’s define a helper method. I needed to iterate over Cursors, and this introduced bloat and reduced readability. So, use this method to simplify this:
Next, we need an OutputStream we write the data to, and a writer:
Now, everything we write into writer will get GZIPed and will end up in bufStream.
Next, we’ll define the methods that actually write the data. The resulting JSON will be like this:
The export should be separated by objects/arrays, as it makes the streaming more readable and also help catch unclosed object/arrays more easily.
Note: You might be using an ORM, it does not make any difference. Just read the data and serialize it somehow into JSON.
And thats it, it can be written into the BackupDataOutput:
The restore is very similar to the backup, just the opposite direction. First, we construct the reader:
Then we read the entities one by one, and save them to the database:
Note: Notice the explicit beginTransaction() and endTransaction(). This greatly increases the speed of the inserts. See this blog post about this.
… And so on, same for the Stuffs.
And we construct the dataBuf like this in the onRestore() method:
Also, do not forget to clear all the previous data from the database!
Checking the correctness of backup
Running a backup and then a restore indeed worked fine. To check it, I’ve introduced a database hash which is a consistently ordered read of all entity name hashed with SHA-1. This produces a string that can be used to check the data without manually examining the database.
I’ve ran the backup:
Then, I’ve modified the database, effectively changed the hash, then ran a restore:
The hash is the same, so the data is the same.
Note: The hashing algorithm I use does not take the IDs, only the names. It is probable that the autogenerated IDs are different after the restore than before the backup.
Measuring timing and size requirements
I’ve ran multiple tests with different amount of entities ranging from a small database to a relatively big one. I’ve also ran a couple of tests for each run:
- Db population: How much time was needed to populate the db with the desired amount of entities
- DB read: How much time was needed to read the whole database, without writing anything
- Backup size without GZIP: The amount of bytes the raw JSON takes
- JSON write without GZIP: How long the export took without GZIP
- Backup size with GZIP: The amount of bytes the full-featured backup takes
- JSON write with GZIP: How long it was
- Restore: How long a full restore is
I’ve made the following runs:
- 10 Locations, 10 Stuffs each (110 total)
- 100 Locations, 10 Stuffs each (1100 total)
- 100 Locations, 100 Stuffs each (10100 total)
- 1000 Locations, 100 Stuffs each (100100 total)
The findings are summarized in this table:
We can conclude several things from the table above. First, restoring from the GZIPed stream did not have any overhead as generating the entities randomly (in fact, it was faster. It may happen because of increased heap size when the last step is running, or some caching, or the random is slow). Also, GZIPing the stream also had no effect on the running time. All the tests shown <1sec differences to the raw stream. But the size requirements are significantly lower, with GZIPing we’ll see unacceptably long backups/restores before reaching the 2MB limit.
On the other side, serializing into JSON does add a considerable overhead. On smaller data sets it is still a little (4 sec vs 2 sec), on larger sets it is more considerable. For the largest set I’ve tested it added a 16% computation to the raw read.
The method described above is a valid solution to backing up SQLite databases on Android. It can be used for a sizeable amount data, and is also compatible with the Backup Manager. It is expected that the 2MB limit will be enough for most apps. Also, the JSON serializing gives the freedom to freely choose the schema, and it can be also processed with other apps easily. Also, with dump versioning, apps can be made to restore any previously made backups, no matter the concrete database schema.