SQLite for production? - El blog de Felipe

I have been around databases for a while: dBase III, QuickSilver, and Clipper in the 80s, DB/2 (half a database as I jokingly called it back then), SQL Server (later MSSQL), Oracle, Paradox, FoxPro, rBase, and Lotus Notes as a database in the 90s, MS Access (with MDE) and back to MSSQL, and back again to DB2 on z/OS and then entered the NoSQL era with MongoDB, SQLite for local testing, Redis (a non-RDBMS memory caching database), and back to SQL with MySQL. I even wrote my own dBase III compatible database engine in C++ (with full reverse engineering of the tree index format and the file locking system for writing) and hold a patent on using databases for logging and debugging applications.

So, have I heard of SQLite? Oh yes, it is a tiny, single-file database engine that can be used for mobile applications, or to run some unit tests on your local machine and it is so flexible that you can spin up a database during your CI build unit tests and tear it down at the end. You could also set it up as a local instance of your database, so you don’t have to be connected while developing your application. But wait, do you really think that it can also be used in production? As the database backend to my precious application? With redundancy? And good performance? And DR?

DevOps

Well, yes, that is exactly what I am saying. Over the years we have created an abstraction around databases that hides some of its complexity. We “request” databases, and expect someone else to figure out the storage, rights, backups, performance tuning, scaling, growth, etc., but isn’t that going in the opposite direction of DevOps? In the DevOps discipline, we want every component of our application to be under our (DevOps) control. So, in order to achieve DevOps of databases, we should either automate the provisioning of the managed databases or maybe we could rethink our posture about databases.

DBAs

Historically, databases needed to be performance-tuned, and that was a science by itself, so we hired a group of experts and called them DBAs (DataBase Administrators). Over the years, that nice separation also made it possible to throw more and more things over the DBA’s fence, so that “application” developers could concentrate on developing the application’s functionality. But, this world where the developer didn’t need to care about database issues, also became a problem in itself, because now developers wanted to use these easy database services for everything, even for things that don’t really belong to managed databases, while expecting DBAs to figure out the problems and performance.

As with most things, the right balance is to learn how to pick the right tool for every job. And, yes, there are very good reasons to request a managed database, but a large number of applications do not consider the qualifications, just request a managed database out of convenience. When the TV salesperson asks what are we looking for, the answer always comes back in upper limits of size and money, and it is the same thing with databases, of course, we would like the best thing we can get in iops and transfer rates, and maximum column count, when, in reality, most of those numbers will make very little, if any, difference in the performance of our application.

Performance tuning

Why do we need to performance-tune a query? Obviously, because it is not performing well, right? Well, you’ll be surprised how many times I have seen developers (and even myself) overengineering the performance tuning of some query before we have even considered how much it will affect the running application. And nowadays, there are many ways to improve the performance of a query, like buy faster hardware, use cache, or co-locate; any of those mechanisms might solve the performance issue, but fixing it before we know it is a problem is not a good use of our time. Please note, that I didn’t say that we should not put a sensible amount of thought into creating solid schemas, a good index or two might make a world of a difference; I am just pointing out that it is not the only tool in our belt.

Caching

Speaking of tools, key-value pair in-memory databases, like redis and etcd, should be used for caching instead of your main database. Those tools have been fine-tuned already for the purpose and we shouldn’t reinvent that wheel trying to do it in our database just because we can.

Self-managed

Although there are many reasons to choose and go with a managed database, it is not our only choice, and, in a growing number of cases, in my opinion, it is not the best choice. Why go with self-managed databases?

Lifecycle. The database gets instantiated, updated, upgraded, and decommissioned with the application. OS patching (nowadays called container rebuild), database version updates, and all other routine reasons to coordinate with the DBAs are now totally under the control of the application team and can be synchronized with the application’s lifecycle.
Autonomy. Not only is the application now independent from other applications that cannot upgrade databases at the same speed, but also the ability to move the application to other/better/more flexible locations (on-prem or different clouds) without having to worry about the data paths between the new location of the application and the old location of the database.
Network travel. The advantage of co-locating the database with the application server is not just the autonomy, but also the performance improvement you get from simply having the data packets travel a shorter route, which would now give you the freedom to make multiple calls to the database (free to be chatty) without the urgent need to build a huge but potentially non-performing SQL call to get all the data in one swoop.

But, SQLite?

It is a toy database, isn’t it? No, it is not. It is a full-featured database system that has been used for applications in production for many years. Several of the applications that you run every day on your cell phone are probably storing all their data in SQLite. It has been there, it has proven its toughness and reliability, the only thing that is changing is the previous perception that it should only be used as an embedded database for a tool or mobile application to a newer appreciation of this mature tool as an all-you-need database for a larger percentage of applications.

Compatibility

SQLite implements all the common SQL features with very few exceptions, like “RIGHT OUTER JOIN”, but, to be honest, my applications have survived the last decade without those. Of course, I had not been considering SQLite as a serious production alternative for most of my applications until recently, but, I have been using SQLite for most of my unit and local testing, which goes to show that it has all the features that I AM ACTUALLY USING from the bigger alternatives.

Simplicity

If you have been developing applications with the support of DBAs, you probably have been kept unaware of the complexity of the database system. Even if you have already been convinced by the DevOps culture and have started deploying and updating your database from your code, you have probably made sure that it stays separate from your application, via containers, and might not have paid much attention to the persistent objects managed by the database system. But it would be hard to ignore the simplicity of the SQLite system since the only way to use it is to pass the actual filename to SQLite, you will quickly notice that the whole database is contained in this single file. Does that mean that there are physical limitations? You can read about the limitations here, but we will probably run into the limitations of our computing power before we get close to some of those limits. But, the simplicity of the single-file database cannot be overstated. Backup, migration, and data transfer are made simpler by the single-file approach.

Portability

This might be the biggest reason to give SQLite a second look as a production database. Wouldn’t it be wonderful to have your data in the same exact format regardless of where your application is running? Your traditional RDBMS works well on one or two major platforms, but you wouldn’t put it on your cell phone or appliances… the opposite used to be true also, SQLite works extremely well on your cell phone, TVs, and appliances but you wouldn’t put it as the heart of your web application, but now, that is changing. The same engine and the same file format are not just available, but portable across every device where you might think of needing a database.

Enterprise level

There are some utilities that are being brought forward to complement the simplicity of SQLite with features that you would expect only from an “Enterprise” level solution. DqLite, for example, “(distributed SQLite) extends SQLite across a cluster of machines, with automatic failover and high-availability to keep your application running“. Also, the one I am experimenting with now, the very promising LiteStream, “Safely run your application on a single server. Fully-replicated database with no pain and little cost.” And, I just heard of LiteFS, which “is a FUSE-based file system for replicating SQLite databases across a cluster of machines“. All of these initiatives help answer the distributed, availability, and backups needs of an enterprise solution.

Looking back at the applications I have developed (or participated in) for the last 10 years that required a database to store its own data, and excluding the ones that required access to external databases, I found that SQLite could have been extremely appropriate for the great majority of them. Actually, I could just switch several of them very easily since 1) they were already tested with SQLite and 2) I typically use Sequelize (nodejs ORM) and my own easy-sequelize (shameless plug) which abstracts the engine selection from its consumption.

Appropriateness

Should we all switch our production databases to SQLite? Of course not, that is not the intention of this article, not by a long shot. But this list might help you decide:

Client/Server. SQLite does not have a server component, if your application architecture requires that the client application sends SQL queries to a server engine for processing, SQLite is not your answer.
Multi-user. If your scenario requires that multiple clients read and write from the database at the same time, SQLite is not your best choice, although, for almost two decades I have slowly moved away from such implementations and created an interface to the data (API) where the multiple clients do not talk SQL queries over the wire, but REST APIs, and then the API implementation is the only user of the database. Most times the API approach provides a better architecture, regardless of using SQLite or a traditional RDBMS in the backend, but I acknowledge that there are good use cases for the opposite.
Concurrency. If your application needs require a large number of clients hitting the database concurrently, SQLite is not your best choice. The same is true for high volume or huge datasets, highly optimized RDBMSs were designed for those scenarios and should be preferred.

Conclusion

As usual, there is no one answer for this type of dilemma, and you should always pick the most appropriate tool for the job. But, it is important to know that, if you have been using SQLite for your unit tests, and love its simplicity, more and more opportunities are opening up, including the possibility of using SQLite as the backend for your production application.