Strongly Emergent

What comes from combining humans, computers, and narrative

How to Make SQLAlchemy Pass Connection Flags to a SQLite Database via File:// URI Query Parameters

(for just the code promised by the title, see this GitHub gist.)

Making software is more work than just sitting down and writing code. For the code you’ve written to matter, you have to make it available to others. Like prose or visual art, there is no “done” with software projects: there is only Good Enough. Unlike prose or visual art, a software project can raise its standard of Good Enough over time by releasing new versions of the code with bug-fixes and new features. This is where the “make it available to others” part starts being very difficult. Active projects, by releasing new versions, always end up in a state of heterogeneity where not all of the project’s users are using the same version of the project. When active projects are incorporated into larger projects, it exacerbates the problem. It’s very common for active projects to advance very far as a standalone project, but to lag very far behind that advancement as part of a larger project. Sometimes this is very difficult for users. But it is not any one person’s fault: it is, overwhelmingly, the emergent result of how projects interact with each other. Today I’m going to show how this process leads to the preĆ«minent database/ORM library in the Python world, SQLAlchemy, being unable to take advantage of a nearly decade-old feature of SQLite, a widely-used database.

Let’s start with SQLite. Like most database systems, SQLite lets you provide connection flags (extra information) when you connect to a database. By sending connection flags (e.g. SQLITE_OPEN_READONLY or SQLITE_OPEN_CREATE) you can have the database itself enforce restrictions on your interactions with it. Being able to send connection flags is very helpful for programmers. Having the database enforce the restrictions that the connection flags signify means you don’t need to write your own enforcement code. It also eliminates the chance of making mistakes while writing enforcement code. You gain productivity because the time that writing and checking enforcement code would take, you can instead spend on writing other code.

SQLite added connection flags in version 3.5.0 (Fall 2007). However, SQLite is a C project, not a Python project. The connection flags are concepts that exist in SQLite’s C code. For them to exist in other languages, those languages (or their ecosystems) must provide a way of interacting with SQLite that permits specifying “please send the following connection flags when you connect to the SQLite database.”

Plenty of other languages already had tools for interacting with SQLite in 2007, based on a function named sqlite3_open(). Because there was already plenty of software using sqlite3_open() and relying on its existing behavior (SQLite’s first public release was Spring 2000), SQLite 3.5.0 also introduced a new function, sqlite3_open_v2(), that understood connection flags. This allowed users to keep using sqlite3_open() until they were ready to change their code to use sqlite3_open_v2(). Once they began using the new function, they’d be able to use the new features. In version 3.7.7 (Summer 2011), SQLite made it easier still to use the new features by teaching both the old and new versions of sqlite3_open() to, with a little coaxing, understand file:// URLs1 as well as file paths. File paths are specific to a particular operating system or family of them, but file:// URLs are OS-independent. This made life slightly easier in general, but using file:// URLs had a more important benefit as well. Using them made it much easier to send connection flags, because SQLite permitted users to put connection flags in the file:// URL. Effectively, both versions of sqlite3_open() could now understand connection flags. SQLite also added some connection flags that could only be used by embedding them in a file:// URL.

If you were writing Python instead of C, though, you couldn’t count on having access to SQLite’s improvements. Python’s support for SQLite comes from periodically incorporating code from the independent pysqlite project. The sqlite3 module in Python’s standard library (introduced with Python 2.5’s release, Fall 2006) is a lightly modified version of pysqlite. Python 2.7 (Summer 2010) contained version 2.6.0 of pysqlite (Spring 2010). This version remains the core of sqlite3 as of Python 2.7.11 (Winter 2015) and Python 3.5.2 (Summer 2016). There does not yet exist a version of Python where the following code works2:

import sqlite3
sqlite3.connect("file:///path/to/my/database.db")

There are workarounds, but they show how challenging it can be to get new versions of software projects into users' hands. Fundamentally, the Python code above fails because SQLite, in the process of teaching the sqlite3_open() functions to understand file:// URLs, chose to make the new feature opt-in (similar to how they distinguished between sqlite3_open() and sqlite_v2_open). There are three times when you can opt into having SQLite understand file:// URIs: during its compilation, when it launches, and when you call it. The sqlite3 module, in its pysqlite version 2.6.0 incarnation, avails itself of none of them. It also provides no way for users to opt in.3 As an independent project, pysqlite released version 2.8.2 (Spring 2015), which added a way for users to send any connection flags SQLite understands.4 This version is not part of Python, however, and is only available for use as a standalone module when using 2.x versions of Python. Early versions of Python 3 were also stuck with the no-URIs behavior. Python 3.4 (Spring 2014) introduced a way to tell sqlite3.connect() that it should treat its input as a URL.5 Unlike pysqlite’s improved version, the Python 3.4 change didn’t add a general way to send flags (though it did open up the “send flags as part of a file:// URL” path). Still, by mid-2015, if you were using sqlite3, you had a fairly good chance of being able to use connection flags.

There are a lot of people using SQLite who aren’t using sqlite3, though, at least not directly. Because of how easy it is to create bugs, some of which will be disastrous security holes, and because of how tedious it can be to write raw SQL queries, the overwhelming (and correct) consensus of the Python community is that you should use SQLAlchemy to interact with your database. SQLAlchemy also connects to databases via URLs, but given that its decision to use URLs predates SQLite’s by years (SQLAlchemy version 0.1.0, Winter 2005-06), it should be unsurprising that the two usages clash. SQLAlchemy wants users to identify databases by URLs with the database name as the schema. So the database in our example above would be sqlite:///path/to/my/database.db. SQLAlchemy’s database-URL schemas can have extra information (query parameters) in them, like SQLite’s file:// URLs, which tell it how to connect to the database. The connection isn’t done by SQLAlchemy, though, it’s done by an external library. SQLAlchemy is a layer on top of modules like sqlite3 that understand how to directly interact with databases. Under the hood, SQLAlchemy extracts a file path from the database URL and hands that file path to the underlying database module. This structure, though, eliminates the possibility of asking SQLite to open a file:// URI! It can only send file paths to SQLite, and so the extra capabilities that SQLite activates when it sees the right prompts in a file:// URL cannot be activated through SQLite. SQLAlchemy does try to pass on extra arguments that it finds in the database URL, but it passes those on to underlying database modules like frozen-in-amber-since-2010 sqlite3.6 Such extra arguments change the details of sqlite3’s behavior, but do not change the way it tries to connect to SQLite. On older Python versions, pysqlite 2.8.2 or later can be substituted for the built-in sqlite3, but because pysqlite is not available on modern Python versions, this is not a satisfactory solution.

We are in a situation, nearly 10 years after SQLite introduced its connection flags and file:// URLs, where taking advantage of those features from Python code is impossible to accomplish with the tools provided by the latest version of Python’s best database library, running on the latest version of Python itself. It’s important to note that none of this is malfeasance or incompetence on the part of library or language maintainers. Projects like the Python language, SQLAlchemy, and SQLite, prize stability very, very highly. They are infrastructure projects: they want to build durably and to provide durable materials for others to build atop, and they are not wrong in how often they value this above convenience. The power of defaults is very important here, too: although many OSes ship with Python and/or SQLite built in, those projects in turn have their own release cycles and their own stability concerns. The first version of OS X that shipped with a SQLite version able to understand file:// URIs was summer 2012’s OS X 10.8 “Mountain Lion” (Summer 2012). Debian-stable didn’t ship with such a SQLite until midway through wheezy’s patch sequence (2014). Ubuntu picked it up faster, incorporating SQLite 3.7.7 in their Oneiric Ocelot release (Fall 2011). All of these infrastructure projects, reasonably enough, tend to defer building support for any particular thing until they are sure that their users can and want to use it. Frustratingly, they can unintentionally enable each other in delaying support. But there is no archfiend actively obstructing the uptake of new versions, just a collection of overworked engineers trying to build things that won’t fall apart too easily.

Fortunately, individual programmers writing brand-new projects have no old versions to be bound to. We can, by investing a little work, make different decisions about stability than project maintainers. This brings us around to the promise in this post’s title. Python, sqlite3, and SQLAlchemy were all written by clever people with an interest in flexibility. The tools that they’re eager to give us, the defaults, are not the only tools they can give us: there are others. Let’s use those others.

The code below follows a fairly straightforward strategy. Given a file path and some arguments to pass to SQLite, it begins with some basic plausibility checks. It ignores the arguments and uses only the path if the caller has an ancient version of SQLite or wants SQLite’s :memory: database. Otherwise, it turns the file path and the arguments into a file:// URL, then tries to connect to that URL. First it tries in the way that Python 3.4+ allows, with a uri=True parameter. If trying to connect that way is met with “I know no such parameter as uri”, we know we’re on an earlier version of Python. But since we know that SQLite and sqlite3 are available, we use ctypes to reach into the Python/C communication channel that the sqlite3 library has already set up. We prod the variable in C code that tells SQLite we’re opting into our input being treated as a URL, then connect again with our URL. Finally, we create a SQLAlchemy engine. We tell it that we’re connecting to an empty URL7, but we also tell it “when you go to connect to that URL, use this object we’re handing you in order to establish that connection.” The object we hand it is the SQLite connection we just established, which SQLAlchemy doesn’t know how to create by itself.

This strategy has some limitations: it definitely won’t work as-is on PyPy or Jython, and it’s superfluous if you know your project will run only on Python 3.4+. However, if you want your project to run on multiple versions of Python and to have access to nice SQLite features on all of them, this function will, I hope, get you to that point. I should also note that I drew inspiration from this GitHub issue and this Gist: the digging is all me, however, as is the unit test. I’m firmly of the opinion that if you tell someone you have code that can solve their problem, the code you provide should be tested.

import os.path
import re
import sqlite3
from sqlalchemy import create_engine

def create_sqlite_engine(sqlite_path, read_only=None, create_db=None, **kwargs): """ Open SQLite with appropriate use of connection flags.

This is harder than it sounds.
"""
# Requirements: sqlite_path must be either ':memory:' or the output of
# os.path.abspath(). In this context, the SQLitePathType class handles that
# for us.
log.info("Trying to connect to SQLite ({!r})".format(sqlite_path))

sqlite_prefix = 'sqlite:///'

# For an in-memory DB, ignore the flags
if sqlite_path == ':memory:':
    log.info("Fallback: :memory: DB.")
    return sqlalchemy.engine.create_engine(sqlite_prefix)

# For atrociously old versions of SQLite, ignore the flags
if (sqlite3.sqlite_version_info < (3, 7, 7)):
    log.warning(("Found old SQLite version: "
                 "using it, but have to ignore {!r}").format(kwargs))
    return sqlalchemy.engine.create_engine(sqlite_prefix + sqlite_path)

kwargs.pop('mode', None)
if create_db:
    kwargs['mode'] = 'rwc'
else:
    kwargs['mode'] = 'rw'
# Because the read_only and create_db modes are mutually exclusive, we give
# priority to the more restrictive mode.
if read_only:
    kwargs['mode'] = 'ro'

# Normalize sqlite_path from a platform-bound file path to a
# platform-agnostic file:// URI (see https://www.sqlite.org/uri.html for
# details of SQLite's file:// URI requirements).
sqlite_path = re.sub(r'\?', r'%3f', sqlite_path)
sqlite_path = re.sub('#', r'%23', sqlite_path)
drive, sqlite_path = os.path.splitdrive(sqlite_path)

path_segments = []
path_head, path_tail = os.path.split(sqlite_path)
while path_tail != '':
    path_segments = [path_tail] + path_segments
    path_head, path_tail = os.path.split(path_head)
if drive != '':
    path_segments = [drive] + path_segments

try:
    # Python 3
    query_string = urlparse.urlencode(kwargs)
except AttributeError:
    # Python 2
    query_string = urllib.urlencode(kwargs)

file_uri = urlparse.urlunparse(
    ("file",  # Scheme
     '',  # Netloc (localhost)
     '/'.join(path_segments),  # Path
     "",  # Params
     query_string,  # Query
     "",  # Fragment
     ))

log.debug("Database URI: {!r}".format(file_uri))

def _sqlite_conn_creator():
    try:
        log.debug("Trying the easy way...")
        # In Python 3.4+, we can do this the easy way.
        conn = sqlite3.connect(file_uri, uri=True)
    except TypeError:
        log.debug("Trying ctypes hackage...")
        # The ctypes way is not the easy way (nor is it the "compatible
        # with anything but CPython" way).
        import ctypes
        import ctypes.util
        sqlite = ctypes.CDLL(ctypes.util.find_library('sqlite3'))

        # https://www.sqlite.org/c3ref/c_config_covering_index_scan.html
        SQLITE_CONFIG_URI = 17
        sqlite.sqlite3_config(SQLITE_CONFIG_URI, 1)
        conn = sqlite3.connect(file_uri)

    return conn

# Dust our hands off, give the caller what they asked for, pretend we
# didn't do anything disreputable just now.
return sqlalchemy.engine.create_engine(sqlite_prefix,
                                       creator=_sqlite_conn_creator)


  1. URIs, if you’re picky.

  2. Specifically, you’ll get a sqlite3.OperationalError that signifies a C-level SQLITE_CANTOPEN return code.

  3. To be fair, Python can’t (and shouldn’t!) compile SQLite for you.

  4. Mostly by switching from sqlite3_open() to sqlite3_open_v2().

  5. Also by switching from sqlite3_open() to sqlite3_open_v2().

  6. See sqlalchemy.dialects.pysqlite.SQLiteDialect_pysqlite.create_connect_args() for the implementation.

  7. Normally this gets you connected to the :memory: DB.