Migrating terabytes of data instantly (can your ALTER TABLE do this?)

Every seasoned developer has been there: whether it’s an urgent requirement change from your business leader or a faulty assumption revealing itself after a production deployment, your data needs to change, and fast.

Maybe a newly-passed tariff law means recalculation of the tax on every product in your retail catalog (and you sell everything). Maybe a user complains that her blog post is timestamped to the year 56634, and you realize you’ve been writing milliseconds, not seconds, as your epoch time for who knows how long. Or maybe Pluto has just been reclassified and your favorite_planet column urgently needs rectification across millions of astrological enthusiast rows.

Now you’re between a rock and a hard place. Is downtime acceptable while you take the database offline and whip it into shape? That’s a hard “no.” If you’re using SQL, you might be able to express your changes in your database’s arcane API, but even then, you’re left with the laborious job of coordinating your migration with your application deployment (and hopefully you’ve understood the relevant concurrency and locking semantics). If you’re running on NoSQL, you might as well commence with the stages of database grief: denial of severe migration restrictions, bargaining with third-party tools, and finally acceptance that there’s no hope at all. The solutions left to you all rhyme with “tech debt.”

But what if there were a better way?

Today we’re releasing Rama’s new “instant PState migration” feature. For those unfamiliar with Rama, PStates are like databases: they’re durable indexes that are replicated and potentially sharded, and they are structured as arbitrary combinations of maps, sets and lists.

Instant PState migrations are a major leap forward compared to schema migration functionality available in databases: use your own programming language to implement arbitrary schema transformations, deploy them worry-free with a single CLI command, and then watch as the data in your PStates, no matter how large, is instantly migrated in its entirety.

If you want to go straight to the nitty gritty, you can jump to the public documentation or the example in the rama-demo-gallery. Otherwise, let’s take a look at the status quo before diving into a demonstration.

Status quo

SQL

SQL needs no introduction – it’s a tried-and-true tool with built-in support for schema evolution.

SQL (Structured Query Language) is composed of sub-languages, two of which are the Data Definition Language (DDL) and the Data Manipulation Language (DML).

Via DDL, you can specify a table’s schema:

1
2
3
4
5
6

CREATE TABLE golfers (
golfer_id SERIAL PRIMARY KEY,
full_name VARCHAR(100),
handicap_index DECIMAL(4, 2),
total_rounds_played INTEGER
);

Then, maybe months later, you can modify it:

1
2
3
4

ALTER TABLE golfers
ALTER COLUMN full_name TYPE TEXT;
ADD COLUMN is_experienced BOOLEAN
ADD COLUMN skill_level VARCHAR(20);

Via DML, you can manipulate the data in your table:

1
2
3
4
5
6
7
8
9

UPDATE golfers
SET
is_experienced = total_rounds_played >= 10,
skill_level = CASE
WHEN total_rounds_played < 10 THEN 'Beginner'
WHEN handicap_index <= 5.0 THEN 'Advanced'
WHEN handicap_index <= 20.0 THEN 'Intermediate'
ELSE 'Beginner'
END;

In this example, an internet amateur golfer database is making some changes:

Change full_name to a TEXT field (perhaps uber-long names have become fashionable)
Precompute a golfer’s experience indicator and skill level (say, to shave off some milliseconds at render time)

To actually update the production database, they’ll need to wrap the changes in a transaction so that a failure can’t leave the table with unpopulated new columns:

1
2
3
4
5
6

BEGIN;

ALTER TABLE golfers ...;
UPDATE golfers ...;

COMMIT;

Taken together, this demonstrates some powerful functionality:

New attributes can be derived from existing ones
In some cases, a column’s type can be altered “for free”, without reading a single row from disk, as would happen if the only modification was to change full_name ‘s type from VARCHAR(50) to TEXT
SQL is sufficiently expressive to describe changes to multiple columns in a single operation, and smart enough to apply them in a single full-table scan. Doing so should offer significant speed-up compared to doing multiple, separate full-table scans.

However, there are some areas that could use improvement:

Changes must be specified using nothing but SQL. This will likely mean re-implementation of code and duplication of business logic that’s already been expressed in the application programming language. For example, the 10-round experience threshold and skill level tiers above would be duplicated in both SQL and whichever programming language the application uses.
Deployment of the migration will take hand-holding and coordination. If the table is massive, then scanning it may take hours or days, during which the old schema must still be assumed by application code. If there’s an unexpected fault (say, power outage), the transaction may fail and require manual re-attempt.
Some migrations may require locking entire tables for the duration of the migration, inducing downtime as reads and writes are blocked. While there may be third-party tools available that minimize downtime, these generally work by providing a phased rollout of the new schema, which may still involve an extended period of backfilling during which the old schema must be used, as is the case with the pgroll plugin for PostgreSQL.
Under the hood, the SQL database must always retain all state necessary to perform a rollback while in the middle of a commit; in practice, this could mean holding on to duplicate data for every single migrated row until the commit goes through.
If the database is sharded across multiple nodes, then deployment becomes immensely trickier, requiring careful thought and attention to ensuring its coordinated success on all shards.

NoSQL

The category of “NoSQL” databases is vast and varied, but we’ll try and summarize the landscape with respect to schema and data migrations.

In general, NoSQL databases eschew the relative power of SQL in order to gain horizontal scalability. Any schema migration capabilities had by SQL are likewise mostly thrown out with the bathwater.

Some NoSQL databases retain a distinctly SQL-ish interface, as exemplified by the column-oriented Apache Cassandra’s ALTER TABLE command. This command enables immediate addition or logical deletion of a column, but little else (its support for making even very limited changes to a column’s type was removed). A search for “Cassandra schema migration” yields primarily links to third-party tools.

Indeed, the general theme across NoSQL databases is a total lack of built-in support for anything resembling schema migrations. This might seem sensible for the category of document databases, which are often referred to as schemaless or as having dynamic schemas. These databases are lax about the shape of the data stored. Each record is a collection of key-value attributes; the attributes are an open set, and the only one required is the all-important one used as the key for lookup and partitioning. For example, the CLI command to define the golfers table in DynamoDB might look like:

1
2
3
4

aws dynamodb create-table \
--table-name golfers \
--attribute-definitions AttributeName=golfer_id,AttributeType=S \
--key-schema AttributeName=golfer_id,KeyType=HASH

Notice that Dynamo isn’t told what other attributes the golfers will have; it’s got no idea that it will ultimately be storing fields like full_name and total_rounds_played .

But what happens when changes must be made to the data’s shape and contents? The answer from document databases is: you’re on your own, kid. One option is to roll your own migration system by writing code that scans an entire dataset and rewrites everything, but this is tedious, non-transactional, and error-prone. The other options boil down to variants of migrate-on-read, wherein the tier of the codebase which reads from the database is updated to tolerate different versions of the data at read time. This might mean deserializing records as instances of either GolferV1 , GolferV2 , etc. When a record is updated, it’s written to the database using the new schema. Optionally, additional code may be written to perform a more eager write-on-first-read wherein the record is immediately written back to the database the first time it happens to be read following deployment of a new schema.

The migrate-on-read approach comes with lots of baggage. It requires tedious, imperative code to be written and deployed to the database access tier. Since many NoSQL databases provide little in the way of locking, this code may need to explicitly handle race conditions inherent to reading and re-writing a record that might have been updated in the interim. Worse, this code can never be removed unless you are certain that every single record has been re-written to the database, which can only be determined by carefully scanning the entire dataset. This might mean incurring a significant performance penalty on every read, forever.

Many NoSQL databases have ecosystems of third-party tools around them, some of which build out support for schema-migration capabilities. Mongock is one such tool, a Java library that supports code-first migrations for MongoDB and DynamoDB. While such tools will inevitably appear as godsends to developers in tight spots, they’ll never offer the ease-of-use and efficiency achievable via first-party support.

NewSQL

We should note that there is a class of “NewSQL” databases which attempt to bring NoSQL’s horizontal scalability to SQL. Schema migrations with these databases are mostly the same as SQL’s, except that they may provide assistance with coordinating changes across multiple partitions. For example, CockroachDB’s online schema changes actually enable background migration of partitioned tables, followed by a coordinated “switch-over” to the new schema on all nodes. While this is a commendable effort, it still suffers from the same limitations and expressivity issues that hamstring standard SQL schema migrations, and it’s far from instantaneous. We feel that an entirely new paradigm is necessary.

Schema evolution in Rama

Rama was built from the ground up to enable rapid iteration on software backends.

With this in mind, let’s take a quick look at Rama’s existing support for schema evolution. Then, we’ll take a detailed dive into today’s newly-released feature, instant PState migrations.

Existing support

Rama has had built-in support for schema evolution since day one.

Unlike systems built with SQL or document databases, systems built with Rama use an event sourcing architecture which separates raw facts, i.e. depot entries, from the indexes (or “views”) built from them, i.e. PStates.

This design wipes out an entire class of problems in traditional databases: by recording data in terms of irrevocable facts rather than overwriting fields in a database record, no fact once learned is ever lost to time.

With Rama, when your requirements change, you can materialize new PStates using the entirety of your depot data. For example, continuing with the above golf scenario, suppose a change must be made as to how a golfer’s handicap is computed. Thankfully, the event sourcing architecture means that the raw facts required are available: a depot record for each golf round completed by a golfer, e.g. GolfRound(golferId, finishedAt, score) .

Even if the handicap calculation requires examining every golf round ever played by a golfer, Rama happily enables its calculation via use of the “start from beginning” option on a depot subscription. Here’s how it’s done with Rama’s Java API:

1
2
3
4
5
6
7
8
9
10
11
12
13

setup.declareDepot("*rounds-depot", Depot.hashBy(ExtractGolferId.class));
StreamTopology golfRounds = topologies.stream("golf-rounds");

golfRounds.pstate("$$handicaps", PState.mapSchema(Long.class, // golfer-id
Double.class // handicap
));

golfRounds.source("*rounds-depot", StreamSourceOptions.startFromBeginning()).out("*round")
.each((Round round) -> round.golferId, "*round").out("*golfer-id")
.localSelect("$$handicaps", Path.key("*golfer-id")).out("*handicap")
// updateHandicap performs the actual arithmetic to calculate the new handicap
.each(GolfModule::updatedHandicap, "*handicap", "*round").out("*new-handicap")
.localTransform("$$handicaps", Path.termVal("*new-handicap"));

And here’s the equivalent code expressed in the Clojure API:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

(let [golf-rounds (stream-topology topologies "golf-rounds")]
(declare-pstate golf-rounds
$$handicaps
(map-schema Long ; golfer-id
Double ; handicap
))
(<<sources golf-rounds
(source> *rounds-depot
{:start-from :beginning}
:> {:as *round :keys [*golfer-id]})
(local-select> [(keypath *golfer-id)] $$handicaps
:> *handicap)
;; updated-handicap performs the arithmetic to calculate the new handicap
(updated-handicap *handicap *round :> *new-handicap)
(local-transform> [(keypath *golfer-id) (termval *new-handicap)]
$$handicaps)))

Having the ability to easily compute new indexes based on the entirety of the raw data is immensely powerful, but there are some scenarios where it might be infeasible or impossible to compute the desired view in this manner:

If you’ve enabled depot trimming to cut down on storage costs, then you won’t have access to each and every historical depot record.
If your existing PStates have data that was non-deterministically generated, you might find that you need to describe your change in terms of existing views rather than in terms of your depot records.
Scanning millions of depot records might be egregiously inefficient – for example, if your depot records describe many repeated updates to a given entity, and you already have a PState view on the “current” state of the entity, then it might mean lots of wasted effort to examine all of the obviated depot entries corresponding to that entity.

In these scenarios, Rama’s new instant PState migration feature is here to help.

New: instant PState migrations

Just as Rama reifies decades of the industry’s collective learnings into a cohesive set of abstractions, our new instant PState migration feature draws from SQL’s expressivity and NoSQL’s scalability.

In Rama, PState migrations are:

Expressive – just as Rama PStates support infinite, arbitrary combinations of elemental data structures, so do migrations support arbitrary transformations expressed in the programming language you’re already using.
Instant – after a quick deployment, all PState reads will immediately return migrated data, regardless of the volume of data.
Durable and fault-tolerant – in the background, Rama takes care of durably persisting your changes in a consistent, fault-tolerant manner.

Rama achieves this via a simple, easy-to-reason-about design. On every PState read until the PState is durably migrated, Rama automatically applies the user-supplied migration function before returning the data to the client. In the background, Rama works on durably migrating the PState; it does so unobtrusively on the task thread as part of the same streaming and microbatches your application is already doing.

Let’s take a detailed look at each facet of migration.

Expressive

PState migrations are specified as code, and the heart of each migration is a function written in your programming language of choice. Specifying your migration as an arbitrary function is tremendously powerful. Rather than being confined to a limited, predefined set of operations, as is often the case with SQL migrations, with PState migrations you have the Turing-complete power of your language, your entire codebase and all its dependencies available to you.

When you declare a PState, you provide a schema describing the shape of the data it contains. At certain locations within the schema, you may now specify a migration.

Continuing with the golf example, the golfers PState schema expressed via the Java API might look like this:

1
2
3
4
5

PState.mapSchema(String.class,
PState.fixedKeysSchema(
"fullName", String.class,
"handicapIndex", Double.class,
"totalRoundsPlayed", Long.class))

Or, using the Clojure API:

1
2
3
4
5
6

(map-schema
String ; golfer-id
(fixed-keys-schema
{:full-name String
:handicap-index Double
:total-rounds-played Long}))

When it comes time to add a golfer’s experience indicator and skill level, you can specify a migration using code you already have. Here it is with the Java API:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

private static Object enrichGolfer(Object o) {
Map m = (Map)o;
if (m.get("skillLevel") == null) {
Map n = new HashMap();
n.putAll(m);
Boolean isExperienced = (Integer)m.get("totalRoundsPlayed") > 10;
n.put("isExperienced", isExperienced);
Double handicapIndex = (Double)m.get("handicapIndex");
if (!isExperienced) {
n.put("skillLevel", "beginner");
} else if (handicapIndex < 5.0) {
n.put("skillLevel", "advanced");
} else if (handicapIndex < 20.0) {
n.put("skillLevel", "intermediate");
} else {
n.put("skillLevel", "beginner");
}
return n;
} else {
return o;
}
}

PState.mapSchema(String.class,
PState.migrated(
PState.fixedKeysSchema(
"fullName", String.class,
"handicapIndex", Double.class,
"totalRoundsPlayed", Long.class,
"isExperienced", Boolean.class,
"skillLevel", String.class),
"precompute-experience-and-skill",
GolfModule::enrichGolfer));

And the equivalent Clojure code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

(defn is-experienced?
[{:keys [total-rounds-played]}]
(>= total-rounds-played 10))

(defn skill-level
[{:as golfer :keys [handicap-index]}]
(cond
(not (is-experienced? golfer)) "beginner"
(< handicap-index 5.0) "advanced"
(< handicap-index 20.0) "intermediate"
:else "beginner"))

(defn enrich-golfer
[golfer]
(-> golfer
(update :is-experienced #(or % (is-experienced? golfer))
(update :skill-level #(or % (skill-level golfer))))))

(map-schema
String
(migrated
(fixed-keys-schema
{:full-name String
:handicap-index Double
:total-rounds-played Long
:is-experienced Boolean
:skill-level String}))
"precompute-experience-and-skill"
enrich-golfer
[(fixed-key-additions #{:is-experienced :skill-level})])

The new API addition demonstrated here is the migrated function. It takes three or four arguments:

the new PState schema
a migration ID string
a function from old-data to new data
optionally, some options describing the migration

The migration function used here is enrich-golfer , a function from golfer to golfer which calculates the :is-experienced and :skill-level keys unless they’re already set.

It’s important to note that the migration function must be idempotent. Rama will invoke the migration function on every read of a migrated location until the PState is completely durably migrated in the background, whether or not a particular entry has been migrated yet or not. This means that the migration function may run against both yet-to-be-migrated and already-migrated inputs. This design choice gives total control to the user: rather than adding definite storage and computational overhead to the implementation, e.g. state for every single PState entry indicating whether it has been migrated, the user’s migration function may switch on state which is already present, e.g. the migrated entity’s type.

The migration ID is used to determine whether successive migrations to the same PState are the same or different. It is only relevant when you perform a module update while a PState is undergoing migration. In such cases, Rama will look at the migration IDs in the PState’s schema and restart the migration from scratch if any of them has changed; otherwise, it continues where it left off. For example, consider the following cases:

You’ve deployed a module update with a migration on your massive $$golfers PState which will take several days to complete. However, in the midst of migration an unrelated hot-fix must be made to some other topology. Another module update may safely be made with the $$golfers migration left untouched, and the background migration will resume where it left off.
Or, suppose you’ve deployed a migration on the $$golfers PState, but while it’s running you realize there’s a bug in your migration function that’s somehow made it through your staging environment testing. In this case you don’t have to wait for background migration to complete – you can fix your migration function, alter the migration’s ID, and do another module update immediately. Background migration will immediately be restarted from scratch.

There are also some options available for making certain kinds of structural changes to your schema; see the docs for more details.

Instant

With the migrated schema in place, committed to version control and built into a jar, all that’s left is to do is deploy it with single command:

1
2
3
4

rama deploy \
--action update \
--jar golf-application-0.0.1.jar \
--module 'com.mycompany.GolfModule'

This is the same command used for any ordinary module update, and this will do the same thing as any other module update: spin up new workers running the new code and gracefully hand over writes and reads before shutting down the old workers. It will take no longer than if there were no migrations specified in the new code.

Once the module update concludes, every read of the migrated location will return migrated data, whether made via a distributed query, a select on a foreign PState, or a topology read. Rama automatically applies the migration function at read time. This means that your topology code and client queries can immediately expect to see the migrated data, without ever having to worry about handling the old schema or content.

Durable and Fault Tolerant

After deploying a migration, Rama begins iterating over your migrated PStates and re-writing migrated data back to disk. Like all PState reads and writes, this happens on the task thread, so there are no races. Rama does migration work as part of the streaming event batches and microbatches that are already occurring, so the additional overhead of background migration is minimal.

The rate of migration is tuned primarily via four dynamic options, two apiece for streaming and microbatching:

topology.stream.migration.max.paths.per.second
topology.microbatch.migration.max.paths.per.second
topology.stream.migration.max.paths.per.batch
topology.microbatch.migration.max.paths.per.batch

With these options, you may tune the target number of paths for Rama to migrate each second, and limit the amount of migration work done in each batch. In our testing with the default dynamic option values, background migration work added about 15% and 7% task group load for streaming and microbatch topologies respectively, with one million paths per partition migrated in about 3 hours 15 minutes and 2 hours 45 minutes respectively (but this will depend on your hardware, append rate, and other configuration). If your Rama cluster has 128 partitions, this comes out to about 40M and 46M paths migrated per hour respectively.

Remember, Rama applications can be scaled up or down with a single CLI command, so if you need a little extra CPU to perform a migration or want to increase its rate, it’s trivial to do.

Migrations are done in a fault tolerant manner; they will progress and eventually complete even in the face of leader switches, worker death, and network disconnection issues, with no intervention from a cluster operator required.

Migration status details are visible in the UI, at the top-level modules page down through to the individual PState pages. If the monitoring module is deployed, detailed migration progress metrics are also available.

These three screenshots taken from the cluster UI of one of our test clusters show how migration status is surfaced at the module, module instance, and PState levels:

On an individual PState’s page, the PState’s schema, migration status, and collection of tasks undergoing migration are displayed:

If the monitoring module is deployed, then migration progress metrics are also available per-PState:

Once your migration completes, you are free to remove the migration from your source code and forget it ever happened.

Conclusion

Schema evolution is an inevitable part of application development. Existing databases have varied levels of support for it: none at the low end, but even at the high end, SQL databases leave much to be desired in terms of expressivity, operational ease, and fault tolerance.

Rama was built with schema evolution in mind: with event-sourcing at its core, you’ll never “forget” anything once known, and you’ll always have the ability to derive new PState views from existing depot data.

With Rama’s new instant PState migration feature, the story gets even better: you now have the power to update your PStates’ schemas and data in-place, via the powerful programming language you’re already using, instantly and without any operational pain.

As always, we’re excited to see what kinds of novel applications are unlocked by this new leap forward in development ease.

Status quo

SQL

NoSQL

NewSQL

Schema evolution in Rama

Existing support

New: instant PState migrations

Expressive

Instant

Durable and Fault Tolerant

Conclusion

Leave a Reply Cancel reply