I keep having this thought and I’m not sure I’ve written it down.

Way way back in 2008 or something, I used to work at Microsoft. What I worked on there was totally irrelevant and ridiculous. I wasted a year of my life. But, more importantly, I had time for some ideas. One was about synchronization.

Synchronization was and is the thing that powers our new Internet. Because our new Internet is a bunch of different screens: laptops, phones, tablets, and other screens that aren’t selling in bulk yet. But it’s also what runs behind the scenes too. The problem of scaling databases is, to me, a big synchronization problem.

I put some thought into synchronization and came to a simple conclusion: we can’t keep storing and using data the way we had been. That is, when all your stuff is on one computer in front of you, you can be as messy as you want. But as soon as you start having to deal with other people and other computers, it’s time to get your act together.

Kind of like threading and memory safety.

No one wants that, though. Everyone wants to be as messy as they like. So, I knew it would be years of hacks before we got our acts together. Hacks like most NoSQL databases. Hacks like Dropbox. They don’t work, they just “work” until they break and you keep both ends.

I got a point, and I’m coming to it now.

People confuse synchronization with backup. Hell, I did for a long while. But they’re different. Synchronization is active, backup is passive. Synchronization is for data in motion. That data has context. Backup is for data at rest. It’s homogenous.

Prediction:

Our systems will settle on a two-part strategy. Differential snapshot backups and CRDT synchronization.

Basically, those are the only two things that can’t be done wrong.