存储架构

feeds justification

微信扫一扫,分享到朋友圈

feeds justification
0

I realized I’ve left out a major part from mysequence ofprevious feed
related
posts: a justification for why we should bother with a separate feed_sync_id
.
So let’s give it a shot!
The fundamental problem is:

AUTO_INCREMENT
ids are assigned
in insertion
order, but become visible to other threads in commit
order.

To see how this causes a problem, consider the interactions and visibilities between three transactions to the same database:

t0: TRX0: BEGIN; INSERT INTO kv (ns, k, v) VALUES ("-", "k0", "v0"); COMMIT;
t1: TRX1: BEGIN; INSERT INTO kv (ns, k, v) VALUES ("-", "k1", "v1");
t2: TRX2: BEGIN; INSERT INTO kv (ns, k, v) VALUES ("-", "k2", "v2");
t3: TRX0: SELECT MAX(id) FROM kv;
t4: TRX2: COMMIT;
t5: TRX0: SELECT MAX(id) FROM kv;
t6: TRX1: COMMIT;
t7: TRX0: SELECT MAX(id) FROM kv;

Here, we have two transactions that both insert a new kv
record.
The database has
to assign an id
value to each of those records, because we might be creating other associations to those records in our application code.
But other threads — TRX0
in this case — shouldn’t be able to see those records until we COMMIT
, and so indeed the SELECT
at t=t3
might return 1
.

Next, we commit TRX2
.
This was the third
insert the database saw, so we expect MAX(id)
to be 3
, and indeed, it is, as verified at t5
.

Finally, we commit TRX1
.
Consumers will now be able to see an id=2
record — hooray!

Except: if we applied a feed-style consumption based on id
instead of feed_sync_id
, we have introduced the possibility of skipping records.
To see how this happens, consider a feed-style fetch w/ id
at t=t5
above.
We will run a query like

SELECT *
FROM kv
WHERE id > :cursor
ORDER BY id
LIMIT :limit

and fetch

KV(id=1, ns=-, k=k0, v=v0)
KV(id=3, ns=-, k=k2, v=v2)

Note that there is no KV(id=2)
record committed yet, so this response is correct!
The problem comes from subsequent fetches: In particular, we will record 3
as the cursor to use for the next fetch.
This will prevent the subsequent fetch (with cursor:=3
) from ever observing the KV(id=2)
record!

This is pretty bad, and that’s why we pay the penalty of storing a separately assigned feed_sync_id
column.

schema

After all the updates, I figure it’d be helpful to give a reference for the current schema we’ve built up over all the blog posts mentioned above:

CREATE TABLE `kv` (
   `id` bigint(22) NOT NULL AUTO_INCREMENT,
   `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
   `updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
   `feed_sync_id` bigint(22) DEFAULT NULL,
   `shard` int(11) DEFAULT '0',
   `ns` varchar(255) COLLATE utf8_bin NOT NULL DEFAULT '',
   `k` varchar(255) COLLATE utf8_bin NOT NULL,
   `v` longblob NOT NULL,
   PRIMARY KEY (`id`),
   UNIQUE KEY `u_ns_k` (`ns`,`k`),
   UNIQUE KEY `u_fsi` (`feed_sync_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

阅读原文...


Avatar

Microsoft Patch Tuesday — June 2019: Vulnerability disclosures and Snort coverage

上一篇

Solving for Endpoint Compliance in a Cloud-First Landscape

下一篇

您也可能喜欢

评论已经被关闭。

插入图片
feeds justification

长按储存图像,分享给朋友