Database Consistency And Jepsen Testing: Mike Hall Interviews Kyle Kingsbury | GOTO Conference 2015

UGtastic Archive
Full Transcript Available
Kyle Kingsbury dives into the complexities of database consistency and the importance of thorough testing. Learn about issues like stale reads and dirty reads in MongoDB, and the evolving nature of database reliability over time. Don't miss this! #databaseconsistency #MongoDB #testing #distributedsystems #CTA https://just3ws.github.io/interviews/kyle-kingsbury-goto-conference-2015
The Interviewer

Mike Hall

Interviewer, UGtastic

The Guest

Kyle Kingsbury

database consistency and Jepsen testing

The Conversation


Mike Hall Interviewer, UGtastic
Five. Recording now is one, two, three. Okay. Hi, it's Mike with UGtastic . I'm here at GOTO Conference 2015 and I'm sitting here with Kyle Kingsbury who gave the opening keynote and its tagline was Hope Springs Eternal. Did I say that correctly? Okay. Well, thank you very much for taking the time to speak with me. What Hope Springs Eternal? What were you trying to implant as a message for your keynote?
Kyle Kingsbury database consistency and Jepsen testing
The message is basically that everything is broken and we should be crying in the corner. I came to deliver fire and ashes, but we're wrapping it in a pretty happy package. It's like we're smiling. We're all doomed. So in the previous Jepson talks, I'd gone through a number of databases and had found inconsistencies or cases of data loss. And in this talk, I wanted to come back to some of these databases because people are optimistic that they've fixed certain problems. And I wanted to measure whether or not those problems have been resolved and then see if there were new ones.
Mike Hall Interviewer, UGtastic
Okay.
Kyle Kingsbury database consistency and Jepsen testing
So like the notorious Mongo missed write problem and things like that. So what were some of...
Mike Hall Interviewer, UGtastic
Can you tell me your findings?
Kyle Kingsbury database consistency and Jepsen testing
So MongoDB in particular fixed their majority write issue, where if you would write at the strongest level of consistency, it could occasionally say that it had written data successfully and then lose it. Oh, okay. So that was fixed quickly and it was confirmed to be safe in the latest tests.
Mike Hall Interviewer, UGtastic
Okay.
Kyle Kingsbury database consistency and Jepsen testing
I found a new issue, however, which is that it will allow you to read values from the past. So you can write something and then not see it anymore. And a later time it might show up. Well, I guess that's the eventual part of eventually readable instead of eventually consistent. But it'll show up. So there was just some latency issues that if you wrote and then read too fast, it might get stale data. Or you could also read garbage data. You could make a write that should not have succeeded. It would fail, but you might not know it failed. And then it would be visible for reads to that node. So that's called read uncommitted. So both stale reads and dirty reads are sort of... ... two complementary sides of the same phenomenon in Mongo.
Mike Hall Interviewer, UGtastic
Okay. And what were some of the other databases?
Kyle Kingsbury database consistency and Jepsen testing
Did you look at MySQL and Postgres and Couch and React and all the other databases that are usually come up in these conversations? So I haven't gone to test MySQL with Postgres replication. I did a very simple single node Postgres analysis, but it doesn't show very much interesting beyond two generals. I did come back to look at Elasticsearch in this test, which previously lost data. But when a partition happened that left a node connecting two different components in the network, that particular behavior was known to be an issue, and the ticket was fixed. And in fact, it had reduced the window of data loss significantly. But there's still some time when you can lose writes. Were you the person who authored those blog post articles doing the partition analysis on...
Mike Hall Interviewer, UGtastic
Or was it Elasticsearch I'm thinking of?
Kyle Kingsbury database consistency and Jepsen testing
There was about missing data due to... Some latency inside of certain database scenarios. That could have been me, yeah.
Mike Hall Interviewer, UGtastic
Okay.
Kyle Kingsbury database consistency and Jepsen testing
The Jepson series has been going on for a couple years now. And there's a bunch of blog posts, and there's a paper in ACMQ, and there's the talks as well.
Mike Hall Interviewer, UGtastic
Okay. And are there any databases that you're just like, yeah, this database is good?
Kyle Kingsbury database consistency and Jepsen testing
Or is it always the... Well, it depends. You know, I think depending on what you're building, you're going to need various guarantees. Guarantees and various performance characteristics and various availability characteristics. So for a consistent metadata store, you might pick something like Zookeeper. Maybe etcd, although that's a little newer, so I figure it 'll take a while to iron its mugs. In fact, Zookeeper, you know, over its, what, 10-year history has been ironing out bugs gradually. There was a really nice article from PagerDuty recently that discovered a particular confluence of unchecked checks ums for GCP packets and some weird Xen anomalies that resulted in Zookeeper. So it's like if you have an elbow here and this elbow here, it won't write. Yeah, it's really amazing. These are very difficult to get correct. Yeah, yeah. But Zookeeper, by and large, I think has hammered out a lot of the bugs. Well, and that's one of the things I've heard an interview with one of the Postgres internals DB engine developers. And they basically described that it's, especially with open source development on these tools, it's very hard to get people to work on them that can actually effectively work with these problems because they're hard problems that they're solving. Oh, yes. I mean, so when you're looking at databases, sometimes it feels like the older the better.
Mike Hall Interviewer, UGtastic
Does that seem like a reasonable wisdom?
Kyle Kingsbury database consistency and Jepsen testing
Like if it's been around 20 years, it's probably pretty SOLID. Yeah, and the more use cases it's seen, and as it gets to larger deployments, the more bugs you'll run into enough that it was painful enough that somebody would go and fix it. But conversely, you know, old doesn't necessarily mean safe . There are well-established pieces of software with plenty of bugs in them.
Mike Hall Interviewer, UGtastic
Yeah.
Kyle Kingsbury database consistency and Jepsen testing
Not a pen share.
Mike Hall Interviewer, UGtastic
So just the word Jepson, what does that mean? What is Jepson?
Kyle Kingsbury database consistency and Jepsen testing
And how did, you know, you said it's a series of articles, but it just seems like such a random word.
Mike Hall Interviewer, UGtastic
What does that mean?
Kyle Kingsbury database consistency and Jepsen testing
Carly Rae Jepson is a Canadian pop star. Oh, okay. Who had this famous song, Call Me Maybe. It's all about miscommunication and not knowing if the boy likes you or not. And to me, this speaks to distributed systems where you're sending your operations into the, and hoping that messages come back that they understood you , that they want to meet. That's almost as subtle a pun as UGtastic. UGtastic. User group's UGtastic. But, yeah, it's one of those when you finally hear the definition, you're like, that's awesome. Call me maybe. And it makes total sense when you hear it. Well, the blog posts are calling me Jepson, calling me El asticsearch. Oh, okay. And then the hope springs eternal. That one was a little hard. Yeah, in the previous talk, there was a star.
Mike Hall Interviewer, UGtastic
Yeah.
Kyle Kingsbury database consistency and Jepsen testing
There was a reference about building a Death Star, and so this is sort of playing on that.
Mike Hall Interviewer, UGtastic
Okay.
Kyle Kingsbury database consistency and Jepsen testing
And did you happen to look at any other, like, we have several database providers here, like Neo4j, do your tests look at a variety of types of database stores, or are you mostly focused on NoSQL document stores, or is there a specific vertical style of database that you are focusing on with your Jepson series? I would like to, and I think the tools are capable of analyzing, all sorts of different things. Looking for SQL serializability anomalies is very tricky to do. The analyzer is slow, so it's going to take more work, I think, before I can really do tests on things like Postgres . But basic tests about insert safety and update safety for single rows, I think those should be amenable. And I'm actually hoping to do that in Postgres RDS next. Okay, great. So far, I've done things from consensus services like console and etcd to sort of horizontally scalable key value stores, like Cassandra and React, and then some SQL-style databases like NeoDB.
Mike Hall Interviewer, UGtastic
Okay.
Kyle Kingsbury database consistency and Jepsen testing
So there's a whole gamut.
Mike Hall Interviewer, UGtastic
Great. Well, thank you very much for taking the time to speak with me. I appreciate it. Thank you.

Critical Insights


durable
"Database consistency is a complex issue with no one-size-fits-all solution."
durable
"Older, well-established databases may have more bugs but are often more reliable due to their extensive use cases and larger deployments."
durable
"Testing is crucial to uncovering hidden issues and ensuring the reliability of distributed systems."