SYS-CON MEDIA Authors: Pat Romanski, Liz McMillan, Yeshim Deniz, Elizabeth White, Courtney Abud

Article

MongoDB Write Concern: 3 Must-Know Caveats

In this post, we discuss 3 gotchas when using MongoDB write concern.

'Write concern' in MongoDB describes the level of write acknowledgment you can expect from it. It's a rather important setting to remember in your write operations and its behavior is useful to understand, especially in distributed MongoDB deployments (i.e. replica sets and sharded clusters). In this post, we discuss 3 gotchas when using MongoDB write concern.

MongoDB Write Concern

MongoDB's documentation defines write concern as "the level of acknowledgment requested from MongoDB for write operations to a standalone mongod or to replica sets or to sharded clusters."

Simply put, a write concern is an indication of 'durability' passed along with write operations to MongoDB. To clarify, let us look at the syntax:

{ w: <value>, j: <boolean>, wtimeout: <number> }
Where*,
 w can be an integer | "majority" | , it represents the number of members that must acknowledge the write. Default value is 1.
 j Requests that a write be acknowledged after it is written to the on-disk journal as opposed to just the system memory. Unspecified by default.
wtimeout specifies timeout for the applying the write concern. Unspecified by default.

* You can find the detailed syntax in the Write Concern Specification documentation.

* Learn more about the different "tags" you can use for common write concern values in our Understanding Durability & Write Safety in MongoDB blog.

Example:

db.inventory.insert(
    { sku: "abcdxyz", qty : 100, category: "Clothing" },
    { writeConcern: { w: 2, j: true, wtimeout: 5000 } }
)

The above insert's write concern can be read as follows:  acknowledge this write when 'at least 2 members of the replica set have written it to their journals within 5000 msecs or return an error'. A write concern value for option was majority, meaning "requests acknowledgment that write operations have propagated to the majority of voting nodes, including the primary."

The importance of write concern is apparent. Increasing values of w increases the latency of writes while also decreasing their probability of getting lost. Choosing the correct values for write concern depends on the latency and durability requirements of writes being performed.

With that as the background on what a write concern is, let's move on to the three caveats to remember when using write concern.

CAVEAT 1: Setting write concern on replica sets without a wtimeout can cause writes to block indefinitely

The majority definition (applicable MongoDB 3.0 onwards) above states that acknowledgment is requested from a majority of the "voting nodes". Note that "If you do not specify the wtimeout option and the level of write concern is unachievable, the write operation will block indefinitely. "

This can have unexpected consequences, for example, consider a 2+1 replica set (i.e. a primary, a secondary and an arbiter). If your sole read replica goes down, then all writes with a write concern w option of "majority" will block indefinitely.  The same will happen if the w option is set to 2. Another extreme example is in the case of a 3+2 replica set (primary, 2 secondaries and 2 arbiters, not a recommended configuration). All "majority" writes will block even if a single data node is unavailable as the majority number, in this case, is 3.

The simplest way to alleviate this issue is to always specify a wtimeout value so the query can timeout if the write concern can't be enforced. However, in case of such timeout errors, MongoDB doesn't undo already successful writes made to some of the members before the timeout occurred.

There is also currently no setting to ensure a write reaches the majority of nodes that are currently reachable, so be careful about setting the value of write concern w based on the topology, desired durability, and availability.

CAVEAT 2: You might lose data even with w: majority

It seems intuitive that once a write has been acknowledged by the majority of voting members, its durability is guaranteed. However, that isn't the case! Remember that when the j option is unspecified, a write is acknowledged right after it has been written to memory.

So, such a write can be lost if a freak power outage takes out the majority of the nodes to which the write had propagated (and before syncPeriodSecs i.e. before it could be flushed to disk).

In order to ensure the durability of writes, it's best not to turn off journaling on your database and set the j option to true. In fact, starting MongoDB 3.6, the --nojournal flag has been deprecated for replica set members using the WiredTiger storage engine.

With a w value of "majority" and the j option unspecified on a replica set, the exact durability behavior depends on the value of the replica set configuration writeConcernMajorityJournalDefault. When set to true (and when journaling is enabled), it acknowledges writes after they have been written to the journals of a majority of voting members.

Aside: Even with journaling turned on, your writes might still get lost on the MMAPv1 storage engine if an outage occurs within commitIntervalMs duration. The WiredTiger storage engine, on the other hand, forces a sync of journal files when it receives a write with j option set to true. And, even with j set to false, an acknowledged "majority" write to a latest WiredTiger based deployment can be lost only when majority of the data nodes crash simultaneously.

CAVEAT 3: w: 0 while setting j: true doesn't improve write performance

This is easy enough to reason once you think about it, but equally easy to forget. Setting w option to 0 is usually done to write to the database in a "fire-and-forget" fashion - when you have a fair amount of confidence on the database infrastructure and care more about latency than the durability of every write. However, if you set the j option to true, your w option will effectively be overridden as the database will ensure that the write is written to the on-disk journal before returning.

If you're using write concerns to guarantee the success of your write operations, make sure that you remember these three crucial caveats! We're here to help, so feel free to connect with any questions through Twitter or by email.

MongoDB Write Concern: 3 Must-Know Caveats

More Stories By Vaibhaw Pandey

Vaibhaw Pandey is a Software Developer with interests in Distributed Systems, Databases and Web-scale technologies.

Latest Stories
With the rise of Docker, Kubernetes, and other container technologies, the growth of microservices has skyrocketed among dev teams looking to innovate on a faster release cycle. This has enabled teams to finally realize their DevOps goals to ship and iterate quickly in a continuous delivery model. Why containers are growing in popularity is no surprise — they’re extremely easy to spin up or down, but come with an unforeseen issue. However, without the right foresight, DevOps and IT teams may lo...
Isomorphic Software is the global leader in high-end, web-based business applications. We develop, market, and support the SmartClient & Smart GWT HTML5/Ajax platform, combining the productivity and performance of traditional desktop software with the simplicity and reach of the open web. With staff in 10 timezones, Isomorphic provides a global network of services related to our technology, with offerings ranging from turnkey application development to SLA-backed enterprise support. Leadin...
Platform9, the open-source-as-a-service company making cloud infrastructure easy, today announced the general availability of its Managed Kubernetes service, the industry's first infrastructure-agnostic, SaaS-managed offering. Unlike legacy software distribution models, Managed Kubernetes is deployed and managed entirely as a SaaS solution, across on-premises and public cloud infrastructure. The company also introduced Fission, a new, open source, serverless framework built on Kubernetes. These ...
Emil Sayegh is an early pioneer of cloud computing and is recognized as one of the industry's true veterans. A cloud visionary, he is credited with launching and leading the cloud computing and hosting businesses for HP, Rackspace, and Codero. Emil built the Rackspace cloud business while serving as the company's GM of the Cloud Computing Division. Earlier at Rackspace he served as VP of the Product Group and launched the company's private cloud and hosted exchange services. He later moved o...
As you know, enterprise IT conversation over the past year have often centered upon the open-source Kubernetes container orchestration system. In fact, Kubernetes has emerged as the key technology -- and even primary platform -- of cloud migrations for a wide variety of organizations. Kubernetes is critical to forward-looking enterprises that continue to push their IT infrastructures toward maximum functionality, scalability, and flexibility. As they do so, IT professionals are also embr...
Kubernetes is a new and revolutionary open-sourced system for managing containers across multiple hosts in a cluster. Ansible is a simple IT automation tool for just about any requirement for reproducible environments. In his session at @DevOpsSummit at 18th Cloud Expo, Patrick Galbraith, a principal engineer at HPE, will discuss how to build a fully functional Kubernetes cluster on a number of virtual machines or bare-metal hosts. Also included will be a brief demonstration of running a Galer...
DevOps is under attack because developers don’t want to mess with infrastructure. They will happily own their code into production, but want to use platforms instead of raw automation. That’s changing the landscape that we understand as DevOps with both architecture concepts (CloudNative) and process redefinition (SRE). Rob Hirschfeld’s recent work in Kubernetes operations has led to the conclusion that containers and related platforms have changed the way we should be thinking about DevOps and...
Cloud-Native thinking and Serverless Computing are now the norm in financial services, manufacturing, telco, healthcare, transportation, energy, media, entertainment, retail and other consumer industries, as well as the public sector. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that pro...
Docker is sweeping across startups and enterprises alike, changing the way we build and ship applications. It's the most prominent and widely known software container platform, and it's particularly useful for eliminating common challenges when collaborating on code (like the "it works on my machine" phenomenon that most devs know all too well). With Docker, you can run and manage apps side-by-side - in isolated containers - resulting in better compute density. It's something that many developer...
Technology has changed tremendously in the last 20 years. From onion architectures to APIs to microservices to cloud and containers, the technology artifacts shipped by teams has changed. And that's not all - roles have changed too. Functional silos have been replaced by cross-functional teams, the skill sets people need to have has been redefined and the tools and approaches for how software is developed and delivered has transformed. When we move from highly defined rigid roles and systems to ...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It's clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Th...
Kubernetes is an open source system for automating deployment, scaling, and management of containerized applications. Kubernetes was originally built by Google, leveraging years of experience with managing container workloads, and is now a Cloud Native Compute Foundation (CNCF) project. Kubernetes has been widely adopted by the community, supported on all major public and private cloud providers, and is gaining rapid adoption in enterprises. However, Kubernetes may seem intimidating and complex ...
xMatters helps enterprises prevent, manage and resolve IT incidents. xMatters industry-leading Service Availability platform prevents IT issues from becoming big business problems. Large enterprises, small workgroups, and innovative DevOps teams rely on its proactive issue resolution service to maintain operational visibility and control in today's highly-fragmented IT environment. xMatters provides toolchain integrations to hundreds of IT management, security and DevOps tools. xMatters is the ...
If you are part of the cloud development community, you certainly know about “serverless computing,” almost a misnomer. Because it implies there are no servers which is untrue. However the servers are hidden from the developers. This model eliminates operational complexity and increases developer productivity. We came from monolithic computing to client-server to services to microservices to the serverless model. In other words, our systems have slowly “dissolved” from monolithic to function-...
CoreOS extends CoreOS Tectonic, the enterprise Kubernetes solution, from AWS and bare metal to more environments, including preview availability for Microsoft Azure and OpenStack. CoreOS has also extended its container image registry, Quay, so that it can manage and store complete Kubernetes applications, which are composed of images along with configuration files. Quay now delivers a first-of-its-kind Kubernetes Application Registry that with this release is also integrated with Kubernetes Helm...