SYS-CON MEDIA Authors: Pat Romanski, Gary Arora, Zakia Bouachraoui, Yeshim Deniz, Liz McMillan

Article

MongoDB Write Concern: 3 Must-Know Caveats

In this post, we discuss 3 gotchas when using MongoDB write concern.

'Write concern' in MongoDB describes the level of write acknowledgment you can expect from it. It's a rather important setting to remember in your write operations and its behavior is useful to understand, especially in distributed MongoDB deployments (i.e. replica sets and sharded clusters). In this post, we discuss 3 gotchas when using MongoDB write concern.

MongoDB Write Concern

MongoDB's documentation defines write concern as "the level of acknowledgment requested from MongoDB for write operations to a standalone mongod or to replica sets or to sharded clusters."

Simply put, a write concern is an indication of 'durability' passed along with write operations to MongoDB. To clarify, let us look at the syntax:

{ w: <value>, j: <boolean>, wtimeout: <number> }
Where*,
 w can be an integer | "majority" | , it represents the number of members that must acknowledge the write. Default value is 1.
 j Requests that a write be acknowledged after it is written to the on-disk journal as opposed to just the system memory. Unspecified by default.
wtimeout specifies timeout for the applying the write concern. Unspecified by default.

* You can find the detailed syntax in the Write Concern Specification documentation.

* Learn more about the different "tags" you can use for common write concern values in our Understanding Durability & Write Safety in MongoDB blog.

Example:

db.inventory.insert(
    { sku: "abcdxyz", qty : 100, category: "Clothing" },
    { writeConcern: { w: 2, j: true, wtimeout: 5000 } }
)

The above insert's write concern can be read as follows:  acknowledge this write when 'at least 2 members of the replica set have written it to their journals within 5000 msecs or return an error'. A write concern value for option was majority, meaning "requests acknowledgment that write operations have propagated to the majority of voting nodes, including the primary."

The importance of write concern is apparent. Increasing values of w increases the latency of writes while also decreasing their probability of getting lost. Choosing the correct values for write concern depends on the latency and durability requirements of writes being performed.

With that as the background on what a write concern is, let's move on to the three caveats to remember when using write concern.

CAVEAT 1: Setting write concern on replica sets without a wtimeout can cause writes to block indefinitely

The majority definition (applicable MongoDB 3.0 onwards) above states that acknowledgment is requested from a majority of the "voting nodes". Note that "If you do not specify the wtimeout option and the level of write concern is unachievable, the write operation will block indefinitely. "

This can have unexpected consequences, for example, consider a 2+1 replica set (i.e. a primary, a secondary and an arbiter). If your sole read replica goes down, then all writes with a write concern w option of "majority" will block indefinitely.  The same will happen if the w option is set to 2. Another extreme example is in the case of a 3+2 replica set (primary, 2 secondaries and 2 arbiters, not a recommended configuration). All "majority" writes will block even if a single data node is unavailable as the majority number, in this case, is 3.

The simplest way to alleviate this issue is to always specify a wtimeout value so the query can timeout if the write concern can't be enforced. However, in case of such timeout errors, MongoDB doesn't undo already successful writes made to some of the members before the timeout occurred.

There is also currently no setting to ensure a write reaches the majority of nodes that are currently reachable, so be careful about setting the value of write concern w based on the topology, desired durability, and availability.

CAVEAT 2: You might lose data even with w: majority

It seems intuitive that once a write has been acknowledged by the majority of voting members, its durability is guaranteed. However, that isn't the case! Remember that when the j option is unspecified, a write is acknowledged right after it has been written to memory.

So, such a write can be lost if a freak power outage takes out the majority of the nodes to which the write had propagated (and before syncPeriodSecs i.e. before it could be flushed to disk).

In order to ensure the durability of writes, it's best not to turn off journaling on your database and set the j option to true. In fact, starting MongoDB 3.6, the --nojournal flag has been deprecated for replica set members using the WiredTiger storage engine.

With a w value of "majority" and the j option unspecified on a replica set, the exact durability behavior depends on the value of the replica set configuration writeConcernMajorityJournalDefault. When set to true (and when journaling is enabled), it acknowledges writes after they have been written to the journals of a majority of voting members.

Aside: Even with journaling turned on, your writes might still get lost on the MMAPv1 storage engine if an outage occurs within commitIntervalMs duration. The WiredTiger storage engine, on the other hand, forces a sync of journal files when it receives a write with j option set to true. And, even with j set to false, an acknowledged "majority" write to a latest WiredTiger based deployment can be lost only when majority of the data nodes crash simultaneously.

CAVEAT 3: w: 0 while setting j: true doesn't improve write performance

This is easy enough to reason once you think about it, but equally easy to forget. Setting w option to 0 is usually done to write to the database in a "fire-and-forget" fashion - when you have a fair amount of confidence on the database infrastructure and care more about latency than the durability of every write. However, if you set the j option to true, your w option will effectively be overridden as the database will ensure that the write is written to the on-disk journal before returning.

If you're using write concerns to guarantee the success of your write operations, make sure that you remember these three crucial caveats! We're here to help, so feel free to connect with any questions through Twitter or by email.

MongoDB Write Concern: 3 Must-Know Caveats

More Stories By Vaibhaw Pandey

Vaibhaw Pandey is a Software Developer with interests in Distributed Systems, Databases and Web-scale technologies.

Latest Stories
Every organization is facing their own Digital Transformation as they attempt to stay ahead of the competition, or worse, just keep up. Each new opportunity, whether embracing machine learning, IoT, or a cloud migration, seems to bring new development, deployment, and management models. The results are more diverse and federated computing models than any time in our history.
On-premise or off, you have powerful tools available to maximize the value of your infrastructure and you demand more visibility and operational control. Fortunately, data center management tools keep a vigil on memory contestation, power, thermal consumption, server health, and utilization, allowing better control no matter your cloud's shape. In this session, learn how Intel software tools enable real-time monitoring and precise management to lower operational costs and optimize infrastructure...
"Calligo is a cloud service provider with data privacy at the heart of what we do. We are a typical Infrastructure as a Service cloud provider but it's been designed around data privacy," explained Julian Box, CEO and co-founder of Calligo, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Isomorphic Software is the global leader in high-end, web-based business applications. We develop, market, and support the SmartClient & Smart GWT HTML5/Ajax platform, combining the productivity and performance of traditional desktop software with the simplicity and reach of the open web. With staff in 10 timezones, Isomorphic provides a global network of services related to our technology, with offerings ranging from turnkey application development to SLA-backed enterprise support. Leadin...
While a hybrid cloud can ease that transition, designing and deploy that hybrid cloud still offers challenges for organizations concerned about lack of available cloud skillsets within their organization. Managed service providers offer a unique opportunity to fill those gaps and get organizations of all sizes on a hybrid cloud that meets their comfort level, while delivering enhanced benefits for cost, efficiency, agility, mobility, and elasticity.
DevOps has long focused on reinventing the SDLC (e.g. with CI/CD, ARA, pipeline automation etc.), while reinvention of IT Ops has lagged. However, new approaches like Site Reliability Engineering, Observability, Containerization, Operations Analytics, and ML/AI are driving a resurgence of IT Ops. In this session our expert panel will focus on how these new ideas are [putting the Ops back in DevOps orbringing modern IT Ops to DevOps].
Darktrace is the world's leading AI company for cyber security. Created by mathematicians from the University of Cambridge, Darktrace's Enterprise Immune System is the first non-consumer application of machine learning to work at scale, across all network types, from physical, virtualized, and cloud, through to IoT and industrial control systems. Installed as a self-configuring cyber defense platform, Darktrace continuously learns what is ‘normal' for all devices and users, updating its understa...
Enterprises are striving to become digital businesses for differentiated innovation and customer-centricity. Traditionally, they focused on digitizing processes and paper workflow. To be a disruptor and compete against new players, they need to gain insight into business data and innovate at scale. Cloud and cognitive technologies can help them leverage hidden data in SAP/ERP systems to fuel their businesses to accelerate digital transformation success.
Most organizations are awash today in data and IT systems, yet they're still struggling mightily to use these invaluable assets to meet the rising demand for new digital solutions and customer experiences that drive innovation and growth. What's lacking are potent and effective ways to rapidly combine together on-premises IT and the numerous commercial clouds that the average organization has in place today into effective new business solutions.
Concerns about security, downtime and latency, budgets, and general unfamiliarity with cloud technologies continue to create hesitation for many organizations that truly need to be developing a cloud strategy. Hybrid cloud solutions are helping to elevate those concerns by enabling the combination or orchestration of two or more platforms, including on-premise infrastructure, private clouds and/or third-party, public cloud services. This gives organizations more comfort to begin their digital tr...
Keeping an application running at scale can be a daunting task. When do you need to add more capacity? Larger databases? Additional servers? These questions get harder as the complexity of your application grows. Microservice based architectures and cloud-based dynamic infrastructures are technologies that help you keep your application running with high availability, even during times of extreme scaling. But real cloud success, at scale, requires much more than a basic lift-and-shift migrati...
David Friend is the co-founder and CEO of Wasabi, the hot cloud storage company that delivers fast, low-cost, and reliable cloud storage. Prior to Wasabi, David co-founded Carbonite, one of the world's leading cloud backup companies. A successful tech entrepreneur for more than 30 years, David got his start at ARP Instruments, a manufacturer of synthesizers for rock bands, where he worked with leading musicians of the day like Stevie Wonder, Pete Townsend of The Who, and Led Zeppelin. David has ...
Darktrace is the world's leading AI company for cyber security. Created by mathematicians from the University of Cambridge, Darktrace's Enterprise Immune System is the first non-consumer application of machine learning to work at scale, across all network types, from physical, virtualized, and cloud, through to IoT and industrial control systems. Installed as a self-configuring cyber defense platform, Darktrace continuously learns what is ‘normal' for all devices and users, updating its understa...
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
Addteq is a leader in providing business solutions to Enterprise clients. Addteq has been in the business for more than 10 years. Through the use of DevOps automation, Addteq strives on creating innovative solutions to solve business processes. Clients depend on Addteq to modernize the software delivery process by providing Atlassian solutions, create custom add-ons, conduct training, offer hosting, perform DevOps services, and provide overall support services.