SYS-CON MEDIA Authors: Pat Romanski, Elizabeth White, Zakia Bouachraoui, Liz McMillan, William Schmarzo

Article

MongoDB Write Concern: 3 Must-Know Caveats

In this post, we discuss 3 gotchas when using MongoDB write concern.

'Write concern' in MongoDB describes the level of write acknowledgment you can expect from it. It's a rather important setting to remember in your write operations and its behavior is useful to understand, especially in distributed MongoDB deployments (i.e. replica sets and sharded clusters). In this post, we discuss 3 gotchas when using MongoDB write concern.

MongoDB Write Concern

MongoDB's documentation defines write concern as "the level of acknowledgment requested from MongoDB for write operations to a standalone mongod or to replica sets or to sharded clusters."

Simply put, a write concern is an indication of 'durability' passed along with write operations to MongoDB. To clarify, let us look at the syntax:

{ w: <value>, j: <boolean>, wtimeout: <number> }
Where*,
 w can be an integer | "majority" | , it represents the number of members that must acknowledge the write. Default value is 1.
 j Requests that a write be acknowledged after it is written to the on-disk journal as opposed to just the system memory. Unspecified by default.
wtimeout specifies timeout for the applying the write concern. Unspecified by default.

* You can find the detailed syntax in the Write Concern Specification documentation.

* Learn more about the different "tags" you can use for common write concern values in our Understanding Durability & Write Safety in MongoDB blog.

Example:

db.inventory.insert(
    { sku: "abcdxyz", qty : 100, category: "Clothing" },
    { writeConcern: { w: 2, j: true, wtimeout: 5000 } }
)

The above insert's write concern can be read as follows:  acknowledge this write when 'at least 2 members of the replica set have written it to their journals within 5000 msecs or return an error'. A write concern value for option was majority, meaning "requests acknowledgment that write operations have propagated to the majority of voting nodes, including the primary."

The importance of write concern is apparent. Increasing values of w increases the latency of writes while also decreasing their probability of getting lost. Choosing the correct values for write concern depends on the latency and durability requirements of writes being performed.

With that as the background on what a write concern is, let's move on to the three caveats to remember when using write concern.

CAVEAT 1: Setting write concern on replica sets without a wtimeout can cause writes to block indefinitely

The majority definition (applicable MongoDB 3.0 onwards) above states that acknowledgment is requested from a majority of the "voting nodes". Note that "If you do not specify the wtimeout option and the level of write concern is unachievable, the write operation will block indefinitely. "

This can have unexpected consequences, for example, consider a 2+1 replica set (i.e. a primary, a secondary and an arbiter). If your sole read replica goes down, then all writes with a write concern w option of "majority" will block indefinitely.  The same will happen if the w option is set to 2. Another extreme example is in the case of a 3+2 replica set (primary, 2 secondaries and 2 arbiters, not a recommended configuration). All "majority" writes will block even if a single data node is unavailable as the majority number, in this case, is 3.

The simplest way to alleviate this issue is to always specify a wtimeout value so the query can timeout if the write concern can't be enforced. However, in case of such timeout errors, MongoDB doesn't undo already successful writes made to some of the members before the timeout occurred.

There is also currently no setting to ensure a write reaches the majority of nodes that are currently reachable, so be careful about setting the value of write concern w based on the topology, desired durability, and availability.

CAVEAT 2: You might lose data even with w: majority

It seems intuitive that once a write has been acknowledged by the majority of voting members, its durability is guaranteed. However, that isn't the case! Remember that when the j option is unspecified, a write is acknowledged right after it has been written to memory.

So, such a write can be lost if a freak power outage takes out the majority of the nodes to which the write had propagated (and before syncPeriodSecs i.e. before it could be flushed to disk).

In order to ensure the durability of writes, it's best not to turn off journaling on your database and set the j option to true. In fact, starting MongoDB 3.6, the --nojournal flag has been deprecated for replica set members using the WiredTiger storage engine.

With a w value of "majority" and the j option unspecified on a replica set, the exact durability behavior depends on the value of the replica set configuration writeConcernMajorityJournalDefault. When set to true (and when journaling is enabled), it acknowledges writes after they have been written to the journals of a majority of voting members.

Aside: Even with journaling turned on, your writes might still get lost on the MMAPv1 storage engine if an outage occurs within commitIntervalMs duration. The WiredTiger storage engine, on the other hand, forces a sync of journal files when it receives a write with j option set to true. And, even with j set to false, an acknowledged "majority" write to a latest WiredTiger based deployment can be lost only when majority of the data nodes crash simultaneously.

CAVEAT 3: w: 0 while setting j: true doesn't improve write performance

This is easy enough to reason once you think about it, but equally easy to forget. Setting w option to 0 is usually done to write to the database in a "fire-and-forget" fashion - when you have a fair amount of confidence on the database infrastructure and care more about latency than the durability of every write. However, if you set the j option to true, your w option will effectively be overridden as the database will ensure that the write is written to the on-disk journal before returning.

If you're using write concerns to guarantee the success of your write operations, make sure that you remember these three crucial caveats! We're here to help, so feel free to connect with any questions through Twitter or by email.

MongoDB Write Concern: 3 Must-Know Caveats

More Stories By Vaibhaw Pandey

Vaibhaw Pandey is a Software Developer with interests in Distributed Systems, Databases and Web-scale technologies.

Latest Stories
DevOps tends to focus on the relationship between Dev and Ops, putting an emphasis on the ops and application infrastructure. But that’s changing with microservices architectures. In her session at DevOps Summit, Lori MacVittie, Evangelist for F5 Networks, will focus on how microservices are changing the underlying architectures needed to scale, secure and deliver applications based on highly distributed (micro) services and why that means an expansion into “the network” for DevOps.
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settl...
Contextual Analytics of various threat data provides a deeper understanding of a given threat and enables identification of unknown threat vectors. In his session at @ThingsExpo, David Dufour, Head of Security Architecture, IoT, Webroot, Inc., discussed how through the use of Big Data analytics and deep data correlation across different threat types, it is possible to gain a better understanding of where, how and to what level of danger a malicious actor poses to an organization, and to determin...
@CloudEXPO and @ExpoDX, two of the most influential technology events in the world, have hosted hundreds of sponsors and exhibitors since our launch 10 years ago. @CloudEXPO and @ExpoDX New York and Silicon Valley provide a full year of face-to-face marketing opportunities for your company. Each sponsorship and exhibit package comes with pre and post-show marketing programs. By sponsoring and exhibiting in New York and Silicon Valley, you reach a full complement of decision makers and buyers in ...
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
LogRocket helps product teams develop better experiences for users by recording videos of user sessions with logs and network data. It identifies UX problems and reveals the root cause of every bug. LogRocket presents impactful errors on a website, and how to reproduce it. With LogRocket, users can replay problems.
Data Theorem is a leading provider of modern application security. Its core mission is to analyze and secure any modern application anytime, anywhere. The Data Theorem Analyzer Engine continuously scans APIs and mobile applications in search of security flaws and data privacy gaps. Data Theorem products help organizations build safer applications that maximize data security and brand protection. The company has detected more than 300 million application eavesdropping incidents and currently secu...
Rafay enables developers to automate the distribution, operations, cross-region scaling and lifecycle management of containerized microservices across public and private clouds, and service provider networks. Rafay's platform is built around foundational elements that together deliver an optimal abstraction layer across disparate infrastructure, making it easy for developers to scale and operate applications across any number of locations or regions. Consumed as a service, Rafay's platform elimi...
Kubernetes is a new and revolutionary open-sourced system for managing containers across multiple hosts in a cluster. Ansible is a simple IT automation tool for just about any requirement for reproducible environments. In his session at @DevOpsSummit at 18th Cloud Expo, Patrick Galbraith, a principal engineer at HPE, discussed how to build a fully functional Kubernetes cluster on a number of virtual machines or bare-metal hosts. Also included will be a brief demonstration of running a Galera MyS...
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, discussed the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, application p...
Growth hacking is common for startups to make unheard-of progress in building their business. Career Hacks can help Geek Girls and those who support them (yes, that's you too, Dad!) to excel in this typically male-dominated world. Get ready to learn the facts: Is there a bias against women in the tech / developer communities? Why are women 50% of the workforce, but hold only 24% of the STEM or IT positions? Some beginnings of what to do about it! In her Day 2 Keynote at 17th Cloud Expo, Sandy Ca...
Two weeks ago (November 3-5), I attended the Cloud Expo Silicon Valley as a speaker, where I presented on the security and privacy due diligence requirements for cloud solutions. Cloud security is a topical issue for every CIO, CISO, and technology buyer. Decision-makers are always looking for insights on how to mitigate the security risks of implementing and using cloud solutions. Based on the presentation topics covered at the conference, as well as the general discussions heard between sessio...
New competitors, disruptive technologies, and growing expectations are pushing every business to both adopt and deliver new digital services. This ‘Digital Transformation’ demands rapid delivery and continuous iteration of new competitive services via multiple channels, which in turn demands new service delivery techniques – including DevOps. In this power panel at @DevOpsSummit 20th Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, panelists examined how DevOps helps to meet the de...
Fact: storage performance problems have only gotten more complicated, as applications not only have become largely virtualized, but also have moved to cloud-based infrastructures. Storage performance in virtualized environments isn’t just about IOPS anymore. Instead, you need to guarantee performance for individual VMs, helping applications maintain performance as the number of VMs continues to go up in real time. In his session at Cloud Expo, Dhiraj Sehgal, Product and Marketing at Tintri, sha...
According to Forrester Research, every business will become either a digital predator or digital prey by 2020. To avoid demise, organizations must rapidly create new sources of value in their end-to-end customer experiences. True digital predators also must break down information and process silos and extend digital transformation initiatives to empower employees with the digital resources needed to win, serve, and retain customers.