SYS-CON MEDIA Authors: Yeshim Deniz, Elizabeth White, Pat Romanski, Carmen Gonzalez, Zakia Bouachraoui

Related Topics: Containers Expo Blog, Java IoT, Microservices Expo, Linux Containers, Open Source Cloud, SDN Journal

Containers Expo Blog: Article

Preventing Performance Bottlenecks with Inline Deduplication

In a typical enterprise storage system the bottleneck to performance is in media bandwidth or computational overhead

Implementing high performance in enterprise storage is a constant battle to find and eliminate the next system bottleneck. Normally this alternates between limits of the underlying media and the computational overhead of metadata management, but choosing the wrong approach to deduplication can introduce a third performance challenge that can be impossible to overcome. Storage that implements a multi-pass approach to data optimization, such as staged or post-process deduplication, becomes inherently at a disadvantage for both computational and media overhead.

Common Performance Bottlenecks
In a typical enterprise storage system the bottleneck to performance is in one of two places: media bandwidth or computational overhead.

For the storage system designer, media overhead is the simplest to address - add more or faster media. In a hard disk-based storage system, this means adding faster drives, more drives, and larger drive sets. In a flash storage system bandwidth is increased by using SLC flash, adding more independent modules, allocating more over-provisioned space, and improving the flash translation layer.

Reducing computational overhead is a greater challenge. Once you have enough media bandwidth the problem becomes shuffling data to and from the storage initiators. Identifying the bottlenecks to performance here can be devilishly complex, as the designer must be concerned about matters such as system memory bandwidth, number of data copies and, especially in today's multi-core world, synchronization between multiple requests. There's no silver bullet here, so the only solution is having a very talented team of software engineers designing and optimizing the storage platform.

Adding deduplication introduces complexity directly into this most challenging area for performance improvement. Any deduplication implementation must interact directly with the storage metadata that is so critical to performance, since I/O requests are being redirected or eliminated based on the system's knowledge of duplicate data. Unless the deduplication technology has been designed and implemented in an inline, multi-core scalable, and low memory overhead way, system architects often try to separate deduplication into a separate layer and a second pass through the data. This is a mistake that harms storage performance in a way that cannot be repaired.

The Impossible Challenge of Multi-Pass Deduplication
This second pass through the data commonly occurs in two possible places: on the final storage media or when transferring data from a staging area to the final storage media.

The first case is always called post-process deduplication. Data are written to their resting media location and a separate process later reads them back, as time and bandwidth allow, determining if any portions are duplicates. If there are duplicates then storage metadata is updated to note this and space is freed for reuse. I've written extensively in the past about the risks of post-process deduplication; since it always requires additional media bandwidth and computational overhead it severely harms performance, and since there are no guarantees about when deduplication will occur it does not meet the requirements for high-change-rate use cases such as VDI.

Post-process Deduplication

The second case, where data is deduplicated as it is being moved from a staging location to a final media location, is often erroneously called inline - as it is inline with that destaging process - but is really just a modified form of post-process deduplication. As with conventional post-process deduplication, another round of data read and processing must occur. Additionally, now both the staging media and final media must provide the full system level of performance or either can become the bottleneck.

For example, some flash storage systems stage all data to a small arena of SLC flash prior to deduplication. This design doubles the number of possible performance bottlenecks in the architecture: performance writing to the staging area, front-end data ingestion, final media storage performance, and the deduplication and de-staging process itself. This sort of multi-pass deduplication process retains all of the negative performance aspects of a traditional post-process implementation.

Staged Post-process Deduplication

High Performance Requires Inline Deduplication
Any form of multi-pass deduplication introduces new bottlenecks that prevent an enterprise storage system from delivering the highest levels of performance. Post-process deduplication, whether on the final media or during a destaging process, creates additional overhead in both media access and data processing. For flash storage platforms requiring the highest levels of performance, only tightly integrated inline deduplication can meet all system requirements.

More Stories By Jered Floyd

Jered Floyd, Chief Technology Officer and Founder of Permabit Technology Corporation, is responsible for exploring strategic future directions for Permabit’s products, and providing thought leadership to guide the company’s data optimization initiatives. He has previously deployed Permabit’s effective software development methodologies and was responsible for developing Permabit product’s core protocol and initial server and system architectures.

Prior to Permabit, Floyd was a Research Scientist on the Microbial Engineering project at the MIT Artificial Intelligence Laboratory, working to bridge the gap between biological and computational systems. Earlier at Turbine, he developed a robust integration language for managing active objects in a massively distributed online virtual environment. Floyd holds Bachelor’s and Master’s degrees in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Latest Stories
The Blockchain Benchmark asks and answers the questions many people want to know about the state of Blockchain: What are the biggest barriers? What was your motivation to get involved? When will it mainstream? Who are the true influencers? What are its top use cases? Who will win over the next 5 years? How will the future unfold? And 20+ other valuable questions.
In addition to 22 Keynotes and General Sessions, attend all FinTechEXPO Blockchain "education sessions" plus 40 in two tracks: (1) Enterprise Cloud (2) Digital Transformation. PRICE EXPIRES AUGUST 31, 2018. Ticket prices: ($295-Aug 31) ($395-Oct 31) ($495-Nov 12) ($995-Walk-in) Does NOT include lunch.
SAP is the world leader in enterprise applications in terms of software and software-related service revenue. Based on market capitalization, we are the world's third largest independent software manufacturer. Harness the power of your data and accelerate trusted outcome-driven innovation by developing intelligent and live solutions for real-time decisions and actions on a single data copy. Support next-generation transactional and analytical processing with a broad set of advanced analytics - r...
DXWorldEXPO LLC announced today that Kevin Jackson joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Kevin L. Jackson is a globally recognized cloud computing expert and Founder/Author of the award winning "Cloud Musings" blog. Mr. Jackson has also been recognized as a "Top 100 Cybersecurity Influencer and Brand" by Onalytica (2015), a Huffington Post "Top 100 Cloud Computing Experts on Twitter" (2013) and a "Top 50 C...
Provide an overview of the capabilities of Azure Stack allowing you or your customers to adopt truly consistent Hybrid Cloud capabilities to deliver greater productivity in your cloud world. Ultan Kinahan is on a member of the Global Black Belt team at Microsoft with a focus on Azure Stack Hybrid Cloud. Ultan has been in the Azure team since the beginning, Has held roles in Engineering, Sales and now consults with both small to medium size business and the worlds largest organizations on how to ...
As the fourth industrial revolution continues to march forward, key questions remain related to the protection of software, cloud, AI, and automation intellectual property. Recent developments in Supreme Court and lower court case law will be reviewed to explain the intricacies of what inventions are eligible for patent protection, how copyright law may be used to protect application programming interfaces (APIs), and the extent to which trademark and trade secret law may have expanded relev...
Early Bird Registration Discount Expires on August 31, 2018 Conference Registration Link ▸ HERE. Pick from all 200 sessions in all 10 tracks, plus 22 Keynotes & General Sessions! Lunch is served two days. EXPIRES AUGUST 31, 2018. Ticket prices: ($1,295-Aug 31) ($1,495-Oct 31) ($1,995-Nov 12) ($2,500-Walk-in)
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
Contino is a global technical consultancy that helps highly-regulated enterprises transform faster, modernizing their way of working through DevOps and cloud computing. They focus on building capability and assisting our clients to in-source strategic technology capability so they get to market quickly and build their own innovation engine.
Blockchain is a new buzzword that promises to revolutionize the way we manage data. If the data is stored in a blockchain there is no need for a middleman - the distributed database is stored on multiple and there is no need to have a centralized server that will ensure that the transactions can be trusted. The best way to understand how a blockchain works is to build one. During this presentation, we'll start with covering the basics (hash, nounce, block, smart contracts) and then we'll create ...
DevOpsSUMMIT at CloudEXPO will expand the DevOps community, enable a wide sharing of knowledge, and educate delegates and technology providers alike. Recent research has shown that DevOps dramatically reduces development time, the amount of enterprise IT professionals put out fires, and support time generally. Time spent on infrastructure development is significantly increased, and DevOps practitioners report more software releases and higher quality. Sponsors of DevOpsSUMMIT at CloudEXPO will b...
FinTech Is Now Part of the CloudEXPO New York Program. Financial enterprises in New York City, London, Singapore, and other world financial capitals are embracing a new generation of smart, automated FinTech that eliminates many cumbersome, slow, and expensive intermediate processes from their businesses. Accordingly, attendees at the upcoming 22nd CloudEXPO | DXWorldEXPO November 12-13, 2018 in New York City will find fresh new content in two new tracks called: FinTechEXPO New York Blockchain E...
In addition to 22 Keynotes and General Sessions, pick from 40 technical sessions in two tracks: (1) DevOpsSUMMIT (2) Cloud-Native & Serverless. EXPIRES AUGUST 31, 2018. Ticket prices: ($295-Aug 31) ($395-Oct 31) ($495-Nov 12) ($595-Walk-in) Does NOT include lunch. DevOps Institue Certification DevOps Institute Two-Day DevOps Certification Program EXPIRES AUGUST 31, 2018. Ticket prices: ($995-Aug 31) ($1,095-Oct 31) ($1,195-Nov 12) ($1,395-Walk-in)
Cloud adoption is a core component of digital transformation. Scaling the IT environment, making it resilient, and reducing costs are what organizations want. Hear from the author of the best selling Packtbook "Architecting Cloud Computing Solutions" as he presents and explains critical Cloud solution design considerations and technology decisions required to choose and deploy the right Cloud service and deployment models, that are aligned to your business and technology service requirements. T...
Wasabi is the hot cloud storage company delivering low-cost, fast, and reliable cloud storage. Wasabi is 80% cheaper and 6x faster than Amazon S3, with 100% data immutability protection and no data egress fees. Created by Carbonite co-founders and cloud storage pioneers David Friend and Jeff Flowers, Wasabi is on a mission to commoditize the storage industry. Wasabi is a privately held company based in Boston, MA. Follow and connect with Wasabi on Twitter, Facebook, Instagram and the Wasabi blog...