SYS-CON MEDIA Authors: Pat Romanski, Elizabeth White, Zakia Bouachraoui, Liz McMillan, William Schmarzo

Blog Feed Post

Is the Facebook DC Architecture right for you?

A few weeks ago Facebook announced their new datacenter architecture in a post on their network engineering blog. Facebook is one of the few large web scale companies that is fairly open about their network architecture and designs and it gives many others the opportunity to see how a network can be scaled, even though the scale is well beyond what most will need in the foreseeable future, if not forever.

In the post, Alexey walks through some of the thought process behind the architecture, which is ultimately the most important part of any architecture and design. Too often we simply build whatever seems to be popular or common, or mandated/pushed by a specific vendor. The network however is a product, a deliverable, and has requirements like just about anything else we produce.

Facebook’s and the other web properties’ scale is at a different order of magnitude from most everyone else, but their requirements should sound pretty familiar to many:

  • Intra DC traffic is significantly higher than inter DC or DC to Internet traffic
    • “machine to machine traffic – is several orders of magnitude larger than what goes out to the Internet”
  • Build for growth, the network is not a static entity
    • “ability to move fast and support rapid growth is at the core of our infrastructure design philosophy”
  • Simple Design, easy to operate and maintain
    • “keep our networking infrastructure simple enough that small, highly efficient teams of engineers can manage it”
    • “Our goal is to make deploying and operating our networks easier and faster over time”

Anyone with a decent sized datacenter infrastructure should find these same basic requirements back in their own network needs.

With the requirements in hand (and a few more I am sure), Facebook created clusters of racks with servers and supporting networking equipment and then built a hierarchy of network equipment on top. Each rack in a cluster contains a regular ToR switch with 4 40GbE uplinks to the first spine layer. While not explicitly stated, these ToRs likely support 48 to 56 server side 10GbE ports (this could be as high as 80 when using 96 port switches). That makes a rack somewhere between 3:1 to 5:1 oversubscribed to the fabric.

From these ToR switches, each of these 40GbE is connected to a fabric switch. With 48 ToR switches in a cluster or pod, these fabric switches support 48x40GbE towards the ToR layer. As stated, these switches have the ability to support the same amount of bandwidth up to the next spine layer (I guess Facebook differentiates them in name by calling them fabric switches vs spine switches even though the fabric switches act as the spine for the ToR switches).

This means that each of these pod spine switches needs to support up to 96x40GbE, which makes these mid sized modular switches that have an internal fabric. You cannot make a switch of that size without having some form of internal fabric to connect multiple ethernet ASICs to each other. With simplicity and ease of maintenance in mind, I am sure Facebook picked systems that have an internal CLOS fabric built out of the same ethernet ASICs used for the ToR switches. This also means there is not a very large amount of buffer memory available in the fabric and spine layers, contrary to what many believe is required (we are not among them). Similarly for latency, this is not a low latency fabric by new standards, which may be fine for Facebook’s requirements. Server to server traffic between different server pods may take up to 11 ethernet ASIC hops, some of which are not cut through switching. This may add up to close to 10 microseconds.

The spine plane that connects each of the clusters together is created using the same switch as the cluster spine. It has the ability to scale to essentially a few hundred pods. And that’s big. Bigger than 99% of the rest of the world will need.

This design very modular and can grow inside of a pod and by attaching more pods together with the fabric switches. The challenge however is that the cabling is not trivial unless you get to start fresh and layout enough fiber for the maximum configuration. Facebook has the luxury to regularly build new datacenters, most enterprises are adding to existing infrastructures, in existing buildings where recabling is not easy or cheap. Grow as you go with this design only works if the cabling is provided for the maximum configuration. So while the network is designed for easy expansion and growth, the foundational physical infrastructure has to be planned and executed at maximum size.

Ultimately the Facebook design is a 3 tier hierarchical network, but the top 2 tiers act as a fabric for the ToR switches. Facebook decided to implement the fabric as its own spine and leaf network. Our solution to a similar set of requirements would build a Plexxi fabric connecting ToR switches. ToR switches would connect to only a few Plexxi switches (for redundancy purposes), the Plexxi switches connect to each other to provide a fully programmable fabric. A Plexxi fabric extends by simply adding more switches with only local cabling.

By using switches that all use the same underlying ASIC technology, there is a very common set of limitations to worry about. It is exactly known how large each of the required tables are and those can be carefully engineered. The BGP engineering portion of the Facebook design is not insignificant. The ASICs used are limited in some of their table sizes, which means that IP address schemes need to be carefully designed, again with maximum size in mind.

The network is engineered as a full L3 network, there is no L2 connectivity outside of a rack. For Facebook this works as they own every piece of their application suite. Like it or not, there are many (legacy) enterprise applications and services that either require L2 connectivity, or work simpler in an L2 environment.

I have not touched on a key aspect of the Facebook design: “distributed control with centralized override”. This Facebook variation of SDN has extremely similar foundational thoughts to how we at Plexxi approach the programmability of the network. That will be blog post in and by itself.

I am sure many will take the Facebook design as the new way to design datacenter networks. But please apply your own scaling, extensibility and physical limitation requirements. There are some rather large luxuries a company like Facebook can afford which most others can not.

The post Is the Facebook DC Architecture right for you? appeared first on Plexxi.

Read the original blog entry...

More Stories By Michael Bushong

The best marketing efforts leverage deep technology understanding with a highly-approachable means of communicating. Plexxi's Vice President of Marketing Michael Bushong has acquired these skills having spent 12 years at Juniper Networks where he led product management, product strategy and product marketing organizations for Juniper's flagship operating system, Junos. Michael spent the last several years at Juniper leading their SDN efforts across both service provider and enterprise markets. Prior to Juniper, Michael spent time at database supplier Sybase, and ASIC design tool companies Synopsis and Magma Design Automation. Michael's undergraduate work at the University of California Berkeley in advanced fluid mechanics and heat transfer lend new meaning to the marketing phrase "This isn't rocket science."

Latest Stories
DevOps tends to focus on the relationship between Dev and Ops, putting an emphasis on the ops and application infrastructure. But that’s changing with microservices architectures. In her session at DevOps Summit, Lori MacVittie, Evangelist for F5 Networks, will focus on how microservices are changing the underlying architectures needed to scale, secure and deliver applications based on highly distributed (micro) services and why that means an expansion into “the network” for DevOps.
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settl...
Contextual Analytics of various threat data provides a deeper understanding of a given threat and enables identification of unknown threat vectors. In his session at @ThingsExpo, David Dufour, Head of Security Architecture, IoT, Webroot, Inc., discussed how through the use of Big Data analytics and deep data correlation across different threat types, it is possible to gain a better understanding of where, how and to what level of danger a malicious actor poses to an organization, and to determin...
@CloudEXPO and @ExpoDX, two of the most influential technology events in the world, have hosted hundreds of sponsors and exhibitors since our launch 10 years ago. @CloudEXPO and @ExpoDX New York and Silicon Valley provide a full year of face-to-face marketing opportunities for your company. Each sponsorship and exhibit package comes with pre and post-show marketing programs. By sponsoring and exhibiting in New York and Silicon Valley, you reach a full complement of decision makers and buyers in ...
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
LogRocket helps product teams develop better experiences for users by recording videos of user sessions with logs and network data. It identifies UX problems and reveals the root cause of every bug. LogRocket presents impactful errors on a website, and how to reproduce it. With LogRocket, users can replay problems.
Data Theorem is a leading provider of modern application security. Its core mission is to analyze and secure any modern application anytime, anywhere. The Data Theorem Analyzer Engine continuously scans APIs and mobile applications in search of security flaws and data privacy gaps. Data Theorem products help organizations build safer applications that maximize data security and brand protection. The company has detected more than 300 million application eavesdropping incidents and currently secu...
Rafay enables developers to automate the distribution, operations, cross-region scaling and lifecycle management of containerized microservices across public and private clouds, and service provider networks. Rafay's platform is built around foundational elements that together deliver an optimal abstraction layer across disparate infrastructure, making it easy for developers to scale and operate applications across any number of locations or regions. Consumed as a service, Rafay's platform elimi...
Kubernetes is a new and revolutionary open-sourced system for managing containers across multiple hosts in a cluster. Ansible is a simple IT automation tool for just about any requirement for reproducible environments. In his session at @DevOpsSummit at 18th Cloud Expo, Patrick Galbraith, a principal engineer at HPE, discussed how to build a fully functional Kubernetes cluster on a number of virtual machines or bare-metal hosts. Also included will be a brief demonstration of running a Galera MyS...
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, discussed the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, application p...
Growth hacking is common for startups to make unheard-of progress in building their business. Career Hacks can help Geek Girls and those who support them (yes, that's you too, Dad!) to excel in this typically male-dominated world. Get ready to learn the facts: Is there a bias against women in the tech / developer communities? Why are women 50% of the workforce, but hold only 24% of the STEM or IT positions? Some beginnings of what to do about it! In her Day 2 Keynote at 17th Cloud Expo, Sandy Ca...
Two weeks ago (November 3-5), I attended the Cloud Expo Silicon Valley as a speaker, where I presented on the security and privacy due diligence requirements for cloud solutions. Cloud security is a topical issue for every CIO, CISO, and technology buyer. Decision-makers are always looking for insights on how to mitigate the security risks of implementing and using cloud solutions. Based on the presentation topics covered at the conference, as well as the general discussions heard between sessio...
New competitors, disruptive technologies, and growing expectations are pushing every business to both adopt and deliver new digital services. This ‘Digital Transformation’ demands rapid delivery and continuous iteration of new competitive services via multiple channels, which in turn demands new service delivery techniques – including DevOps. In this power panel at @DevOpsSummit 20th Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, panelists examined how DevOps helps to meet the de...
Fact: storage performance problems have only gotten more complicated, as applications not only have become largely virtualized, but also have moved to cloud-based infrastructures. Storage performance in virtualized environments isn’t just about IOPS anymore. Instead, you need to guarantee performance for individual VMs, helping applications maintain performance as the number of VMs continues to go up in real time. In his session at Cloud Expo, Dhiraj Sehgal, Product and Marketing at Tintri, sha...
According to Forrester Research, every business will become either a digital predator or digital prey by 2020. To avoid demise, organizations must rapidly create new sources of value in their end-to-end customer experiences. True digital predators also must break down information and process silos and extend digital transformation initiatives to empower employees with the digital resources needed to win, serve, and retain customers.