SYS-CON MEDIA Authors: Pat Romanski, Elizabeth White, Liz McMillan, Zakia Bouachraoui, Yeshim Deniz

Blog Feed Post

Is the Facebook DC Architecture right for you?

A few weeks ago Facebook announced their new datacenter architecture in a post on their network engineering blog. Facebook is one of the few large web scale companies that is fairly open about their network architecture and designs and it gives many others the opportunity to see how a network can be scaled, even though the scale is well beyond what most will need in the foreseeable future, if not forever.

In the post, Alexey walks through some of the thought process behind the architecture, which is ultimately the most important part of any architecture and design. Too often we simply build whatever seems to be popular or common, or mandated/pushed by a specific vendor. The network however is a product, a deliverable, and has requirements like just about anything else we produce.

Facebook’s and the other web properties’ scale is at a different order of magnitude from most everyone else, but their requirements should sound pretty familiar to many:

  • Intra DC traffic is significantly higher than inter DC or DC to Internet traffic
    • “machine to machine traffic – is several orders of magnitude larger than what goes out to the Internet”
  • Build for growth, the network is not a static entity
    • “ability to move fast and support rapid growth is at the core of our infrastructure design philosophy”
  • Simple Design, easy to operate and maintain
    • “keep our networking infrastructure simple enough that small, highly efficient teams of engineers can manage it”
    • “Our goal is to make deploying and operating our networks easier and faster over time”

Anyone with a decent sized datacenter infrastructure should find these same basic requirements back in their own network needs.

With the requirements in hand (and a few more I am sure), Facebook created clusters of racks with servers and supporting networking equipment and then built a hierarchy of network equipment on top. Each rack in a cluster contains a regular ToR switch with 4 40GbE uplinks to the first spine layer. While not explicitly stated, these ToRs likely support 48 to 56 server side 10GbE ports (this could be as high as 80 when using 96 port switches). That makes a rack somewhere between 3:1 to 5:1 oversubscribed to the fabric.

From these ToR switches, each of these 40GbE is connected to a fabric switch. With 48 ToR switches in a cluster or pod, these fabric switches support 48x40GbE towards the ToR layer. As stated, these switches have the ability to support the same amount of bandwidth up to the next spine layer (I guess Facebook differentiates them in name by calling them fabric switches vs spine switches even though the fabric switches act as the spine for the ToR switches).

This means that each of these pod spine switches needs to support up to 96x40GbE, which makes these mid sized modular switches that have an internal fabric. You cannot make a switch of that size without having some form of internal fabric to connect multiple ethernet ASICs to each other. With simplicity and ease of maintenance in mind, I am sure Facebook picked systems that have an internal CLOS fabric built out of the same ethernet ASICs used for the ToR switches. This also means there is not a very large amount of buffer memory available in the fabric and spine layers, contrary to what many believe is required (we are not among them). Similarly for latency, this is not a low latency fabric by new standards, which may be fine for Facebook’s requirements. Server to server traffic between different server pods may take up to 11 ethernet ASIC hops, some of which are not cut through switching. This may add up to close to 10 microseconds.

The spine plane that connects each of the clusters together is created using the same switch as the cluster spine. It has the ability to scale to essentially a few hundred pods. And that’s big. Bigger than 99% of the rest of the world will need.

This design very modular and can grow inside of a pod and by attaching more pods together with the fabric switches. The challenge however is that the cabling is not trivial unless you get to start fresh and layout enough fiber for the maximum configuration. Facebook has the luxury to regularly build new datacenters, most enterprises are adding to existing infrastructures, in existing buildings where recabling is not easy or cheap. Grow as you go with this design only works if the cabling is provided for the maximum configuration. So while the network is designed for easy expansion and growth, the foundational physical infrastructure has to be planned and executed at maximum size.

Ultimately the Facebook design is a 3 tier hierarchical network, but the top 2 tiers act as a fabric for the ToR switches. Facebook decided to implement the fabric as its own spine and leaf network. Our solution to a similar set of requirements would build a Plexxi fabric connecting ToR switches. ToR switches would connect to only a few Plexxi switches (for redundancy purposes), the Plexxi switches connect to each other to provide a fully programmable fabric. A Plexxi fabric extends by simply adding more switches with only local cabling.

By using switches that all use the same underlying ASIC technology, there is a very common set of limitations to worry about. It is exactly known how large each of the required tables are and those can be carefully engineered. The BGP engineering portion of the Facebook design is not insignificant. The ASICs used are limited in some of their table sizes, which means that IP address schemes need to be carefully designed, again with maximum size in mind.

The network is engineered as a full L3 network, there is no L2 connectivity outside of a rack. For Facebook this works as they own every piece of their application suite. Like it or not, there are many (legacy) enterprise applications and services that either require L2 connectivity, or work simpler in an L2 environment.

I have not touched on a key aspect of the Facebook design: “distributed control with centralized override”. This Facebook variation of SDN has extremely similar foundational thoughts to how we at Plexxi approach the programmability of the network. That will be blog post in and by itself.

I am sure many will take the Facebook design as the new way to design datacenter networks. But please apply your own scaling, extensibility and physical limitation requirements. There are some rather large luxuries a company like Facebook can afford which most others can not.

The post Is the Facebook DC Architecture right for you? appeared first on Plexxi.

Read the original blog entry...

More Stories By Michael Bushong

The best marketing efforts leverage deep technology understanding with a highly-approachable means of communicating. Plexxi's Vice President of Marketing Michael Bushong has acquired these skills having spent 12 years at Juniper Networks where he led product management, product strategy and product marketing organizations for Juniper's flagship operating system, Junos. Michael spent the last several years at Juniper leading their SDN efforts across both service provider and enterprise markets. Prior to Juniper, Michael spent time at database supplier Sybase, and ASIC design tool companies Synopsis and Magma Design Automation. Michael's undergraduate work at the University of California Berkeley in advanced fluid mechanics and heat transfer lend new meaning to the marketing phrase "This isn't rocket science."

Latest Stories
Founded in 2002 and headquartered in Chicago, Nexum® takes a comprehensive approach to security. Nexum approaches business with one simple statement: “Do what’s right for the customer and success will follow.” Nexum helps you mitigate risks, protect your data, increase business continuity and meet your unique business objectives by: Detecting and preventing network threats, intrusions and disruptions Equipping you with the information, tools, training and resources you need to effectively m...
The vast majority of businesses now use cloud services, yet many still struggle with realizing the full potential of their IT investments. In particular, small and medium-sized businesses (SMBs) lack the internal IT staff and expertise to fully move to and manage workloads in public cloud environments. Speaker Todd Schwartz will help session attendees better navigate the complex cloud market and maximize their technical investments. The SkyKick co-founder and co-CEO will share the biggest challe...
Despite being the market leader, we recognized the need to transform and reinvent our business at Dynatrace, before someone else disrupted the market. Over the course of three years, we changed everything - our technology, our culture and our brand image. In this session we'll discuss how we navigated through our own innovator's dilemma, and share takeaways from our experience that you can apply to your own organization.
All in Mobile is a mobile app agency that helps enterprise companies and next generation startups build the future of digital. We offer mobile development and design for smartphones, tablets and wearables. Our projects cover the latest and most innovative technologies - voice assistants, AI, AR/VR and more. We excel at solutions for sports, fintech and retail industries.
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Nutanix has been named "Platinum Sponsor" of CloudEXPO | DevOpsSUMMIT | DXWorldEXPO New York, which will take place November 12-13, 2018 in New York City. Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that power their business. The Nutanix Enterprise Cloud Platform blends web-scale engineering and consumer-grade design to natively converge server, storage, virtualization and networking into a resilient, software-defined solution with rich machine ...
ICC is a computer systems integrator and server manufacturing company focused on developing products and product appliances to meet a wide range of computational needs for many industries. Their solutions provide benefits across many environments, such as datacenter deployment, HPC, workstations, storage networks and standalone server installations. ICC has been in business for over 23 years and their phenomenal range of clients include multinational corporations, universities, and small busines...
"DevOps is set to be one of the most profound disruptions to hit IT in decades," said Andi Mann. "It is a natural extension of cloud computing, and I have seen both firsthand and in independent research the fantastic results DevOps delivers. So I am excited to help the great team at @DevOpsSUMMIT and CloudEXPO tell the world how they can leverage this emerging disruptive trend."
DXWorldEXPO LLC announced today that Nutanix has been named "Platinum Sponsor" of CloudEXPO | DevOpsSUMMIT | DXWorldEXPO New York, which will take place November 12-13, 2018 in New York City. Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that power their business. The Nutanix Enterprise Cloud Platform blends web-scale engineering and consumer-grade design to natively converge server, storage, virtualization and networking into a resilient, softwar...
Digital transformation is about embracing digital technologies into a company's culture to better connect with its customers, automate processes, create better tools, enter new markets, etc. Such a transformation requires continuous orchestration across teams and an environment based on open collaboration and daily experiments. In his session at 21st Cloud Expo, Alex Casalboni, Technical (Cloud) Evangelist at Cloud Academy, explored and discussed the most urgent unsolved challenges to achieve fu...
Wasabi is the hot cloud storage company delivering low-cost, fast, and reliable cloud storage. Wasabi is 80% cheaper and 6x faster than Amazon S3, with 100% data immutability protection and no data egress fees. Created by Carbonite co-founders and cloud storage pioneers David Friend and Jeff Flowers, Wasabi is on a mission to commoditize the storage industry. Wasabi is a privately held company based in Boston, MA. Follow and connect with Wasabi on Twitter, Facebook, Instagram and the Wasabi blog...
Lori MacVittie is a subject matter expert on emerging technology responsible for outbound evangelism across F5's entire product suite. MacVittie has extensive development and technical architecture experience in both high-tech and enterprise organizations, in addition to network and systems administration expertise. Prior to joining F5, MacVittie was an award-winning technology editor at Network Computing Magazine where she evaluated and tested application-focused technologies including app secu...
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO Silicon Valley 2019 will cover all of these tools, with the most comprehensive program and with 222 rockstar speakers throughout our industry presenting 22 Keynotes and General Sessions, 250 Breakout Sessions along 10 Tracks, as well as our signature Power Panels. Our Expo Floor will bring together the leading global 200 companies throughout the world of Cloud Computing, DevOps, IoT, Smart Cities, FinTech, Digital Transformation, and all they entail. As ...
Atmosera delivers modern cloud services that maximize the advantages of cloud-based infrastructures. Offering private, hybrid, and public cloud solutions, Atmosera works closely with customers to engineer, deploy, and operate cloud architectures with advanced services that deliver strategic business outcomes. Atmosera's expertise simplifies the process of cloud transformation and our 20+ years of experience managing complex IT environments provides our customers with the confidence and trust tha...
Only Adobe gives everyone - from emerging artists to global brands - everything they need to design and deliver exceptional digital experiences. Adobe Systems Incorporated develops, markets, and supports computer software products and technologies. The Company's products allow users to express and use information across all print and electronic media. The Company's Digital Media segment provides tools and solutions that enable individuals, small and medium businesses and enterprises to cre...