SYS-CON MEDIA Authors: Yeshim Deniz, Carmen Gonzalez, Liz McMillan, Elizabeth White, Pat Romanski

Blog Feed Post

Hadoop Will Not Mow Your Lawn


"The best minds of my generation are thinking about how to make people click ads." Jeff Hammerbacher ex- Facebook Architect

It turns out that when you have a lot of "best minds" working on the same problem, you come up with some pretty interesting technology - no matter how inane that problem may be.

The technology that those "best minds" at Yahoo came up with to target ads to users is called Hadoop. 

Hadoop is a powerful technology and like most new IT solutions is being touted at being able to solve a vast number of technical ills. When companies discover that Hadoop will not in fact cure male pattern balding, they will fall into the inevitable trough of disillusionment

Here are some thoughts about what Hadoop can and cannot do:

1. RDBS are for business data, Hadoop is for web data

Almost all traditional business data fits well into the relational model, including data about customers (CRM), products (ERP) and employees (HR). This data should continue to live in relational databases, where it is much easier to manage and access than in Hadoop.

Almost all web data fits well into the Hadoop model, including log files, email and social media. This data would be almost impossible to store in a relational database, not just because of the volume, but because of the inherently nested quality of the data (threaded email conversations, web site directory structures, social media graphs).

2. Hadoop is really good at analyzing web data

Hadoop is incredibly good at looking at huge amounts of web data and figuring out why people clicked on the blue button instead of the red one. This can be generated to a few other computer log formats, but the list is relatively small, including:
How many other data types look like click streams? Not very many. How many other real world problems lend themselves to analysis using web data analytic techniques? Also not as many as you might think.

This is not to take anything from the Hadoop market opportunity - as more of the world interacts with each other via web applications and devices, more of the world's data will be reducible to click-stream-like formats. 

The big data craze has taken over the tech media world much like the cloud craze. Most people know it is important but they don't know why. Many vendors get caught up in the hype cycle and start to believe that their technology has some sort of manifest destiny that will allow it to do much more than it can reasonably be expected to do.

3. Hadoop is a Pay Me Later Technology

Traditional data warehouses work on a "pay me now" basis. To get data into the data warehouse - even data that may not end up being useful in any way - you have to massage the data into a formal relational model. This is expensive and the data normalization process itself may make it impossible to get at the data in exactly the way you want to.

In contrast, Hadoop works on a "pay me later" basis. Data can be shoved into the Hadoop file system any old way. It is not until someone wants to analyze the data that you have to worry about how to connect all the pieces. The gotcha is that the price you pay in this "pay me later" model is much higher, requiring extensive programming in order to ask each question. 

In addition, because the normalization process wasn't done up front, it won't be until later that you may discover that you were missing crucial pieces of information all along. Thus it does bear some thinking up front on what sort of data to store in your Hadoop database and what kinds of questions you might want to be able to answer about that data in the future.  

Realistically, it will take most businesses who implement several years to figure out whether all the data they are dumping into Hadoop produces real value out the back end, just as it was several years before companies started to get a payout from their investments in relational data warehouses.

4. Use the right tool for the right job

Back in my - very brief - high school shop days, we learned that the trick to making a really nice looking ash tray is picking the right tool for the right job.
  • Hadoop is web data query engine that requires a high level of effort for each new query. 
  • Relational is a business data query engine that requires a high level of effort to format and load data into the datastore.
The fastest way for companies to get into trouble with Hadoop is to try to use it as a one-size-fits-all data warehouse. Much of the news in the Hadoop world today has to do with SQL parsers that run on top of Hadoop data. This is a powerful and valuable technology, but does not mean that you can throw out your data warehouse and replace it with Hadoop just yet.



Read the original blog entry...

More Stories By Christopher Keene

Christopher Keene is Chairman and CEO of WaveMaker (formerly ActiveGrid). He was the founder, in 1991, of Persistence Software, a San Mateo, CA-based company that created a new approach for managing data in high-transaction banking and communications systems. Persistence Software investors included Cisco, Intel, Reuters and Sun Microsystems. The company went public in 1999 on the NASDAQ exchange and was sold in 2004 to Progress software.

After leaving Persistence Software in 2005, Chris spent a year in France as chairman of Reportive Software, a Paris-based maker of business-intelligence tools, and as an adjunct professor and entrepreneur-in-residence at INSEAD, a leading graduate business school.

Latest Stories
Financial enterprises in New York City, London, Singapore, and other world financial capitals are embracing a new generation of smart, automated FinTech that eliminates many cumbersome, slow, and expensive intermediate processes from their businesses. Accordingly, attendees at the upcoming 23rd CloudEXPO, June 24-26, 2019 at Santa Clara Convention Center in Santa Clara, CA will find fresh new content in full new FinTech & Enterprise Blockchain track.
92% of enterprises are using the public cloud today. As a result, simply being in the cloud is no longer enough to remain competitive. The benefit of reduced costs has normalized while the market forces are demanding more innovation at faster release cycles. Enter Cloud Native! Cloud Native enables a microservices driven architecture. The shift from monolithic to microservices yields a lot of benefits - but if not done right - can quickly outweigh the benefits. The effort required in monitoring,...
As the digitization of business accelerates the move of critical applications and content to the cloud, the network has never been as critical to business success. Consuming everything ‘as-a-service' requires new levels of network automation, agility and security. Discover how Enterprises can take advantage of Digital Platforms, directly connecting to an extensive ecosystem of digital partners and flex their service at the click of a button.
Blockchain has shifted from hype to reality across many industries including Financial Services, Supply Chain, Retail, Healthcare and Government. While traditional tech and crypto organizations are generally male dominated, women have embraced blockchain technology from its inception. This is no more evident than at companies where women occupy many of the blockchain roles and leadership positions. Join this panel to hear three women in blockchain share their experience and their POV on the futu...
Cloud Storage 2.0 has brought many innovations, including the availability of cloud storage services that are less expensive and much faster than previous generations of cloud storage. Cloud Storage 2.0 has also delivered new and faster methods for migrating your premises storage environment to the cloud and the concept of multi-cloud. This session will provide technical details on Cloud Storage 2.0 and the methods used to efficiently migrate from premises-to-cloud storage. This session will als...
Concerns about security, downtime and latency, budgets, and general unfamiliarity with cloud technologies continue to create hesitation for many organizations that truly need to be developing a cloud strategy. Hybrid cloud solutions are helping to elevate those concerns by enabling the combination or orchestration of two or more platforms, including on-premise infrastructure, private clouds and/or third-party, public cloud services. This gives organizations more comfort to begin their digital tr...
In very short order, the term "Blockchain" has lost an incredible amount of meaning. With too many jumping on the bandwagon, the market is inundated with projects and use cases that miss the real potential of the technology. We have to begin removing Blockchain from the conversation and ground ourselves in the motivating principles of the technology itself; whether it is consumer privacy, data ownership, trust or even participation in the global economy, the world is faced with serious problems ...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It's clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Th...
For enterprises to maintain business competitiveness in the digital economy, IT modernization is required. And cloud, with its on-demand, elastic and scalable principles has resoundingly been identified as the infrastructure model capable of supporting fast-changing business requirements that enterprises are challenged with, as a result of our increasingly connected world. In fact, Gartner states that by 2022, 28% of enterprise IT spending will have shifted to cloud. But enterprises still must d...
Cloud-Native thinking and Serverless Computing are now the norm in financial services, manufacturing, telco, healthcare, transportation, energy, media, entertainment, retail and other consumer industries, as well as the public sector. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that pro...
The level of trust we have with individuals, businesses, and technology affects our lives daily. This is important to remember when discussing new technologies. For example, our level of trust is a critical factor when evaluating a new technology as a potential solution for providing business value. Given the importance of trust, imagine one's reaction upon hearing that blockchain is a "trustless trust" system. On the surface, that does sound like an oxymoron. This paper discusses how "trustless...
Public clouds dominate IT conversations but the next phase of cloud evolutions are "multi" hybrid cloud environments. The winners in the cloud services industry will be those organizations that understand how to leverage these technologies as complete service solutions for specific customer verticals. In turn, both business and IT actors throughout the enterprise will need to increase their engagement with multi-cloud deployments today while planning a technology strategy that will constitute a ...
Data center, on-premise, public-cloud, private-cloud, multi-cloud, hybrid-cloud, IoT, AI, edge, SaaS, PaaS... it's an availability, security, performance and integration nightmare even for the best of the best IT experts. Organizations realize the tremendous benefits of everything the digital transformation has to offer. Cloud adoption rates are increasing significantly, and IT budgets are morphing to follow suit. But distributing applications and infrastructure around increases risk, introdu...
Moving to Azure is the path to digital transformation, but not every journey is effective. Organizations that start with a cohesive, well-planned migration strategy can avoid common mistakes and stay a step ahead of the competition. Learn from Atmosera CEO, Jon Thomsen about the opportunities and challenges found in three pivotal phases of the journey to the cloud: Evaluation and Architecting, Migration and Management, and Optimization & Innovation. In each phase, there are distinct insights tha...
Most modern computer languages embed a lot of metadata in their application. We show how this goldmine of data from a runtime environment like production or staging can be used to increase profits. Adi conceptualized the Crosscode platform after spending over 25 years working for large enterprise companies like HP, Cisco, IBM, UHG and personally experiencing the challenges that prevent companies from quickly making changes to their technology, due to the complexity of their enterprise. An accomp...