SYS-CON MEDIA Authors: Zakia Bouachraoui, Liz McMillan, Carmen Gonzalez, Roger Strukhoff, David Linthicum

Blog Feed Post

Hadoop Will Not Mow Your Lawn


"The best minds of my generation are thinking about how to make people click ads." Jeff Hammerbacher ex- Facebook Architect

It turns out that when you have a lot of "best minds" working on the same problem, you come up with some pretty interesting technology - no matter how inane that problem may be.

The technology that those "best minds" at Yahoo came up with to target ads to users is called Hadoop. 

Hadoop is a powerful technology and like most new IT solutions is being touted at being able to solve a vast number of technical ills. When companies discover that Hadoop will not in fact cure male pattern balding, they will fall into the inevitable trough of disillusionment

Here are some thoughts about what Hadoop can and cannot do:

1. RDBS are for business data, Hadoop is for web data

Almost all traditional business data fits well into the relational model, including data about customers (CRM), products (ERP) and employees (HR). This data should continue to live in relational databases, where it is much easier to manage and access than in Hadoop.

Almost all web data fits well into the Hadoop model, including log files, email and social media. This data would be almost impossible to store in a relational database, not just because of the volume, but because of the inherently nested quality of the data (threaded email conversations, web site directory structures, social media graphs).

2. Hadoop is really good at analyzing web data

Hadoop is incredibly good at looking at huge amounts of web data and figuring out why people clicked on the blue button instead of the red one. This can be generated to a few other computer log formats, but the list is relatively small, including:
How many other data types look like click streams? Not very many. How many other real world problems lend themselves to analysis using web data analytic techniques? Also not as many as you might think.

This is not to take anything from the Hadoop market opportunity - as more of the world interacts with each other via web applications and devices, more of the world's data will be reducible to click-stream-like formats. 

The big data craze has taken over the tech media world much like the cloud craze. Most people know it is important but they don't know why. Many vendors get caught up in the hype cycle and start to believe that their technology has some sort of manifest destiny that will allow it to do much more than it can reasonably be expected to do.

3. Hadoop is a Pay Me Later Technology

Traditional data warehouses work on a "pay me now" basis. To get data into the data warehouse - even data that may not end up being useful in any way - you have to massage the data into a formal relational model. This is expensive and the data normalization process itself may make it impossible to get at the data in exactly the way you want to.

In contrast, Hadoop works on a "pay me later" basis. Data can be shoved into the Hadoop file system any old way. It is not until someone wants to analyze the data that you have to worry about how to connect all the pieces. The gotcha is that the price you pay in this "pay me later" model is much higher, requiring extensive programming in order to ask each question. 

In addition, because the normalization process wasn't done up front, it won't be until later that you may discover that you were missing crucial pieces of information all along. Thus it does bear some thinking up front on what sort of data to store in your Hadoop database and what kinds of questions you might want to be able to answer about that data in the future.  

Realistically, it will take most businesses who implement several years to figure out whether all the data they are dumping into Hadoop produces real value out the back end, just as it was several years before companies started to get a payout from their investments in relational data warehouses.

4. Use the right tool for the right job

Back in my - very brief - high school shop days, we learned that the trick to making a really nice looking ash tray is picking the right tool for the right job.
  • Hadoop is web data query engine that requires a high level of effort for each new query. 
  • Relational is a business data query engine that requires a high level of effort to format and load data into the datastore.
The fastest way for companies to get into trouble with Hadoop is to try to use it as a one-size-fits-all data warehouse. Much of the news in the Hadoop world today has to do with SQL parsers that run on top of Hadoop data. This is a powerful and valuable technology, but does not mean that you can throw out your data warehouse and replace it with Hadoop just yet.



Read the original blog entry...

More Stories By Christopher Keene

Christopher Keene is Chairman and CEO of WaveMaker (formerly ActiveGrid). He was the founder, in 1991, of Persistence Software, a San Mateo, CA-based company that created a new approach for managing data in high-transaction banking and communications systems. Persistence Software investors included Cisco, Intel, Reuters and Sun Microsystems. The company went public in 1999 on the NASDAQ exchange and was sold in 2004 to Progress software.

After leaving Persistence Software in 2005, Chris spent a year in France as chairman of Reportive Software, a Paris-based maker of business-intelligence tools, and as an adjunct professor and entrepreneur-in-residence at INSEAD, a leading graduate business school.

Latest Stories
Moroccanoil®, the global leader in oil-infused beauty, is thrilled to announce the NEW Moroccanoil Color Depositing Masks, a collection of dual-benefit hair masks that deposit pure pigments while providing the treatment benefits of a deep conditioning mask. The collection consists of seven curated shades for commitment-free, beautifully-colored hair that looks and feels healthy.
The textured-hair category is inarguably the hottest in the haircare space today. This has been driven by the proliferation of founder brands started by curly and coily consumers and savvy consumers who increasingly want products specifically for their texture type. This trend is underscored by the latest insights from NaturallyCurly's 2018 TextureTrends report, released today. According to the 2018 TextureTrends Report, more than 80 percent of women with curly and coily hair say they purcha...
The textured-hair category is inarguably the hottest in the haircare space today. This has been driven by the proliferation of founder brands started by curly and coily consumers and savvy consumers who increasingly want products specifically for their texture type. This trend is underscored by the latest insights from NaturallyCurly's 2018 TextureTrends report, released today. According to the 2018 TextureTrends Report, more than 80 percent of women with curly and coily hair say they purcha...
We all love the many benefits of natural plant oils, used as a deap treatment before shampooing, at home or at the beach, but is there an all-in-one solution for everyday intensive nutrition and modern styling?I am passionate about the benefits of natural extracts with tried-and-tested results, which I have used to develop my own brand (lemon for its acid ph, wheat germ for its fortifying action…). I wanted a product which combined caring and styling effects, and which could be used after shampo...
The precious oil is extracted from the seeds of prickly pear cactus plant. After taking out the seeds from the fruits, they are adequately dried and then cold pressed to obtain the oil. Indeed, the prickly seed oil is quite expensive. Well, that is understandable when you consider the fact that the seeds are really tiny and each seed contain only about 5% of oil in it at most, plus the seeds are usually handpicked from the fruits. This means it will take tons of these seeds to produce just one b...
Steaz, the nation's top-selling organic and fair trade green-tea-based beverage company, announces its 2017 "Mind. Body. Soul." tour, which will bring authentic experiences inspired by the brand's signature Mind. Body. Soul. tagline to life across the country. The tour will inform, educate, inspire and entertain through events, digital activations and partner-curated experiences developed to support the three pillars of complete health and wellness.
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
ScaleMP is presenting at CloudEXPO 2019, held June 24-26 in Santa Clara, and we’d love to see you there. At the conference, we’ll demonstrate how ScaleMP is solving one of the most vexing challenges for cloud — memory cost and limit of scale — and how our innovative vSMP MemoryONE solution provides affordable larger server memory for the private and public cloud. Please visit us at Booth No. 519 to connect with our experts and learn more about vSMP MemoryONE and how it is already serving some of...
Darktrace is the world's leading AI company for cyber security. Created by mathematicians from the University of Cambridge, Darktrace's Enterprise Immune System is the first non-consumer application of machine learning to work at scale, across all network types, from physical, virtualized, and cloud, through to IoT and industrial control systems. Installed as a self-configuring cyber defense platform, Darktrace continuously learns what is ‘normal' for all devices and users, updating its understa...
Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
As you know, enterprise IT conversation over the past year have often centered upon the open-source Kubernetes container orchestration system. In fact, Kubernetes has emerged as the key technology -- and even primary platform -- of cloud migrations for a wide variety of organizations. Kubernetes is critical to forward-looking enterprises that continue to push their IT infrastructures toward maximum functionality, scalability, and flexibility. As they do so, IT professionals are also embr...
Platform9, the leader in SaaS-managed hybrid cloud, has announced it will present five sessions at four upcoming industry conferences in June: BCS in London, DevOpsCon in Berlin, HPE Discover and Cloud Computing Expo 2019.
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
When you're operating multiple services in production, building out forensics tools such as monitoring and observability becomes essential. Unfortunately, it is a real challenge balancing priorities between building new features and tools to help pinpoint root causes. Linkerd provides many of the tools you need to tame the chaos of operating microservices in a cloud native world. Because Linkerd is a transparent proxy that runs alongside your application, there are no code changes required. I...