SYS-CON MEDIA Authors: Pat Romanski, Gary Arora, Zakia Bouachraoui, Yeshim Deniz, Liz McMillan

Blog Feed Post

It’s Time to Kill the Elephant

Google started using MapReduce about 10 years ago.  Somewhere between there and now, Doug Cutting decided that he could copy it while at Yahoo and Hadoop was born.  Doug now works at a company named Cloudera who bills themselves as providing the “only solution that manages Apache Hadoop across the enterprise.”  Hadoop has been around for so long that even leading analyst firms are covering it, claiming that if your organization is an early adopter, you need to be looking at Hadoop.  Hear that Luddites?  Time to get moving.

Hadoop Is Picking Up Speed

MAYBE THERE’S A REASON FOR THAT

Recently, Google announced their move away from batch based MapReduce to something a little more real time.  Seams like it was taking days to update search results with something that you might be interested in.  Google never open sourced their implementation of MapReduce, which is said to be at least one or two orders of magnitude faster than Hadoop.  But still not fast enough.

EVEN YAHOO IS GETTING INTO THE ACT

Yahoo used to have a substantial relationship with Cloudera, at least according to Cloudera.  But now even Yahoo have started a company to distribute and support Hadoop.  Yahoo calls their company hortonworks.

WHAT THIS MEANS TO YOU

Without getting into things like how much data and corresponding analysis you need to do before Hadoop makes any sense to use at all (most companies are not going to see any benefit at all), let’s recognize something.  All of these recent shifts from companies like Google, Yahoo, and others no longer see a competitive advantage in batch based MapReduce.  The future has arrived, let’s look at some evidence.

REAL TIME HADOOP

MapReduce

There have been more than a handful of releases in this space – like S4 from Yahoo, HStreaming, Storm, and several NoSQL databases now supporting this, it means that for competitive advantage, you’d best be getting some real-time.  And getting it soon.

WHAT IS REAL-TIME?

Database vendors like DataStax, who support Cassandra, claim to be real-time.  They’re not.  They say that they’re real time because as soon as you commit data to the database, it’s available for query.  That’s supported by just about every database and hardly a new and exciting feature of NoSQL.  Even one of their big shots left to start a real time company named Platfora.

CONTINUOUS QUERY OR EVENT-DRIVEN

Rather than thinking about what real-time is or is not, let’s worry about event-driven.  Let’s use an example:

I’m a manager, and I want to know when the average time on my website dips below 2 minutes.  Using the ‘my database is real time because the data I send to it can be queried after I write it’ means that I would have to run this query repeatedly at regular intervals to catch this mounting exodus from my web properties.

THERE’S GOT TO BE A BETTER WAY

And there is, it’s called continuous query.  I ask the same question as above, and there’s some process somewhere that’s sessionizing data from my web logs and injecting that into that server – the same server that I sent the query above to.  And when that process finds a web session that lasted less than 2 minutes, it sends another ‘row’ to the program that submitted that query.

ABRACADABRA

Waiting for Hadoop Query

And then I’ve got it on my dashboard, and can switch out the really badly designed page the marketing department A/B’d this morning.  That’s continuous query, or event-driven.  The term real-time didn’t even need to be mentioned.  If I was running batch based Hadoop, that notification could have taken hours, or days.  How much money would your company lose if that happened to you?

BACK TO MAP/REDUCE

I am Node of Cluster...

So if I can do the above, why do I need MapReduce?  MapReduce is an algorithm for splitting work up, distributing the work out to nodes where the data lives that needs to be analyzed, and then gathering the results.  If you’re problem is big enough, MapReduce might help you get it done faster than using just one machine.

BUT EITHER WAY

If you’re running batch processes, like some well known web properties are and think that Hadoop holds an answer to your ever dwindling ad revenue, you’re mistaken.  And if you’re that CIO, the other thing you need to be working on is most likely your resume.

GET YOURSELF SOME CONTINUOUS QUERY, AND GET COMPETITIVE!

and thanks for reading!

Read the original blog entry...

More Stories By Colin Clark

Colin Clark is the CTO for Cloud Event Processing, Inc. and is widely regarded as a thought leader and pioneer in both Complex Event Processing and its application within Capital Markets.

Follow Colin on Twitter at http:\\twitter.com\EventCloudPro to learn more about cloud based event processing using map/reduce, complex event processing, and event driven pattern matching agents. You can also send topic suggestions or questions to [email protected]

Latest Stories
Every organization is facing their own Digital Transformation as they attempt to stay ahead of the competition, or worse, just keep up. Each new opportunity, whether embracing machine learning, IoT, or a cloud migration, seems to bring new development, deployment, and management models. The results are more diverse and federated computing models than any time in our history.
On-premise or off, you have powerful tools available to maximize the value of your infrastructure and you demand more visibility and operational control. Fortunately, data center management tools keep a vigil on memory contestation, power, thermal consumption, server health, and utilization, allowing better control no matter your cloud's shape. In this session, learn how Intel software tools enable real-time monitoring and precise management to lower operational costs and optimize infrastructure...
"Calligo is a cloud service provider with data privacy at the heart of what we do. We are a typical Infrastructure as a Service cloud provider but it's been designed around data privacy," explained Julian Box, CEO and co-founder of Calligo, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Isomorphic Software is the global leader in high-end, web-based business applications. We develop, market, and support the SmartClient & Smart GWT HTML5/Ajax platform, combining the productivity and performance of traditional desktop software with the simplicity and reach of the open web. With staff in 10 timezones, Isomorphic provides a global network of services related to our technology, with offerings ranging from turnkey application development to SLA-backed enterprise support. Leadin...
While a hybrid cloud can ease that transition, designing and deploy that hybrid cloud still offers challenges for organizations concerned about lack of available cloud skillsets within their organization. Managed service providers offer a unique opportunity to fill those gaps and get organizations of all sizes on a hybrid cloud that meets their comfort level, while delivering enhanced benefits for cost, efficiency, agility, mobility, and elasticity.
DevOps has long focused on reinventing the SDLC (e.g. with CI/CD, ARA, pipeline automation etc.), while reinvention of IT Ops has lagged. However, new approaches like Site Reliability Engineering, Observability, Containerization, Operations Analytics, and ML/AI are driving a resurgence of IT Ops. In this session our expert panel will focus on how these new ideas are [putting the Ops back in DevOps orbringing modern IT Ops to DevOps].
Darktrace is the world's leading AI company for cyber security. Created by mathematicians from the University of Cambridge, Darktrace's Enterprise Immune System is the first non-consumer application of machine learning to work at scale, across all network types, from physical, virtualized, and cloud, through to IoT and industrial control systems. Installed as a self-configuring cyber defense platform, Darktrace continuously learns what is ‘normal' for all devices and users, updating its understa...
Enterprises are striving to become digital businesses for differentiated innovation and customer-centricity. Traditionally, they focused on digitizing processes and paper workflow. To be a disruptor and compete against new players, they need to gain insight into business data and innovate at scale. Cloud and cognitive technologies can help them leverage hidden data in SAP/ERP systems to fuel their businesses to accelerate digital transformation success.
Most organizations are awash today in data and IT systems, yet they're still struggling mightily to use these invaluable assets to meet the rising demand for new digital solutions and customer experiences that drive innovation and growth. What's lacking are potent and effective ways to rapidly combine together on-premises IT and the numerous commercial clouds that the average organization has in place today into effective new business solutions.
Concerns about security, downtime and latency, budgets, and general unfamiliarity with cloud technologies continue to create hesitation for many organizations that truly need to be developing a cloud strategy. Hybrid cloud solutions are helping to elevate those concerns by enabling the combination or orchestration of two or more platforms, including on-premise infrastructure, private clouds and/or third-party, public cloud services. This gives organizations more comfort to begin their digital tr...
Keeping an application running at scale can be a daunting task. When do you need to add more capacity? Larger databases? Additional servers? These questions get harder as the complexity of your application grows. Microservice based architectures and cloud-based dynamic infrastructures are technologies that help you keep your application running with high availability, even during times of extreme scaling. But real cloud success, at scale, requires much more than a basic lift-and-shift migrati...
David Friend is the co-founder and CEO of Wasabi, the hot cloud storage company that delivers fast, low-cost, and reliable cloud storage. Prior to Wasabi, David co-founded Carbonite, one of the world's leading cloud backup companies. A successful tech entrepreneur for more than 30 years, David got his start at ARP Instruments, a manufacturer of synthesizers for rock bands, where he worked with leading musicians of the day like Stevie Wonder, Pete Townsend of The Who, and Led Zeppelin. David has ...
Darktrace is the world's leading AI company for cyber security. Created by mathematicians from the University of Cambridge, Darktrace's Enterprise Immune System is the first non-consumer application of machine learning to work at scale, across all network types, from physical, virtualized, and cloud, through to IoT and industrial control systems. Installed as a self-configuring cyber defense platform, Darktrace continuously learns what is ‘normal' for all devices and users, updating its understa...
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
Addteq is a leader in providing business solutions to Enterprise clients. Addteq has been in the business for more than 10 years. Through the use of DevOps automation, Addteq strives on creating innovative solutions to solve business processes. Clients depend on Addteq to modernize the software delivery process by providing Atlassian solutions, create custom add-ons, conduct training, offer hosting, perform DevOps services, and provide overall support services.