SYS-CON MEDIA Authors: Pat Romanski, Yeshim Deniz, Elizabeth White, Liz McMillan, William Schmarzo

Related Topics: @CloudExpo

@CloudExpo: Blog Feed Post

The Data Era – Moving from Big Data 1.0 to Big Data 2.0

Do you think anyone truly understood just how fast the data infrastructure marketplace was going to change?

Do you think they truly understood just how fast the data infrastructure marketplace was going to change? That is the question that comes to mind when I think about Donald Feinberg and Mark Beyer at Gartner who, last year, wrote about how the data warehouse market is undergoing a transformation. Did they, or anyone for that matter, understand the significant change underway in the data center? 



I describe it as Big Data 1.0 versus Big Data 2.0.

Big Data 1.0

Enterprise Data Warehouse (EDW) framework

I was recently talking to friends at one of our largest banks about their Big Data projects under way. In less than one year, their Hadoop cluster has already far exceeded their Teradata enterprise data warehouse in size.

Is that a surprise? Not really. When you think about it, a traditionally large data warehouse is always in the terabytes, not petabytes (well, unless you are eBay).

With the current “Enterprise Data Warehouse” (EDW) framework (shown here) we will always see the high-value structured data in the well-hardened, highly available and secure EDW RDBMS (aka Teradata).

In fact, Gartner defines a large EDW starting at 20TB. This is why I’ve held back from making comments like, “Teradata should be renamed to Yottadata.” After all, it is my “alma mater” after having spent 10 years learning Big Data 1.0 there. I highly respect the Teradata technology and more importantly the people.

Big Data 2.0

So with over two zettabytes of information being generated in 2012 alone, we can expect more “Big Data” systems to be stood up, new breakthroughs in large dataset analytics, and many more data-centric applications being developed for businesses.

Big Data 2.0

However, many of the “new systems” will be driven by “Big Data 2.0” technology. The enterprise data warehouse framework itself doesn’t change much. However, there are many, many new players – mostly open source, who have entered the scene.

Examples include:

* Talend for ETL
* Cloudera, Hortonworks, MapR for Hadoop
* SymmetricDS for replication
* Storm, S4, for real-time stream processing
* Hbase, Cassandra, Redis, Riak, Elastic Search, etc. for NoSQL / NewSQL data stores
* ’R’, Mahout, Weka, etc. for machine learning / analytics
* Tableau, Jaspersoft, Pentaho, Datameer, Karmasphere, etc. for BI

These are so many new and disruptive technologies, each contributing to the evolution of the enterprise’s data infrastructure.

I haven’t mentioned one of the more controversial statements made in the adjacent graphic – Teradata is becoming a source along side the new pool of unstructured data. Both the new and the old data are being aggregated into the “Big Data Warehouse”.

We may also be seeing much of what Hadoop does in ETL feeding back into the EDW. But I suspect that this will become less significant as compared to the new analytics architecture with Hadoop + NoSQL/NewSQL data stores at the core of the framework – especially as this new architecture becomes more hardened and enterprise class.

Infochimps’ Big Data Warehouse Framework

Infochimps’ Big Data Warehouse Framework

This leads us to why I’m much more than a fan of Infochimps (see disclosure below) and why I believe the company is so well positioned to make a significant impact within the marketplace.

By leveraging four years of experience and technology development in cloud-based big data infrastructure, the company is now offering a suite of products that contribute to each part of Big Data Warehouse Framework for enterprise customers.

DDS: With Infochimps’ Data Delivery Services (DDS), our customer’s application developers do not rely on sophisticated ETL tools. But rather, they can manipulate data streams of any volume or velocity using DDS through a simple developer-friendly language, referred to as Wukong. Wukong turns application developers into data scientists.

Ingress and egress can be handled directly by the application developer, uniquely bridging the gap between them and their data.

Wukong: Wukong is much more than a data-centric domain specific language (DSL). With standardized connectors to analytics from ‘R’, Mahout, Weka, and others, not only is data manipulation made easy, integration of sophisticated analytics with the most complicated data sources is also made easy.

Hadoop & NoSQL/NewSQL Data Stores: At the center of the framework, is not only an elastic andcloud-based Hadoop stack, but a selection of NoSQL/NewSQL data stores as well. This uniquely positions Infochimps to address both decision support-like workloads, which are complex and batch in nature, with OLTP or more real-time workloads as well. The complexities of standing up, configuring, scaling, and managing these data stores is all automated.

Dashpot: The application developer is typically left out with many of the business intelligence tools offered today. This is because most tools are extremely powerful and built for special groups of business users / analysts. Infochimps has taken a slightly different approach, staying focused on the application developer. Dashpot is a reporting and analytics dashboard which was built for the developer – enabling quick iteration and insights into the data, prior to production and prior to the deployment of more sophisticated BI tools.

Ironfan and Homebase: As the underpinning of the Infochimps solution, Ironfan and Homebaseare the two solutions which essentially abstract any and all hardware and software deployment, configuration, and management. Ironfan is used to deploy the entire system into production. Homebase is used by application developers to create their end-to-end data flows and applications locally on their laptops or desktops before they are deployed into QA, staging, and/or production.

All-in-all Infochimps has taken a very innovative approach to enabling application developers with Big Data 2.0 technologies in a way that is not only comprehensive, but fast, simple, extensible, and safe.

Our vision for Infochimps leverages the power of Big Data, Cloud Computing, Open Source, and Platform as a Service – all extremely disruptive technology forces. We’re excited to be helping our customers address their mission critical questions, with high impact answers. And I personally look forward to executing on our vision to provide the simplest yet most powerful cloud-based and completely managed big data service for our enterprise customers.

Disclosure: I’ve just taken the role of CEO at Infochimps. I’m really looking forward to working with Joe, Flip, Dhruv, and the team.

More Stories By Jim Kaskade

Jim Kaskade currently leads Janrain, the category creator of Consumer Identity & Access Management (CIAM). We believe that your identity is the most important thing you own, and that your identity should not only be easy to use, but it should be safe to use when accessing your digital world. Janrain is an Identity Cloud servicing Global 3000 enterprises providing a consistent, seamless, and safe experience for end-users when they access their digital applications (web, mobile, or IoT).

Prior to Janrain, Jim was the VP & GM of Digital Applications at CSC. This line of business was over $1B in commercial revenue, including both consulting and delivery organizations and is focused on serving Fortune 1000 companies in the United States, Canada, Mexico, Peru, Chile, Argentina, and Brazil. Prior to this, Jim was the VP & GM of Big Data & Analytics at CSC. In his role, he led the fastest growing business at CSC, overseeing the development and implementation of innovative offerings that help clients convert data into revenue. Jim was also the CEO of Infochimps; Entrepreneur-in-Residence at PARC, a Xerox company; SVP, General Manager and Chief of Cloud at SIOS Technology; CEO at StackIQ; CEO of Eyespot; CEO of Integral Semi; and CEO of INCEP Technologies. Jim started his career at Teradata where he spent ten years in enterprise data warehousing, analytical applications, and business intelligence services designed to maximize the intrinsic value of data, servicing fortune 1000 companies in telecom, retail, and financial markets.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Latest Stories
Despite being the market leader, we recognized the need to transform and reinvent our business at Dynatrace, before someone else disrupted the market. Over the course of three years, we changed everything - our technology, our culture and our brand image. In this session we'll discuss how we navigated through our own innovator's dilemma, and share takeaways from our experience that you can apply to your own organization.
DXWorldEXPO LLC announced today that Nutanix has been named "Platinum Sponsor" of CloudEXPO | DevOpsSUMMIT | DXWorldEXPO New York, which will take place November 12-13, 2018 in New York City. Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that power their business. The Nutanix Enterprise Cloud Platform blends web-scale engineering and consumer-grade design to natively converge server, storage, virtualization and networking into a resilient, softwar...
Founded in 2002 and headquartered in Chicago, Nexum® takes a comprehensive approach to security. Nexum approaches business with one simple statement: “Do what’s right for the customer and success will follow.” Nexum helps you mitigate risks, protect your data, increase business continuity and meet your unique business objectives by: Detecting and preventing network threats, intrusions and disruptions Equipping you with the information, tools, training and resources you need to effectively m...
Having been in the web hosting industry since 2002, dhosting has gained a great deal of experience while working on a wide range of projects. This experience has enabled the company to develop our amazing new product, which they are now excited to present! Among dHosting's greatest achievements, they can include the development of their own hosting panel, the building of their fully redundant server system, and the creation of dhHosting's unique product, Dynamic Edge.
The Transparent Cloud-computing Consortium (T-Cloud) is a neutral organization for researching new computing models and business opportunities in IoT era. In his session, Ikuo Nakagawa, Co-Founder and Board Member at Transparent Cloud Computing Consortium, will introduce the big change toward the "connected-economy" in the digital age. He'll introduce and describe some leading-edge business cases from his original points of view, and discuss models & strategies in the connected-economy. Nowad...
"DevOps is set to be one of the most profound disruptions to hit IT in decades," said Andi Mann. "It is a natural extension of cloud computing, and I have seen both firsthand and in independent research the fantastic results DevOps delivers. So I am excited to help the great team at @DevOpsSUMMIT and CloudEXPO tell the world how they can leverage this emerging disruptive trend."
For far too long technology teams have lived in siloes. Not only physical siloes, but cultural siloes pushed by competing objectives. This includes informational siloes where business users require one set of data and tech teams require different data. DevOps intends to bridge these gaps to make tech driven operations more aligned and efficient.
NanoVMs is the only production ready unikernel infrastructure solution on the market today. Unikernels prevent server intrusions by isolating applications to one virtual machine with no users, no shells and no way to run other programs on them. Unikernels run faster and are lighter than even docker containers.
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO Silicon Valley 2019 will cover all of these tools, with the most comprehensive program and with 222 rockstar speakers throughout our industry presenting 22 Keynotes and General Sessions, 250 Breakout Sessions along 10 Tracks, as well as our signature Power Panels. Our Expo Floor will bring together the leading global 200 companies throughout the world of Cloud Computing, DevOps, IoT, Smart Cities, FinTech, Digital Transformation, and all they entail. As ...
Darktrace is the world's leading AI company for cyber security. Created by mathematicians from the University of Cambridge, Darktrace's Enterprise Immune System is the first non-consumer application of machine learning to work at scale, across all network types, from physical, virtualized, and cloud, through to IoT and industrial control systems. Installed as a self-configuring cyber defense platform, Darktrace continuously learns what is ‘normal' for all devices and users, updating its understa...
Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throughout enterprises of all sizes. We are offering early bird savings...
SUSE is a German-based, multinational, open-source software company that develops and sells Linux products to business customers. Founded in 1992, it was the first company to market Linux for the enterprise. Founded in 1992, SUSE is the world’s first provider of an Enterprise Linux distribution. Today, thousands of businesses worldwide rely on SUSE for their mission-critical computing and IT management needs.
The dream is universal: heuristic driven, global business operations without interruption so that nobody has to wake up at 4am to solve a problem. Building upon Nutanix Acropolis software defined storage, virtualization, and networking platform, Mark will demonstrate business lifecycle automation with freedom of choice and consumption models. Hybrid cloud applications and operations are controllable by the Nutanix Prism control plane with Calm automation, which can weave together the following: ...
Crosscode Panoptics Automated Enterprise Architecture Software. Application Discovery and Dependency Mapping. Automatically generate a powerful enterprise-wide map of your organization's IT assets down to the code level. Enterprise Impact Assessment. Automatically analyze the impact, to every asset in the enterprise down to the code level. Automated IT Governance Software. Create rules and alerts based on code level insights, including security issues, to automate governance. Enterpr...
Your job is mostly boring. Many of the IT operations tasks you perform on a day-to-day basis are repetitive and dull. Utilizing automation can improve your work life, automating away the drudgery and embracing the passion for technology that got you started in the first place. In this presentation, I'll talk about what automation is, and how to approach implementing it in the context of IT Operations. Ned will discuss keys to success in the long term and include practical real-world examples. Ge...