SYS-CON MEDIA Authors: Zakia Bouachraoui, Liz McMillan, Pat Romanski, Jason Bloomberg, Dana Gardner

Related Topics: @DXWorldExpo, Microservices Expo, Linux Containers, Containers Expo Blog, @CloudExpo

@DXWorldExpo: Blog Feed Post

A Review Of @SnapLogic By @TheEbizWizard | @CloudExpo [#BigData]

Squeezing value out of data in the enterprise has always pushed the limits of the available technology

SnapLogic: From ETL to VVV

Squeezing value out of data in the enterprise has always pushed the limits of the available technology. Furthermore, when business needs exceed available capabilities, vendors push to innovate within the processor, storage, and network constraints of the day.

This inherent stress between enterprise demands and vendor innovation gave rise to the Extract, Transform, and Load (ETL) marketplace over twenty years ago. Business realized that building complex, ad hoc SQL queries on increasingly large databases would grind them to a halt, thus requiring an alternate approach to gaining essential business intelligence.

The best solution given the hardware limitations of the time required controlled, pre-planned extraction of data from various databases of record, followed by complex, time-consuming transformation steps, and then loading the transformed data into separate reporting data stores (dubbed data warehouses and data marts) specially optimized for a range of analytical queries.

As available storage and memory ramped up, ad hoc data transformations became increasingly practical, allowing for the transform step to take place as needed, subsequent to the load step – and Extract, Load, and Transform (ELT) became a popular alternative to ETL.

The transition from ETL to ELT represents an important stepping stone to real-time data analysis. ELT still wasn’t truly real-time, as businesses had to extract and load their data ahead of time, but much of the analysis work depended on the now-accelerated and increasingly flexible transformation step.

Hadoop and ELT

Today, all the buzz is about Big Data and the most important technology innovation on the Big Data analysis scene: Hadoop. In spite of all the commotion around Hadoop, this open source platform and its various add-ons are little more than the next generation of the transform capability of ELT, albeit at cloud scale.

The core motivations that drove the Hadoop community to create this tool were the increasing size of data sets (leading to the awkward Big Data terminology), as well as the need to process data of diverse levels of structure – in particular, a mix of unstructured (content-centric) and semi-structured (generally XML-formatted), as well as structured (relational) information.

In other words, traditional ETL and ELT tools weren’t up to the challenge of dealing with the volume and variety of data that enterprises increasingly produced and wished to analyze. Hadoop addressed these challenges with a horizontally scalable, highly redundant file system (the Hadoop Distributed File System, or HDFS), as well as MapReduce, an algorithmic approach to analyzing data appropriate for processing the necessary volumes of data on HDFS.

The first version of Hadoop, however, was essentially a batch analytics platform. Data analysts had to surmount the significant challenges of extracting data from their source locations and loading them properly into HDFS, only to run arcane MapReduce jobs to produce useful results. As a result, the hype surrounding Hadoop 1.0 exceeded its actual usefulness for most organizations brave enough to implement it.

As an open source project, however, Hadoop had enough backing from the community to drive the development of version 2, which offers a resource negotiator for MapReduce tasks dubbed YARN as well as fledgling real-time processing capabilities. Today, real-time Hadoop is at the cutting edge, as various tools in an expanding Hadoop ecosystem mature to address the velocity requirements for real-time data analytics.

Hadoop’s Missing Pieces

In terms of the maturation of ETL technologies, therefore, the current version of Hadoop can be thought of as a modern transformation engine running on a horizontally scalable file system that in theory offers the “three V’s” of Big Data: volume, variety, and velocity. In practice, however, many capabilities are missing from the open source distribution.

As a result, other open source projects as well as commercial software providers have an opportunity to fill in the gaps that Hadoop leaves in the areas of enterprise data integration in the context of modern enterprise infrastructures. Today, such integration scenarios typically fall within hybrid cloud environments that combine on-premise and could-based capabilities.

In the enterprise context, the extract and load steps of ELT require organizations to leverage diverse data sources both on-premise and in the cloud. Those data sources may be a mix of relational, hierarchical, or content-centric. Furthermore, the business may require real-time (or near real-time) analysis of data from such diverse data sources.

To address these challenges, SnapLogic has built a data and application integration platform that resolves many of Hadoop’s shortcomings. As I wrote about in a previous BrainBlog post, SnapLogic separates their technology into a Control Plane and a Data Plane. The Control plane resides in the cloud and contains the Designer, Manager, and Dashboard subcomponents which manage the Data Plane, allowing the Data Plane to act as a cloud-friendly abstraction of the data flows or Pipelines that users can create with the SnapLogic Designer.

The data integrations themselves run as Pipelines, which are sequences of atomic integration steps that SnapLogic calls Snaps – because people literally snap them together. Snaps support the full gamut of data types and levels of structure, facilitating the ability to send the full variety of enterprise data to Hadoop.

SnapLogic has also recently rolled out Hadooplexes, which are Snaplexes (data processing components) that run as YARN apps in Hadoop, as well as SnapReduce, SnapLogic’s support for Big Data integrations that leverage Hadoop to process large amounts of data across large clusters.

SnapReduce enables Pipelines to generate MapReduce jobs and scale them across multiple nodes in a Hadoop cluster. Each Hadooplex then delegates MapReduce-based analytic operations automatically across all Hadoop nodes, thus abstracting the horizontally distributed nature of the Hadoop environment from the user.

The result is an elastic, horizontally scalable integration fabric that provides the extract and load capabilities that Hadoop lacks. Each data integration can be run manually, on a preset schedule, or via a trigger – and SnapLogic exposes such triggers as URLs (either on the Internet or a private network), allowing any authorized piece of software to kick off the integration.

End-to-End Modern ELT

In summary, SnapLogic modernizes each element of ELT for today’s modern, cloud-centric, Big Data world. Instead of traditional extraction of structured data, SnapLogic allows for diverse queries across the full variety of data types and structures by streaming all data as JSON documents. Instead of simplistic, point-to-point loading of data, SnapLogic offers elastic, horizontally scalable Pipelines that hide the underlying complexity of data integration from the user. And within Hadoop, Hadooplexes simplify the distribution of YARN-based MapReduce algorithms, allowing users to treat the Hadoop environment as though it were a traditional reporting database.

Furthermore, SnapLogic can perform each of these steps in real-time, in those situations where the business requires real-time analytics. Each pipeline simply streams the data from the acquisition point to the delivery point, handling the appropriate operations statelessly along the way. The end result is a user-friendly data integration and analysis tool that adroitly hides an extraordinary level of complexity behind the scenes – opening up the power of Big Data to an increasingly broad user base.

SnapLogic is an Intellyx client. At the time of writing, no other organizations mentioned in this article are Intellyx clients. Intellyx retains full editorial control over the content of this article.

Read the original blog entry...

More Stories By Jason Bloomberg

Jason Bloomberg is a leading IT industry analyst, Forbes contributor, keynote speaker, and globally recognized expert on multiple disruptive trends in enterprise technology and digital transformation. He is ranked #5 on Onalytica’s list of top Digital Transformation influencers for 2018 and #15 on Jax’s list of top DevOps influencers for 2017, the only person to appear on both lists.

As founder and president of Agile Digital Transformation analyst firm Intellyx, he advises, writes, and speaks on a diverse set of topics, including digital transformation, artificial intelligence, cloud computing, devops, big data/analytics, cybersecurity, blockchain/bitcoin/cryptocurrency, no-code/low-code platforms and tools, organizational transformation, internet of things, enterprise architecture, SD-WAN/SDX, mainframes, hybrid IT, and legacy transformation, among other topics.

Mr. Bloomberg’s articles in Forbes are often viewed by more than 100,000 readers. During his career, he has published over 1,200 articles (over 200 for Forbes alone), spoken at over 400 conferences and webinars, and he has been quoted in the press and blogosphere over 2,000 times.

Mr. Bloomberg is the author or coauthor of four books: The Agile Architecture Revolution (Wiley, 2013), Service Orient or Be Doomed! How Service Orientation Will Change Your Business (Wiley, 2006), XML and Web Services Unleashed (SAMS Publishing, 2002), and Web Page Scripting Techniques (Hayden Books, 1996). His next book, Agile Digital Transformation, is due within the next year.

At SOA-focused industry analyst firm ZapThink from 2001 to 2013, Mr. Bloomberg created and delivered the Licensed ZapThink Architect (LZA) Service-Oriented Architecture (SOA) course and associated credential, certifying over 1,700 professionals worldwide. He is one of the original Managing Partners of ZapThink LLC, which was acquired by Dovel Technologies in 2011.

Prior to ZapThink, Mr. Bloomberg built a diverse background in eBusiness technology management and industry analysis, including serving as a senior analyst in IDC’s eBusiness Advisory group, as well as holding eBusiness management positions at USWeb/CKS (later marchFIRST) and WaveBend Solutions (now Hitachi Consulting), and several software and web development positions.

Latest Stories
Addteq is a leader in providing business solutions to Enterprise clients. Addteq has been in the business for more than 10 years. Through the use of DevOps automation, Addteq strives on creating innovative solutions to solve business processes. Clients depend on Addteq to modernize the software delivery process by providing Atlassian solutions, create custom add-ons, conduct training, offer hosting, perform DevOps services, and provide overall support services.
Transformation Abstract Encryption and privacy in the cloud is a daunting yet essential task for both security practitioners and application developers, especially as applications continue moving to the cloud at an exponential rate. What are some best practices and processes for enterprises to follow that balance both security and ease of use requirements? What technologies are available to empower enterprises with code, data and key protection from cloud providers, system administrators, inside...
The vast majority of businesses now use cloud services, yet many still struggle with realizing the full potential of their IT investments. In particular, small and medium-sized businesses (SMBs) lack the internal IT staff and expertise to fully move to and manage workloads in public cloud environments. Speaker Todd Schwartz will help session attendees better navigate the complex cloud market and maximize their technical investments. The SkyKick co-founder and co-CEO will share the biggest challe...
IoT is rapidly becoming mainstream as more and more investments are made into the platforms and technology. As this movement continues to expand and gain momentum it creates a massive wall of noise that can be difficult to sift through. Unfortunately, this inevitably makes IoT less approachable for people to get started with and can hamper efforts to integrate this key technology into your own portfolio. There are so many connected products already in place today with many hundreds more on the h...
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
Blockchain has numerous revolutionary ambitions, but technology hasn’t evolved enough to make them practical. What is the best use of blockchain technology today? How are asset owners and managers looking at blockchain to transform ownership structures? How will blockchain technology allow global investors to access new markets? What kinds of companies will take advantage of blockchain technology as a more efficient way to raise capital?
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science" is responsible for guiding the technology strategy within Hitachi Vantara for IoT and Analytics. Bill brings a balanced business-technology approach that focuses on business outcomes to drive data, analytics and technology decisions that underpin an organization's digital transformation strategy.
DXWorldEXPO LLC announced today that Kevin Jackson joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Kevin L. Jackson is a globally recognized cloud computing expert and Founder/Author of the award winning "Cloud Musings" blog. Mr. Jackson has also been recognized as a "Top 100 Cybersecurity Influencer and Brand" by Onalytica (2015), a Huffington Post "Top 100 Cloud Computing Experts on Twitter" (2013) and a "Top 50 C...
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
Machine learning has taken residence at our cities' cores and now we can finally have "smart cities." Cities are a collection of buildings made to provide the structure and safety necessary for people to function, create and survive. Buildings are a pool of ever-changing performance data from large automated systems such as heating and cooling to the people that live and work within them. Through machine learning, buildings can optimize performance, reduce costs, and improve occupant comfort by ...
With more than 30 Kubernetes solutions in the marketplace, it's tempting to think Kubernetes and the vendor ecosystem has solved the problem of operationalizing containers at scale or of automatically managing the elasticity of the underlying infrastructure that these solutions need to be truly scalable. Far from it. There are at least six major pain points that companies experience when they try to deploy and run Kubernetes in their complex environments. In this presentation, the speaker will d...
Containers and Kubernetes allow for code portability across on-premise VMs, bare metal, or multiple cloud provider environments. Yet, despite this portability promise, developers may include configuration and application definitions that constrain or even eliminate application portability. In this session we'll describe best practices for "configuration as code" in a Kubernetes environment. We will demonstrate how a properly constructed containerized app can be deployed to both Amazon and Azure ...
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
When applications are hosted on servers, they produce immense quantities of logging data. Quality engineers should verify that apps are producing log data that is existent, correct, consumable, and complete. Otherwise, apps in production are not easily monitored, have issues that are difficult to detect, and cannot be corrected quickly. Tom Chavez presents the four steps that quality engineers should include in every test plan for apps that produce log output or other machine data. Learn the ste...
Andi Mann, Chief Technology Advocate at Splunk, is an accomplished digital business executive with extensive global expertise as a strategist, technologist, innovator, marketer, and communicator. For over 30 years across five continents, he has built success with Fortune 500 corporations, vendors, governments, and as a leading research analyst and consultant.