SYS-CON MEDIA Authors: Yeshim Deniz, Pat Romanski, Gary Arora, Zakia Bouachraoui, Liz McMillan

Related Topics: @DXWorldExpo, Microservices Expo, Linux Containers, Containers Expo Blog, @CloudExpo

@DXWorldExpo: Blog Feed Post

A Review Of @SnapLogic By @TheEbizWizard | @CloudExpo [#BigData]

Squeezing value out of data in the enterprise has always pushed the limits of the available technology

SnapLogic: From ETL to VVV

Squeezing value out of data in the enterprise has always pushed the limits of the available technology. Furthermore, when business needs exceed available capabilities, vendors push to innovate within the processor, storage, and network constraints of the day.

This inherent stress between enterprise demands and vendor innovation gave rise to the Extract, Transform, and Load (ETL) marketplace over twenty years ago. Business realized that building complex, ad hoc SQL queries on increasingly large databases would grind them to a halt, thus requiring an alternate approach to gaining essential business intelligence.

The best solution given the hardware limitations of the time required controlled, pre-planned extraction of data from various databases of record, followed by complex, time-consuming transformation steps, and then loading the transformed data into separate reporting data stores (dubbed data warehouses and data marts) specially optimized for a range of analytical queries.

As available storage and memory ramped up, ad hoc data transformations became increasingly practical, allowing for the transform step to take place as needed, subsequent to the load step – and Extract, Load, and Transform (ELT) became a popular alternative to ETL.

The transition from ETL to ELT represents an important stepping stone to real-time data analysis. ELT still wasn’t truly real-time, as businesses had to extract and load their data ahead of time, but much of the analysis work depended on the now-accelerated and increasingly flexible transformation step.

Hadoop and ELT

Today, all the buzz is about Big Data and the most important technology innovation on the Big Data analysis scene: Hadoop. In spite of all the commotion around Hadoop, this open source platform and its various add-ons are little more than the next generation of the transform capability of ELT, albeit at cloud scale.

The core motivations that drove the Hadoop community to create this tool were the increasing size of data sets (leading to the awkward Big Data terminology), as well as the need to process data of diverse levels of structure – in particular, a mix of unstructured (content-centric) and semi-structured (generally XML-formatted), as well as structured (relational) information.

In other words, traditional ETL and ELT tools weren’t up to the challenge of dealing with the volume and variety of data that enterprises increasingly produced and wished to analyze. Hadoop addressed these challenges with a horizontally scalable, highly redundant file system (the Hadoop Distributed File System, or HDFS), as well as MapReduce, an algorithmic approach to analyzing data appropriate for processing the necessary volumes of data on HDFS.

The first version of Hadoop, however, was essentially a batch analytics platform. Data analysts had to surmount the significant challenges of extracting data from their source locations and loading them properly into HDFS, only to run arcane MapReduce jobs to produce useful results. As a result, the hype surrounding Hadoop 1.0 exceeded its actual usefulness for most organizations brave enough to implement it.

As an open source project, however, Hadoop had enough backing from the community to drive the development of version 2, which offers a resource negotiator for MapReduce tasks dubbed YARN as well as fledgling real-time processing capabilities. Today, real-time Hadoop is at the cutting edge, as various tools in an expanding Hadoop ecosystem mature to address the velocity requirements for real-time data analytics.

Hadoop’s Missing Pieces

In terms of the maturation of ETL technologies, therefore, the current version of Hadoop can be thought of as a modern transformation engine running on a horizontally scalable file system that in theory offers the “three V’s” of Big Data: volume, variety, and velocity. In practice, however, many capabilities are missing from the open source distribution.

As a result, other open source projects as well as commercial software providers have an opportunity to fill in the gaps that Hadoop leaves in the areas of enterprise data integration in the context of modern enterprise infrastructures. Today, such integration scenarios typically fall within hybrid cloud environments that combine on-premise and could-based capabilities.

In the enterprise context, the extract and load steps of ELT require organizations to leverage diverse data sources both on-premise and in the cloud. Those data sources may be a mix of relational, hierarchical, or content-centric. Furthermore, the business may require real-time (or near real-time) analysis of data from such diverse data sources.

To address these challenges, SnapLogic has built a data and application integration platform that resolves many of Hadoop’s shortcomings. As I wrote about in a previous BrainBlog post, SnapLogic separates their technology into a Control Plane and a Data Plane. The Control plane resides in the cloud and contains the Designer, Manager, and Dashboard subcomponents which manage the Data Plane, allowing the Data Plane to act as a cloud-friendly abstraction of the data flows or Pipelines that users can create with the SnapLogic Designer.

The data integrations themselves run as Pipelines, which are sequences of atomic integration steps that SnapLogic calls Snaps – because people literally snap them together. Snaps support the full gamut of data types and levels of structure, facilitating the ability to send the full variety of enterprise data to Hadoop.

SnapLogic has also recently rolled out Hadooplexes, which are Snaplexes (data processing components) that run as YARN apps in Hadoop, as well as SnapReduce, SnapLogic’s support for Big Data integrations that leverage Hadoop to process large amounts of data across large clusters.

SnapReduce enables Pipelines to generate MapReduce jobs and scale them across multiple nodes in a Hadoop cluster. Each Hadooplex then delegates MapReduce-based analytic operations automatically across all Hadoop nodes, thus abstracting the horizontally distributed nature of the Hadoop environment from the user.

The result is an elastic, horizontally scalable integration fabric that provides the extract and load capabilities that Hadoop lacks. Each data integration can be run manually, on a preset schedule, or via a trigger – and SnapLogic exposes such triggers as URLs (either on the Internet or a private network), allowing any authorized piece of software to kick off the integration.

End-to-End Modern ELT

In summary, SnapLogic modernizes each element of ELT for today’s modern, cloud-centric, Big Data world. Instead of traditional extraction of structured data, SnapLogic allows for diverse queries across the full variety of data types and structures by streaming all data as JSON documents. Instead of simplistic, point-to-point loading of data, SnapLogic offers elastic, horizontally scalable Pipelines that hide the underlying complexity of data integration from the user. And within Hadoop, Hadooplexes simplify the distribution of YARN-based MapReduce algorithms, allowing users to treat the Hadoop environment as though it were a traditional reporting database.

Furthermore, SnapLogic can perform each of these steps in real-time, in those situations where the business requires real-time analytics. Each pipeline simply streams the data from the acquisition point to the delivery point, handling the appropriate operations statelessly along the way. The end result is a user-friendly data integration and analysis tool that adroitly hides an extraordinary level of complexity behind the scenes – opening up the power of Big Data to an increasingly broad user base.

SnapLogic is an Intellyx client. At the time of writing, no other organizations mentioned in this article are Intellyx clients. Intellyx retains full editorial control over the content of this article.

Read the original blog entry...

More Stories By Jason Bloomberg

Jason Bloomberg is a leading IT industry analyst, Forbes contributor, keynote speaker, and globally recognized expert on multiple disruptive trends in enterprise technology and digital transformation. He is ranked #5 on Onalytica’s list of top Digital Transformation influencers for 2018 and #15 on Jax’s list of top DevOps influencers for 2017, the only person to appear on both lists.

As founder and president of Agile Digital Transformation analyst firm Intellyx, he advises, writes, and speaks on a diverse set of topics, including digital transformation, artificial intelligence, cloud computing, devops, big data/analytics, cybersecurity, blockchain/bitcoin/cryptocurrency, no-code/low-code platforms and tools, organizational transformation, internet of things, enterprise architecture, SD-WAN/SDX, mainframes, hybrid IT, and legacy transformation, among other topics.

Mr. Bloomberg’s articles in Forbes are often viewed by more than 100,000 readers. During his career, he has published over 1,200 articles (over 200 for Forbes alone), spoken at over 400 conferences and webinars, and he has been quoted in the press and blogosphere over 2,000 times.

Mr. Bloomberg is the author or coauthor of four books: The Agile Architecture Revolution (Wiley, 2013), Service Orient or Be Doomed! How Service Orientation Will Change Your Business (Wiley, 2006), XML and Web Services Unleashed (SAMS Publishing, 2002), and Web Page Scripting Techniques (Hayden Books, 1996). His next book, Agile Digital Transformation, is due within the next year.

At SOA-focused industry analyst firm ZapThink from 2001 to 2013, Mr. Bloomberg created and delivered the Licensed ZapThink Architect (LZA) Service-Oriented Architecture (SOA) course and associated credential, certifying over 1,700 professionals worldwide. He is one of the original Managing Partners of ZapThink LLC, which was acquired by Dovel Technologies in 2011.

Prior to ZapThink, Mr. Bloomberg built a diverse background in eBusiness technology management and industry analysis, including serving as a senior analyst in IDC’s eBusiness Advisory group, as well as holding eBusiness management positions at USWeb/CKS (later marchFIRST) and WaveBend Solutions (now Hitachi Consulting), and several software and web development positions.

Latest Stories
While a hybrid cloud can ease that transition, designing and deploy that hybrid cloud still offers challenges for organizations concerned about lack of available cloud skillsets within their organization. Managed service providers offer a unique opportunity to fill those gaps and get organizations of all sizes on a hybrid cloud that meets their comfort level, while delivering enhanced benefits for cost, efficiency, agility, mobility, and elasticity.
Isomorphic Software is the global leader in high-end, web-based business applications. We develop, market, and support the SmartClient & Smart GWT HTML5/Ajax platform, combining the productivity and performance of traditional desktop software with the simplicity and reach of the open web. With staff in 10 timezones, Isomorphic provides a global network of services related to our technology, with offerings ranging from turnkey application development to SLA-backed enterprise support. Leadin...
DevOps has long focused on reinventing the SDLC (e.g. with CI/CD, ARA, pipeline automation etc.), while reinvention of IT Ops has lagged. However, new approaches like Site Reliability Engineering, Observability, Containerization, Operations Analytics, and ML/AI are driving a resurgence of IT Ops. In this session our expert panel will focus on how these new ideas are [putting the Ops back in DevOps orbringing modern IT Ops to DevOps].
Darktrace is the world's leading AI company for cyber security. Created by mathematicians from the University of Cambridge, Darktrace's Enterprise Immune System is the first non-consumer application of machine learning to work at scale, across all network types, from physical, virtualized, and cloud, through to IoT and industrial control systems. Installed as a self-configuring cyber defense platform, Darktrace continuously learns what is ‘normal' for all devices and users, updating its understa...
Enterprises are striving to become digital businesses for differentiated innovation and customer-centricity. Traditionally, they focused on digitizing processes and paper workflow. To be a disruptor and compete against new players, they need to gain insight into business data and innovate at scale. Cloud and cognitive technologies can help them leverage hidden data in SAP/ERP systems to fuel their businesses to accelerate digital transformation success.
Most organizations are awash today in data and IT systems, yet they're still struggling mightily to use these invaluable assets to meet the rising demand for new digital solutions and customer experiences that drive innovation and growth. What's lacking are potent and effective ways to rapidly combine together on-premises IT and the numerous commercial clouds that the average organization has in place today into effective new business solutions.
Concerns about security, downtime and latency, budgets, and general unfamiliarity with cloud technologies continue to create hesitation for many organizations that truly need to be developing a cloud strategy. Hybrid cloud solutions are helping to elevate those concerns by enabling the combination or orchestration of two or more platforms, including on-premise infrastructure, private clouds and/or third-party, public cloud services. This gives organizations more comfort to begin their digital tr...
Keeping an application running at scale can be a daunting task. When do you need to add more capacity? Larger databases? Additional servers? These questions get harder as the complexity of your application grows. Microservice based architectures and cloud-based dynamic infrastructures are technologies that help you keep your application running with high availability, even during times of extreme scaling. But real cloud success, at scale, requires much more than a basic lift-and-shift migrati...
David Friend is the co-founder and CEO of Wasabi, the hot cloud storage company that delivers fast, low-cost, and reliable cloud storage. Prior to Wasabi, David co-founded Carbonite, one of the world's leading cloud backup companies. A successful tech entrepreneur for more than 30 years, David got his start at ARP Instruments, a manufacturer of synthesizers for rock bands, where he worked with leading musicians of the day like Stevie Wonder, Pete Townsend of The Who, and Led Zeppelin. David has ...
Darktrace is the world's leading AI company for cyber security. Created by mathematicians from the University of Cambridge, Darktrace's Enterprise Immune System is the first non-consumer application of machine learning to work at scale, across all network types, from physical, virtualized, and cloud, through to IoT and industrial control systems. Installed as a self-configuring cyber defense platform, Darktrace continuously learns what is ‘normal' for all devices and users, updating its understa...
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
Addteq is a leader in providing business solutions to Enterprise clients. Addteq has been in the business for more than 10 years. Through the use of DevOps automation, Addteq strives on creating innovative solutions to solve business processes. Clients depend on Addteq to modernize the software delivery process by providing Atlassian solutions, create custom add-ons, conduct training, offer hosting, perform DevOps services, and provide overall support services.
Contino is a global technical consultancy that helps highly-regulated enterprises transform faster, modernizing their way of working through DevOps and cloud computing. They focus on building capability and assisting our clients to in-source strategic technology capability so they get to market quickly and build their own innovation engine.
When applications are hosted on servers, they produce immense quantities of logging data. Quality engineers should verify that apps are producing log data that is existent, correct, consumable, and complete. Otherwise, apps in production are not easily monitored, have issues that are difficult to detect, and cannot be corrected quickly. Tom Chavez presents the four steps that quality engineers should include in every test plan for apps that produce log output or other machine data. Learn the ste...
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...