| By Lynn de la Torre | Article Rating: |
|
| February 23, 2005 12:00 AM EST | Reads: |
17,275 |
Data warehouse implementations represent one of the most challenging types of deployments for the enterprise. Several factors contribute to the challenge of deploying a successful data warehouse. Among these are large-scale and complex system configurations, sophisticated data modeling and analysis tools, and high visibility in a broad range of important business functions within the company.
Data warehouse workloads can serve as a litmus test to determine the enterprise readiness of a given deployment platform. For this reason it's interesting to determine how well Linux can support such challenging workloads. To that end I began a study, examining two interrelated aspects of enterprise readiness for a data warehouse on Linux:
- Is the solution stack supported on Linux?
- Are end-user companies actively deploying the stack to support their business needs?
Data Warehouse Solution Participants
The survey examined three types of participants in the data warehouse solution or ecosystem:- Independent software vendors (ISV)
- Independent hardware vendors (IHV)
- End-user company deployments
I used Ralph Kimball's "High Level Warehouse Technical Architecture" as a reference for analysis and to provide common terminology for analysis of the solution stack. I broke down the list of vendors into "front room" and "back room" categories, based upon Kimball's architecture.
The study involved a total of 18 vendors. It's important to note that this roster did not represent a de facto list chosen to illustrate Linux usage. In fact the list represented the dominant vendors, chosen based upon experience in deployments at a number of large companies.
Study Results - Data Warehouse Trends
The study found that overall there exists reasonable support for Linux from ISVs that comprise the data warehouse solution, with 14 of 18 vendors offering some level of support for the open source OS. Within Kimball's technical architecture, the vendors supplying products to meet the needs for the "front room" were predominantly hosting their offerings on client platforms. They had weaker support overall for Linux than the "back room" vendors with products in such areas as extract, transform, and load (ETL) and database. Specifically, the ETL vendors tended to support one particular Linux distribution very well, while database vendors tended to support multiple Linux distributions.The study further examined motivators and other issues driving (and inhibiting) Linux adoption and support by ISVs, with the following findings.
Motivators
- Market demand for the Linux platform
- How many and which distributions to support
- Differences in packages across distributions
- Lack of standardization among maintenance tools and lack of usability features
By examining end-user company deployments, my study focused on companies that had data warehouse and/or data mart implementations that would be considered medium-sized to large (i.e., total implementation data size was at least one terabyte), with a typical configuration around 60 terabytes. These types of configurations shared some common themes:
- Overall configuration elements - medium to large data warehouse:
- SAN disk - use of failover
- Employ NFS
- Use multiple file systems as well as raw disk partitions
- Employ large file systems
- Multi-CPU large servers dominant - use of partitioning
Of the seven companies surveyed, the responses broke down as shown in Table 1.
The following is a summary of the issues and motivators for the three groups above.
Group 1
- While there are some potential motivators for cost consolidation, there are significant inhibitors in terms of the internal infrastructure to support Linux and the perceived immaturity in the platform.
- Flexibility in choice of hardware platforms drove decisions to build a development environment as a first step toward evolving a mature support infrastructure for Linux.
- The primary inhibitor to moving to production was the lack of adequate support infrastructure within key ISVs for solutions on Linux.
- Migration to Linux represented a strategic move to take advantage of the flexibility of deploying the hardware and software solutions that Linux provides.
- The primary production issue for IT infrastructure teams was providing systems integration services to ensure the success of such a demanding workload, such as the need to build customized monitoring scripts for the environment.
Group 1 reported the following motivations and issues in detail.
Motivators
- Cost consolidation
- H/W platform flexibility
- Low-cost clustering
- Consolidation of system administration skills
- Weak internal support for Linux infrastructure
- Lack of maturity of data warehouse solutions on Linux
- Maturity defined: Referenceable and in production for at least one year - Lack of acceptance of Linux within DW
- Acceptance defined: Deployments within Fortune 100 companies
Conclusion
The overall conclusion drawn from this survey of the data warehouse and Linux was that the solution stack is sufficient to support the workload on Linux. However, the Linux support infrastructure is often not mature enough for Linux-based deployments for the large, complex configurations and demanding workloads of data warehouses.End-User Highlights
Some very specific findings emerged from the study with regard to end-user deployment:
- The majority of companies in Group 1 (no plans in the near future to migrate to Linux) will eventually move into Group 2 (development on Linux with a longer- term move to production). They fell into Group 1 because complexity, reliability, and scalability requirements proved too demanding for current deployments on Linux. Staffing and support issues were key inhibitors as well.
- Groups 2 and 3 featured early adopters who leveraged the availability of H/W, database, and ETL server solutions to enable successful deployment.
Similarly, salient ISV data emerged from the study:
- Market adoption of Linux in "back room" solutions is healthy and growing.
- Market adoption of Linux in "front room" solutions is measured, due to limitations in current ISV offerings and challenges for ISVs to support multiple Linux distributions.
- Opportunities exist for standardization across distributions, e.g., tools, packages, etc., to support the ISV community.
Published February 23, 2005 Reads 17,275
Copyright © 2005 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Lynn de la Torre
Lynn de la Torre is a member of OSDL and coordinates the activities of the DCL Working Group. Lynn has thirty years of experience in the data center, and has worked in operations, system administration, database administration, and software development. Prior to joining OSDL, Lynn was a project manager for a large data warehouse implementation.

