Enterprise information integration (EII) is getting a lot of hype these days, and the vendors are giving you very compelling reasons for why you need an enterprise data access layer based on their EII product. However, a lot of the scenarios and case studies they present are very targeted or are simple examples of how their product works that don't delve into the complexities of a real-world environment. This article presents some of those complexities and demonstrates how some of the EII products may not provide adequate functionality for an environment with such complexities.
1. "It's probably a lot harder to implement an EII solution in your enterprise than our examples and case studies would tell you."
The quality of an enterprise's data is usually much worse than what someone normally expects. Things such as incomplete entries in a database, invalid values, and data inconsistencies are rampant. Also, within large enterprises it is quite common to find that the same information is stored redundantly in multiple systems. When you try to provide centralized access to your data through an EII solution, these issues will only make implementing the solution that much more difficult. For example, when you have the same data stored in multiple sources and you're trying to map the entities defined in your EII solution's composite view, which source do you map to? Even if you don't have data stored redundantly across multiple sources, just trying to map an entity in the composite view that spans multiple sources may not be as easy as it seems. In a single database, you have foreign key relationships that link together related records. There is no such linking mechanism across different database instances and you have to rely on unique identification attributes that aren't always consistent across different databases. An EII product is not going to be able to help you too much with these types of problems.
2. "We isolate you from all of the complexities of accessing multiple data sources... including the ability to debug."
No doubt writing code that needs to access multiple data sources is not easy, so it's nice to have something that handles this for you. However, this can be a double-edged sword. Hiding the complexities of accessing multiple data sources also hides the causes when you have problems. When you are debugging, you want the ability to be exposed to all the gory details. When an EII product receives a query, it breaks it down into smaller queries that are sent to the constituent data sources. To ensure that you can adequately debug your data access layer, make sure your EII product allows you to trace through the execution of the queries to each constituent source.
3. "Be prepared to lose some platform-specific functionality and flexibility."
Traditionally when you directly access a database you can take advantage of that particular RDBMS's native functionality and extensions. For example, one particular database vendor has an extension to JDBC called a ROWID type that allows you to directly access the unique identifier of a row in a table. With the ROWID type, you can directly specify the row in your query to improve performance. Since an EII product has to provide a common interface to multiple platforms, it cannot include such platform-specific mechanisms in this interface. So just keep in mind that if you use an EII solution, you may lose the ability to use some of those platform-specific features that you've grown to love.
4. "Sure we support transactions, but only if they go through a single data source."
A usable data access layer must allow you do more than just read data. You must also be able to create, update, and delete data through it. A single update operation from the client application's perspective may in fact update multiple sources. That's the whole point in having this data access layer - it isolates the client applications from having to deal with the complexities of interfacing with multiple data sources. What this means is that the EII product must be able to coordinate these operations across the multiple sources as a single transaction. This is the classic problem of distributed transaction management. This is not a new problem - products such as transaction processing monitors have solved it, and there are specifications out there such as XA that define a standard way to handle it. Although it's not a new problem, it is still not an easy one to solve and not all EII products are built on top of transaction-processing platforms. Complicating this matter is that not all databases and their corresponding drivers support distributed transactions in a standard manner, if at all. Throw into the mix other data sources that EII vendors claim to be able to support, such as Web services, and this can get pretty complicated. My advice here is to make sure you understand what type of distributed transaction support your enterprise will need and go with the EII vendors that have proven experience and expertise in building transaction management products.
5. "Of course we support write operations, but you may not actually be able to use them."
When you create a data access layer with an EII product, you often define a composite view of an entity with attributes of that entity coming from multiple source systems. This creates complexities when you try to execute a write operation on that composite view (in addition to the distributed transaction issues mentioned earlier). For example, suppose you've created a composite view of Customer that uses a customer address from an order management system and customer name and birth-date from a sales system. Now you have another application that tries to create a new instance of Customer through this data access layer and you supply all of the necessary information that is defined in the composite view of Customer. However, the order management system and sales system may contain other attributes on Customer that are not defined in the composite view, so that data is not going to be available. If those other attributes are critical to the functionality of the order management and sales systems, they may have "not null" constraints set on their corresponding columns in the databases. In this scenario, the create operation will fail when it is propagated to the order management and sales systems. It will not be uncommon to find these and other types of referential integrity constraints in your source systems that can complicate what may seem like a simple write operation. The bottom line is that there may be complex relationships in your enterprise data that may make it impractical to perform write operations through an EII layer that hides these application-specific relationships.
6. "Our product can be clustered, but that doesn't guarantee you'll have proper backups."
When you use an EII product to implement an enterprise data access layer, one of the things you'll need to be concerned about is how to ensure availability, since all of your applications are now dependent on it. Most EII products offer some type of clustering solution to address availability; however, the sophistication of the clustering solution will vary greatly from vendor to vendor. In a clustered deployment, when you have backup instances running, the challenge is how to migrate state to the backup instance when the main one goes down. In a typical application, we generally think of state as the contextual information that is maintained across multiple client requests. The typical usage scenario for an EII product won't generally require support for maintaining this type of stateful information. That is usually managed in the application logic layer that is calling the EII solution. However, to support distributed transactions the EII product will have to maintain state to keep track of the steps in the execution of a transaction so that it knows when to commit or rollback the distributed data sources. In a clustered setup, if the instance that is coordinating this transaction fails, that information needs to be migrated to the backup instance so that it can correctly complete the execution of this transaction. Support for these types of scenarios is where you start to see the varying levels of sophistication in each vendor's clustering solution. As mentioned earlier, most of them will offer clustering, but not all of them can transparently migrate state from one instance to the other. If you operate in a high capacity environment with transactions that span multiple sources, this kind of functionality is critical to ensure the integrity of your data. When you're evaluating EII products, make sure you ask the vendors about these kinds of details in their clustering solutions.
7. "Tuning this thing for performance can be a nightmare."
Different applications have different data access and usage patterns. Some applications may produce a lot of transactions but may only access a small amount of data in each transaction, while in another application the transaction throughput may be small but the volume of data that is accessed is very large. The ways you'd tune a product for these two types of applications are very different. When you use an EII solution to provide centralized access to your enterprise data sources, you have to accommodate all of the different access and usage patterns of the applications that will be integrated with this EII solution. Tuning your infrastructure to support a single application's performance requirements is tricky enough. Trying to tune it to adequately support multiple patterns of usage and access will be even harder. Frequently there will be conflicting configurations - something that optimizes the performance of one application will degrade the performance of another. My advice here is to make sure that you understand the access and usage patterns of the applications that will be integrated with the EII solution and ask the vendors if they can adequately support these patterns in the same deployment. Of course, they will say it depends on what kind of performance requirements you have, so make sure you have well-defined performance criteria for each of those scenarios. Finally, don't just take their words for it, plan for enough time to performance test your EII solution with simulations that reflect the access and usage patterns that are common in your environment.
About Tieu Luu Tieu Luu is an Associate with Booz Allen Hamilton where he works on architectures and strategies for large enterprise systems. Prior to Booz Allen Hamilton, Tieu held lead engineering positions at companies including Grand Central Communications, Mercator Software, and Aether Systems where he worked on the development of integration and mobile computing platforms. You can read more of Tieu’s writing at his blog at http://thluu.blogspot.com.
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
Click to Add our RSS Feeds to the Service of Your Choice: