The i-Technology Media!
Register | Log in
   
 
.NET  ·  AJAX  ·  CLOUD  ·  ECLIPSE  ·  FLEX  ·  OPEN WEB  ·  iPHONE  ·  JAVA  ·  LINUX  ·  OPEN SOURCE  ·  ORACLE  ·  PBDJ  ·  SEARCH  ·  SILVERLIGHT  ·  SOA  ·  VIRTUALIZATION  ·  WEB 2.0  ·  WIRELESS  ·  XML
Comments
Drool, Britannia? Is the UK Failing the Cloud?
By Roger Strukhoff
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Jan. 8, 2012 11:38 AM EST
read more & respond »
Cloud Expo on Google News
Did you read today's front page stories & breaking news?

Cloud Expo & Virtualization 2011 West
Keynotes
Oracle
Opening Keynote | An Enterprise Cloud for Business-Critical Applications
Abiquo
Day 2 Keynote | The Enterprise Cloud Tightrope - Balancing for Success
Akamai
Day 3 Keynote | The DNA of an Enterprise Cloud
DIAMOND SPONSOR:
Oracle
Many Clouds, Many Choices'Cloud
PLATINUM PLUS SPONSORS:
Abiquo
Enterprise Cloud Best Practices - Town Hall - Join the discussion…
PLATINUM SPONSORS:
Intel
Progressing Toward the Federated, Automated and Client-Aware Cloud
New Relic
How to build an app with Twitter-like throughput
Rackspace
Computing in the Cloud Era
GOLD SPONSORS:
Gale Technologies
Practical Cloud Migration
IBM
Re-think IT. Re-inventing Business.
Intel/McAfee
Identity Driven Security in the Cloud
PerspecSys
Hackers Hackers Everywhere, Is My Public Cloud That Safe?
Red Hat
Unlock the Value of the Cloud
SHI
Mission Critical Applications and the Cloud - Myth or Reality?
SoftLayer
Not Your Grandpa's Cloud
Terremark
Integrating Enterprise Clouds
VMware
Upgrade to a vCloud
POWER PANELS:
Cloud Expo Silicon Valley: CTO Power Panel
Cloud Expo Silicon Valley: CEO Power Panel
Cloud Expo Silicon Valley: Cloud SuperStars Panel
Cloud Expo Silicon Valley: CloudNOW Panel
Click For 2010 West
Event Webcasts
Cloud Expo & Virtualization 2011 East
DIAMOND SPONSOR:
Dell
Dell & VMware Deliver the Enterprise Hybrid Cloud
PLATINUM PLUS SPONSORS:
Abiquo
Are Financial Services Organizations Risking Security by Avoiding Cloud Computing?
Oracle
From Consolidation to Enterprise Private PaaS
PLATINUM SPONSORS:
Intel
Driving the Transformation to Next Generation Cloud Data Centers
Rackspace
The Inevitability of an Open Cloud
GOLD SPONSORS:
CA Technologies
Follow YOUR path to Cloud Computing
Interxion
Who Keeps the Cloud in the Air?
Microsoft
Patterns for Cloud Computing
PerspecSys
War in the Clouds: Are you ready?
ServiceMesh
The Big Win: Stop Playing Small-Ball with Your Cloud Strategy
Terremark
Evaluating Enterprise Clouds
Xiotech
Cloud Storage: Myths and Realities
POWER PANELS:
Cloud Expo New York: CTO Power Panel
Cloud Expo New York: CEO Power Panel
Cloud Expo New York: CMO Power Panel
Cloud Expo New York: Wrap-Up Power Panel
Click For 2010 West
Event Webcasts
Live Google News by SYS-CON!
Top Three Links You Must Click On


Feature
Leveraging IT Assets
Leveraging IT Assets

By: Anupam Singh
Oct. 3, 2003 11:19 AM

XML is establishing itself as the standard for exchange of information across enterprises. However, the technology that allows enterprise-class applications to deal with XML processing is still not clearly formulated. This causes most enterprise customers to implement their own architecture. Additionally, their software implementations try to deal with the same set of basic XML processing questions in different layers of the enterprise, rather than as a whole.

Consider a fictitious company that has been using plain text documents or spreadsheets to report its financials. In the late 1990s, the company is asked by regulatory authorities to report more detailed financial information in a new format. The new data format is XML and the standard is XBRL (Extensible Business Reporting Language), an emerging XML-based open source specification for exchanging and processing financial information.

Despite reluctance due to the high-end costs of technology adoption, the CEO agrees to adopt the standard throughout the enterprise. An IT project is begun to foster the adoption of this particular format as the lingua franca for exchange of financial information between different parts of the company.

An IT team rolls up their sleeves and gets to work. Before the team can create the necessary IT infrastructure that allows existing applications to communicate with each other using the new data format, it stumbles upon the following questions:

  • Should the IT department create a new application that talks to all other applications?
  • Will the IT department purchase new and expensive software to make the new format pervasive?
  • What happens to legacy systems - older and reliable systems still widely used in key business functions like billing?
  • Given that financial reporting is such a key part of the enterprise, how does the company ensure that their software solutions based upon the new format are reliable?

    The IT department decides to design an application to generate one XBRL balance sheet for the entire company. As it sets out to develop this application, it needs to consider how to utilize the various databases throughout the enterprise that house important financial data. The obvious candidate step is the storage of XML as a Binary Large Object (BLOB). However, the IT department must examine additional steps and explore the possibility of pushing most, if not all, work into a relational database.

    The IT department decides to generate XBRL balance sheets from "raw" data for every division of the company. Raw data could be paper-based data, data in spreadsheets, or data in relational databases.

    The XBRL balance sheets generated by each division are further consolidated, analyzed, and verified to form one single XBRL balance sheet for the company. It is then issued to regulatory authorities as a set of XBRL documents.

    As any application development team would do, the IT department tries to identify the core requirements of this application. The main requirements are identified as XML consumption from many sources, XML storage in an efficient and reliable manner, searching XML to perform analysis and verification, and transforming XML to generate other XML formats (see Figure 1).

    Using the XBRL balance sheet example, IT developers decide to prototype an XML processing engine inside the application as a simple Java program that uses existing (and free) software components. The prototype uses simple Java programs to map specific tables to specific XML formats; the file system as a simple XML storage mechanism; and the XSLT processor to run transformations over the XML. The architecture looks something like Figure 2.

    Almost immediately after putting together the prototype, the IT development team realizes the basic inefficiencies in it. The mapping of SQL to XML seems to create a two-step processing of SQL result sets to XML. Additionally, there is the cost of developing transformation programs that create main memory representations of XML documents. And last but not least, the notion of the file system as a database is extremely antiquated.

    The key lesson learned from the prototype is that XML processing cannot be solved as a single problem. Instead, there are different pieces to the XML processing puzzle (see Figure 3).

  • Shredding/mapping of incoming XBRL: When the information is published in XML from the divisions, the application might want to map it back to relational data for some legacy application.
  • Storage of incoming XBRL: If the application does not need to map it to its relational format, the application developer might choose to store the whole XML document into the database.
  • Indexing of incoming XBRL: Though not an immediate and obvious requirement of our application, it is important to create the right indexes to make transformation and searching of the XML as painless as possible.
  • Query-stored XML: To run analysis queries, the application developer will need to run queries across the XBRL documents. For example, "Show me all of the divisions whose net assets are less than the net assets from the same quarter last year."
  • Combine XML streams: In many cases, the application might have the requirement of "pulling" in some other information from a Web service, file system, or spreadsheet. In this case, the application needs to add logic to pull data from these sources and combine this information from "local" XML.
  • Transform XML: Even though we are focused on XBRL as a sample scenario, the same application might need to support another set of XML formats. Instead of writing a new application for each new XML format, the application might need a declarative interface to transform one form of XML to another.

    After identifying the above requirements, the IT department has a choice to build the whole application, buy separate software to fulfill these needs, or evaluate existing pieces of software that provide these functionalities. The IT department knows that the solution has to be robust in terms of availability, concurrency, recovery, and transactions. These characteristics bring the relational database to mind.

    However, a natural question arises, "Why do we want to work the XML processing functionality into a relational database?" The answer is simple. Enterprise IT departments have invested heavily in relational databases because of scalability, high availability, reliability, and manageability. These requirements are the soul of any IT project. Special XML storage and customized XML processing engines could not have automatically guaranteed the above features without the years of research and development that have already gone into relational databases.

    The relational database performs well for storage, indexing, and querying, but mostly with structured data. For semistructured data like XML, what are the features or solutions that the IT department needs from the database? How does one gain the functionality without loosing the robustness?

    The challenge for database vendors is to deal with these new requirements without sacrificing the primary characteristics of a database. Luckily, each XML requirement described above translates to a specific feature in the database. For now, let's continue to solve the problem from the point of view of an IT developer.

    In the next few sections, the IT developers classify the solutions into "immediate" and "desired" solutions in the context of a relational database. Immediate solutions can be seen in many commercial databases in one form or the other. (It should be noted that we might use nonstandard SQL extensions for illustration purposes.) The desired solutions are longer-term solutions, which try to create interoperable solutions between W3C XML standards and the familiar SQL-oriented interface in relational databases.

    The W3C has issued initial drafts of an XML Query Language called XQuery. Without going into the details of the language, XQuery provides excellent syntactical support for query and transformation of XML data. We will refer to this standard throughout our detailed analysis of database requirements.

    Shred
    Immediate solution

    Shredding or mapping of an incoming XML document is the stage of mapping a hierarchical, tree-structured XML document into a relational table. The simplest mapping tool would be an embedded XPath processor in SQL. A set of XPath queries will identify the nodes that are to be mapped into a table. Furthermore, the XPath queries will use XML Schema information to map from an XML Schema type to the table.

    The advantage of this solution is that the data mapping is taking place in the database and the IT developer does not maintain a set of XPath programs outside of the database. Of course, this assumes that the relational database has XPath extensions to the SQL language that look something like the following:

    insert into assets_table
    select xmlextract("/group/ci:statements.balanceSheet/
    ci:Assets.currentAssets[@numericContext='C0101'
    ]/text()", balancesheets)
    from xmltable

    It should be noted that the above example is not standard ANSI-SQL but an illustration of extending the SQL language with XPath queries. In the statement above, the SQL language is used to extract the currentAssets XML element from all company balance sheets stored in the balancesheets table and insert it into a relational column in assets_table.

    Desired solution
    A more declarative solution could be an XQuery-based engine that maps incoming XML documents into SQL tables. The most standard mapping available between XML and SQL is SQLX.

    Using XQuery queries, the incoming document could be translated to one or more SQLX documents. These documents can be automatically mapped into the database. Of course, this assumes that the SQL interface in the relational database has been augmented by an XQuery interface and allows "automatic" insertion from SQLX documents into SQL tables.

    Store
    Immediate solution

    Sometimes shredding might not be the best solution because the IT department is required to store the whole XML document to maintain the "original" copy of transactions. The best examples are financial data like mortgage applications, trades, analyst reports, and balance sheets.

    Storage of the complete XML document is closely tied with the extent of support for a native XML data type in an RDBMS type. In the simplest solution, the XML document can be stored as a BLOB. But, the BLOB storage does not help the application developer. The storage needs to be complemented with an indexing scheme that allows searches on these documents to run faster. Some of these indexing schemes are described in greater detail in the indexing section.

    Desired solution
    The RDBMS should identify XML documents as separate, distinct types within the SQL system. As in any other type, the application developer should be able to validate the incoming data against user-defined constraints. In the case of XML, the XML should be parsed for validity against the user-defined XML Schema. For documents without well-defined XML Schema, the document should be at least valid XML. Of course, the insertion, deletion, and updates of XML documents should be transactionally consistent.

    Another aspect is updates of parts of the XML documents. There should be a standard language (a la SQL) that allows the user to express updates to an XML document. In time, XQuery is anticipated to have updates added as part of the standard syntax.

    Index
    Immediate solution

    The short-term solution for indexing has been extraction of the data into a base SQL-type column. Conventional relational indexes like B-Trees are then constructed on this base. Whenever XML Query is run, the index is used to qualify the XML document.

    There are many problems with this short-term solution. Instead of qualifying a particular element within the document, the whole XML document is qualified. For example, in an XBRL document, if you wanted the total assets only when the total liabilities were greater than $1 million, this solution would only qualify the document without giving the exact part of the XML document that the application needed.

    Another case of "non-native" indexing is when all information inside XML documents is shredded and indexed. Even though all values are now indexed using powerful SQL indexes, the fidelity of the XML document is lost, the contextual nature of information is all but destroyed, and the re-creation of the original XML document becomes impossible.

    Desired solution
    As can be seen, all immediate/short-term solutions are focused on reusing relational indexes that will require a very sophisticated two-way application-level mapping between XML and relational data. This mapping defeats the whole purpose of using the relational engine to reduce the code needed in the application layer.

    Therefore, the desired solution is for the database to have native XML indexes. There are many types of native XML indexes:

  • Path-based indexes: These are indexes that allow paths to be resolved without going through the whole document. Thus, the context-based queries can be immediately resolved using the path indexes.
  • Value indexes: These are indexes that allow fast retrieval of XML fragments based on the value within the elements. Thus, all predicated queries can be immediately resolved using the value indexes.
  • Link indexes: These are indexes that allow for faster navigation between fragments of XML data. In many cases, link indexes are useful when re-creating the XML fragment that the application needs.

    When all three indexes are available, entire XPath queries can be resolved using the indexes. This avoids the cost of parsing or traversing the entire XML document for each and every XPath query. An example is shown below.

    "/transaction[transactionData/tradeId >=10 and
    transactionData/tradeId <=20]//trade/tradeHeader"

    Consider the above XPath query to illustrate the usage of native XML indexes. The initial path (in blue) "/transaction" is resolved using a path index. The predicates (in green) "transactionData/tradeId >=10" and "transactionData/ tradeId >=10" are resolved using value indexes. The output fragment (in red) "trade//tradeHeader" is recreated using a link index.

    Query
    Immediate solution

    Where there is data, there are queries. When storage of XML documents has been added to the database, the query language for the database should be modified for searching within these XML documents. Currently, many, if not all, databases support XPath queries embedded inside SQL queries.

    As the IT developer knows, there are many XPath implementations available, so why would one choose the XPath processor inside the database? Apart from performance, is there any other gain in performing XPath queries inside the database?

    The answer lies in SQL-XML interoperability. In many cases, the application developer needs to qualify XML data based on SQL data or vice versa. For example, consider an application that qualifies an employee's XML entry based on some relational information. We will go back to using our XPath extensions to SQL

    select manager, ssn
    from employeetab, t1
    where
    "/hr/employee[manager='John Cox']
    [contacts/phone/mobile][name/@ssn =
    " + t1.ssn + "]/name/@ssn" xmltest xmlcol

    In the above query written using nonstandard XPath extensions to SQL, we use information stored in the relational database to qualify an XML document. There are many semantic problems with the above example - most of them caused by mixing of the relational data model with the hierarchical XML data model. That brings us to the desired solution.

    Desired solution
    Even if the XPath queries are embedded in SQL, they are not fully interoperable. Therefore, a fully functional solution to XML querying inside relational databases would:

  • Provide complete duality between the SQL as a query language and XML as relational data, i.e., SQL column values can be used to qualify XML data and vice versa.
  • Provide XQuery support for duality between XML Query Languages and relational/XML data stored in the RDBMS.
  • Provide a high-performance engine that uses age-old, relational optimization techniques to use the indexes described above.

    Combine
    Immediate solution

    In many cases, the basic assumption of data processing in a database implies local storage. But, in the XBRL example, the application developer has to fetch data from different divisions. This data could be transported to the databases as XML documents returned from a Web service, a plain text XML file in the file system, data available at a Web site, or plain relational data sent as a comma-separated file.

    The short-term solution for the application developer is to import all the external data inside the database. But this could create multiple copies of the data as well as create an issue of synchronization between the data sources. That brings us to the desired solution.

    Desired Solution
    Even though we used the colloquial term of "combine" to express data from heterogeneous sources, the better technical term is "federation." Most relational databases have architectures to federate sources of relational data. This infrastructure needs to be extended to streams of XML information available through wrappers, Web services, and Web sites. On the API side, given that these sources are hierarchical in nature, queries across these sources are best expressed in a language like XQuery rather than SQL.

    Transform
    Immediate solution

    Currently there are very few solutions for transformation of XML from one format to another. For example, what if an analyst wanted to use the XBRL document to create a report in RIML formats for a trader? What if the trader wanted to use this information to create an FpML (Financial Products Markup Language)-based trade? The current solutions lean very much toward additional clauses in SQL that let you generate different types of XML. However, these syntactical extensions to SQL are very restrictive as they provide a limited range of formats, are not declarative, and do not let users specify a series of nested XML transformations from one to another and so on.

    select * from balancesheets for xml

    In the above example, we transform relational data stored in the table balancesheets into XML data using a simple syntactical construct called "for xml." The results of this query are now generated in XML rather than the base types that application developers are accustomed to.

    Desired solution
    XQuery provides excellent syntactical support for generation of XML documents. A series of nested XQuery functions and queries could provide the infrastructure for transforming XML documents from one format to another.

    Apart from the ability to query XML data (a la SQL), the return clause in XQuery can be used to create XML documents of different formats. The next logical step would be to give complete interoperability between XQuery across relational and XML data.

    Using this detailed analysis, the IT department decides to use some immediate solutions around the database features described above. The focus of the IT project shifts from developing an XML infrastructure to using the robust XML infrastructure provided in the database. In some cases, the application developer might have to create a solution through another software piece or through homegrown solutions. But it's better to evaluate and use the infrastructure provided by existing relational database technology that has the credibility earned after years of research, development, and real-life customer usage.

    Conclusion
    As can be seen from each of the examples, there are many parts to the XML processing puzzle. Many of these parts fit quite elegantly with the roles that traditional databases have played in enterprise-class applications for many decades. For other parts of the puzzle, the relational databases will have to add new features and the relational database users will have a learning curve for these new features. Even though there are many obstacles, for most real-life XML usage, interoperability between relational and XML data cannot be ignored. Traditional characteristics like robustness and reliability cannot be sacrificed. In time, most relational databases will have features that will seamlessly bridge the gap across these different forms of data.

    Published Oct. 3, 2003— Reads 11,743
    Copyright © 2003 SYS-CON Media, Inc. — All Rights Reserved.
    Syndicated stories and blog feeds, all rights reserved by the author.
    About Anupam Singh
    Anupam Singh is a staff software engineer with Sybase, Inc. His work
    has focused on querying and indexing semi-structured data within an
    RDBMS engine. Mr. Singh is also a founding member and project lead of
    Sybase's XML Query Engine Development Team, which works within the
    Database Server Technology Group at Sybase. Because of the Query
    Engine Team's work in XML storage, indexing and querying, Sybase
    Adaptive Server Enterprise is the prominent selection of companies
    looking for an efficient system to incorporate XML into their IT
    systems.
    © Copyright 2003 Sybase, Inc. All Rights Reserved.

  • Add Your Feedback

    In order to post a comment you need to be registered and logged in.

    Register | Sign-in

    Reader Feedback: Page 1 of 1

    Subscribe to the World's Most Powerful Newsletters
    Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
    Click to Add our RSS Feeds to the Service of Your Choice:
    Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
    myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
    Publish Your Article! Please send it to editorial(at)sys-con.com!

    Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

    SYS-CON Featured Whitepapers

    ADS BY GOOGLE

    Breaking Java News
    AEGON UK CEO Adrian Grace Appointed to Management Board
    AEGON UK CEO Adrian Grace Appointed to Management Board
    SES: Recurring Revenue and EBITDA Growth of 2.8% and 3.1% Respectively
    Autism Care UK Launches Inspirational New Mission Statement for 2012 and Beyond
    Minera IRL Announces Commencement of Ollachea Exploration Tunnel
    Chiltern Wins Two Pharma Awards at the Prestigious Annual Pharmatimes Awards Ceremony
    Higher Gold Price Contributes to Improved Earnings
    OctoPlus announces 2011 annual results
    AEGON Maintains Strong Capital Position

    ADVERTISE   |   MAGAZINE SUBSCRIPTIONS   |   FREE BREAKING-NEWSLETTERS!   |   SYS-CON.TV   |   BLOG-N-PLAY!   |   WEBCAST   |   EDUCATION   |   RESEARCH

    .NET Developer's Journal - .NETDJ   |   ColdFusion Developer's Journal - CFDJ   |   Eclipse Developer's Journal - EDJ   |   Enterprise Open Source Magazine - EOS
    Open Web Developer's Journal - OPENWEB   |   iPhone Developer's Journal - iPHONE   |   Virtualization - Virtualization   |   Java Developer's Journal - JDJ   |   Linux.SYS-CON.com
    PowerBuilder Developer's Journal - PBDJ   |   SEO / SEM Journal - SJ   |   SOAWorld Magazine - SOAWM   |   IT Solutions Guide - ITSG   |   Symbian Developer's Journal - SDJ
    WebLogic Developer's Journal - WLDJ   |   WebSphere Journal - WJ   |   Wireless Business & Technology - WBT   |   XML-Journal - XMLJ   |   Internet Video - iTV
    Flex Developer's Journal - Flex   |   AJAXWorld Magazine - AWM   |   Silverlight Developer's Journal - SLDJ   |   PHP.SYS-CON.com   |   Web 2.0 Journal - WEB2
    Apache   |   CMS   |   CRM   |   HP   |   Oracle Journal   |   Perl   |   Python   |   Red Hat   |   Ruby on Rails   |   SAP   |   SaaS

    SYS-CON MEDIA:   ABOUT US   |   CONTACT US   |   COMPANY NEWS   |   CAREERS   |   SITE MAP
    SYS-CON EVENTS:   |  AJAXWorld Conference & Expo  |  iPhone Developer Summit  |  Cloud Computing Conference & Expo  |  SOA World Conference & Expo  |  Virtualization Conference & Expo
    INTERNATIONAL SITES:   India  |  U.K.  |  Canada  |  Germany  |  France  |  Australia  |  Italy  |  Spain  |  Netherlands  |  Brazil  |  Belgium
     Terms of Use & Our Privacy Statement     About Newsfeeds / Video Feeds
    Copyright ©1994-2008 SYS-CON Publications, Inc. All Rights Reserved. All marks are trademarks of SYS-CON Media.
    Reproduction in whole or in part in any form or medium without express written permission of SYS-CON Publications, Inc. is prohibited.
     
    close this window