The i-Technology Media!
Register | Log in
   
 
.NET  ·  AJAX  ·  CLOUD  ·  ECLIPSE  ·  FLEX  ·  OPEN WEB  ·  iPHONE  ·  JAVA  ·  LINUX  ·  OPEN SOURCE  ·  ORACLE  ·  PBDJ  ·  SEARCH  ·  SILVERLIGHT  ·  SOA  ·  VIRTUALIZATION  ·  WEB 2.0  ·  WIRELESS  ·  XML
Comments
Drool, Britannia? Is the UK Failing the Cloud?
By Roger Strukhoff
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Jan. 8, 2012 11:38 AM EST
read more & respond »
Cloud Expo on Google News
Did you read today's front page stories & breaking news?

Cloud Expo & Virtualization 2011 West
Keynotes
Oracle
Opening Keynote | An Enterprise Cloud for Business-Critical Applications
Abiquo
Day 2 Keynote | The Enterprise Cloud Tightrope - Balancing for Success
Akamai
Day 3 Keynote | The DNA of an Enterprise Cloud
DIAMOND SPONSOR:
Oracle
Many Clouds, Many Choices'Cloud
PLATINUM PLUS SPONSORS:
Abiquo
Enterprise Cloud Best Practices - Town Hall - Join the discussion…
PLATINUM SPONSORS:
Intel
Progressing Toward the Federated, Automated and Client-Aware Cloud
New Relic
How to build an app with Twitter-like throughput
Rackspace
Computing in the Cloud Era
GOLD SPONSORS:
Gale Technologies
Practical Cloud Migration
IBM
Re-think IT. Re-inventing Business.
Intel/McAfee
Identity Driven Security in the Cloud
PerspecSys
Hackers Hackers Everywhere, Is My Public Cloud That Safe?
Red Hat
Unlock the Value of the Cloud
SHI
Mission Critical Applications and the Cloud - Myth or Reality?
SoftLayer
Not Your Grandpa's Cloud
Terremark
Integrating Enterprise Clouds
VMware
Upgrade to a vCloud
POWER PANELS:
Cloud Expo Silicon Valley: CTO Power Panel
Cloud Expo Silicon Valley: CEO Power Panel
Cloud Expo Silicon Valley: Cloud SuperStars Panel
Cloud Expo Silicon Valley: CloudNOW Panel
Click For 2010 West
Event Webcasts
Cloud Expo & Virtualization 2011 East
DIAMOND SPONSOR:
Dell
Dell & VMware Deliver the Enterprise Hybrid Cloud
PLATINUM PLUS SPONSORS:
Abiquo
Are Financial Services Organizations Risking Security by Avoiding Cloud Computing?
Oracle
From Consolidation to Enterprise Private PaaS
PLATINUM SPONSORS:
Intel
Driving the Transformation to Next Generation Cloud Data Centers
Rackspace
The Inevitability of an Open Cloud
GOLD SPONSORS:
CA Technologies
Follow YOUR path to Cloud Computing
Interxion
Who Keeps the Cloud in the Air?
Microsoft
Patterns for Cloud Computing
PerspecSys
War in the Clouds: Are you ready?
ServiceMesh
The Big Win: Stop Playing Small-Ball with Your Cloud Strategy
Terremark
Evaluating Enterprise Clouds
Xiotech
Cloud Storage: Myths and Realities
POWER PANELS:
Cloud Expo New York: CTO Power Panel
Cloud Expo New York: CEO Power Panel
Cloud Expo New York: CMO Power Panel
Cloud Expo New York: Wrap-Up Power Panel
Click For 2010 West
Event Webcasts
Live Google News by SYS-CON!
Top Three Links You Must Click On


Java Industry News
High-Performance Batch Processing with Java Enterprise Edition
The benefits

By: Colin Hendricks
Nov. 14, 2007 07:45 AM
  • 1
  • 2
  • 3
  • next ›
  • last »

Enterprise software developers and corporate IT architects have established the Java Enterprise Edition (JEE) platform as a leading choice for building enterprise software applications. The platform is widely used for everything from eCommerce Websites to back office data aggregation systems. Its versatility and reliability as an enterprise computing platform is well established.

But this wasn't always so. Sun initially trumpeted Java as a desktop platform that would bring rich content to Web applications in the form of Java applets that run locally in a user's Web browser. It was also touted as a thick-client desktop application development tool that would be widely used to build applications that could run on any computer (remember write once, run anywhere?).

Sometime in the late nineties, Java application development took a 90 degree turn and ended up resulting in software that mostly runs on corporate servers instead of corporate workstations. Today, a substantial portion of Web applications are delivered on the JEE platform.

Despite the "Enterprise" in its name, the JEE platform was principally designed for handling HTTP requests from Web browsers and performing some business logic in response to each request. It now includes many other technologies, but most of them are related to this mission.

However, as the complexity and disparate uses of Web applications has grown, users and designers of these systems have found many users for JEE beyond just responding to requests from a browser. Many of these uses include common enterprise back office tasks such as batch processing of large volumes of data, and while the JEE platform was not originally designed for such purposes, it is versatile enough to provide viable solutions to these problems.

What Is a Batch?
Batch scenarios arise often in business software applications because of a conflict between the enterprise's desire to respond immediately to customer requests and also analyze the resulting transactions. This requires the speedy capture of the initial transaction with no analysis and then a later batch process to aggregate or optimize the data for reporting, analysis, archive or some other large volume process. It is a safe assumption that every business in the world does some kind of batch processing on their data.

The characteristics of the typical batch process include:
  • A long-running process that must occur on a regularly scheduled basis.
  • The volume of data to be processed is high, usually on the order of thousands to millions of database rows.
  • There may be complex logic or calculations to perform on the data.
  • The process may require a large set of data from some other system that is delivered at a specific time in a large set.
  • The process is run asynchronously from user interactions. It's not part of a user session in an online system. A user does not start it and is not waiting on it to complete.

Why Do Batch Processing in JEE?
The JEE specification was designed for online Web applications and has several limitations with respect to batch processing. For instance, JEE containers are required to manage the life cycle of Enterprise JavaBeans (EJB) and as such might limit the ability to create threads from within these classes.

However, this limitation can be overcome in a couple ways. First, while most JEE containers discourage developers from creating and managing their own threads, they do not prohibit the practice, especially outside the bounds of EJB classes. Therefore, the batch process can do its own threading using the java.util.Concurrent package (available as of Java 5) and on most JEE platforms this causes no trouble. This package provides user-friendly thread pool classes and thread management facilities that make it easier than ever to create multi-threaded applications in Java.

Second, a more spec-compliant approach to multithreading is to use Java Message Service (JMS) messages to create worker threads within the JEE context. This approach is a little more complex to implement but provides the benefits of complying with the JEE specification while also allowing the batch process to span multiple Java Virtual Machine (JVM) instances in a clustering situation. This will be discussed in more detail below.

Another issue with batch processing on the JEE platform is that by default the container manages transactions and session timeouts. The JEE container is inclined to limit how long resources such as database connections, transactions and beans can be monopolized. This is meant to guarantee a high level of service to all users within an online application, but can be problematic for a long-running batch process.

This issue can be addressed by correctly configuring a batch process not to require JEE transactions and to avoid the use of entity beans and stateful session beans that might have timeout or locking problems. Also, be sure to use the pooled resources such as database connections judiciously, releasing them back to the pool when not in use.

In addition to these limitations there is a performance question. Other methods can achieve higher performance than the JEE platform. Batch processing typically involves operations on large volumes of rows stored in a relational database, and a stored procedure implemented directly in the database might offer the fastest performance for most applications. However, there are legitimate reasons to implement the logic in JEE instead.

  • Stored procedures are typically implemented in the version of SQL specific to the database platform and are not portable to other databases. This may not matter for a departmental application but is usually not acceptable for an enterprise software product that must be supported on many different databases.
  • The JEE platform provides complimentary technology such as JCA connections to other systems, Web service calls to other services and other features that might be useful.
  • Logic implemented in Java can reuse other application logic that is also present in the business layer tier of the application.
  • Well-written Java code is usually easier to understand, maintain, and enhance than a collection of stored procedures.
  • JEE servers usually include clustering capabilities that provide the ability to federate multiple, cheap, commodity servers to improve batch processing performance.

These benefits will often outweigh any performance gain that might be achieved using stored procedures. Furthermore, the difference in performance between a Java solution and a database stored procedure solution can be minimized using the techniques described below.

Techniques for High-Performance Batch Processing on JEE
Now that we've covered the limitations and the alternatives, let's discuss how to architect a batch process on the JEE platform for maximum performance. Batch problems are clearly candidates for multi-threaded solutions because the objective is to complete as much work as possible in the shortest time possible and no human user interaction is necessary. Parallel processing using multiple threads is necessary to bring all available computing resources to bear on the problem. Today's multiple core, multiple CPU servers are especially well suited for multi-threaded processing.


  • 1
  • 2
  • 3
  • next ›
  • last »
Published Nov. 14, 2007— Reads 38,297 — Feedback 3
Copyright © 2007 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
About Colin Hendricks
Colin Hendricks is CTO of Rome Corp. He has worked as a software developer and consultant on high-performance, server-side Java systems for the past 10 years.

Add Your Feedback

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

#3
Snehal Antani commented on 27 Jul 2008

Kalyan, to answer your questions:

"what are the hiccups?": a key issue with batch processing using java and application servers relates to JDBC cursors, transactions, and holding cursors across transactions. Checkpointing - committing work periodically so you can restart the job if needed - is important in batch. Checkpointing is achieved by using transactions, JTA transactions specifically. Unfortunately if you use a Type-4 JDBC driver with XA, you're not able to keep cursors open across transactions, therefore you are not easily able to do a "select account from table1" type of query that retrieves all of the accounts to process and leverage some checkpoint strategy as you process those records. There are a few approaches to getting around this: first, we've built a stateful session bean pattern (SFSB) where reads to the DB are done in a local transaction and the writes to the database are done in the global transaction; second, executing smaller queries that are bounded by the checkpoint intervals versus one very large query; third, if you are on z/OS and your data is in DB2 z/OS, to use the Type-2 JDBC driver that allows you to hold cursors across transactions; fourth, to use Last Participant Support, which is the ability to use a single 1-PC resource in a 2-PC (XA) transaction. This problem will plague *every* java-batch solution and a pain due to limitations in XA. The WebSphere XD Compute Grid (aka WebSphere Batch) forum has some posts on this topic, please feel free to ask more questions there: http://www-128.ibm.com/developerworks/forums/forum.jspa?forumID=1240&sta.... Within Compute Grid, we've built the SFSB pattern as part of our Batch Datastream Framework (BDS Framework) to make it simpler to leverage. Using LPS or type-2 drivers is pretty straightforward in WebSphere.

Another important gotcha is workload management and ensuring your batch processing doesn't negatively impact your online transaction (OLTP) workloads (and vice versa). The only way to have a good solution in this area is to use a software stack that integrates with the database and the workload manager. Basically, you need an integrated batch and OLTP platform, not just a batch container.

"app's performance would depend on database specifics": yes, of course, but this is business-as-usual. DB vendors have their own knobs and runtime behaviors that will differ, therefore each has to be optimized in its own way.

"what sort of frameworks have you worked with": I've found Hibernate to not be very good for batch processing. You can read more about why here: http://forum.hibernate.org/viewtopic.php?t=988575&view=next&sid=0aada757.... I've seen customers use IBatis, OpenJPA, raw JDBC, Pure Query, and SQLJ/Static SQL. As the article mentions, getting down to the raw SQL query for Batch can be crucial for performance. I tend to stick to raw JDBC and I use the Batch Data Stream Framework (BDS Framework) to manage the connections, prepared statements, restarting, etc. You can read more about this at: http://www-128.ibm.com/developerworks/forums/thread.jspa?threadID=190623...

#2
Kalyan commented on 13 Nov 2007

This article looks pretty good in its content. Couple of questions though:

# Have you used this architecture on any of the systems that you have implemented? If so, what are the hiccups that you have come across?

# Though you discourage using storedpocs for performance reasons, you say that tweak some database configuration to see if one can get better performance. Wouldn't this make the app's performance (thought not logic) dependent on database specifics?

Interacting with databases is the most important part of any batch processing application that has to save data to the persistent store. It'd be interesting to see what sort of framework (hibernate, ibatis, etc.) have you worked with in this kind of architecture.

#1
Snehal Antani commented on 13 Aug 2007

Interesting article. I recently published an article describing your Dispatcher-Worker pattern for highly parallel batch jobs in the context of WebSphere XD Compute Grid.

http://www.ibm.com/developerworks/websphere/techjournal/0707_antani/0707...

An interesting extension to the your description is depicted in figure 6 of my article- establishing endpoint affinity which enables new caching opportunities.

The minus with using straight JEE5 multi-threading packages versus building on an existing enterprise java batch framework like Compute Grid- the developer would have to manage threading which, for enterprise adopters composed of large development teams, could be more trouble than its worth.


Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON Featured Whitepapers

ADS BY GOOGLE

Breaking Java News
Editorial Feature: North American Oil & Gas Pipelines Applies for BPA Membership
Congressman Fattah Urges Bi-Partisanship Over President's Budget
PCMA Launches New Missouri Ad Campaign: 'That's What PBMs Do'
Verizon Invested More Than $76 Million in Delaware's Wireline Communications, IT Infrastructure in 2011
Credit Suisse Named Best Private Bank Globally by Euromoney Magazine for Third Consecutive Year
BRS Resources to Present at the Oil and Gas Services Conference
MMIC Invests Further in Its Health Information Technology Division
US Department of Labor releases fiscal year 2013 budget request
The London Heathrow Marriott's Italian Restaurant Is Awarded AA Rosette for Culinary Experience
SFL - Invitation to Presentation of 4Q 2011 Results

ADVERTISE   |   MAGAZINE SUBSCRIPTIONS   |   FREE BREAKING-NEWSLETTERS!   |   SYS-CON.TV   |   BLOG-N-PLAY!   |   WEBCAST   |   EDUCATION   |   RESEARCH

.NET Developer's Journal - .NETDJ   |   ColdFusion Developer's Journal - CFDJ   |   Eclipse Developer's Journal - EDJ   |   Enterprise Open Source Magazine - EOS
Open Web Developer's Journal - OPENWEB   |   iPhone Developer's Journal - iPHONE   |   Virtualization - Virtualization   |   Java Developer's Journal - JDJ   |   Linux.SYS-CON.com
PowerBuilder Developer's Journal - PBDJ   |   SEO / SEM Journal - SJ   |   SOAWorld Magazine - SOAWM   |   IT Solutions Guide - ITSG   |   Symbian Developer's Journal - SDJ
WebLogic Developer's Journal - WLDJ   |   WebSphere Journal - WJ   |   Wireless Business & Technology - WBT   |   XML-Journal - XMLJ   |   Internet Video - iTV
Flex Developer's Journal - Flex   |   AJAXWorld Magazine - AWM   |   Silverlight Developer's Journal - SLDJ   |   PHP.SYS-CON.com   |   Web 2.0 Journal - WEB2
Apache   |   CMS   |   CRM   |   HP   |   Oracle Journal   |   Perl   |   Python   |   Red Hat   |   Ruby on Rails   |   SAP   |   SaaS

SYS-CON MEDIA:   ABOUT US   |   CONTACT US   |   COMPANY NEWS   |   CAREERS   |   SITE MAP
SYS-CON EVENTS:   |  AJAXWorld Conference & Expo  |  iPhone Developer Summit  |  Cloud Computing Conference & Expo  |  SOA World Conference & Expo  |  Virtualization Conference & Expo
INTERNATIONAL SITES:   India  |  U.K.  |  Canada  |  Germany  |  France  |  Australia  |  Italy  |  Spain  |  Netherlands  |  Brazil  |  Belgium
 Terms of Use & Our Privacy Statement     About Newsfeeds / Video Feeds
Copyright ©1994-2008 SYS-CON Publications, Inc. All Rights Reserved. All marks are trademarks of SYS-CON Media.
Reproduction in whole or in part in any form or medium without express written permission of SYS-CON Publications, Inc. is prohibited.
 
close this window