Home » Solutions » Mainframe Migration

GemStone in Mainframe Migration

GemStone in Mainframe Migration


GemFire and Mainframe Migration

The cost of maintaining a legacy mainframe in the 21st century has become both very expensive and unnecessary. Expensive because increased transaction rates and data volumes have dramatically driven up costs while the lack of cost efficient, real-time data processing cycles prevents customers from detecting perishable opportunities, compliance issues or fraud. Unnecessary because distributed computing hardware coupled with a distributed data management system like GemFire can offload mainframe data processing at a fraction of the economic and opportunity cost. The distributed computing environment used with GemFire allows for

  • Linear scalability at a much lower cost
  • Takes advantage of cheaper hardware

Easier to:

  • Migrate application to modern languages
  • Better maintain applications and data

How do we do it? Let’s look first at how archaic mainframe data forms are able to speak to GemFire . Then we’ll look at a use case in which a batch process was migrated from the mainframe to a GemFire-enabled distributed environment. Finally, we’ll review some of the stumbling blocks people typically associate with migrating from the mainframe.

Data Forms

Data exists in two primary forms on the mainframe: either ISAM files or DB2.

  • DB2 data can be extracted using standard ETL tools or by directly connecting GemFire loaders via JDBC.
  • ISAM files can be exported to a distributed world via FTP.
  • Any other data forms can be exported via CICS based products like SOLA (Service Oriented Architecture).

Often times it is thought that compute grids will overwhelm the I/O channel of the mainframe, but GemFire mitigates this worry because it limits the number of nodes that attach to the I/O channel.

Data processing

Data processing takes two main forms in the mainframe environment.

  • OLTP — typically CICS based
    • Distributed processing is much better at transaction processing than the mainframe for most classes of applications.
  • Batch – file in, crunch records, file out
    • Many of these processes are embarrassingly parallel processing tasks which are ideally suited for distributed processing and can result in a greater than 100x improvement in run-time at a fraction of the cost.

Example of a batch process: Account Master File Maintenance.

This class of applications consists of three steps:

  • Loop processing by each account
  • Process each type of trade record (orders, fills, credit trades, options, futures etc)
  • Update account data (balance, portfolios, etc)

This example can increase indefinitely by account (can get into the millions) and often by trade types, so it will benefit greatly by moving to a scalable, distributed environment. In a recent use case, a batch that commonly takes 120 minutes to run on a mainframe was reduced to 1 minute on a GemFire-enabled distributed compute-grid, comprised of:

  • 25 servers with 2 Quad Core Xeon 5570 2.93 GHz processors per server
  • Red Hat EL 5
  • 2 Gigabit aggregate Ethernet capacity
  • 4x 1 Gigabit switches
  • 16 JVMs per server with 8 GB heap per JVM

There are three primary approaches to offloading mainframe data processing:

  • Export the actual COBOL code via tools like MicroFocus COBOL and only rewrite the file handler I/O layer to take advantage of GemFire
  • Re-write the compute-intensive parts of the application in a modern day language such as Java
  • Re-write entire applications (or CICS transactions) in a modern day language

What are the stumbling blocks that get in the way of offloading mainframe processing to the distributed world?

  • Reliability – there is a perception that the mainframe never fails, and that distributed processing systems fail frequently
    • The reality is that given modern compute and data-grid architectures, distributed systems can actually be more reliable than the mainframe
  • COBOL code – lack of developer talent that can port COBOL code to another language
    • This problem will only get worse over time. There will be even fewer COBOL programmers available in the next decade.
  • Assembler code – lack of developer talent that can port Assembler code that handles I/O, etc from the mainframe.
    • This problem may not be quite as bad as it sounds, because there is a high likelihood that the same Assembler code was re-used again and again, so it may only need to be re-written once, and since the I/O patterns are very different it is probably a pure re-write not a “port”.

Aside from drastically reducing the economic and opportunity cost of mainframe data processing, what else does GemFire help with?

  • Reliability of the data-management layer
  • Horizontal partitioning of large data-sets
  • Co-location of related data
  • Load-balancing of data processing across multiple servers with locality of reference (i.e. data affinity)
  • Data off-boarding
    • GemFire can centralize the data extract process such that only one member of the data grid actually connects to DB2
    • GemFire provides Lazy Load semantics on cache “miss”. But we need either DB2 or SOLA on the mainframe to enable this
  • Data re-boarding
    • Once the processing is complete, GemFire can lazily re-load the data back to the mainframe if necessary.

Migrating mainframe processing to a GemFire-enabled distributed environment is not only much cheaper than maintaining legacy mainframes, but also cuts batch processing time by a factor of 120×. Thus, GemFire is able to not only drastically reduce infrastructure costs, it also frees up processing power, drastically lowering opportunity cost.