Apache®, Spark, and the flame logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
This article describes an end-to-end modernization plan for mainframe and midrange data sources.
Architecture
Download a Visio file of this architecture.
Dataflow
The following dataflow outlines a process for modernizing a mainframe data tier. It corresponds to the preceding diagram.
Mainframe and midrange systems store data in data sources, like file systems (VSAM, flat file, LTFS), relational databases (Db2 for z/OS, Db2 for IBM i, Db2 for Linux UNIX and Windows), or nonrelational databases (IMS, ADABAS, IDMS).
The object conversion process extracts object definitions from source objects. The definitions are then converted into corresponding objects in the target data store.
- SQL Server Migration Assistant (SSMA) for Db2 migrates schemas and data from IBM Db2 databases to Azure databases.
- Managed Data Provider for Host Files converts objects by:
- Parsing COBOL and RPG record layouts, or copybooks.
- Mapping the copybooks to C# objects that .NET applications use.
- Use a custom tool to convert database objects from Db2 to Azure Database for PostgreSQL. Note that you might have to request access to this tool.
- Third-party tools perform automated object conversion on nonrelational databases, file systems, and other data stores.
Data is ingested and transformed. Mainframe and midrange systems store their file system data in EBCDIC-encoded format in file formats like:
COBOL, PL/I, and assembly language copybooks define the data structure of these files.
a. FTP transfers mainframe and midrange file system datasets with single layouts and unpacked fields in binary format and corresponding copybook to Azure.
b. Data is converted. The Azure Data Factory custom connector is a solution developed by using the Host File client component of Host Integration Server to convert mainframe datasets.
Host Integration Server integrates existing IBM host systems, programs, messages, and data with Azure applications. Host Integration Server is a Host File client component that you can use to develop a custom solution for dataset conversion.
The Azure Data Factory custom connector is based on the open-source Spark framework, and it runs on Azure Synapse Analytics. Like other solutions, it can parse the copybook and convert data. Manage the service for data conversion by using the Azure Logic Apps Parse Host File Contents connector.
c. Relational database data is migrated.
IBM mainframe and midrange systems store data in relational databases like these:
These services migrate the database data:
- Data Factory uses a Db2 connector to extract and integrate data from the databases.
- SQL Server Integration Services handles various data ETL tasks.
d. Nonrelational database data is migrated.
IBM mainframe and midrange systems store data in nonrelational databases like these:
- IDMS, a network model database management system (DBMS)
- IMS, a hierarchical model DBMS
- Adabas
- Datacom
Third-party products integrate data from these databases.
Azure services like Data Factory and AzCopy load data into Azure databases and Azure data storage. You can also use third-party solutions and custom loading solutions to load data.
Azure provides many managed data storage solutions:
Azure services use the modernized data tier for computing, analytics, storage, and networking.
Client applications also use the modernized data tier.
Components
Data storage
- SQL Database is part of the Azure SQL family. It's built for the cloud and provides all the benefits of a fully managed and evergreen platform as a service. SQL Database also provides AI-powered automated features that optimize performance and durability. Serverless compute and Hyperscale storage options automatically scale resources on demand.
- Azure Database for PostgreSQL is a fully managed relational database service that's based on the community edition of the open-source PostgreSQL database engine.
- Azure Cosmos DB is a globally distributed multimodel NoSQL database.
- Azure Database for MySQL is a fully managed relational database service that's based on the community edition of the open-source MySQL database engine.
- SQL Managed Instance is an intelligent, scalable cloud database service that offers all the benefits of a fully managed and evergreen platform as a service. SQL Managed Instance has near-100% compatibility with the latest SQL Server Enterprise edition database engine. It also provides a native virtual network implementation that addresses common security concerns.
- Azure Data Lake Storage is a storage repository that holds large amounts of data in its native, raw format. Data lake stores are optimized for scaling to terabytes and petabytes of data. The data typically comes from multiple heterogeneous sources. It can be structured, semi-structured, or unstructured.
Compute
- Data Factory integrates data across different network environments by using an integration runtime (IR), which is a compute infrastructure. Data Factory copies data between cloud data stores and data stores in on-premises networks by using self-hosted IRs.
- Azure Virtual Machines provides on-demand, scalable computing resources. An Azure virtual machine (VM) provides the flexibility of virtualization but eliminates the maintenance demands of physical hardware. Azure VMs offer a choice of operating systems, including Windows and Linux.
Data integrators
- Azure Data Factory is a hybrid data integration service. In this solution, an Azure Data Factory custom connector uses the Host File client component of Host Integration Server to convert mainframe datasets. With minimal setup, you can use a custom connector to convert your mainframe dataset just as you'd use any other Azure Data Factory connector.
- AzCopy is a command-line utility that moves blobs or files into and out of storage accounts.
- SQL Server Integration Services is a platform for creating enterprise-level data integration and transformation solutions. You can use it to solve complex business problems by:
- Copying or downloading files.
- Loading data warehouses.
- Cleansing and mining data.
- Managing SQL Server objects and data.
- Host Integration Server technologies and tools enable you to integrate existing IBM host systems, programs, messages, and data with Azure applications. The Host File client component provides flexibility for data that's converted from EBCDIC to ASCII. For example, you can generate JSON/XML from the data that's converted.
- Azure Synapse Analytics brings together data integration, enterprise data warehousing, and big data analytics. The Azure Synapse conversion solution used in this architecture is based on Apache Spark and is a good candidate for large mainframe-dataset workload conversion. It supports a wide range of mainframe data structures and targets and requires minimal coding effort.
Other tools
- SQL Server Migration Assistant for Db2 automates migration from Db2 to Microsoft database services. When it runs on a VM, this tool converts Db2 database objects into SQL Server database objects and creates those objects in SQL Server.
- Data Provider for Host Files is a component of Host Integration Server that uses offline, SNA, or TCP/IP connections.
- With offline connections, Data Provider reads and writes records in a local binary file.
- With SNA and TCP/IP connections, Data Provider reads and writes records stored in remote z/OS (IBM Z Series Mainframe) datasets or remote i5/OS (IBM AS/400 and iSeries systems) physical files. Only i5/OS systems use TCP/IP.
- Azure services provide environments, tools, and processes for developing and scaling new applications in the public cloud.
Scenario details
Modern data storage solutions like the Azure data platform provide better scalability and performance than mainframe and midrange systems. By modernizing your systems, you can take advantage of these benefits. However, updating technology, infrastructure, and practices is complex. The process involves an exhaustive investigation of business and engineering activities. Data management is one consideration when you modernize your systems. You also need to look at data visualization and integration.
Successful modernizations use a data-first strategy. When you use this approach, you focus on the data rather than the new system. Data management is no longer just an item on the modernization checklist. Instead, the data is the centerpiece. Coordinated, quality-oriented data solutions replace fragmented, poorly governed ones.
This solution uses Azure data platform components in a data-first approach. Specifically, the solution involves:
- Object conversion. Converting object definitions from the source data store to corresponding objects in the target data store.
- Data ingestion. Connecting to the source data store and extracting data.
- Data transformation. Transforming extracted data into appropriate target data store structures.
- Data storage. Loading data from the source data store to the target data store, both initially and continually.
Potential use cases
Organizations that use mainframe and midrange systems can benefit from this solution, especially when they want to achieve these goals:
- Modernize mission-critical workloads.
- Acquire business intelligence to improve operations and gain a competitive advantage.
- Remove the high costs and rigidity that are associated with mainframe and midrange data stores.
Considerations
These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that can be used to improve the quality of a workload. For more information, see Microsoft Azure Well-Architected Framework.
Security
Security provides assurances against deliberate attacks and the abuse of your valuable data and systems. For more information, see Design review checklist for Security.
- Be aware of the differences between on-premises client identities and client identities in Azure. You need to compensate for any differences.
- Use managed identities for component-to-component data flows.
- When you use Data Provider for Host Files to convert data, follow the recommendations in Data Providers for Host Files security and protection.
Cost Optimization
Cost Optimization is about looking at ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Design review checklist for Cost Optimization.
- SQL Server Migration Assistant is a free, supported tool that simplifies database migration from Db2 to SQL Server, SQL Database, and SQL Managed Instance. SQL Server Migration Assistant automates all aspects of migration, including migration assessment analysis, schema and SQL statement conversion, and data migration.
- The Azure Synapse Spark-based solution is built from open-source libraries. It eliminates the financial burden of licensing conversion tools.
- Use the Azure pricing calculator to estimate the cost of implementing this solution.
Performance Efficiency
Performance Efficiency is the ability of your workload to scale to meet the demands placed on it by users in an efficient manner. For more information, see Design review checklist for Performance Efficiency.
- The key pillars of performance efficiency are performance management, capacity planning, scalability, and choosing an appropriate performance pattern.
- You can scale out the self-hosted IR by associating the logical instance with multiple on-premises machines in active-active mode.
- Azure SQL Database offers the ability to dynamically scale your databases. In a serverless tier, it can automatically scale the compute resources. Elastic Pool, which allows databases to share resources in a pool, can only be scaled manually.
When you use the Data Provider for Host Files client to convert data, turn on connection pooling to reduce the connection startup time. When you use Data Factory to extract data, tune the performance of the copy activity.
Contributors
This article is maintained by Microsoft. It was originally written by the following contributors.
Principal author:
- Ashish Khandelwal | Principal Engineering Architect Manager
Other contributors:
- Nithish Aruldoss | Engineering Architect
To see non-public LinkedIn profiles, sign in to LinkedIn.
Next steps
Review the Azure Database Migration Guides. Contact Azure Data Engineering - Mainframe & Midrange Modernization for more information.
See the following articles:
- IBM workloads on Azure
- Mainframe rehosting on Azure virtual machines
- Mainframe workloads supported on Azure
- Move mainframe compute to Azure