beall's picture
From beall rss RSS  subscribe Subscribe

IBM Software Group Presentation Template 

 

 
 
Tags:  software management  db2 zos  db2 connect  db2 performance tuning 
Views:  117
Published:  April 13, 2012
 
0
download

Share plick with friends Share
save to favorite
Report Abuse Report Abuse
 
Related Plicks
RW NOSQL 2013 Presentaton v1

RW NOSQL 2013 Presentaton v1

From: ronwarshawsky
Views: 21 Comments: 0
MongoDB performance tuning and load testing
 
Cadillac Performance Chips Tuning - Upgrade Your Car

Cadillac Performance Chips Tuning - Upgrade Your Car

From: cement4reward
Views: 12 Comments: 0
Cadillac Performance Chips Tuning - Upgrade Your Car
 
Choosing a Diesel Tuning Chip (ECU Remap) Or Diesel Tuning Box.20130506.042225

Choosing a Diesel Tuning Chip (ECU Remap) Or Diesel Tuning Box.20130506.042225

From: bamboo9sauce
Views: 157 Comments: 0

 
See all 
 
More from this user
M Tech Projects

M Tech Projects

From: beall
Views: 1200
Comments: 0

Hyderabad institute of technology & management

Hyderabad institute of technology & management

From: beall
Views: 513
Comments: 0

energy future holindings  TCEH10QSep2008_Fina l

energy future holindings TCEH10QSep2008_Final

From: beall
Views: 408
Comments: 0

Buy cymbalta online at and buy generic cymbalta

Buy cymbalta online at and buy generic cymbalta

From: beall
Views: 391
Comments: 0

XMPP For Cloud Computing

XMPP For Cloud Computing

From: beall
Views: 1730
Comments: 0

Deriving Key Insights From Blue Martini Business Intelligence

Deriving Key Insights From Blue Martini Business Intelligence

From: beall
Views: 801
Comments: 0

See all 
 
 
 URL:          AddThis Social Bookmark Button
Embed Thin Player: (fits in most blogs)
Embed Full Player :
 
 

Name

Email (will NOT be shown to other users)

 

 
 
Comments: (watch)
 
 
Notes:
 
Slide 1: ® Information Management Data Warehousing with DB2 for z/OS Paul Wirth wirthp@us.ibm.com IBM Software Group © 2005 IBM Corporation
Slide 2: Information Management Agenda  Refocus on Business Intelligence and Data Warehousing  When DB2 for z/OS?  Reference Architecture and Components  Best Practices / Benchmarking 2
Slide 3: Information Management The terminology as used in this presentation  Business Intelligence (BI) and Data Warehousing (DWH) are sometimes used interchangeably BI includes end user tools for query, reporting, analysis, dashboarding etc. Both concepts depend on each other  BI almost always assumes a Data Warehouse (DWH), Operational Data Store (ODS) or Data Mart (DM) exists with timely, trusted information – An ODS is a subject oriented database organized by business area. – up to date (vs. historical) – detailed (vs. summarized). – A DM is a database designed to support the analysis of a particular business subject area. – data has usually been transformed and aggregated from the source DWH or operational system – could be relational, multidimensional (OLAP) or statistical in nature.  A DWH depends on end user tools that turn data into information.  Both terms (DWH and BI) address desire for timely, accurate, available data delivered when, where and how the end users want it. 3
Slide 4: Information Management Traditional Topology Operational systems Extract, transform, load Operational data store metadata Enterprise Data Warehouse Dependent Data Marts Independent Data Mart ETL metadata Line of Business Data Marts 4
Slide 5: Information Management DB2 for z/OS: “Back into the ring”  July 2006 Announcement  www.ibm.com/common/ssi/rep_ca/1/897/ENUS 206-181/ENUS206-181.PDF  Announces availability of products to support the mission  Time is right, shift in customer requirements  “Real-Time” data access  Service Level Agreements to match Operational  Operational BI and Embedded Analytics     Deliver BI to customer facing humans and applications (broad audience) Integrate BI Components with Operational Systems & Information Portals Massive number of queries Queries semi-aggregated data, i.e. data aggregated at a low level 1985 Executive Manager Analyst Customer Service Customers < 50 > 10,000 2007 5
Slide 6: Information Management Operational Business Intelligence and Embedded Content – Customer Service View Operational Content (red) 6
Slide 7: Information Management Operational Business Intelligence and Embedded Content – a Customer Service View Embedded BI Object (blue) 7
Slide 8: Information Management Operational Business Intelligence and Embedded Content – a Customer Self Service View Operational Content alone delivers just data to information consumers. Operational Content (red) 8
Slide 9: Information Management Operational Business Intelligence and Embedded Content – a Customer Self Service View Business Intelligence content puts the data into perspective. Embedded BI Object (blue) 9
Slide 10: Information Management Dynamic Warehousing A New Approach to Leveraging Information Information On Demand to Optimize Real-Time Processes Dynamic Warehousing OLAP & Data Mining to Understand Why and Recommend Future Action Traditional Data Warehousing Query & Reporting to Understand What Happened 10
Slide 11: Information Management DB2 for z/OS features that support DWHing  64-Bit Addressability  2000 byte index keys  MQT’s  Multi-Row Operations  225 way table joins  In-Memory Workfiles  Automatic Space Allocation  Partition by growth  Index Compression  Dynamic index ANDing  New row internal structure for faster VARCHAR processing  Fast delete of all the rows in a partition  Deleting first n rows  Non-Uniform Distribution Statistics on  Skipping uncommitted Non-Indexed Columns inserted/updated qualifying rows  Parallel Sorting  Etc.  Data Partitioned Secondary Indexes  2MB SQL Statements  Etc. A whitepaper can be downloaded from: http://www.ibm.com/software/swlibrary/en_US/detail/A016040Z53841K98.html 11
Slide 12: Information Management When should I consider… …a Data Warehouse on DB2 for z/OS (zDWH)?   Data requires highest levels of availability, resiliency, security, and recovery Need true real-time Operational Data Store (ODS)  Operational data is on System z  ODS must virtually be in sync with the operational data Embedded Analytics and Operational Business Intelligence OLTP data on System z  Keep existing data marts/warehouses on System z  Consolidate data marts/warehouse on System z  Implement an enterprise data warehouse (EDW) SAP Business Warehouse when SAP R/3 is on System z Want to leverage & optimize existing System z skills and investment to service the mixed workload environment 12    
Slide 13: Information Management DWH Solution Architecture using DB2 for z/OS DB2 Data Sharing Group DB2 Data Sharing Group Member A Member B OLTP Member C CEC One CF OLTP Member D CEC Two DWH DWH CF CEC One CEC Two Within a data sharing environment, the data warehouse can reside in the same group as the transactional data. 13
Slide 14: Information Management DWH Solution Architecture using DB2 for z/OS Transactional Applications Centralized and Consolidated OLTPSubsy stem MEMBER A OLTPSubsy stem MEMBER B OLTPSubsy stem MEMBER C OLTP Location Alias Transactional and Warehouse data in one system All members see all data, but each member is optimized for a particular workload Location aliases for transaction routing Resources WLM / I RD OLAPSubsy stem MEMBER D OLAPSubsy stem MEMBER E OLAP Location Alias Shared Resources managed by Workload Manager (WLM) and the Intelligent Resource Director (IRD) Single subsystem option for non-data sharing environments Analy tical Applications 14
Slide 15: Information Management Initial load of the Data Warehouse OLTP OLTP Member C Member D DWH DWH CEC One CF CEC Two Extract JDBC/ODBC Legacy data sources are Distributed data sources Data Sources The data is extracted from the OLTP informationDistributed data -sharing integratedof the through the Classic - Integration are directly integrated group, transformed by DataStage and then loadedServer. This way data Federation into the data through the IBM warehouse tables again. is extracted from DataStage Information server. VSAM, IMS Software AG CA CA directlyDB2 UDBof the data sources out SQL Server Oracle IAM & Adabas Datacom IDMS for LUW like IMS, VSAM… sequential Classic Federation Server z/OS Extract JDBC/ODBC DataStage Parallel Engine Member A Member B ETL is done, by using an ETL Accelerator which Extract in it’s current implementation is represented by DataStage (IBM Information Server) on a secondary system. Until end of 2007, pSeries and Load xSeries are the only supported systems. ETL Accelerator Later on the ETL Accelerator will move to zLinux pSeries as soon as the Information Server is available xSeries there. zLinux 15
Slide 16: Information Management Access to Data Outside of DB2 for z/OS WebSphereClassicFederation Relational Query Processor SQL via JDBC / ODBC Integrate data into the DWH DataStage ETL Server The ETL s erver does a read access on the legacy data sources via SQL just as to any other relational database through the ODBC driver. The data is then stored in staging tables or already aggregated in the DB2 warehous e database. DataServer MetaData Classic Data Architect All data access appears relational Mapping with Classic Data Architect ETL Server access the data via ODBC / JDBC Connector Connector Connector Connector DB2 Data Warehouse IMS VS AM AD AB AS IDMS Mapping information from legacy data source to relational model, updated through DataServer by Classic Data Architect. 16
Slide 17: Information Management Incremental Updates at Runtime WebSphere ClassicData EventPublisher Relational Query Process or SQL via JDBC There is m ore than one queue set up for the dis tribution of the event. The Event Publisher also makes use of an administrative queue used for synchronization and a recovery queue used of the logger exit failed. Change Capture Agent detects changes in data sources Correlation Service maps the data Distribution Service and Publisher send the data to the ETL Server via MQ DataStage reads from MQ and stores the changes in the warehouse with low latency Optionally stack up in staging tables for batch window DataServer MetaData Classic Data Architect Publisher WebSphere MQ DataStage ETL Server DataStage is able to read the change events directly from WebSphere MQ and stores the changes within the DB2 data warehouse. Distribution Service DB2 Data Warehouse Correlation Service Legacy Data Source Logger Exit Change Capture Agent Updating Application This is any application which is updating the legacy (IMS, VSAM, IDMS, ADABAS...) data s ource. 17
Slide 18: Information Management In Database ETL is triggered by DataStage Member A Member B OLTP OLTP ETL Member C Member D SQL ETL Accelerator pSeries xSeries zLinux DWH DWH CEC One CF CEC Two Simple example: Wherever possible, “In Database” transformations (ELT) are used to spare the transport of the by to the accelerator. But the -- Aggregate by salary datadepartment into AGGRSALARYused SQL is still sent from the ETL Accelerator to the database to have one place of INSERT INTO AGGRSALARY (ETL steps. AVGBAND, AVGSALARY ) documentation for all DEPTCODE, SELECT DEPTCODE, AVG( BAND ) AS AVGBAND, AVG( SALARY ) AS AVGSALARY This can also be used to shift the data up the hierarchy within the Layered FROM STAFF Data Architecutre. GROUP BY DEPTCODE 18
Slide 19: Information Management Information Server Cleanse Extract / Transform Data Transformation and Cleansing All data can be deployed as a SOA Services and fed to the Information Server QualityStage™ DataStage ® 19
Slide 20: Information Management Enterprise Data Warehouse using DB2 for z/OS The Complete Solution Architecture z/OS BI Query Workload Data warehouse, Transf ormation andDeploy ment zLinux/xLinux/AIX.. . Analy tics and Reporting DB2 Data Sharing Group I nsert/Update Extract /Query Inf ormation Serv er DataStage Quality Stage MetaData SOA Serv er WebSphere Classic Federation Federation and ChangeCapture Legacy DataSource Legacy DataSource ODBC WebSphere Classic Data Ev ent Publisher WebSphere MQ 20
Slide 21: Information Management Enterprise Data Warehouse using DB2 for z/OS The Complete Solution Architecture – Zoom Out Transactional Applications OLTPSubsy stem MEMBER A OLTPSubsy stem MEMBER B OLTPSubsy stem MEMBER C Resources WLM / I RD OLTP Location Alias OLAP Location Alias z/OS BI Query Workload Data warehouse, Transf ormation andDeploy ment zLinux/xLinux/AIX... Analy tics and Reporting OLAPSubsy stem MEMBER D OLAPSubsy stem MEMBER E AlphaBlox DataQuant Analy tical Applications DB2 Data Sharing Group Insert/Update Extract/Query Inf ormation Serv er DataStage Quality Stage QM F WebSphereClassicFederation Rel ational Query Proces s or SQL via JD BC / ODBC MetaData Class ic D ata Architect DataStage ETL Server DataServer MetaD ata Connector Connector Connector Connector DB2 D ata Warehouse Legacy DataSource Legacy DataSource Federation and ChangeCapture The ETL s erver does a read acces s on the legacy data s ources via SQL jus t as to any other relational databas e through the ODBC driver. The data is then s tored in s taging tables or already aggregated in the DB2 warehous e database. SOA Serv er WebSphere Classic Federation ODBC IMS VS AM AD AB AS IDMS Mapping information from legacy data s ource to relational model, updated through DataServer by Class ic Data Architect. WebSphere Classic Data Ev ent Publisher WebSphere MQ WebSphere Classic Data EventPublisher Relational Query Proces s or SQL via JDBC There is more than one queue s et up for the dis tribution of the event. The Event Publis her also makes use of an administrative queue us ed for s ynchronization and a recovery queue used of the logger exit failed. DataServer MetaData Classic Data Architect Cleanse QualityStage™ Transform Publisher WebSphere MQ DataStage ETL Server DataStage is able to read the change events directly from WebSphere MQ and stores the changes within the DB2 data warehouse. Dis tribution Service DB2 Data Warehouse Correlation Service DataStage® Legacy Data Source Logger Exit Change Capture Agent Updating Application This is any application which is updating the legacy (IMS, VSAM, IDMS, ADABAS...) data source. 21
Slide 22: Information Management Alternative Architectures (Reporting & Analytics)  “Pure” System z BI Solution from a Data Perspective  ODS, DWH, DMs in DB2 z/OS  End User Tools (e.g. QMF, DataQuant, Business Objects, Cognos) access DB2 z/OS directly (fat client implementation) or via browser (web server implementation)  Reporting solution may run on distributed WAS, e.g. Alphablox, QMF, DataQuant, Cognos ReportNet, Business Objects Server  “Hybrid” BI Solution from a Data Perspective  ODS & Data Warehouse in DB2 z/OS  Relational, Multidimensional (OLAP) and Statistical Datamarts on System p and/or System x supporting End User Tools, e.g. DB2 DWE, Hyperion Essbase, Cognos PowerPlay 22
Slide 23: Information Management Best Practices Accurate requirements for solution right-fit Demonstration and POC systems Boblingen Lab BI CoC / Teraplex Center Equivalent Papers in progress Work with IBM 23
Slide 24: Information Management Capacity Planning Critical Elements Number of Users Amount of Data Size and Complexity of Query Workload Growth in Maintenance Workload Critical System Resources for a Balanced System CPU Central Storage I/O Channels Controllers (storage directors, cache, non-volatile storage) and disk Parallel Sysplex / Coupling Facility Resources (links, CPU, storage). 24
Slide 25: Information Management DB2 Sizing Tool  Model based on workloads that were run in IBM Lab environments  Continually refined  Classify queries into 5 categories  Provide query category definitions  Specify % of data “touched” in each query category  Request size of data warehouse  Compute Large Systems Perfromance Reference (LSPR) ratios for data warehouse workload  Compute required capacity including zIIP offload percentage  Alternate method - build a prototype and profile your own workload  Consider starting small and growing incrementally (benefit of System z DWH environment) Query Profiles Average # of Queries Type of Queries Trivial Online Output Processor …… CPU% Complex Online zIIP Offload% Complex AdHoc 25
Slide 26: Information Management IBM zIIP leveraged by DWH workloads 1. Business Intelligence applications via DRDA® over a TCP/IP connection 2. Complex Parallel Queries  Star schema parallel queries (available June 30, 2006)  All other parallel queries (available July 31, 2006) 3. DB2 Utilities for Index Maintenance zIIP Specialty Engine http://www.ibm.com/systems/z/ziip 26
Slide 27: Information Management What is Star Schema?  Star schema = a relational database schema for representing multidimensional data  Sometimes graphically represented as a ‘star’ Data stored in a central fact table Surrounded by additional dimension tables holding information on each perspective of the data Example: store "facts" of the sale (units sold, price, ..) with product, time, customer, and store keys in a central fact table. Store full descriptive detail for each keys in surrounding dimension tables. This allows you to avoid redundantly storing this information (such as product description) for each individual transaction  Complex star schema parallel queries include the acts of joining several dimensions of a star schema data set (like promotion vs. product).  If the workload uses DB2 for z/OS V8+ to join star schemas, then portions of that DB2 workload will be eligible to be redirected to the zIIP. 27
Slide 28: Information Management zIIP Exploitation – DB2 Complex Parallel Query Activity Extend beyond the Star Schema… 28
Slide 29: Information Management Focus on Star Schema  Star schema workloads may benefit from two redirected tasks 1. ‘Main’ task = the DRDA request  If the request is coming in via DRDA via TCP/IP it can take advantage of the DRDA use of zIIP, just like any other network attached Enterprise Application. 2. ‘Child’ task = the star schema parallel queries  If the business intelligence and data warehousing application uses star schemas, then a significant amount of this task (star schema) processing is eligible to be redirected to the ziip.  The child (star schema) & main tasks (coming in through DRDA via TCP/IP) are additive.  Combining the child and the main tasks is expected to yield a larger amount of redirect than that of just DRDA via TCP/IP alone.  Longer running queries see higher benefit.  Benefits to a data warehousing application may vary significantly depending on the details of that application. 29
Slide 30: Information Management BI Distributed with parallel complex query Complex star schema parallel queries via DRDA over a TCP/IP connection will have portions of this work directed to the zIIP CP High utilization DB2/DRDA/StSch DB2/DRDA/StSch DB2/DRDA/StSch DB2/DRDA/StSch DB2/DRDA/StSch DB2/DRDA DB2/DRDA DB2/DRDA DB2/DRDA DB2/DRDA DB2/Batch DB2/DRDA/StSch CP zIIP BI Application TCP/IP (via Network or HiperSockets ™) Reduced utilization DB2/DRDA/StSch DB2/DRDA DB2/DRDA DB2/DRDA DB2/Batch DB2/DRDA/StSch DB2/DRDA/StSch DB2/DRDA/StSch DB2/DRDA DB2/DRDA Portions of eligible DB2 enclave SRB workload executed on zIIP For illustrative purposes only. Single application only. Actual workload redirects may vary depending on how long the queries run, how much parallelism is used, and the number of zIIPs and CPs employed 30
Slide 31: Information Management Before zIIP PTFs 1. 2. 3. Query enters system Sliced into Parallel tasks, classified via WLM Each parallel task of the query accumulates Service Units, with the total aggregated across all tasks to determine when to invoke period switch. Query requires 50,000 SUs to complete WLM definition: Period 1 - Importance 2, 1000 SUs Period 2 - Importance 4, 4000 SUs WLM/DB2 Period 3 - Importance 5, “the rest” GCP GCP GCP GCP GCP Assuming even distribution/usage across all 5 CPs: •After 200 SUs on each CP (total 1000 for query), move tasks to Period 2 •After 800 more SUs on each CP (overall total of 5000 for query), move tasks to Period 3 31
Slide 32: Information Management After PTFs – zIIP redirect execution 1. 2. 3. Query enters system Sliced into Parallel tasks, classified via WLM Each parallel task of the query accumulates Service Units, with period switch determined for each task, not the overall query. Same WLM definition Query requires 50,000 SUs to complete Period 1 - Importance 2, 1000 SUs Period 2 - Importance 4, 4000 SUs Period 3 - Importance 5, “the rest” WLM/DB2 • After a task gets 1000 SUs on CP/zIIP, move it to Period 2 (total of 5000 SUs for overall query) •After 4000 more SUs on CP/zIIP, move the task to Period 3 (accumulated total of 25,000 SUs for overall query) If query uses 5 parallel tasks and assume no zIIPonCP time: • 2680 SUs run on CP in Period 1 • 2320 SUs run on zIIP in Period 1 • 4000 SUs run on CP in Period 2 • 16000 SUs run on zIIP in Period 2 • 5000 SUs run on CP in “the rest” • 20000 SUs run on zIIP in “the rest” GCP GCP GCP zIIP zIIP 32
Slide 33: Information Management zIIP redirect  zIIP processors offer significant hardware and software savings.  Number of zIIP processors can not be more than the number of general processors in a physical server. However, an LPAR can be configured to contain more zIIPs than general processors.  A percentage of parallel task activities are eligible to run on zIIP. Actual offload percentage depends on:  Ratio of parallel to non-parallel work  Thresholds  Available zIIP capacity  RMF and OMEGAMON reports provide projection of offload percentage prior to installation of zIIP processors, but actual offload will probably be slightly lower.  z/OS dispatcher algorithm  Benchmark and internal workloads indicates offload between 50% and 80% with a typical mix of queries 33
Slide 34: Information Management zIIP Experiences 12 10 8 6 4 2 0 2 Gen CPs 2 Gen CPs/w/Parallel 2 CPs/1 zIIP The first bar represents the query processing execution time, without using parallel processing. The second bar represents the same query workload when using parallelism on general processors. The last bar represents the query execution time when leveraging a zIIP engine to complete the processing. The dark bars represent the processing cycles consumed by the query workload on the general processing engine, and the blue colored bars reflect the processing that was redirected to the available zIIP engine. 10 8 6 4 2 0 2 CPs 1CP/1 zIIP 2 CPs/1 zIIP zIIP Gen Proc 34
Slide 35: Information Management Speaking of parallelism… … is DB2 for z/OS a “Parallel Database”?  Query I/O parallelism  Manages concurrent I/O requests for a single query, fetching pages into the buffer pool in parallel. This processing can significantly improve the performance of I/O-bound queries.  Query CP parallelism  Enables true multitasking within a query. A large query can be broken into multiple smaller queries. These smaller queries run simultaneously on multiple processors accessing data in parallel reducing the elapsed time for each query. Starting with DB2 V8, the parallel queries exploit zIIPs when they are available on the system thus reducing the costs.  Sysplex query parallelism  To further expand the processing capacity available for processor-intensive queries, DB2 can split a large query across different DB2 members in a data sharing group, known as Sysplex query parallelism. 35
Slide 36: Information Management DB2 for z/OS Parallelism – Another Graphic V3 V4 I/O Parallelism Single Execution Unit Multiple I/O streams CP Parallelism Multiple Execution Units Each has a single I/O stream TCB (originating task) SRBs (parallel tasks) V5 Proce s sor I/O M em ory Channe l Sysplex Query Parallelism Proce ss or I/O Channe M em ory l Sysplex Query Parallelism Parallel tasks spread across Parallel Sysplex Proce ss or I/O M e m ory Channel 36
Slide 37: Information Management Parallel Degree Determination DB2 for z/OS has the flexibility to choose the degree of parallelism Number of I/O CPU that qualify Number of Processor speed CP CP CP CP CP CP Parallel Degree Determination Engine Optimal Degree Data skew Parallel Task #1 Parallel Task #2 Parallel Task #3 Degree determination is done by the DBMS -- not the DBA Using statistics and costs of the query to provide the optimal access path with low overhead taking data skew into consideration 37
Slide 38: Information Management Trust with Limits – Set the Max. Degree of Parallelism ... Set the maximum degree between the # of CPs and the # of partitions CPU intensive queries - closer to the # of CPs CP 0 CP 1 CP 2 I/O intensive queries - closer to the # of partitions Data skew can reduce # of degrees 38
Slide 39: Information Management DB2 for z/OS Partitioning (Range)  Partitioning in DB2 for z/OS V8 and beyond is defined at the table level  Maximum of 4096 partitions  DB2 can generate a maximum of 254 parallel operations  Effectively cluster by two dimensions  Partition by Growth – DB2 9 feature that relates (in a way) to hash Secondary Index -- partitioned like the underlying data (DPSI) ed FEB MAR JUN SEP OCT NOV NH NJ NY 404, FEB, IN 403, MAR, FL 405, JUN, MA 401, SEP, NY 406, OCT, NH 402, NOV, IA JAN FEB MAR APR NOV DEC JUN JUL SEP NOV MO MS NC 306, JUN, NH 302, JUL, MD 305, SEP, CT 301, NOV, LA JAN FEB JUL AUG OCT DEC APR MAY MD MA LA MI 304, APR, MS 303, MAY,MN Partitioned table 205, JAN, AL 206, FEB, NC 201, JUL, NJ 202, AUG, FL 203, OCT, IA 204, DEC, MD 106, JAN, KY 101, FEB, DE 102, MAR,MO 105, APR, CT 104, NOV, IL 103, DEC, MI DE CT AL FL IA Secondary Index -- non-partitioned (NPSI) ed 39 MN IN IL KY
Slide 40: Information Management Partition by Time    Each partition holds data for a certain period  days, weeks, months, years etc. Possible data skew due to seasonal factors Ease of operations  Enable rolling off old data at regular intervals  Back up latest data only  Coexistence of data refresh and queries load into current period Jan 2007 ... Feb March 2007 2007 ... Oct Nov 2007 2007 Dec 2007 ... empty 40
Slide 41: Information Management Partition by Time - Advantages     Queries consume less resources  DB2 uses partition scanning vs scanning entire table Consistent query response times over time  Adding history to database does not affect query ET Potentially smaller degrees of parallelism Data Rolling – Alter Table Rotate Partition First to Last 2003 e hiv ta da 2004 2005 2006 2007 c ar After Data Rolling 2004 2005 2006 2007 2008 empty 41
Slide 42: Information Management Data Compression  DB2 compression should be strongly considered in a data warehousing environment.  Savings is generally 50% or more.  Data warehousing queries are dominated by sequential prefetch accesses, which benefit from DB2 compression.  Newer generation of z processors implement instructions directly in circuits, yielding low single digits of overhead. Index access only - no overhead.  More tables can be pinned in buffer pools.  Index compression supported in V9 42
Slide 43: Information Management Hardware-assisted data compression 100 Compressed Ratio 80 60 53% 46% 61% 40 Compression Ratios Achieved Non-Compressed Compressed 60% 20 0 Compressed Non-compressed Elapsed time (sec) I/O Wait CPU I/O Wait Compress CPU Overhead CPU 283 354 412 281 Effects of Compression on Elapsed Time 200 157 I/O Inte nsive CPU Intensive 43
Slide 44: Information Management DB2 Compression Study 5.3 TB 1.6 TB indexes, system, work, etc. uncompressed raw data 1.3 TB DB2 hardware compression 6.9 TB (75+% data space savings) 1.3 TB 1.6 TB 2.9 TB (( (58% disk savings) 44
Slide 45: Information Management Workload Management  Traditional workload management approach: Screen queries before they start execution  Time consuming for DBAs.  Not always possible. Some large queries slip through the crack.  Running these queries degrades system performance.  Cancellation of the queries wastes CPU cycles. 45
Slide 46: Information Management Workload Management Think about this: Large query submitter behavior: Short query submitter behavior: The ideal workload manager policy for data warehousing: Consistent favoring of shorter running work........ through WLM period aging expect answer later keep em short ** no need to pre-identify shorts either expect answer now with select favoring of critical business users Who's impacted more real time ?? through WLM explicit prioritization of critical users Priority Business Importance Period Ageing Short Long High Medium Lo Business Importance w Q ue ry Ty pe Medium 46
Slide 47: Information Management Query Monopolization Work qu eu e s dU En er s Processors 47
Slide 48: Information Management Inconsistent Response Times for Short Running Queries Response Time Workload Activity Tuesday Monday 5 secs. 5 mins! 48
Slide 49: Information Management Workload Management Workload Manager Overview Service Class Crit ic al Ad hoc IWEB DB2 DDF JES2 OMVS STC es s in bu s je c ob t iv es I nt elligent Miner Warehouse Refres h WLM Rules mo nit or ing Report Class Market ing S ales Headquart ers Tes t 49
Slide 50: Information Management Service Classification Period 1 2 3 4 5 Type text Duration Performance Goal Importance 5,000 Velocity = 80 2 50,000 Velocity = 60 1,000,000 Velocity = 40 30,000,000 Velocity = 30 Discretionary CPU Usage 2 3 4 50
Slide 51: Information Management Query Monopolization Solution 50 Concurrent Users Killer Queries 500 Runaway queries cannot monopolize system resources +4 killer queries 400 aged to low priority class period 1 2 3 duration 5000 50000 1000000 10000000 disc retionary v elocity 80 60 40 30 importance 2 2 3 4 300 192 195 4 5 +4 killer queries 200 162 155 101 95 48 46 100 base 0 Avg response time in seconds trivial 12 10 small 174 181 medium 456 461 large 1468 1726 51
Slide 52: Information Management Consistent Response Time Consistent Response Time for Short-running work 120 Avg query ET in seconds 100 80 60 40 20 0 20 u sers T ri vi al S m al l M ed i u m L arg e 50 u sers 100 u sers 200 u sers 52
Slide 53: Information Management High Priority Queries Service Classification Service Class = QUERY Period Type text 1 2 3 4 5,000 20,000 1,000,000 Duration 1 Duration Performance Goal Velocity = 80 Importance 1 2 3 Velocity = 70 Velocity = 50 Velocity = 30 Performance Goal Velocity = 90% CPU Usage Service Class = CRITICAL Period Importance 1 53
Slide 54: Information Management Dynamic Warehouses Service Classification WLM POLICY = BASE Service Period Class QUERY 1 2 3 4 Importance Performance Goal 2 Velocity = 80 3 Velocity = 60 4 Velocity = 40 5 Velocity = 20 Discretionary WLM POLICY = REFRESH Service Period Class QUERY 1 2 3 4 Importance Performance Goal 2 Velocity = 80 3 Velocity = 60 4 Velocity = 40 5 Velocity = 20 Discretionary Insert/ UTIL 1 1 Type text Insert /UTIL 1 5 Velocity = 10 Velocity = 80 54
Slide 55: Information Management Planning for DASD Storage and I/O Bandwidth Rules of Thumb offered by IBM include allowances for: Indexes Tables Free space Work files Active and archive logs DB2 directory and catalog Temporary tables MQTs Balance bandwidth with available processing power ROTs available based on current processor ratings 55
Slide 56: Information Management Implementation Carefully plan DB2 data set placement Balance I/O activity among different volumes, control units, and channels to minimize I/O elapsed time and I/O queuing Use DFSMS Example of Sort Workfile Distribution with 4 DSG Members Storage SWF A.1 SWF B.1 SWF C.1 SWF D.1 SWF A.2 SWF B.2 SWF C.2 SWF D.2 SWF A.3 SWF B.3 SWF C.3 SWF D.3 SWF A.4 SWF B.4 SWF C.4 SWF D.4 SWF A.5 SWF B.5 SWF C.5 SWF D.5 Sort Work Files – Large, Many, Spread Dynamic Parallel Access Volumes (PAV) Multiple Addresses / Unit Control Blocks versus just multiple paths / channels CEC One CEC Two Vol1 Member A LPAR1.1 CF Member C LPAR2.1 Vol2 Vol3 Vol4 Member B LPAR1.2 Member D LPAR2.2 Vol5 33 Modified Indirect Access Word (MIDAW) Increased channel throughput VSAM Data Striping Partition 56
Slide 57: Information Management Balanced I/O Configuration  Ran tests to show how fast a zSeries processor can scan data with different degrees of SQL complexity.  Last tests were based on z900 processors  Projected System z9 scan rates based on LSPR ratio  Will determine System z9 scan rates based on benchmark results (analysis not done yet)  IBM Storage provides tools to your IBM team to project bandwidth of DS8000 subsystems.  Balance CPU and I/O configurations by matching up scan rates. 57
Slide 58: Information Management SMS Implementation  Use DB2 Storage groups (to define and manage data sets) in conjunction with SMS Storage Classes / Groups (for data set allocation)  Spread tables, indexes and work files  Simple ACS Routines using HLQs to direct datasets to appropriate Storage Class / Storage Group  Storage Groups requiring maximum concurrent bandwidth should consist of addresses spread across the I/O configuration  For Storage Classes, set Initial Access Response Time (IART) parameter to nonzero so SMS will use internal randomization algorithm for final volume selection Tables Indexes Work files ... 58
Slide 59: Information Management Planning for Central Storage Virtual (see also Installation Guide) Category Your size Default [KB] 33 600 104 000 2 000 8 000 17 928 30 000 55 800 =252 328 1 160 Code storage size Working storage size 0.5 1.0 Sort pool size RID pool size Data set control block storage size 0.5 0.5 0.6 Real Category EDM pool storage size Buffer pool size Factor 1.0 1.0 EDM pool storage size Buffer pool size Sort pool size RID pool size Data set control block storage size Code storage size Working storage size Total main storage size (above 16-MB line) Region size (below 16-MB line) (assume SWA above the line) ? ? ? ? ? 30 000 ? ? 59
Slide 60: Information Management Large Memory Configurations  Up to 512 GB of central memory for a single z9 server.  Benchmark workload testing shows improved performance for certain queries  Higher buffer pool hit ratios  Reduced number of I/O  Reduced CPU consumption  Optimal performance requires a good understanding of the query workload and database design 60
Slide 61: Information Management Noteworthy zPARMs in a Data Warehouse Environment Recommendation for DWH CDSSRDEF=ANY, Comments ANY Allow parallelism for DW. any: parallelism, 1: no parallelism NO For best performance, specify NO for this parameter. To resolve storage constraints in DBM1 address space, specify YES. See also: MINSTOR YES The DB2-managed data set has a VSAM control interval that corresponds to the buffer pool that is used for the table space. YES Secondary extent allocations for DB2-managed data sets are to be sized according to a sliding scale NO Recommendation: For best performance, specify NO for this parameter. To resolve storage constraints in DBM1 address space, specify YES. See also: CONTSTOR CONTSTOR=NO, DSVCI=YES, MGEXTSZ=YES, MINSTOR=NO, OPTCCOS4=ON, OPTIXIO=ON enables fix PK26760 (inefficient access plan) OPTIXIO=ON: Provides stable I/O costing w ith significantly less sensitivity to buffer pool sizes. (This is the new default and recommended setting). OPTIORC=ON – explanation??? #Processors <= X <= 2*#Processors If concurrency level is low , the ratio can be higher. 8000 (means 8 MB Sort Pool) DISABLE, unless SAP BW on z/OS used Default is 15. If changed, set MXQDC=TABLES_JOINED_THRESHOLD*(2**N)-1 OPTIORC=ON PARAMDEG=X, SRTPOOL=8000, STARJOIN=DISABLE, MXQDC=15 61
Slide 62: Information Management Why DWH on System z?  Qualities of Service  Superior Quality  Super Availability  Security and Regulatory Compliance  Scalability  Backup and recovery  Positioned for the future  Web-based applications  XML support  Service Oriented Architecture (SOA)  Better leverage System z skills and investment  Operational data and the ODS together means  Reduced complexity  Reduced cost  Shared processes, tools, procedures  Streamlined compliance and security  zIIP specialty engine improves TCO 62
Slide 63: Information Management Suggested reading list (The Classics)  DB2 for OS/390 Capacity Planning, SG24-2244  Capacity Planning for Business Intelligence Applications: Approaches and Methodologies, SG24-5689  Building VLDB for BI Applications on OS/390: Case Study Experiences, SG24-5609  Business Intelligence Architecture on S/390 Presentation Guide, SG24-5641  e-Business Intelligence: Data Mart Solutions with DB2 for Linux on zSeries, SG24-6294  www.redbooks.ibm.com 63
Slide 64: Information Management New (sort of in some cases) Releases  Best Practices for SAP Business Information Warehouse on DB2 for z/OS V8, SG24-6489  DB2 UDB for z/OS: Design Guidelines for High Performance and Availability, SG24-7134  Business Performance Management . . . Meets Business Intelligence, SG24-6340  Preparing for DB2 Near-Realtime Business Intelligence, SG24-6071  Disk storage access with DB2 for z/OS, REDP-4187  How does the MIDAW facility improve the performance of FICON channels using DB2 and other workloads?, REDP-4201  Index Compression with DB2 9 for z/OS, REDP-4345  System Programmer’s Guide To: Workload Manager, SG24-6472  www.redbooks.ibm.com 64
Slide 65: Information Management What is DataQuant?  DataQuant provides a comprehensive query, reporting and data visualization platform for both web and workstation-based environments.  DataQuant introduces a wide variety of powerful business intelligence capabilities, from executive dashboards and interactive visual applications to information-rich graphical reports and ad-hoc querying and data analysis.  DataQuant provides 2 components  DataQuant for Workstation – An Elipse based environment for the development of query, report and dashboard solutions  DataQuant for WebSphere – A runtime environment capable of displaying DataQuant content using a “thin client” model 65
Slide 66: Information Management What is Alphablox?    Platform for Customized Analytic Applications and Inline Analytics Pre-built components (Blox) for analytic functionality Allows you to create customized analytic components that are embedded into existing business processes and web applications 66

   
Time on Slide Time on Plick
Slides per Visit Slide Views Views by Location