Showing posts with label Informatica. Show all posts
Showing posts with label Informatica. Show all posts

Friday, June 24, 2011

DAC Reporting in OBIEE

The purpose of this blog is to show how to consume data in the DAC repository via OBIEE reports.

What is DAC?

Oracle Business Intelligence Data Warehouse Administration Console (Oracle DAC) provides a centralized console for schema management, configuration, administration, loading, and monitoring of the Oracle Business Analytics Warehouse. The Data warehouse Application Console (DAC) is used to create and schedule Informatica ETL workflows. It manages the following data warehouse load processes:
  • Dynamic generation of subject areas and execution plans
  • Dynamic settings for parallelism and load balancing
  • Intelligent task queue engine based on user-defined and computed scores
  • Index management for ETL and query performance
  • Embedded high-performance OTLP change capture techniques
  • Restart from any point of failure
DAC / ETL Statistics
The metadata for ETL is stored in the DAC repository that is composed of approximately 120 tables. These tables store the ETL metadata such as tasks, task steps, execution plan details, connection details, run history, etc.
In this exercise we are only interested in the run history table - W_ETL_DEFN_RUN. This table has the ETL execution details for every Execution plan that was run through DAC (see screenshot below).

We will be walking you through the steps for creating a simple solution that will allow you to analyze the metadata in W_ETL_DEFN_RUN in OBIEE.
RPD
Below is a simple RPD model that is based on the single W_ETL_DEFN_RUN table from the DAC repository.
Reports
Using the subject area (Statistics – ETL) shown above, you can build useful reports to analyze the execution history (for each execution plan type).
1. Current month summary view with duration in minutes & hours.
2. Snapshot of run duration for each month.
Notice the failed steps in orange below.
3. Report showing Error / Status description with number of steps – Total, Failed, and Success.
4. Compare the ETL for large duration (2010 and 2011).
By adding other repository tables into your RPD you will be able to develop more sophisticated reports. You can even have error reports delivered to you with iBots whenever ETL errors occur.

Friday, June 17, 2011

Informatica - Beyond ETL

Informatica has long been one of the business intelligence / data warehouse industry leaders in data integration. Its PowerCenter and PowereExchange products have consistently been positioned in the leader quadrant for data integration by Gartner. However, Informatica is not resting on its laurels. Since 2009 Informatica has gone on an acquisitions rampage buying up best in breed companies with similar data-centric DNA while at the same time continuing to innovate and broaden its base products. The Informatica Platform now enables you to integrate data within the traditional enterprise (Enterprise Data Integration), control data in the clouds (cloud data integration), share data with your partners (B2B Data Exchange), archive data for efficiency and eDiscovery (Application ILM), and deliver data at zero latency (Ultra Messaging). To unlock the business value of all this data, the platform enables you to cleanse and gain trustworthy data (Data quality), to detect and act on operational intelligence (Complex Event Processing) and gain a 360 view of your strategic, authoritative master data assets (Master Data Management).

Informatica’s CEO stated that “only Informatica delivers the comprehensive, unified and open data integration platform to empower the data-centric enterprise.”


Clearly, Informatica has moved beyond ETL to embrace these key data-centric technologies.

Acquisitions since January 2009:
  • Applimation (Jan. 2009) – Leader in Application Information Life Cycle Management. Helping Customers manage their data assets from cradle-to-grave:

  • Data Archive for storing inactive and rarely accessed data on lower cost storage medium.
  • Data Privacy for masking identifying and identity data to remain in compliance with today’s stringent personal information and privacy laws.
  • Data Subset for creating purpose-built data sets for environment (DEV/QA/SYSTEM) or functional (CRM/ERP/FINANCE/HR) based testing.
  • Address Doctor (Mid-2009) – Leader in Worldwide Name and Address Validation for Over 200 Countries. Providing Customers with the highest quality name and address content for name and address standardization and validation.
  • Agent Logic (Oct. 2009) – Leader in Complex Event Processing and Operational Intelligence. Helping Customers institutionalize event detection patterns and human analysis patterns into automated opportunistic action and response alerts.
  • Siperian (Jan. 2010) – Leader in Master Data Management. Helping Customers achieve mastery of a single view of ‘X’ across all mission critical data domains.

Friday, June 3, 2011

Upgrading to Informatica PowerCenter 9.1 from Version 8.6

We have completed the upgrade from Informatica PowerCenter 8.6 to PowerCenter 9.1. The Informatica Upgrade Guide for PowerCenter 8.5.x and 8.6.x describes an in-place upgrade of an existing hardware and software environment. However, as we wanted to upgrade our hardware at the same time, our plan was to setup a brand new Informatica PowerCenter 9.1 environment and import the code from our legacy PowerCenter 8.6 server.

Installing Informatica PowerCenter 9.1 on a Windows 64-bit server was straight forward but we ran into the following two issues:
  1. There seems to be a bug in the Pre-Installation System Check Tool (i9Pi). The purpose of this tool is to check that the server meets all installation requirements. This tool kept raising the following error “Make sure that open cursors are at least set to 1000” for our repository connection information to our Oracle 11g database. Even after our DBA had set this value to 1500 and restarted the Oracle 11g database service, the Pre-Installation System Check Tool continued to raise this error. This seems to be a bug in i9Pi because the installation was able to complete successfully.
  2. If you are planning on running Informatica PowerCenter 9.1 on a Windows platform, please be aware that Informatica has released an Advisory for Windows 32-bit Server and 64-bit Server Platforms. Informatica has identified problematic behavior of the PowerCenter Server, which a user would observe when monitoring multiple concurrent tasks from PowerCenter Workflow Monitor. This behavior is tracked as CR 252994 within Informatica’s bug tracking system. Informatica has released an emergency fix that can be downloaded at the following location:
After installing Informatica PowerCenter 9.1 and starting the service, one has to log into the Administration console (http://server-name:6008) and create the Repository Service and the Integration Service. Once done, install the Informatica PowerCenter 9.1 Client on a client workstation and re-create all Connection information in Workflow Manager. This is a critical task because dependent objects cannot be validated during import without having all corresponding source and target connections defined.

Export all Informatica content from the legacy 8.6 environment into an XML file and import it into the Informatica PowerCenter 9.1 environment with Workflow Manager (Repository ==> Import Objects). After this, we were able to run the imported code in version 9.1 without problems.
Please let us know about your experiences with upgrading to PowerCenter 9.1.

Friday, April 29, 2011

Parallelization of ETL Workflows

One of the most often overlooked ETL architecture design techniques is to ensure that all available hardware resources, in particular CPUs, are utilized. Consider the sample ETL workflow in the figure below. In this workflow, all ETL tasks are scheduled to be executed in series, i.e. one after another. In general, execution tasks only utilize one CPU (with some exceptions). Thus, executing one task after another would only utilize a single CPU, no matter how many CPUs are available.
Thus, in order to utilize all available CPUs on a server, ETL tasks should be scheduled to execute in parallel. Scheduling a highly parallelized ETL workflow requires some planning as all ETL task dependencies need to be understood first. For example, if DimTable1 and DimTable2 are dimension tables for FactTable1, then DimTable1 and DimTable2 need to be loaded first before FactTable1 can be loaded. In addition, if there are no dependencies between DimTable1 and DimTable2, then we can execute their specific ETL processes in parallel.
The output of this dependency analysis is an ETL Task Execution Order Map, as outlined in the figure below. The ETL Task Execution Order Map will then be the template for the ETL workflow layout. As each task will utilize an available CPU, this approach will be able to utilize multiple (if not all) CPUs and provide better performance.
One word of caution, though: The number of parallel scheduled tasks should not significantly exceed the number of CPUs on your server. If there are many more concurrent tasks than CPUs, then the overhead of the OS scheduler switching jobs in and out of CPU time may actually decrease performance.
To avoid this problem, many ETL tools (such as Informatica Power Center) provide a global setting that limits the number of concurrently processed tasks regardless of how many tasks are scheduled to run in parallel. Let us call this setting N. As a general rule, N should be set to the number of CPU (cores) of the server. In this case, one should schedule as many tasks in parallel as outlined in the ETL Task Execution Order Map. This option is very useful in environments with different number of CPUs in the development, test, and production hardware as we do not have to create workflows based on the underlying hardware constraints. In this case, we just set N in each environment to the respective number of CPUs.

Tuesday, February 15, 2011

OBIEE Dashboard for Informatica Metadata

The metadata that Informatica Power Center 8 retains in its repository can be exposed via OBIEE reports and dashboards. This metadata includes ETL execution statistics, which can provide valuable information to BI developers/Administrators and BI users alike. ETL execution metadata can expose performance bottlenecks, error-details, debugging information, and the last successful ETL load timestamp. The following steps demonstrate how to create an ETL Monitor Dashboard to expose Informatica metadata in OBIEE.

In this example, we will use the following four views from the Informatica schema:
  • REP_SESS_LOG
  • REP_SESS_TBL_LOG
  • REP_SUBJECT
  • REP_WFLOW_RUN
Steps to build the Informatica Metadata Reports using OBIEE:

Import the four views listed above into the OBIEE repository (RPD) and create the physical joins as shown below on SUBJECT_ID. Build the corresponding business model layer and presentation layer.

Now, you will be able to create ETL reports using OBIEE Answers. In this example, we have build reports to display the following information:
    • Last ETL completion Time Stamp
    • ETL Execution statistics
    • Number of Errors in the latest ETL Execution
    • Latest ETL Execution log (which will drill down to the individual interface log and also error detail
    • Error log (Table level)
The dashboard screenshot below shows our implementation of these reports.Last ETL completion Time Stamp: Data Current as of [MAX (REP_WFLOW_RUN.END_TIME)]: This report shows the latest ETL execution completion time stamp.ETL Execution Statistics: This report shows processing time for the last n ETL executions. The graph is plotted by the date when ETL ran and the duration of each ETL run (in Minutes).

Number of Errors in the latest ETL Execution: This report gives the number of errors encountered in the last n ETL executions.Last ETL Execution Log: This report shows the ETL execution log of the last ETL run. We have designed this report to include several drill-down options as shown below.

Error Log report:
We can also use Informatica Metadata tables for Impact Analysis & Security Analysis using the following views / tables:
Impact Analysis:
  • REP_TASK_ATTR
  • REP_SUBJECT
  • REP_SESSION_INSTANCES
  • REP_ALL_TASKS
  • REP_WORKFLOWS
  • OPB_MAPPING
  • OPB_WIDGET_INST
Security Analysis:
  • REP_VERSION_PROPS
  • REP_USERS
  • REP_SUBJECT


Click here to see how to expose Oracle Data Integrator (ODI) ETL metadata.


If you have question or comments, leave a comment below.