Friday, September 6, 2013

Informatica - An Overview


Informatica:

Informatica is used as an ETL tool for extracting the source data and loading it into the target after applying the necessary transformations. It has a simple visual interface like forms in visual basic. Most of the work in Informatica includes just drag and drop with the mouse in the Designer.

It is self-documenting in the sense that it follows a graphical approach. Almost all the major databases can communicate with Informatica to transform data of any size. Informatica comes in two forms:

Informatica can do joins on tables of different databases on different servers, using Informatica Server and Server Manager is the client application which interacts with the server.

We need to design process flows for data extraction, transformation and load. These flows are known as mappings and they can be kept and scheduled to run anytime in the future as per the requirements.

 


 
ETL stands for:

         E-extract

        T-transform

        L- load

These are the three database functions that are combined into one tool to pull data out of one database and place it into another database.

  • Extract -- the process of reading data from a database.
  • Transform -- the process of converting the extracted data from its previous form into the form it needs to be in so that it can be placed into another database. Transformation occurs by using rules or lookup tables or by combining the data with other data.
  • Load -- the process of writing the data into the target database.

ETL is used to migrate data from one database to another, to form data marts and data warehouses and also to convert databases from one format or type to another.

 

Informatica tool has 3 basic components:

 

  1. Informatica PowerCenter Client Tools:

                 These are the development tools that are installed at the developer end. These are:

    • Used to design processes and is a client application (Designer)
    • Define run-time properties for a mapping, known as sessions (Workflow Manager)
    • Monitor execution of sessions (Workflow Monitor)
    • Used to work with the repository (Repository Manager)
    • Report Metadata (Metadata Reporter)

 

  1. Informatica PowerCenter Repository:

 

 The Repository contains all the metadata for the application. It actually is the database of Informatica and contains all the data and mappings etc.

 

 

  1. Informatica PowerCenter Server:

 

All the executions take place at the Server. It connects to the source, extracts the data, performs transformations and mappings and then loads the data to the target system.

I. Designer:

It consists of several tools:

Source Analyzer - (choose Sources menu) to import or create source definitions for flat file, COBOL, ERP, and Relational Databases. Note - double-click on the title-bar opens a pop-up to edit definitions.

  • Warehouse Designer - to import or create target definitions (choose Targets - Generate/Execute SQL, or Targets-Create to create manually).
  • Transformation Developer - to create reusable transformations.
  • Mapplet Designer - to create mapplets
  • Mapping Designer - to create mappings.

II. Server Manager:

Server Manager is a client application used to create and manage sessions and batches, and to configure session connections. You can monitor multiple Informatica Servers, navigate through folders and repositories. Here is what you can do in Server Manager:

  • Monitor, add, edit, delete Informatica Server info in the repository
  • Stop the Informatica Server
  • Configure database, external loader, and FTP connections
  • Manage sessions and batches - create, edit, delete, copy/move within a folder, start/stop, abort sessions, view session logs, details, and session performance details.


To make a mapping:

  • Switch to the Mapping Designer
  • Choose Mapping-Create - and enter a new name
  • Drag a source table from the navigator to the work space. Note, that the designer will also automatically create and show a Source Qualifier transformation.
  • Drag a target table to the work space
  • Drag one-by-one fields from source to target - thus creating graphical connections. Note - you can also delete connection by selecting it - and pressing DEL button.
  • Choose Layout-Arrange

 

Let’s see the explanation some of the major components:

 

  • Repository Manager: It has the following functions-

       Manage repository:

           Connections

           Folders

           Objects

           Users and groups

 

  • Repository: This is where all the metadata information is stored in the Informatica suite. The Power Center Client and the Repository Server would access this repository to retrieve, store and manage metadata.
  • Power Center Client: Informatica client is used for managing users, identifying source and target systems definitions, creating mapping and mapplets, creating sessions and run workflows etc.
  • Repository Server: This repository server takes care of all the connections between the repository and the Power Center Client.
  • Power Center Server: Power Center server does the extraction from source and then loading data into targets.
  • Designer: Source Analyzer, Mapping Designer and Warehouse Designer are tools reside within the Designer wizard. Source Analyzer is used for extracting metadata from source systems.
    Mapping Designer is used to create mapping between sources and targets. Mapping is a pictorial representation about the flow of data from source to target.
    Warehouse Designer is used for extracting metadata from target systems or metadata can be created in the Designer itself.
  • Data Cleansing: The Power Center's data cleansing technology improves data quality by validating, correctly naming and standardization of address data. A person's address may not be same in all source systems because of typos and postal code, city name may not match with address. These errors can be corrected by using data cleansing process and standardized data can be loaded in target systems (data warehouse).
  • Transformation: Transformations help to transform the source data according to the requirements of target system. Sorting, Filtering, Aggregation, Joining are some of the examples of transformation. Transformations ensure the quality of the data being loaded into target and this is done during the mapping process from source to target.
  • Workflow Manager: Workflow helps to load the data from source to target in a sequential manner. For example, if the fact tables are loaded before the lookup tables, then the target system will pop up an error message since the fact table is violating the foreign key validation. To avoid this, workflows can be created to ensure the correct flow of data from source to target.
  • Workflow Monitor: This monitor is helpful in monitoring and tracking the workflows created in each Power Center Server.
  • Power Center Connect: This component helps to extract data and metadata from ERP systems like IBM's MQSeries, PeopleSoft, SAP, Siebel etc. and other third party applications.
  • Power Center Exchange: This component helps to extract data and metadata from ERP systems like IBM's MQSeries, PeopleSoft, SAP, Siebel etc. and other third party applications.
  • Rep Server Administration Console: It is used to-

             Administer repositories on a Repository Server:

           Create/upgrade/delete

           Configuration

           Start/stop

           Backup/restore

 

1.    Code page: Informatica has a code page that contains encoding used to specify characters in a set of one or more languages. The code page is selected based on source of the data. For example if source contains Chinese text then the code page should be selected to support Chinese text. This influences the way that application stores, receives, and sends character data.

2.     Power Center Server on Windows can connect to following databases:

  • IBM DB2
  • Informix
  • Microsoft Access
  • Microsoft Excel
  • Microsoft SQL Server
  • Oracle
  • Sybase
  • Teradata
Mapplet: A mapplet is a reusable object that is created using mapplet designer. The mapplet contains set of transformations and it allows us to reuse that transformation logic in multiple mappings.

1 comment: