Ab Initio Coach

Ab Initio Tutorials and Examples and Interview Questions,Ab Initio Components Explanation,

Types of Partition Components In Ab Initio

The Abinitio Component organizer provides several partitioning components that divide data in different ways.Those are listed below
1.Function Partition
2.Hash Partition
3.Load Level partition
4.Percentage partition
5.Range Partition
6.Roundrobin partition

1.Function Partition Dividing it according to a DML expression.
2.Hash Partition Grouping it by a key,like dealing cards into piles according to their suit.
3.Load Level partition Dynamic load balancing.more data goes to CPUs that are less busy and vice versa,thus maximizing through put.
4.Percentage partition Distributing it,So out put is proportional to functions of 100.
5.Range Partition Dividing it evenly among nodes,based on key and a set of partitioning range.
6.Roundrobin partition Distributing it evenly,in blocksize chunks,across the output partitions.

Types of parallelism in ab initio

In Ab Initio having three types of Parallelism.
Parallel computing is the simultaneous performance of multiple operations.Ab Initio Uses the three main types of parallel computing::
1.Component Level Parallelism
2.Pipeline Parallelism
3.Data Parallelism

1.Component level Parallelism::
An application with multiple components running simultaneously on separate data uses component parallelism.
2.Pipeline Parallelism :: An application with multiple components running simultaneously on same data uses component parallelism.
3.Data Parallelism:: An application with data divided into segments that operates on each segment simultaneously uses Data Parallelism.
It is common for applications to take advantage of all three types at the same time in ab initio.

sort component use before this components in ab initio

Some components required presorted data because they group adjacent records by a common key.if your data is not sorted,use the sort component before any of the following components:
this is also increase performance tuning in ab initio.
  • Aggregate

  • Merge

  • Dedup sorted

  • Denormalize or accumulate

  • Match Merge

  • Merge Join

  • Rollup

  • Scan

  • if you use sort before this components data process increases this is also performance tuning tip in ab initio.

    File Extensions in Abinitio

    They are few file type extensions you must know in abinitio before you learn

    those are listed below
    .mp means graphs
    .dbc means database table files
    .dat means data files
    .dml means record format files(Data manipulation language)
    .xfr means transform function files
    .mpc means program template files or custom program components
    .mdc means dataset template files or custom dataset components
    .ksh means shell scripting file we use this file for client sharing

    Runtime Status of graph in abinitio

    Runtime Status of graph in abinitio
    the GDE(Graphical Development Environment) displays round colored indicators to show the status of each component during run time in abinitio
    colour in abinitio graph
    white colour means unstarted the abinitio graph
    Green colour means running the abinitio graph
    Red colour means error in the abinitio graph
    Blue colour means success or done the abinitio graph

    Difference Between Aggregation And Rollup

    Aggregation and Rollup, both are used to summarize the data.

    - Rollup is much better and convenient to use.

    - Rollup can perform some additional functionality, like input filtering and output filtering of records.

    - Aggregate does not display the intermediate results in main memory, where as Rollup can.

    - Analyzing a particular summarization is much simpler compared to Aggregations.
    this are the main difference between aggregate and rollup in abinitio

    ab initio dml sample

    Here i am explaining with example data file and example of dml file in ab initio

    data will be like this




    102,Damodar Reddy,M,Madanapalle,90000,
    103,Prasad Kumar,M,Anantapur,50000,
    104,Jim Tpt,M,Tirupati,40000,
    105,Chandra Mouli,M,Bangalore,100000,
    106,Mohan Narayana,M,Vizag,90000,
    107,Mahesh Reddy,M,Kadapa,34000,
    108,Jyothi Kumari,F,Hyderabad,40000,

    dml expression like this in ab initio




    record
    decimal (",") emp_id;
    string (",") first_name;
    string (",") last_name;
    string ( ",") gender;
    string (",") place;
    decimal (",\r\n") salary;
    end;




    if you have any queries please comment below i can respond as early as possible.above i explained in detail example data with sample dml ab initio.

    Ab Initio Important Questions Frequently Asked by Companies [2016]

    1)What is the difference between dbc and cfg? When do you use these two?
    .dbc file is Database configuration file. A file with a .dbc extension which provides the GDE with the information it needs to connect to a database. A configuration file contains the following information:
    1. The name and version number of the database to which you want to connect
    2. The name of the computer on which the database instance or server to which you want to connect runs, or on which the database remote access software is installed
    3. The name of the database instance, server, or provider to which you want to connect




    .cfg file is Database table configuration file for use with 2.1 database components. The 2.1 database components are deprecated components provided for compatibility with Version 2.1 and lower of the Co>Operating System. They should not be used for new development. See the components' individual help topics for more information.

    2) What are the compilation errors you came across while executing your graphs?

    3) What is depth_error?

    4) During the execution of graph, let us say you lost the network connection, would you have to start the process all over again or does it start from where it stopped?

    5) Types of partitions and scenarios.
    The divisions of the data and the copies of the program components that create data parallelism are called partitions, and a component partitioned in this way is called a parallel component. If each partition of a parallel program component runs on a separate processor, the increase in the speed of processing is almost directly proportional to the number of partitions.
    For example, suppose you wanted to build a graph to sort a file of customer records according to a key. To speed the processing, you could use a PARTITION BY ROUND-ROBIN component to divide the file of unsorted records into three partitions, sending all records with the same key to the same partition. Then three corresponding partitions of a SORT component could sort all three partitions of the data at the same time. Finally, a MERGE component could combine the three sorted partitions into one sorted flow. If each partition of the SORT component ran on a separate processor, the sorting process would take only a third of the time it would take if the processing employed no data parallelism.




    Flow partitions
    When you divide a component into partitions, you divide the flows that connect to it as well. These divisions are called flow partitions. The three partitions of the SORT component in "Data parallelism example" have three corresponding flow partitions of the input and output flows that connect them to PARTITION BY ROUND-ROBIN and MERGE.
    Port partitions
    The port to which a partitioned flow connects is partitioned as well, with the same number of port partitions as the flow connected to it. In the diagram in "Data parallelism example", the output port of PARTITION BY ROUND-ROBIN, the input and output ports of SORT, and the input port of MERGE each have three port partitions.
    Depth of parallelism
    The number of partitions of a component, flow, port, graph, or section of a graph determines its depth of parallelism. In the diagram in "Data parallelism example", from the output port of PARTITION BY ROUND-ROBIN to the input port of MERGE, all parts of the process are parallel and have three partitions, so you can say that section of the graph has a depth of parallelism of three, or is three-ways parallel.

    6) What does unused port in join component do?
    The JOIN component also includes unused ports; the number of these ports matches the number of input ports. The records that flow out of the unused ports are the records with key values that did not match the key values of records on the other inputs.

    7) Define Multi file system. Can you create multifile system on the same server? Also, if you have a table that has Name, Address, Status, Position attributes, can Name and Address be on one partition and Status and Position in the other partition?
    An Ab Initio multifile system consists of multiple replications of a directory tree structure containing multidirectories and multifiles — these are the partitions of the multifile system.
    All but one of the partitions contains a subset of the data stored in the multifile system; the additional partition contains control information. The partitions containing data are the data partitions of the system, and the additional partition is the control partition. The control partition contains no user data — only the information the Co>Operating System needs to manage the multifile system.
    Visualize a directory tree containing subdirectories and files. Now imagine n identical copies of the same tree located on several disks, and number them 0 to n-1 (the Co>Operating System numbers partitions starting at 0). These are the data partitions of the multifile system. Then add one more copy of the tree to serve as the control partition. This is a multifile system (for an example, see "Sample multifile system").
    You can place the control and data partitions of a multifile system on any computer that has the Co>Operating System installed on it and to which the run host can connect.





    8) What is a sandbox? Did the co-operating system version 2.8 have sandbox, if not how would you store the respective files?

    9) How did you do version control? Which tool did you use?

    10) How do you troubleshoot performance issues in graph?

    11) What are the usual errors that you encounter during ETL process apart from compilation process?

    12) Were you involved in production support? What were the different kinds of problems that you encountered?

    13) Please give us insight on Enterprise Meta Environment, and some possible questions on that.

    14) What are delta table and master table?

    15) What error would you get when you use Partition by Round Robin and Join?
    Depth not equal

    16) In which scenarios would you use Partition by Key and also, Partition by Round Robin and differences between the both?

    17) What are the different dimension tables that you used and some columns in the fact table?

    18) How do you count the number of records in a flat file?

    19) How do you count the number of records in a multifile system without using GDE?

    20) What does Scan and Rollup component do and give a scenario where you used them?

    21) Did you ever used user defined functions or packages? If yes, give a scenario.

    22) What do you have to give the value for the Record Required parameter for a natural join?

    23) When do you use Partition by Expression?

    24) What is Adhoc File System? Give me a scenario where you used it.

    25) What are the different commands that you used when writing wrappers?

    26) What do the hidden files in a sandbox represent and what does start.ksh represent?

    27) What are different things that you have to consider when loading data into a table?

    28) What is difference between Redefine Format and Reformat components?

    29) Sometimes you have to use dynamic length strings. Can you give me one circumstance where you need it?

    30) If you have a flat file as follows:

    20 General Manager Chris
    30 Divisional Manager Harry
    20 General Manager Mary
    30 Divisional Manager Dravid
    How do you count the number of records that have 20 in the first column, and likewise for 30.




    Conditional DML With Example Explanation

    Conditional DML is known as DML that is used as condition.




    we can write the conditional dml where header and trailer data accoured,
    data format is like this

    """
    H,12/10/16,a.dat,
    D,121,damu,1000,
    D,131,raju,2000,
    D,141,reddy,3000,
    T,3,

    """

    conditional dml record format is like this below i explained with example

    """"





    record
    string(",")id;
    if(id=="H")
    begin
    date(",") date1;
    string(",") file_name,
    end
    else if(id=="D")
    begin
    decimal(",")emp_id;
    string(",")emp_name;
    string(",")salary;
    end
    else if(id==T)
    begin
    decimal(",")count;
    end
    end;


    """""





    Like this conditional dml record format we can write in abinitio mostly accrued in bank projects,health care,trading projects

    Important String and Validation Functions in Ab Initio

    In the Ab Initio few string and validation functions regularly using in the process of generate graphs.
    Important string functions




  • string_split


  • string_replace


  • string_prefix


  • string_index


  • string_suffix


  • string_rindex


  • string_like


  • string_substring


  • string_length



  • this are the frequently using string functions in abinitio for generate the graphs using transformations.

    Important Validation Functions:
    For the data validation purpose below mentioned validation functions we use regularly.
  • is_defined


  • is_error


  • is_null


  • is_valid


  • is_blank






  • These are frequently used data validation functions in ab initio.

    any important/frequently used string functions and validation functions is there or facing any queries comment below we can give solution for real time scenarios

    if any please comment bellow...
    Copyright © Ab Initio Coach. All rights reserved.