Ab Initio Tutorials and Examples and Interview Questions,Ab Initio Components Explanation,

Types of Partition Components In Ab Initio

The Abinitio Component organizer provides several partitioning components that divide data in different ways.Those are listed below
1.Function Partition
2.Hash Partition
3.Load Level partition
4.Percentage partition
5.Range Partition
6.Roundrobin partition

1.Function Partition Dividing it according to a DML expression.
2.Hash Partition Grouping it by a key,like dealing cards into piles according to their suit.
3.Load Level partition Dynamic load balancing.more data goes to CPUs that are less busy and vice versa,thus maximizing through put.
4.Percentage partition Distributing it,So out put is proportional to functions of 100.
5.Range Partition Dividing it evenly among nodes,based on key and a set of partitioning range.
6.Roundrobin partition Distributing it evenly,in blocksize chunks,across the output partitions.

Types of parallelism in ab initio

In Ab Initio having three types of Parallelism.
Parallel computing is the simultaneous performance of multiple operations.Ab Initio Uses the three main types of parallel computing::
1.Component Level Parallelism
2.Pipeline Parallelism
3.Data Parallelism

1.Component level Parallelism::
An application with multiple components running simultaneously on separate data uses component parallelism.
2.Pipeline Parallelism :: An application with multiple components running simultaneously on same data uses component parallelism.
3.Data Parallelism:: An application with data divided into segments that operates on each segment simultaneously uses Data Parallelism.
It is common for applications to take advantage of all three types at the same time in ab initio.

sort component use before this components in ab initio

Some components required presorted data because they group adjacent records by a common key.if your data is not sorted,use the sort component before any of the following components:
this is also increase performance tuning in ab initio.
  • Aggregate

  • Merge

  • Dedup sorted

  • Denormalize or accumulate

  • Match Merge

  • Merge Join

  • Rollup

  • Scan

  • if you use sort before this components data process increases this is also performance tuning tip in ab initio.

    File Extensions in Abinitio

    They are few file type extensions you must know in abinitio before you learn

    those are listed below
    .mp means graphs
    .dbc means database table files
    .dat means data files
    .dml means record format files(Data manipulation language)
    .xfr means transform function files
    .mpc means program template files or custom program components
    .mdc means dataset template files or custom dataset components
    .ksh means shell scripting file we use this file for client sharing

    Runtime Status of graph in abinitio

    Runtime Status of graph in abinitio
    the GDE(Graphical Development Environment) displays round colored indicators to show the status of each component during run time in abinitio
    colour in abinitio graph
    white colour means unstarted the abinitio graph
    Green colour means running the abinitio graph
    Red colour means error in the abinitio graph
    Blue colour means success or done the abinitio graph

    Difference Between Aggregation And Rollup

    Aggregation and Rollup, both are used to summarize the data.

    - Rollup is much better and convenient to use.

    - Rollup can perform some additional functionality, like input filtering and output filtering of records.

    - Aggregate does not display the intermediate results in main memory, where as Rollup can.

    - Analyzing a particular summarization is much simpler compared to Aggregations.
    this are the main difference between aggregate and rollup in abinitio

    ab initio dml sample

    Here i am explaining with example data file and example of dml file in ab initio

    data will be like this
    102,Damodar Reddy,M,Madanapalle,90000,
    103,Prasad Kumar,M,Anantapur,50000,
    104,Jim Tpt,M,Tirupati,40000,
    105,Chandra Mouli,M,Bangalore,100000,
    106,Mohan Narayana,M,Vizag,90000,
    107,Mahesh Reddy,M,Kadapa,34000,
    108,Jyothi Kumari,F,Hyderabad,40000,

    dml expression like this in ab initio
    record
    decimal (",") emp_id;
    string (",") first_name;
    string (",") last_name;
    string ( ",") gender;
    string (",") place;
    decimal (",\r\n") salary;
    end;
    if you have any queries please comment below i can respond as early as possible.above i explained in detail example data with sample dml ab initio.

    Ab Initio Important Questions Frequently Asked by Companies [2016]

    1)What is the difference between dbc and cfg? When do you use these two?
    .dbc file is Database configuration file. A file with a .dbc extension which provides the GDE with the information it needs to connect to a database. A configuration file contains the following information:
    1. The name and version number of the database to which you want to connect
    2. The name of the computer on which the database instance or server to which you want to connect runs, or on which the database remote access software is installed
    3. The name of the database instance, server, or provider to which you want to connect
    .cfg file is Database table configuration file for use with 2.1 database components. The 2.1 database components are deprecated components provided for compatibility with Version 2.1 and lower of the Co>Operating System. They should not be used for new development. See the components' individual help topics for more information.

    2) What are the compilation errors you came across while executing your graphs?

    3) What is depth_error?

    4) During the execution of graph, let us say you lost the network connection, would you have to start the process all over again or does it start from where it stopped?

    5) Types of partitions and scenarios.
    The divisions of the data and the copies of the program components that create data parallelism are called partitions, and a component partitioned in this way is called a parallel component. If each partition of a parallel program component runs on a separate processor, the increase in the speed of processing is almost directly proportional to the number of partitions.
    For example, suppose you wanted to build a graph to sort a file of customer records according to a key. To speed the processing, you could use a PARTITION BY ROUND-ROBIN component to divide the file of unsorted records into three partitions, sending all records with the same key to the same partition. Then three corresponding partitions of a SORT component could sort all three partitions of the data at the same time. Finally, a MERGE component could combine the three sorted partitions into one sorted flow. If each partition of the SORT component ran on a separate processor, the sorting process would take only a third of the time it would take if the processing employed no data parallelism.

    Flow partitions
    When you divide a component into partitions, you divide the flows that connect to it as well. These divisions are called flow partitions. The three partitions of the SORT component in "Data parallelism example" have three corresponding flow partitions of the input and output flows that connect them to PARTITION BY ROUND-ROBIN and MERGE.
    Port partitions
    The port to which a partitioned flow connects is partitioned as well, with the same number of port partitions as the flow connected to it. In the diagram in "Data parallelism example", the output port of PARTITION BY ROUND-ROBIN, the input and output ports of SORT, and the input port of MERGE each have three port partitions.
    Depth of parallelism
    The number of partitions of a component, flow, port, graph, or section of a graph determines its depth of parallelism. In the diagram in "Data parallelism example", from the output port of PARTITION BY ROUND-ROBIN to the input port of MERGE, all parts of the process are parallel and have three partitions, so you can say that section of the graph has a depth of parallelism of three, or is three-ways parallel.

    6) What does unused port in join component do?
    The JOIN component also includes unused ports; the number of these ports matches the number of input ports. The records that flow out of the unused ports are the records with key values that did not match the key values of records on the other inputs.

    7) Define Multi file system. Can you create multifile system on the same server? Also, if you have a table that has Name, Address, Status, Position attributes, can Name and Address be on one partition and Status and Position in the other partition?
    An Ab Initio multifile system consists of multiple replications of a directory tree structure containing multidirectories and multifiles — these are the partitions of the multifile system.
    All but one of the partitions contains a subset of the data stored in the multifile system; the additional partition contains control information. The partitions containing data are the data partitions of the system, and the additional partition is the control partition. The control partition contains no user data — only the information the Co>Operating System needs to manage the multifile system.
    Visualize a directory tree containing subdirectories and files. Now imagine n identical copies of the same tree located on several disks, and number them 0 to n-1 (the Co>Operating System numbers partitions starting at 0). These are the data partitions of the multifile system. Then add one more copy of the tree to serve as the control partition. This is a multifile system (for an example, see "Sample multifile system").
    You can place the control and data partitions of a multifile system on any computer that has the Co>Operating System installed on it and to which the run host can connect.

    8) What is a sandbox? Did the co-operating system version 2.8 have sandbox, if not how would you store the respective files?

    9) How did you do version control? Which tool did you use?

    10) How do you troubleshoot performance issues in graph?

    11) What are the usual errors that you encounter during ETL process apart from compilation process?

    12) Were you involved in production support? What were the different kinds of problems that you encountered?

    13) Please give us insight on Enterprise Meta Environment, and some possible questions on that.

    14) What are delta table and master table?

    15) What error would you get when you use Partition by Round Robin and Join?
    Depth not equal

    16) In which scenarios would you use Partition by Key and also, Partition by Round Robin and differences between the both?

    17) What are the different dimension tables that you used and some columns in the fact table?

    18) How do you count the number of records in a flat file?

    19) How do you count the number of records in a multifile system without using GDE?

    20) What does Scan and Rollup component do and give a scenario where you used them?

    21) Did you ever used user defined functions or packages? If yes, give a scenario.

    22) What do you have to give the value for the Record Required parameter for a natural join?

    23) When do you use Partition by Expression?

    24) What is Adhoc File System? Give me a scenario where you used it.

    25) What are the different commands that you used when writing wrappers?

    26) What do the hidden files in a sandbox represent and what does start.ksh represent?

    27) What are different things that you have to consider when loading data into a table?

    28) What is difference between Redefine Format and Reformat components?

    29) Sometimes you have to use dynamic length strings. Can you give me one circumstance where you need it?

    30) If you have a flat file as follows:

    20 General Manager Chris
    30 Divisional Manager Harry
    20 General Manager Mary
    30 Divisional Manager Dravid
    How do you count the number of records that have 20 in the first column, and likewise for 30.

    Conditional DML With Example Explanation

    Conditional DML is known as DML that is used as condition.

    we can write the conditional dml where header and trailer data accoured,
    data format is like this

    """
    H,12/10/16,a.dat,
    D,121,damu,1000,
    D,131,raju,2000,
    D,141,reddy,3000,
    T,3,

    """

    conditional dml record format is like this below i explained with example

    """"

    record
    string(",")id;
    if(id=="H")
    begin
    date(",") date1;
    string(",") file_name,
    end
    else if(id=="D")
    begin
    decimal(",")emp_id;
    string(",")emp_name;
    string(",")salary;
    end
    else if(id==T)
    begin
    decimal(",")count;
    end
    end;


    """""

    Like this conditional dml record format we can write in abinitio mostly accrued in bank projects,health care,trading projects

    Important String and Validation Functions in Ab Initio

    In the Ab Initio few string and validation functions regularly using in the process of generate graphs.
    Important string functions
  • string_split


  • string_replace


  • string_prefix


  • string_index


  • string_suffix


  • string_rindex


  • string_like


  • string_substring


  • string_length



  • this are the frequently using string functions in abinitio for generate the graphs using transformations.

    Important Validation Functions:
    For the data validation purpose below mentioned validation functions we use regularly.
  • is_defined


  • is_error


  • is_null


  • is_valid


  • is_blank


  • These are frequently used data validation functions in ab initio.

    any important/frequently used string functions and validation functions is there or facing any queries comment below we can give solution for real time scenarios

    if any please comment bellow...

    Reformat Detailed Explanation Ab Initio Component

    Reformat Detailed Explanation Ab Initio Component




    Reformat Component Ab Initio::
    Reformat Changes the record format of your data by dropping fields or by using dml expressions to add fields,combine fields or modify the data.
    reformat-graph-with example

    input port

    input must be .dat file as general cases
    reformat-dml-expression
    reformat parameters
  • count



  • tansform



  • select



  • reject threshold



  • logging



  • reformat-tranform-parameter




    output port
    save .dat file and graph in your sandbox or EME server.

    Partition Components

    Partition Components
    Partition Components
    In the

    ab initio/abinitio

    having few usefull partition components described below.
    Partition Components
    Partition Components Classification/Types::
  • Broad cast

  • Partition By Expression



  • Partition By Key



  • Partition By Percentage



  • Partition By Range



  • Partition By Round-robin



  • Partition By Load Balance

  • Tags:: Partition Components,Partition Components types,Partition Components real time,Partition Components abinitio,Partition Components ab initio

    Filter By Expression component

    Filter By Expression component

    Filter By Expression component in ab initio with example and real time parameters
    Filter By Expression component
    Purpose
    Filter by Expression filters records according to a DML expression or transform function, which specifies the selection criteria.
    Filter by Expression is sometimes used to create a subset, or sample, of the data. For example, you can configure Filter by Expression to select a certain percentage of records, or to select every third (or fourth, or fifth, and so on) record. Note that if you need a random sample of a specific size, you should use the sample component.
    FILTER BY EXPRESSION supports implicit reformat. For more information, see “Implicit reformat”.
    Recommendation
    Component folding can enhance the performance of this component. If this feature is enabled, the Co>Operating System folds this component by default. See “Component folding” for more information.
    Location in the Component Organizer
    Transform folder
    Runtime behavior of FILTER BY EXPRESSION
    Filter by Expression does the following:
  • Reads data records from the in port.


  • If the use_package parameter is false, applies the expression in the select_expr parameter to each record. It routes records as follows, based on how the expression evaluates:



  • For a non-0 value, Filter by Expression writes the record to the out port.


  • For 0, Filter by Expression writes the record to the deselect port. If you do not connect a flow to the deselect port, Filter by Expression discards the records.
  • For NULL, Filter by Expression writes the record to the reject port and a descriptive error message to the error port


  • If the use_package parameter is true, executes the functions defined in the package


  • If output_for_error or make_error is defined, executes them whenever an error event occurs. If log_error is defined and logging of rejects is turned on, executes log_error.


  • Tags::filter by expression component in ab initio,filter by expression in ab initio,gather component in ab initio,rollup component in ab initio,partition component in ab initio,reformat component in ab initio,sort component in ab initio,normalize component in ab initio,scan component in ab initio,join component in ab initio

    Reformat Component

    Reformat Component
    Reformat Component in ab initio with example and real time environment
    Reformat-component-in-ab-initio
    Purpose
    Reformat changes the format of records by dropping fields, or by using DML expressions to add fields, combine fields, or transform the data in the records.

    Recommendation
    Component folding can enhance the performance of this component. If this feature is enabled, the Co>Operating System folds this component by default. See “Component folding” for more information.

    Location in the Component Organizer

    Transform folder
    Runtime behavior of REFORMAT
  • The component reads records from the in port.



  • If you specify an expression for the select parameter, the expression filters the records on the in port:



    • If the expression evaluates to 0 for a particular record, Reformat does not process the record, which means that the record does not appear on any output port.
    • If the expression produces NULL for any record, Reformat writes a descriptive error message and stops execution of the graph.



    • If the expression evaluates to anything other than 0 or NULL for a particular record, Reformat processes the record.


  • If you do not specify an expression for the select parameter, Reformat processes all the records on the in port.
  • If you specify a value for either output-index or output-indexes, Reformat passes the records to the transform functions, calling the transform function on each port in order, depending on the value of output-index or output-indexes, for each record, beginning with out port 0 and progressing through out portcount – 1.
  • The evaluation of the transform functions takes place within each partition of a Reformat running in parallel, which means that evaluations of later transform functions can depend on the results of the evaluations of earlier transform functions, such as modification of global variables or use of functions such as next_in_sequence.
  • If you do not specify a transform function for a particular out port, Reformat uses default record assignment. (For more information, see “Default record assignment”.) You can use default record assignment to eliminate fields from a record format.
  • Reformat writes the valid records to the out ports.
  • Dedup Sorted Component

    Dedup Sorted Component
    Dedup Sorted Component Explanation with example dedup sorted component in ab initio parameters
    Dedup-sorted-component
    Purpose
    Dedup Sorted separates one specified record in each group of records from the rest of the records in the group.
    Requirement
    Dedup Sorted requires grouped input.

    Recommendation
    Component folding can enhance the performance of this component. If this feature is enabled, the Co>Operating System folds this component by default. See “Component folding” for more information.

    Location in the Component Organizer
    Transform folder
    Runtime behavior of DEDUP SORTED with parameters

    in the parameters this are the act parameters in dedup sorted component in ab initio if you have software this parameters can fill and lean the dedup sorted component and learn with example



  • Dedup Sorted does the following:



    • Reads a grouped flow of records from the in port.
    • If your records are not already grouped, use SORT to group them.
    • Does one of the following:
    • If you have supplied an expression for the select parameter, Dedup Sorted applies the expression to the records as follows: If you do not supply an expression for the select parameter, Dedup Sorted processes all records on the in port.
    • Processes groups of records as follows:
    • Considers any consecutive records with the same key value to be in the same group. If a group consists of one record, writes that record to the out port. If a group consists of more than one record, uses the value of the keep parameter to determine which record — if any — to write to the out port, and which record or records to write to the dup port. If you have chosen unique-only for the keep parameter, does not write records to the out port from any groups consisting of more than one record.

    Normalize Component

    Normalize Component
    Normalize Component is also one of the multistage components in ab initio
    Normalize-component-ab-initio
    Purpose
    Normalize generates multiple output records from each of its input records. You can directly specify the number of output records for each input record, or you can make the number of output records dependent on a calculation.
    In contrast, to consolidate groups of related records into a single record with a vector field for each group — the inverse of NORMALIZE — you would use the accumulation function of the ROLLUP component.

    Recommendations
  • Always clean and validate data before normalizing it. Because Normalize uses a multistage transform, it follows computation rules that may cause unexpected or incorrect results in the presence of dirty data (NULLs or invalid values). Furthermore, the results will be hard to trace, particularly if the reject-threshold parameter is set to Never abort. Several factors — including the data type, the DML expression used to perform the normalization, and the value of the sorted-input parameter — may affect where the problems occur. It is safest to avoid normalizing dirty data.


  • Component folding can enhance the performance of this component. If this feature is enabled, the Co>Operating System folds this component by default. See “Component folding” for more information.






  • Location in the Component Organizer
    Transform folder

    Run time/Real time behavior of NORMALIZE Component in ab initio
  • Reads the input record.





  • Performs temporary initialization.





  • Performs iterations of the normalize transform function. NORMALIZE determines the number of iterations to perform using either the finished or the length function, whichever is defined:





  • Sends the output record to the out port.


  • Tags::normalize component in abinitio,ab initio components with examples,ab initio components with explanation,Normalize component in ab initio with example,multistage components in ab initio with detail explanation

    Scan Component

    Scan Component
    Scan Component with real time behavior of SCAN Component
    Scan-Component-Ab-initio
    Purpose
    For every input record, Scan generates an output record that consists of a running cumulative summary for the group to which the input record belongs, up to and including the current record. For example, the output records might include successive year-to-date totals for groups of records.
    Recommendations
  • If you want one summary record for a group, use ROLLUP.



  • The behavior of SCAN varies in the presence of dirty data (NULLs or invalid values), according to whether you use the aggregation functions for the scan:



  • Without aggregation functions, you can use SCAN normally.



  • With aggregation functions, always clean and validate data before scanning it. Because the aggregation functions use a multistage transform, SCAN follows computation rules that may cause unexpected or even incorrect results in the presence of dirty data (NULLs or invalid values). Furthermore, the results will be hard to trace, particularly if the reject-threshold parameter is set to Never abort. Several factors — including the data type, the DML expression used to perform the scan, and the value of the sorted-input parameter — may affect where the problems occur. It is safest to clean and validate the data before using the aggregation functions in SCAN.



  • Component folding can enhance the performance of this component. If this feature is enabled, the Co>Operating System folds this component by default. See “Component folding” for more information.



  • Location in the Component Organizer
    Transform folder
    At runtime, Scan does the following:
  • Input selection:



  • Temporary initialization:



  • Computation:



  • Finalization:



  • Output selection:


  • Tags:: scan component ab initio,scan component in ab initio with example,ab initio components list,ab initio components examples,ab initio Scan component with examples,ab initio components list

    ROLLUP Component

    ROLLUP Component
    ab initio Rollup component with examples and detail explanation
    Rollup Component Ab Initio
    Purpose
    Rollup evaluates a group of input records that have the same key, and then generates records that either summarize each group or select certain information from each group.
    Location in the Component Organizer
    Transform folder
    Recommendations
  • For new development, use Rollup rather than AGGREGATE. Rollup provides more control over record selection, grouping, and aggregation.


  • The behavior of ROLLUP varies in the presence of dirty data (NULLs or invalid values), according to whether you use the aggregation functions for the rollup:


  • Without aggregation functions, you can use ROLLUP normally.


  • With aggregation functions, always clean and validate data before rolling it up. Because the aggregation functions use a multistage transform, ROLLUP follows computation rules that may cause unexpected or even incorrect results in the presence of dirty data (NULLs or invalid values). Furthermore, the results will be hard to trace, particularly if the reject-threshold parameter is set to Never abort. Several factors — including the data type, the DML expression used to perform the rollup, and the value of the sorted-input parameter — may affect where the problems occur. It is safest to clean and validate the data before using the aggregation functions in ROLLUP.


  • Component folding can enhance the performance of this component. If this feature is enabled, the Co>Operating System folds this component by default. See “Component folding” for more information.


  • Then ROLLUP executes the following steps for each group of records:
  • Temporary initialization.


  • Computation.


  • Finalization.


  • Output selection.



  • Tags::rollup component in abinitio,scan component in ab initio with example,rollup without key in ab initio,expanded rollup in ab initio,ab initio components pdf,rollup and scan in ab initio,lookup in abinitio,ab initio components examples,roll up scenarios in ab initio

    Ab Initio Course Content

    Ab Initio Course Content
    Ab Initio Course Content/Ab Initio Course Details
    Ab Initio Training,Ab Initio Tutorials,Ab Initio Course Content
    Ab Initio is a Latin term meaning "from the beginning".

    Abinitio Course Content

    Introduction to Ab Initio::
    • Data warehousing Concepts
    • Introduction to Ab initio
    • Ab initio Architecture
    • Graph Programming
    • Introduction to .dat and .dml files

    Partition Components::

    • Braod Cast
    • Partition by Expression
    • Partition by range
    • Graph Programming
    • Partition by community
    • Partition by percentage
    • Partition by Round Robin
    Departition Components::
    • Concatenate
    • Gather
    • Interleave
    • Merge
    • Multifile System (MFS)
    • Types of parallelism (very important in ab initio)
      • Layouts
      • Sort Components
      • Sort
      • Sort with in groups
      • Sample
      • Partition by key and sort
    Transform Components::
    • Filter by expression
    • Aggregate
    • Scan
    • Rollup
    • Denormalize Sorted
    • Normalize
    • Reformat
    • Match sorted
    • Dedup sorted
    • Working with Databases
      • Database components
      • Run SQL
      • Input Table
      • Output Table
      • Truncate Table
      • Update table
      • Phase and check Points
        Miscellaneous component
        • Gather logs
        • Run program
        • Redefine format
        • Trash
        • Replicate
        • Dataset Components
        • Input File
        • Output File
        • Lookup File
        • Intermediate File
        • Compress Components
        • Compress
        • Uncompress
        • Gzip
        • Gunzip
        • Validate Component
          • Check Order
          • Generate Records
          • Generate Random bytes
          • Compare Records
          • Compute Check Sum
          • Compare Check Sum
        • Project and Sandbox
        • Performance Tuning
        • Tags:: ab initio tutorial,ab initio training videos,ab initio training material download,abinitio online training,ab initio training material pdf,ab initio training online free,ab initio content online free,ab initio software structure online free,what does ab initio mean
    Copyright © Ab Initio Coach. All rights reserved.