Home » Archives for 2016

Types of Partition Components In Ab Initio

The Abinitio Component organizer provides several partitioning components that divide data in different ways.Those are listed below
1.Function Partition
2.Hash Partition
3.Load Level partition
4.Percentage partition
5.Range Partition
6.Roundrobin partition

1.Function Partition Dividing it according to a DML expression.
2.Hash Partition Grouping it by a key,like dealing cards into piles according to their suit.
3.Load Level partition Dynamic load balancing.more data goes to CPUs that are less busy and vice versa,thus maximizing through put.
4.Percentage partition Distributing it,So out put is proportional to functions of 100.
5.Range Partition Dividing it evenly among nodes,based on key and a set of partitioning range.
6.Roundrobin partition Distributing it evenly,in blocksize chunks,across the output partitions.

Types of parallelism in ab initio

In Ab Initio having three types of Parallelism.
Parallel computing is the simultaneous performance of multiple operations.Ab Initio Uses the three main types of parallel computing::
1.Component Level Parallelism
2.Pipeline Parallelism
3.Data Parallelism

1.Component level Parallelism:: An application with multiple components running simultaneously on separate data uses component parallelism.
2.Pipeline Parallelism :: An application with multiple components running simultaneously on same data uses component parallelism.
3.Data Parallelism:: An application with data divided into segments that operates on each segment simultaneously uses Data Parallelism.
It is common for applications to take advantage of all three types at the same time in ab initio.

sort component use before this components in ab initio

Some components required presorted data because they group adjacent records by a common key.if your data is not sorted,use the sort component before any of the following components:
this is also increase performance tuning in ab initio.

Aggregate

Merge

Dedup sorted

Denormalize or accumulate

Match Merge

Merge Join

Rollup

Scan

if you use sort before this components data process increases this is also performance tuning tip in ab initio.

File Extensions in Abinitio

They are few file type extensions you must know in abinitio before you learn

those are listed below
.mp means graphs
.dbc means database table files
.dat means data files
.dml means record format files(Data manipulation language)
.xfr means transform function files
.mpc means program template files or custom program components
.mdc means dataset template files or custom dataset components
.ksh means shell scripting file we use this file for client sharing

Runtime Status of graph in abinitio

the GDE(Graphical Development Environment) displays round colored indicators to show the status of each component during run time in abinitio
colour in abinitio graph

white colour means unstarted the abinitio graph
Green colour means running the abinitio graph
Red colour means error in the abinitio graph
Blue colour means success or done the abinitio graph

Difference Between Aggregation And Rollup

Aggregation and Rollup, both are used to summarize the data.

- Rollup is much better and convenient to use.

- Rollup can perform some additional functionality, like input filtering and output filtering of records.

- Aggregate does not display the intermediate results in main memory, where as Rollup can.

- Analyzing a particular summarization is much simpler compared to Aggregations.
this are the main difference between aggregate and rollup in abinitio

ab initio dml sample

Here i am explaining with example data file and example of dml file in ab initio

data will be like this
102,Damodar Reddy,M,Madanapalle,90000,
103,Prasad Kumar,M,Anantapur,50000,
104,Jim Tpt,M,Tirupati,40000,
105,Chandra Mouli,M,Bangalore,100000,
106,Mohan Narayana,M,Vizag,90000,
107,Mahesh Reddy,M,Kadapa,34000,
108,Jyothi Kumari,F,Hyderabad,40000,

dml expression like this in ab initio
record
decimal (",") emp_id;
string (",") first_name;
string (",") last_name;
string ( ",") gender;
string (",") place;
decimal (",\r\n") salary;
end;
if you have any queries please comment below i can respond as early as possible.above i explained in detail example data with sample dml ab initio.

Ab Initio Important Questions Frequently Asked by Companies [2016]

1)What is the difference between dbc and cfg? When do you use these two?
.dbc file is Database configuration file. A file with a .dbc extension which provides the GDE with the information it needs to connect to a database. A configuration file contains the following information:
1. The name and version number of the database to which you want to connect
2. The name of the computer on which the database instance or server to which you want to connect runs, or on which the database remote access software is installed
3. The name of the database instance, server, or provider to which you want to connect
.cfg file is Database table configuration file for use with 2.1 database components. The 2.1 database components are deprecated components provided for compatibility with Version 2.1 and lower of the Co>Operating System. They should not be used for new development. See the components' individual help topics for more information.

2) What are the compilation errors you came across while executing your graphs?

3) What is depth_error?

4) During the execution of graph, let us say you lost the network connection, would you have to start the process all over again or does it start from where it stopped?

5) Types of partitions and scenarios.
The divisions of the data and the copies of the program components that create data parallelism are called partitions, and a component partitioned in this way is called a parallel component. If each partition of a parallel program component runs on a separate processor, the increase in the speed of processing is almost directly proportional to the number of partitions.
For example, suppose you wanted to build a graph to sort a file of customer records according to a key. To speed the processing, you could use a PARTITION BY ROUND-ROBIN component to divide the file of unsorted records into three partitions, sending all records with the same key to the same partition. Then three corresponding partitions of a SORT component could sort all three partitions of the data at the same time. Finally, a MERGE component could combine the three sorted partitions into one sorted flow. If each partition of the SORT component ran on a separate processor, the sorting process would take only a third of the time it would take if the processing employed no data parallelism.

Flow partitions
When you divide a component into partitions, you divide the flows that connect to it as well. These divisions are called flow partitions. The three partitions of the SORT component in "Data parallelism example" have three corresponding flow partitions of the input and output flows that connect them to PARTITION BY ROUND-ROBIN and MERGE.
Port partitions
The port to which a partitioned flow connects is partitioned as well, with the same number of port partitions as the flow connected to it. In the diagram in "Data parallelism example", the output port of PARTITION BY ROUND-ROBIN, the input and output ports of SORT, and the input port of MERGE each have three port partitions.
Depth of parallelism
The number of partitions of a component, flow, port, graph, or section of a graph determines its depth of parallelism. In the diagram in "Data parallelism example", from the output port of PARTITION BY ROUND-ROBIN to the input port of MERGE, all parts of the process are parallel and have three partitions, so you can say that section of the graph has a depth of parallelism of three, or is three-ways parallel.

6) What does unused port in join component do?
The JOIN component also includes unused ports; the number of these ports matches the number of input ports. The records that flow out of the unused ports are the records with key values that did not match the key values of records on the other inputs.

7) Define Multi file system. Can you create multifile system on the same server? Also, if you have a table that has Name, Address, Status, Position attributes, can Name and Address be on one partition and Status and Position in the other partition?
An Ab Initio multifile system consists of multiple replications of a directory tree structure containing multidirectories and multifiles — these are the partitions of the multifile system.
All but one of the partitions contains a subset of the data stored in the multifile system; the additional partition contains control information. The partitions containing data are the data partitions of the system, and the additional partition is the control partition. The control partition contains no user data — only the information the Co>Operating System needs to manage the multifile system.
Visualize a directory tree containing subdirectories and files. Now imagine n identical copies of the same tree located on several disks, and number them 0 to n-1 (the Co>Operating System numbers partitions starting at 0). These are the data partitions of the multifile system. Then add one more copy of the tree to serve as the control partition. This is a multifile system (for an example, see "Sample multifile system").
You can place the control and data partitions of a multifile system on any computer that has the Co>Operating System installed on it and to which the run host can connect.

8) What is a sandbox? Did the co-operating system version 2.8 have sandbox, if not how would you store the respective files?

9) How did you do version control? Which tool did you use?

10) How do you troubleshoot performance issues in graph?

11) What are the usual errors that you encounter during ETL process apart from compilation process?

12) Were you involved in production support? What were the different kinds of problems that you encountered?

13) Please give us insight on Enterprise Meta Environment, and some possible questions on that.

14) What are delta table and master table?

15) What error would you get when you use Partition by Round Robin and Join?
Depth not equal

16) In which scenarios would you use Partition by Key and also, Partition by Round Robin and differences between the both?

17) What are the different dimension tables that you used and some columns in the fact table?

18) How do you count the number of records in a flat file?

19) How do you count the number of records in a multifile system without using GDE?

20) What does Scan and Rollup component do and give a scenario where you used them?

21) Did you ever used user defined functions or packages? If yes, give a scenario.

22) What do you have to give the value for the Record Required parameter for a natural join?

23) When do you use Partition by Expression?

24) What is Adhoc File System? Give me a scenario where you used it.

25) What are the different commands that you used when writing wrappers?

26) What do the hidden files in a sandbox represent and what does start.ksh represent?

27) What are different things that you have to consider when loading data into a table?

28) What is difference between Redefine Format and Reformat components?

29) Sometimes you have to use dynamic length strings. Can you give me one circumstance where you need it?

30) If you have a flat file as follows:

20 General Manager Chris
30 Divisional Manager Harry
20 General Manager Mary
30 Divisional Manager Dravid
How do you count the number of records that have 20 in the first column, and likewise for 30.

Conditional DML With Example Explanation

Conditional DML is known as DML that is used as condition.

we can write the conditional dml where header and trailer data accoured,
data format is like this

"""
H,12/10/16,a.dat,
D,121,damu,1000,
D,131,raju,2000,
D,141,reddy,3000,
T,3,

"""

conditional dml record format is like this below i explained with example

""""

record
string(",")id;
if(id=="H")
begin
date(",") date1;
string(",") file_name,
end
else if(id=="D")
begin
decimal(",")emp_id;
string(",")emp_name;
string(",")salary;
end
else if(id==T)
begin
decimal(",")count;
end
end;

"""""

Like this conditional dml record format we can write in abinitio mostly accrued in bank projects,health care,trading projects

Important String and Validation Functions in Ab Initio

In the Ab Initio few string and validation functions regularly using in the process of generate graphs.
Important string functions

string_split

string_replace

string_prefix

string_index

string_suffix

string_rindex

string_like

string_substring

string_length

this are the frequently using string functions in abinitio for generate the graphs using transformations.

Important Validation Functions:
For the data validation purpose below mentioned validation functions we use regularly.

is_defined

is_error

is_null

is_valid

is_blank

These are frequently used data validation functions in ab initio.

any important/frequently used string functions and validation functions is there or facing any queries comment below we can give solution for real time scenarios

if any please comment bellow...

Reformat Detailed Explanation Ab Initio Component

Reformat Component Ab Initio::
Reformat Changes the record format of your data by dropping fields or by using dml expressions to add fields,combine fields or modify the data.
reformat-graph-with example

input port

input must be .dat file as general cases
reformat-dml-expression

reformat parameters

count

tansform

select

reject threshold

logging

output port
save .dat file and graph in your sandbox or EME server.

Partition Components

Partition Components
In the

ab initio/abinitio

having few usefull partition components described below.

Partition Components Classification/Types::

Broad cast

Partition By Expression

Partition By Key

Partition By Percentage

Partition By Range

Partition By Round-robin

Partition By Load Balance

Tags:: Partition Components,Partition Components types,Partition Components real time,Partition Components abinitio,Partition Components ab initio

Filter By Expression component

Filter By Expression component in ab initio with example and real time parameters

Purpose
Filter by Expression filters records according to a DML expression or transform function, which specifies the selection criteria.
Filter by Expression is sometimes used to create a subset, or sample, of the data. For example, you can configure Filter by Expression to select a certain percentage of records, or to select every third (or fourth, or fifth, and so on) record. Note that if you need a random sample of a specific size, you should use the sample component.
FILTER BY EXPRESSION supports implicit reformat. For more information, see “Implicit reformat”.
Recommendation
Component folding can enhance the performance of this component. If this feature is enabled, the Co>Operating System folds this component by default. See “Component folding” for more information.
Location in the Component Organizer
Transform folder
Runtime behavior of FILTER BY EXPRESSION
Filter by Expression does the following:

Reads data records from the in port.

If the use_package parameter is false, applies the expression in the select_expr parameter to each record. It routes records as follows, based on how the expression evaluates:

For a non-0 value, Filter by Expression writes the record to the out port.

For 0, Filter by Expression writes the record to the deselect port. If you do not connect a flow to the deselect port, Filter by Expression discards the records.

For NULL, Filter by Expression writes the record to the reject port and a descriptive error message to the error port

If the use_package parameter is true, executes the functions defined in the package

If output_for_error or make_error is defined, executes them whenever an error event occurs. If log_error is defined and logging of rejects is turned on, executes log_error.

Tags::filter by expression component in ab initio,filter by expression in ab initio,gather component in ab initio,rollup component in ab initio,partition component in ab initio,reformat component in ab initio,sort component in ab initio,normalize component in ab initio,scan component in ab initio,join component in ab initio

Reformat Component

Reformat Component in ab initio with example and real time environment
Reformat-component-in-ab-initio

Purpose
Reformat changes the format of records by dropping fields, or by using DML expressions to add fields, combine fields, or transform the data in the records.

Recommendation
Component folding can enhance the performance of this component. If this feature is enabled, the Co>Operating System folds this component by default. See “Component folding” for more information.

Location in the Component Organizer
Transform folder
Runtime behavior of REFORMAT

The component reads records from the in port.

If you specify an expression for the select parameter, the expression filters the records on the in port:

If the expression evaluates to 0 for a particular record, Reformat does not process the record, which means that the record does not appear on any output port.
If the expression produces NULL for any record, Reformat writes a descriptive error message and stops execution of the graph.

If the expression evaluates to anything other than 0 or NULL for a particular record, Reformat processes the record.

If you do not specify an expression for the select parameter, Reformat processes all the records on the in port.

If you specify a value for either output-index or output-indexes, Reformat passes the records to the transform functions, calling the transform function on each port in order, depending on the value of output-index or output-indexes, for each record, beginning with out port 0 and progressing through out portcount – 1.

The evaluation of the transform functions takes place within each partition of a Reformat running in parallel, which means that evaluations of later transform functions can depend on the results of the evaluations of earlier transform functions, such as modification of global variables or use of functions such as next_in_sequence.

If you do not specify a transform function for a particular out port, Reformat uses default record assignment. (For more information, see “Default record assignment”.) You can use default record assignment to eliminate fields from a record format.

Reformat writes the valid records to the out ports.

Dedup Sorted Component

Dedup Sorted Component Explanation with example dedup sorted component in ab initio parameters
Dedup-sorted-component

Purpose
Dedup Sorted separates one specified record in each group of records from the rest of the records in the group.
Requirement
Dedup Sorted requires grouped input.

Recommendation
Component folding can enhance the performance of this component. If this feature is enabled, the Co>Operating System folds this component by default. See “Component folding” for more information.

Location in the Component Organizer
Transform folder
Runtime behavior of DEDUP SORTED with parameters

in the parameters this are the act parameters in dedup sorted component in ab initio if you have software this parameters can fill and lean the dedup sorted component and learn with example

Dedup Sorted does the following:

Reads a grouped flow of records from the in port.
Does one of the following:
Processes groups of records as follows:

Normalize Component

Normalize Component is also one of the multistage components in ab initio
Normalize-component-ab-initio

Purpose
Normalize generates multiple output records from each of its input records. You can directly specify the number of output records for each input record, or you can make the number of output records dependent on a calculation.
In contrast, to consolidate groups of related records into a single record with a vector field for each group — the inverse of NORMALIZE — you would use the accumulation function of the ROLLUP component.

Recommendations

Always clean and validate data before normalizing it. Because Normalize uses a multistage transform, it follows computation rules that may cause unexpected or incorrect results in the presence of dirty data (NULLs or invalid values). Furthermore, the results will be hard to trace, particularly if the reject-threshold parameter is set to Never abort. Several factors — including the data type, the DML expression used to perform the normalization, and the value of the sorted-input parameter — may affect where the problems occur. It is safest to avoid normalizing dirty data.

Component folding can enhance the performance of this component. If this feature is enabled, the Co>Operating System folds this component by default. See “Component folding” for more information.

Location in the Component Organizer
Transform folder

Run time/Real time behavior of NORMALIZE Component in ab initio

Reads the input record.

Performs temporary initialization.

Performs iterations of the normalize transform function. NORMALIZE determines the number of iterations to perform using either the finished or the length function, whichever is defined:

Sends the output record to the out port.

Tags::normalize component in abinitio,ab initio components with examples,ab initio components with explanation,Normalize component in ab initio with example,multistage components in ab initio with detail explanation

Scan Component

Scan Component with real time behavior of SCAN Component
Scan-Component-Ab-initio

Purpose
For every input record, Scan generates an output record that consists of a running cumulative summary for the group to which the input record belongs, up to and including the current record. For example, the output records might include successive year-to-date totals for groups of records.
Recommendations

If you want one summary record for a group, use ROLLUP.

The behavior of SCAN varies in the presence of dirty data (NULLs or invalid values), according to whether you use the aggregation functions for the scan:

Without aggregation functions, you can use SCAN normally.

With aggregation functions, always clean and validate data before scanning it. Because the aggregation functions use a multistage transform, SCAN follows computation rules that may cause unexpected or even incorrect results in the presence of dirty data (NULLs or invalid values). Furthermore, the results will be hard to trace, particularly if the reject-threshold parameter is set to Never abort. Several factors — including the data type, the DML expression used to perform the scan, and the value of the sorted-input parameter — may affect where the problems occur. It is safest to clean and validate the data before using the aggregation functions in SCAN.

Component folding can enhance the performance of this component. If this feature is enabled, the Co>Operating System folds this component by default. See “Component folding” for more information.

Location in the Component Organizer
Transform folder
At runtime, Scan does the following:

Input selection:

Temporary initialization:

Computation:

Finalization:

Output selection:

Tags:: scan component ab initio,scan component in ab initio with example,ab initio components list,ab initio components examples,ab initio Scan component with examples,ab initio components list

ROLLUP Component

ab initio Rollup component with examples and detail explanation
Rollup Component Ab Initio

Purpose
Rollup evaluates a group of input records that have the same key, and then generates records that either summarize each group or select certain information from each group.
Location in the Component Organizer
Transform folder
Recommendations

For new development, use Rollup rather than AGGREGATE. Rollup provides more control over record selection, grouping, and aggregation.

The behavior of ROLLUP varies in the presence of dirty data (NULLs or invalid values), according to whether you use the aggregation functions for the rollup:

Without aggregation functions, you can use ROLLUP normally.

With aggregation functions, always clean and validate data before rolling it up. Because the aggregation functions use a multistage transform, ROLLUP follows computation rules that may cause unexpected or even incorrect results in the presence of dirty data (NULLs or invalid values). Furthermore, the results will be hard to trace, particularly if the reject-threshold parameter is set to Never abort. Several factors — including the data type, the DML expression used to perform the rollup, and the value of the sorted-input parameter — may affect where the problems occur. It is safest to clean and validate the data before using the aggregation functions in ROLLUP.

Component folding can enhance the performance of this component. If this feature is enabled, the Co>Operating System folds this component by default. See “Component folding” for more information.

Then ROLLUP executes the following steps for each group of records:

Temporary initialization.

Computation.

Finalization.

Output selection.

Tags::rollup component in abinitio,scan component in ab initio with example,rollup without key in ab initio,expanded rollup in ab initio,ab initio components pdf,rollup and scan in ab initio,lookup in abinitio,ab initio components examples,roll up scenarios in ab initio

Ab Initio Course Content

Ab Initio Course Content/Ab Initio Course Details
Ab Initio Training,Ab Initio Tutorials,Ab Initio Course Content

Ab Initio is a Latin term meaning "from the beginning".

Abinitio Course Content

Introduction to Ab Initio::

Data warehousing Concepts
Introduction to Ab initio
Ab initio Architecture
Graph Programming
Introduction to .dat and .dml files

Partition Components::

Braod Cast
Partition by Expression
Partition by range
Graph Programming
Partition by community
Partition by percentage
Partition by Round Robin

Departition Components::

Concatenate
Gather
Interleave
Merge
Multifile System (MFS)
Types of parallelism (very important in ab initio)

Layouts
Sort Components
Sort
Sort with in groups
Sample
Partition by key and sort

Transform Components::

Filter by expression
Aggregate
Scan
Rollup
Denormalize Sorted
Normalize
Reformat
Match sorted
Dedup sorted
Working with Databases

Database components
Run SQL
Input Table
Output Table
Truncate Table
Update table

Phase and check Points

Miscellaneous component

Gather logs
Run program
Redefine format
Trash
Replicate
Dataset Components
Input File
Output File
Lookup File
Intermediate File
Compress Components
Compress
Uncompress
Gzip
Gunzip
Validate Component

Check Order
Generate Records
Generate Random bytes
Compare Records
Compute Check Sum
Compare Check Sum

Project and Sandbox
Performance Tuning

Tags:: ab initio tutorial,ab initio training videos,ab initio training material download,abinitio online training,ab initio training material pdf,ab initio training online free,ab initio content online free,ab initio software structure online free,what does ab initio mean

Ab Initio Coach

Types of Partition Components In Ab Initio

Types of parallelism in ab initio

sort component use before this components in ab initio

File Extensions in Abinitio

Runtime Status of graph in abinitio

Difference Between Aggregation And Rollup

ab initio dml sample

Ab Initio Important Questions Frequently Asked by Companies [2016]

Conditional DML With Example Explanation

Important String and Validation Functions in Ab Initio

Reformat Detailed Explanation Ab Initio Component

Partition Components

ab initio/abinitio

Filter By Expression component

Reformat Component

Dedup Sorted Component

Normalize Component

Scan Component

ROLLUP Component

Ab Initio Course Content

Abinitio Course Content

Partition Components::

Labels

Popular Posts