1)What is the difference between dbc and cfg? When do you use these two?
.dbc file is Database configuration file. A file with a .dbc extension which provides the GDE with the information it needs to connect to a database. A configuration file contains the following information:
1. The name and version number of the database to which you want to connect
2. The name of the computer on which the database instance or server to which you want to connect runs, or on which the database remote access software is installed
3. The name of the database instance, server, or provider to which you want to connect
.cfg file is Database table configuration file for use with 2.1 database components. The 2.1 database components are deprecated components provided for compatibility with Version 2.1 and lower of the Co>Operating System. They should not be used for new development. See the components' individual help topics for more information.
2) What are the compilation errors you came across while executing your graphs?
3) What is depth_error?
4) During the execution of graph, let us say you lost the network connection, would you have to start the process all over again or does it start from where it stopped?
5) Types of partitions and scenarios.
The divisions of the data and the copies of the program components that create data parallelism are called partitions, and a component partitioned in this way is called a parallel component. If each partition of a parallel program component runs on a separate processor, the increase in the speed of processing is almost directly proportional to the number of partitions.
For example, suppose you wanted to build a graph to sort a file of customer records according to a key. To speed the processing, you could use a PARTITION BY ROUND-ROBIN component to divide the file of unsorted records into three partitions, sending all records with the same key to the same partition. Then three corresponding partitions of a SORT component could sort all three partitions of the data at the same time. Finally, a MERGE component could combine the three sorted partitions into one sorted flow. If each partition of the SORT component ran on a separate processor, the sorting process would take only a third of the time it would take if the processing employed no data parallelism.
Flow partitions
When you divide a component into partitions, you divide the flows that connect to it as well. These divisions are called flow partitions. The three partitions of the SORT component in "Data parallelism example" have three corresponding flow partitions of the input and output flows that connect them to PARTITION BY ROUND-ROBIN and MERGE.
Port partitions
The port to which a partitioned flow connects is partitioned as well, with the same number of port partitions as the flow connected to it. In the diagram in "Data parallelism example", the output port of PARTITION BY ROUND-ROBIN, the input and output ports of SORT, and the input port of MERGE each have three port partitions.
Depth of parallelism
The number of partitions of a component, flow, port, graph, or section of a graph determines its depth of parallelism. In the diagram in "Data parallelism example", from the output port of PARTITION BY ROUND-ROBIN to the input port of MERGE, all parts of the process are parallel and have three partitions, so you can say that section of the graph has a depth of parallelism of three, or is three-ways parallel.
6) What does unused port in join component do?
The JOIN component also includes unused ports; the number of these ports matches the number of input ports. The records that flow out of the unused ports are the records with key values that did not match the key values of records on the other inputs.
7) Define Multi file system. Can you create multifile system on the same server? Also, if you have a table that has Name, Address, Status, Position attributes, can Name and Address be on one partition and Status and Position in the other partition?
An Ab Initio multifile system consists of multiple replications of a directory tree structure containing multidirectories and multifiles — these are the partitions of the multifile system.
All but one of the partitions contains a subset of the data stored in the multifile system; the additional partition contains control information. The partitions containing data are the data partitions of the system, and the additional partition is the control partition. The control partition contains no user data — only the information the Co>Operating System needs to manage the multifile system.
Visualize a directory tree containing subdirectories and files. Now imagine n identical copies of the same tree located on several disks, and number them 0 to n-1 (the Co>Operating System numbers partitions starting at 0). These are the data partitions of the multifile system. Then add one more copy of the tree to serve as the control partition. This is a multifile system (for an example, see "Sample multifile system").
You can place the control and data partitions of a multifile system on any computer that has the Co>Operating System installed on it and to which the run host can connect.
8) What is a sandbox? Did the co-operating system version 2.8 have sandbox, if not how would you store the respective files?
9) How did you do version control? Which tool did you use?
10) How do you troubleshoot performance issues in graph?
11) What are the usual errors that you encounter during ETL process apart from compilation process?
12) Were you involved in production support? What were the different kinds of problems that you encountered?
13) Please give us insight on Enterprise Meta Environment, and some possible questions on that.
14) What are delta table and master table?
15) What error would you get when you use Partition by Round Robin and Join?
Depth not equal
16) In which scenarios would you use Partition by Key and also, Partition by Round Robin and differences between the both?
17) What are the different dimension tables that you used and some columns in the fact table?
18) How do you count the number of records in a flat file?
19) How do you count the number of records in a multifile system without using GDE?
20) What does Scan and Rollup component do and give a scenario where you used them?
21) Did you ever used user defined functions or packages? If yes, give a scenario.
22) What do you have to give the value for the Record Required parameter for a natural join?
23) When do you use Partition by Expression?
24) What is Adhoc File System? Give me a scenario where you used it.
25) What are the different commands that you used when writing wrappers?
26) What do the hidden files in a sandbox represent and what does start.ksh represent?
27) What are different things that you have to consider when loading data into a table?
28) What is difference between Redefine Format and Reformat components?
29) Sometimes you have to use dynamic length strings. Can you give me one circumstance where you need it?
30) If you have a flat file as follows:
20 General Manager Chris
30 Divisional Manager Harry
20 General Manager Mary
30 Divisional Manager Dravid
How do you count the number of records that have 20 in the first column, and likewise for 30.
.dbc file is Database configuration file. A file with a .dbc extension which provides the GDE with the information it needs to connect to a database. A configuration file contains the following information:
1. The name and version number of the database to which you want to connect
2. The name of the computer on which the database instance or server to which you want to connect runs, or on which the database remote access software is installed
3. The name of the database instance, server, or provider to which you want to connect
.cfg file is Database table configuration file for use with 2.1 database components. The 2.1 database components are deprecated components provided for compatibility with Version 2.1 and lower of the Co>Operating System. They should not be used for new development. See the components' individual help topics for more information.
2) What are the compilation errors you came across while executing your graphs?
3) What is depth_error?
4) During the execution of graph, let us say you lost the network connection, would you have to start the process all over again or does it start from where it stopped?
5) Types of partitions and scenarios.
The divisions of the data and the copies of the program components that create data parallelism are called partitions, and a component partitioned in this way is called a parallel component. If each partition of a parallel program component runs on a separate processor, the increase in the speed of processing is almost directly proportional to the number of partitions.
For example, suppose you wanted to build a graph to sort a file of customer records according to a key. To speed the processing, you could use a PARTITION BY ROUND-ROBIN component to divide the file of unsorted records into three partitions, sending all records with the same key to the same partition. Then three corresponding partitions of a SORT component could sort all three partitions of the data at the same time. Finally, a MERGE component could combine the three sorted partitions into one sorted flow. If each partition of the SORT component ran on a separate processor, the sorting process would take only a third of the time it would take if the processing employed no data parallelism.
Flow partitions
When you divide a component into partitions, you divide the flows that connect to it as well. These divisions are called flow partitions. The three partitions of the SORT component in "Data parallelism example" have three corresponding flow partitions of the input and output flows that connect them to PARTITION BY ROUND-ROBIN and MERGE.
Port partitions
The port to which a partitioned flow connects is partitioned as well, with the same number of port partitions as the flow connected to it. In the diagram in "Data parallelism example", the output port of PARTITION BY ROUND-ROBIN, the input and output ports of SORT, and the input port of MERGE each have three port partitions.
Depth of parallelism
The number of partitions of a component, flow, port, graph, or section of a graph determines its depth of parallelism. In the diagram in "Data parallelism example", from the output port of PARTITION BY ROUND-ROBIN to the input port of MERGE, all parts of the process are parallel and have three partitions, so you can say that section of the graph has a depth of parallelism of three, or is three-ways parallel.
6) What does unused port in join component do?
The JOIN component also includes unused ports; the number of these ports matches the number of input ports. The records that flow out of the unused ports are the records with key values that did not match the key values of records on the other inputs.
7) Define Multi file system. Can you create multifile system on the same server? Also, if you have a table that has Name, Address, Status, Position attributes, can Name and Address be on one partition and Status and Position in the other partition?
An Ab Initio multifile system consists of multiple replications of a directory tree structure containing multidirectories and multifiles — these are the partitions of the multifile system.
All but one of the partitions contains a subset of the data stored in the multifile system; the additional partition contains control information. The partitions containing data are the data partitions of the system, and the additional partition is the control partition. The control partition contains no user data — only the information the Co>Operating System needs to manage the multifile system.
Visualize a directory tree containing subdirectories and files. Now imagine n identical copies of the same tree located on several disks, and number them 0 to n-1 (the Co>Operating System numbers partitions starting at 0). These are the data partitions of the multifile system. Then add one more copy of the tree to serve as the control partition. This is a multifile system (for an example, see "Sample multifile system").
You can place the control and data partitions of a multifile system on any computer that has the Co>Operating System installed on it and to which the run host can connect.
8) What is a sandbox? Did the co-operating system version 2.8 have sandbox, if not how would you store the respective files?
9) How did you do version control? Which tool did you use?
10) How do you troubleshoot performance issues in graph?
11) What are the usual errors that you encounter during ETL process apart from compilation process?
12) Were you involved in production support? What were the different kinds of problems that you encountered?
13) Please give us insight on Enterprise Meta Environment, and some possible questions on that.
14) What are delta table and master table?
15) What error would you get when you use Partition by Round Robin and Join?
Depth not equal
16) In which scenarios would you use Partition by Key and also, Partition by Round Robin and differences between the both?
17) What are the different dimension tables that you used and some columns in the fact table?
18) How do you count the number of records in a flat file?
19) How do you count the number of records in a multifile system without using GDE?
20) What does Scan and Rollup component do and give a scenario where you used them?
21) Did you ever used user defined functions or packages? If yes, give a scenario.
22) What do you have to give the value for the Record Required parameter for a natural join?
23) When do you use Partition by Expression?
24) What is Adhoc File System? Give me a scenario where you used it.
25) What are the different commands that you used when writing wrappers?
26) What do the hidden files in a sandbox represent and what does start.ksh represent?
27) What are different things that you have to consider when loading data into a table?
28) What is difference between Redefine Format and Reformat components?
29) Sometimes you have to use dynamic length strings. Can you give me one circumstance where you need it?
30) If you have a flat file as follows:
20 General Manager Chris
30 Divisional Manager Harry
20 General Manager Mary
30 Divisional Manager Dravid
How do you count the number of records that have 20 in the first column, and likewise for 30.
can anyone explain me multi files and 4-way to 8-way conversion
ReplyDelete