Sunday 15 October 2017

Re-blogging/Sharing : Talend: Setting up database logging for a project

Here is a very good article on how to set log tables for for a project using Talend ETL.

http://diethardsteiner.blogspot.in/2012/02/talend-setting-up-database-logging-for.html





Thank you diethar for wonderful article.


Tuesday 25 July 2017

Scenario: Catching flow metrics from a Job in JETL or Talend

Catching flow metrics from a Job in JETL or Talend 

tMySQLInput : select * from film
tFlowMeter
 tMap
tFlowMeter
 tLogRow
 tFlowMeterCatcher
 tFileOutputDelimited

CSV output



Logging in JETL or Talend ETL

Logging in Talend ETL

Family : Logs & Errors

"Jaspersoft ETL"

Version: 5.5.1
Build id: r118616-20140618-0118

tStatCatcher
Operates as a log function triggered by the StatsCatcher Statistics check box
of individual components, and collects and transfers this log data to the output
defined.

tFlowMeter
The number of rows is then meant to be caught by the tFlowMeterCatcher
for logging purpose.

tFlowMeterCatcher
Operates as a log function triggered by the use of a tFlowMeter component
in the Job.

tLogCatcher:
Operates as a log function triggered by one of the three: Java
exception/PerlDie, tDie or tWarn, to collect and transfer log data.


tLogCatcher :  amclogs
CREATE TABLE `amclogs` (
   `moment` datetime DEFAULT NULL,
   `pid` varchar(20) DEFAULT NULL,
   `root_pid` varchar(20) DEFAULT NULL,
   `father_pid` varchar(20) DEFAULT NULL,
   `project` varchar(50) DEFAULT NULL,
   `job` varchar(255) DEFAULT NULL,
   `context` varchar(50) DEFAULT NULL,
   `priority` int(3) DEFAULT NULL,
   `type` varchar(255) DEFAULT NULL,
   `origin` varchar(255) DEFAULT NULL,
   `message` varchar(255) DEFAULT NULL,
   `code` int(3) DEFAULT NULL
 )

tFlowMeterCatcher: amcmeter
CREATE TABLE `amcmeter` (
   `moment` datetime DEFAULT NULL,
   `pid` varchar(20) DEFAULT NULL,
   `father_pid` varchar(20) DEFAULT NULL,
   `root_pid` varchar(20) DEFAULT NULL,
   `system_pid` bigint(8) DEFAULT NULL,
   `project` varchar(50) DEFAULT NULL,
   `job` varchar(50) DEFAULT NULL,
   `job_repository_id` varchar(255) DEFAULT NULL,
   `job_version` varchar(255) DEFAULT NULL,
   `context` varchar(50) DEFAULT NULL,
   `origin` varchar(255) DEFAULT NULL,
   `label` varchar(255) DEFAULT NULL,
   `count` int(3) DEFAULT NULL,
   `reference` int(3) DEFAULT NULL,
   `thresholds` varchar(255) DEFAULT NULL
 )

tStatCatcher: amcstats
CREATE TABLE `amcstats` (
   `moment` datetime DEFAULT NULL,
   `pid` varchar(20) DEFAULT NULL,
   `father_pid` varchar(20) DEFAULT NULL,
   `root_pid` varchar(20) DEFAULT NULL,
   `system_pid` bigint(8) DEFAULT NULL,
   `project` varchar(50) DEFAULT NULL,
   `job` varchar(50) DEFAULT NULL,
   `job_repository_id` varchar(255) DEFAULT NULL,
   `job_version` varchar(255) DEFAULT NULL,
   `context` varchar(50) DEFAULT NULL,
   `origin` varchar(255) DEFAULT NULL,
   `message_type` varchar(255) DEFAULT NULL,
   `message` varchar(255) DEFAULT NULL,
   `duration` bigint(8) DEFAULT NULL
 )

tFlowMeter: meter
CREATE TABLE `meter` (
   `moment` datetime DEFAULT NULL,
   `pid` varchar(20) DEFAULT NULL,
   `root_pid` varchar(20) DEFAULT NULL,
   `father_pid` varchar(20) DEFAULT NULL,
   `project` varchar(50) DEFAULT NULL,
   `job` varchar(255) DEFAULT NULL,
   `context` varchar(50) DEFAULT NULL,
   `priority` int(3) DEFAULT NULL,
   `type` varchar(255) DEFAULT NULL,
   `origin` varchar(255) DEFAULT NULL,
   `message` varchar(255) DEFAULT NULL,
   `code` int(3) DEFAULT NULL
 )

Difference between tFlowMeter and tFlowMeterCatcher
tFlowMeter : It counts the number of rows 
tFlowMeterCatcher : It catches the number of rows counted by tFlowMeter



-- flowmeter
# moment, pid, root_pid, father_pid, project, job, context, priority, type, origin, message, code
'2017-07-25 15:30:01', 'OuupyP', 'OuupyP', 'OuupyP', 'TALENDETLSCENARIOS', 'test1', 'Default', '4', 'tWarn', 'tWarn_1', 'this is a warning', '42'
'2017-07-25 15:30:01', 'OuupyP', 'OuupyP', 'OuupyP', 'TALENDETLSCENARIOS', 'test1', 'Default', '4', 'tWarn', 'tWarn_1', 'this is a warning', '42'

-- flowmetercatcher
# moment, pid, father_pid, root_pid, system_pid, project, job, job_repository_id, job_version, context, origin, label, count, reference, thresholds
'2017-07-25 15:30:01', 'OuupyP', 'OuupyP', 'OuupyP', '9880', 'TALENDETLSCENARIOS', 'test1', '_sUBX4HEXEeeLF_5AItd_KA', '0.1', 'Default', 'tFlowMeter_6', 'row7', '1', NULL, ''
'2017-07-25 15:30:01', 'OuupyP', 'OuupyP', 'OuupyP', '9880', 'TALENDETLSCENARIOS', 'test1', '_sUBX4HEXEeeLF_5AItd_KA', '0.1', 'Default', 'tFlowMeter_6', 'row7', '1', NULL, ''



Download : Click me


References: 
https://www.rilhia.com/tutorials/using-talend-job-create-amc-database-schema

Tuesday 16 May 2017

MySQL - different joins on multiple same value of field from two tables

create table logic_test(id int);

insert into  logic_test values(1);
insert into  logic_test values(1);

select * from logic_test;
/*
# id
'1'
'1'
*/

create table logic_test2(id int);
insert into  logic_test2 values(1);
insert into  logic_test2 values(1);

insert into  logic_test2 values(1);
insert into  logic_test2 values(1);

insert into  logic_test2 values(1);
insert into  logic_test2 values(1);

select * from logic_test2;
/*
# id
'1'
'1'
'1'
'1'
'1'
'1'
*/

select lt.id AS idLeft, lt2.id idRight from logic_test lt
left join logic_test2 lt2 on lt.id=lt2.id;
-- 12 rows returned
/*
# idLeft, idRight
'1', '1'
'1', '1'
'1', '1'
'1', '1'
'1', '1'
'1', '1'
'1', '1'
'1', '1'
'1', '1'
'1', '1'
'1', '1'
'1', '1'
*/
select lt.id AS idLeft, lt2.id idRight from logic_test lt
right join logic_test2 lt2 on lt.id=lt2.id;
-- 12 rows returned
/*
# idLeft, idRight
'1', '1'
'1', '1'
'1', '1'
'1', '1'
'1', '1'
'1', '1'
'1', '1'
'1', '1'
'1', '1'
'1', '1'
'1', '1'
'1', '1'
*/
select lt.id AS idLeft, lt2.id idRight from logic_test lt
 join logic_test2 lt2 on lt.id=lt2.id ;
 -- 12 rows returned
 /*
 1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
 */
-- There is no full outer join mysql instead use below logic
/* http://stackoverflow.com/questions/4796872/full-outer-join-in-mysql*/
/*
SELECT * FROM t1
LEFT JOIN t2 ON t1.id = t2.id
UNION
SELECT * FROM t1
RIGHT JOIN t2 ON t1.id = t2.id
 */

SELECT * FROM logic_test lt
LEFT JOIN logic_test2 lt2 ON lt.id = lt2.id
UNION
 SELECT * FROM logic_test lt
RIGHT JOIN logic_test2 lt2 ON lt.id = lt2.id


This is FAQ in DWH--BI. 

Monday 24 April 2017

Incorrect String value issue (\xC4 \x90 .... ) while loading data from MySQL source to MySQL target (utf-8, latin1 character set usecases)

In this post, you will learn how to fix special character issue while loading data from source MySQL to destination MySQL table.

You may see below kinda of error during the ETL execution.

Starting job Test at 18:17 24/04/2017.

[statistics] connecting to socket on port 3885
[statistics] connected
Incorrect string value: '\xC4\x90in\xC4\x91.......' for column 'ProjectDetails' at row 56
Incorrect string value: '\xC4\x90in\xC4\x91......' for column 'ProjectDetails' at row 76
Incorrect string value: '\xC4\x90in\xC4\x91.......' for column 'ProjectDetails' at row 98
Incorrect string value: \xC4\x90in\xC4\x91.......' for column 'ProjectDetails' at row 11
[statistics] disconnected
Job Test ended at 18:20 24/04/2017. [exit code=0]


Sample job design : 
source(MySQL input) --> target(MySQL output)

When you try to load text data from mysql source to mysql target you may come across issues as shown in above error message. 

In those cases, setting -Dfile.encoding=utf-8 in advanced settings of Arguments may not solve problem. Also, removing here and keeping it  in Advanced settings of component Additional JDBC parameters also not solves the problem. 

The actual solution worked for me below: 
In component advanced properties : 
For Additional JDBC parameters give value as "useUnicode=true&useOldUTF8Behavior=true"

This will solve the problem and source data will be loaded with out any special character issues. 

NOTE : MySQL default character set is utf-8.

References : 


Hope this helps some one in community :-)



Wednesday 22 March 2017

Connecting to MySQL Sakila DB in Talend (or) tMysqlConnection component example


In this post you can learn how to connect to mysql "sakila" db in Talend Open Studio. 

1) From Palette navigate to Databases -> MySQL and drag and drop "tMysqlConnection" component
    to designer. 
2) Select the component, then in its Components pane provide database access details  as shown in below image. 
3) Add to Message boxes for Success or Failure messages of db connection.  These message boxes should be connected for OK or Error on component execution from tMysqlConnection component. 


Creating a new Project in Talend Open Studio 6.1


1) Launch TOS by clicking TOS_DI-win-x86_64.exe in windows-10/7

2) Below pop-up window will be opened

3) Select "Create a new project" and then click on "Create".

4) The created "Test" project will be available in "Select an existing project" list. Select the project as shown in below image and click on "Finish" button to open the TOS console. 

Talend Designer for jobs : 

This way one can create a new project in Talend Open Studio in Windows-10