Today, I’ll show couple of very useful functions or logic implemented in Teradata using It’s Regular Expression.
There is two very popular demand comes from most of the Developer across different databases regarding the following two cases –
1. How to Split Comma Separated Values in each rows
2. How to bind separate values in 1 row (Just opposite of Step 1)
2nd Options are very demanding as Cross platform database professional specially Oracle Developers looking for these kind of implementation as Oracle has directly built-in functions to do the same. Those functions are Listagg, wm_concat, group_concat.
Let’s check the solution –
Case 1,
Let’s create the table & prepare some data –
1
2
3
4
5
6
7
|
CREATE MULTISET TABLE ETL_DATA.PARSE_STR
(
SEQ_NO INTEGER,
SRC_STR VARCHAR(70)
);
CREATE TABLE completed. 0 rows processed. Elapsed Time = 00:00:01.864
|
Let’s insert some data –
1
2
3
|
INSERT INTO ETL_DATA.PARSE_STR VALUES(1,'RAM,TRIDIB,ANUPAM,BIRESWAR,SUJAY')
;INSERT INTO ETL_DATA.PARSE_STR VALUES(2,'TUNKAI,SAYAN,BABU,PAPU')
;INSERT INTO ETL_DATA.PARSE_STR VALUES(3,'IK,ATBIS,SAPMUNDA');
|
Let’s check the value –
1
2
3
4
5
|
SEQ_NO SRC_STR
------ ----------------------------------
1 RAM,TRIDIB,ANUPAM,BIRESWAR,SUJAY
2 TUNKAI,SAYAN,BABU,PAPU
3 IK,ATBIS,SAPMUNDA
|
Fine, Now our objective will be split these comma separated values in each lines.
1
2
3
4
5
6
|
SELECT b.SEQ_NO,
regexp_substr(b.SRC_STR,'[^,]+',1,day_of_calendar) AS SRC_STR
FROM sys_calendar.calendar ,
PARSE_STR b
WHERE day_of_calendar BETWEEN 1 AND (LENGTH(b.SRC_STR) - LENGTH(regexp_replace(b.SRC_STR,'[^A-Z]+','',1,0,'i'))+1 )
ORDER BY 1,2;
|
And, let’s check the output –
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
SEQ_NO SRC_STR
----- ----------------------
1 ANUPAM
1 BIRESWAR
1 RAM
1 SUJAY
1 TRIDIB
2 BABU
2 PAPU
2 SAYAN
2 TUNKAI
3 ATBIS
3 IK
3 SAPMUNDA
|
Gr8! I guess, result is coming as per my expectation. 🙂
Case 2(Subsitute Of Listagg, wm_concat, group_concat in Oracle),
This we’ve to do it in Two small Steps for better understanding & performance.
First, let us create another table –
1
2
3
4
5
6
7
|
CREATE MULTISET TABLE ETL_DATA.WM_CONCAT_TAB
(
SEQ_NO INTEGER,
SRC_STR VARCHAR(20)
);
CREATE TABLE completed. 0 rows processed. Elapsed Time = 00:00:01.230
|
Good. Now we’ll populate some data into this table. We’ll populate data from Step 1 as this will provide the exact data that we’re expecting as input test data for Case 2.
Let’s insert those data –
1
2
3
4
5
6
|
INSERT INTO ETL_DATA.WM_CONCAT_TAB
SELECT b.SEQ_NO,
regexp_substr(b.SRC_STR,'[^,]+',1,day_of_calendar) AS SRC_STR
FROM sys_calendar.calendar ,
PARSE_STR b
WHERE day_of_calendar BETWEEN 1 AND (LENGTH(b.SRC_STR) - LENGTH(regexp_replace(b.SRC_STR,'[^A-Z]+','',1,0,'i'))+1 );
|
Let’s check the data –
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
SEQ_NO SRC_STR
------ --------------------
1 ANUPAM
1 BIRESWAR
1 RAM
1 SUJAY
1 TRIDIB
2 BABU
2 PAPU
2 SAYAN
2 TUNKAI
3 ATBIS
3 IK
3 SAPMUNDA
|
As you know in TD we’ve significant restcriction regarding Hirarchical Queries & Recursive Queries. So, In this step we’ll build one relationship like employee & manager in popular employee table. So, if we have that kind of relation then we can easily establish & fit that in TD model.
Let’s create this intermediate table. In this case we’ll go for mapping between current rows with next rows. This is also very useful process. In Oracle, they have LEAD or LAG functions to achieve the same. But, here we’ve to work a little bit more to achive the same.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
CREATE MULTISET VOLATILE TABLE VT_SRC_ARRNG
AS
(
SELECT SEQ_NO,
SRC_STR,
MAX(SRC_STR) OVER(
PARTITION BY SEQ_NO
ORDER BY SEQ_NO, SRC_STR
ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING
) AS PREV_SRC_STR,
COUNT(*) OVER(
PARTITION BY SEQ_NO
) AS MAX_RECUR_CNT
FROM WM_CONCAT_TAB
)
WITH DATA
ON COMMIT
PRESERVE ROWS;
CREATE TABLE completed. 0 rows processed. Elapsed Time = 00:00:01.102
|
Let’s look the output –
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
SELECT *
FROM VT_SRC_ARRNG
ORDER BY 1,2;
SEQ_NO SRC_STR PREV_SRC_STR MAX_RECUR_CNT
----- ------- --------------- ---------------------
1 ANUPAM BIRESWAR 5
1 BIRESWAR RAM 5
1 RAM SUJAY 5
1 SUJAY TRIDIB 5
1 TRIDIB ? 5
2 BABU PAPU 4
2 PAPU SAYAN 4
2 SAYAN TUNKAI 4
2 TUNKAI ? 4
3 ATBIS IK 3
3 IK SAPMUNDA 3
3 SAPMUNDA ? 3
|
Fine. From the above VT we can see every Source String has one Previous Source String. Also, we’ve noted down that in each window of SEQ_NO how many levels are there by MAX_RECUR_CNT. We’ll use this column later.
Let’s move to the 2nd & final part –
Let’s aggregate the values based on SEQ_NO & club them with comma –
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
|
WITH RECURSIVE WM_CONCAT(SEQ_NO, SRC_STR, PREV_SRC_STR, MAX_RECUR_CNT, LVL, COMMA_SEP_STR)
AS
(
SELECT SEQ_NO,
SRC_STR,
PREV_SRC_STR,
MAX_RECUR_CNT,
1 AS LVL,
CAST( '' AS VARCHAR(100)) AS COMMA_SEP_STR
FROM VT_SRC_ARRNG
WHERE PREV_SRC_STR IS NULL
UNION ALL
SELECT b.SEQ_NO,
b.SRC_STR,
b.PREV_SRC_STR,
b.MAX_RECUR_CNT,
c.LVL+1 AS LVL,
c.COMMA_SEP_STR||b.SRC_STR||',' AS COMMA_SEP_STR
FROM VT_SRC_ARRNG b,
WM_CONCAT c
WHERE c.SRC_STR = b.PREV_SRC_STR
)
SELECT k.SEQ_NO,
k.AGGR_STR
FROM (
SELECT SEQ_NO,
SRC_STR,
LVL,
MAX_RECUR_CNT,
MIN(CASE
WHEN LVL = 1 THEN
SRC_STR
ELSE
'ZZZZZ'
END ) OVER(
PARTITION BY SEQ_NO
ORDER BY LVL ASC
) ROOT_SRC_STR,
COMMA_SEP_STR||ROOT_SRC_STR AS AGGR_STR
FROM WM_CONCAT
) k
WHERE k.LVL = k.MAX_RECUR_CNT
ORDER BY 1,2;
|
Let’s check the output –
1
2
3
4
5
|
SEQ_NO AGGR_STR
------- ---------------------------
1 SUJAY,RAM,BIRESWAR,ANUPAM,TRIDIB
2 SAYAN,PAPU,BABU,TUNKAI
3 IK,ATBIS,SAPMUNDA
|
I guess, We’ve done it. 😀
So, You can achieve the same without writing any UDF.
Like this:
Like Loading...
You must be logged in to post a comment.