String Manipulation Advanced Using Teradata 14.0 Regular Expression

Today, I’ll show couple of very useful functions or logic implemented in Teradata using It’s Regular Expression.

There is two very popular demand comes from most of the Developer across different databases regarding the following two cases –

1. How to Split Comma Separated Values in each rows 

2. How to bind separate values in 1 row (Just opposite of Step 1)

2nd Options are very demanding as Cross platform database professional specially Oracle Developers looking for these kind of implementation as Oracle has directly built-in functions to do the same. Those functions are Listagg, wm_concat, group_concat.

Let’s check the solution –

Case 1,

Let’s create the table & prepare some data –

 

1
2
3
4
5
6
7
CREATE MULTISET TABLE ETL_DATA.PARSE_STR
  (
     SEQ_NO       INTEGER,
     SRC_STR     VARCHAR(70)
  );
 
CREATE TABLE completed. 0 rows processed. Elapsed Time =  00:00:01.864

 

Let’s insert some data –

 

1
2
3
INSERT INTO ETL_DATA.PARSE_STR VALUES(1,'RAM,TRIDIB,ANUPAM,BIRESWAR,SUJAY')
;INSERT INTO ETL_DATA.PARSE_STR VALUES(2,'TUNKAI,SAYAN,BABU,PAPU')
;INSERT INTO ETL_DATA.PARSE_STR VALUES(3,'IK,ATBIS,SAPMUNDA');

 

Let’s check the value –

 

1
2
3
4
5
SEQ_NO          SRC_STR
------  ----------------------------------
    1   RAM,TRIDIB,ANUPAM,BIRESWAR,SUJAY
    2   TUNKAI,SAYAN,BABU,PAPU
    3   IK,ATBIS,SAPMUNDA

 

Fine, Now our objective will be split these comma separated values in each lines.

 

1
2
3
4
5
6
SELECT b.SEQ_NO,
       regexp_substr(b.SRC_STR,'[^,]+',1,day_of_calendar) AS SRC_STR
FROM sys_calendar.calendar ,
     PARSE_STR b
WHERE day_of_calendar BETWEEN 1 AND  (LENGTH(b.SRC_STR) - LENGTH(regexp_replace(b.SRC_STR,'[^A-Z]+','',1,0,'i'))+1 )
ORDER BY 1,2;

 

And, let’s check the output –

 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SEQ_NO  SRC_STR
-----   ----------------------
1       ANUPAM
1       BIRESWAR
1       RAM
1       SUJAY
1       TRIDIB
2       BABU
2       PAPU
2       SAYAN
2       TUNKAI
3       ATBIS
3       IK
3       SAPMUNDA

 

Gr8! I guess, result is coming as per my expectation. 🙂

 

Case 2(Subsitute Of Listagg, wm_concat, group_concat in Oracle),

This we’ve to do it in Two small Steps for better understanding & performance.

First, let us create another table –

 

1
2
3
4
5
6
7
CREATE MULTISET TABLE ETL_DATA.WM_CONCAT_TAB
   (
      SEQ_NO   INTEGER,
      SRC_STR VARCHAR(20)
   );
    
CREATE TABLE completed. 0 rows processed. Elapsed Time =  00:00:01.230

 

Good. Now we’ll populate some data into this table. We’ll populate data from Step 1 as this will provide the exact data that we’re expecting as input test data for Case 2.

Let’s insert those data –

 

1
2
3
4
5
6
INSERT INTO ETL_DATA.WM_CONCAT_TAB
SELECT b.SEQ_NO,
       regexp_substr(b.SRC_STR,'[^,]+',1,day_of_calendar) AS SRC_STR
FROM sys_calendar.calendar ,
     PARSE_STR b
WHERE day_of_calendar BETWEEN 1 AND  (LENGTH(b.SRC_STR) - LENGTH(regexp_replace(b.SRC_STR,'[^A-Z]+','',1,0,'i'))+1 );

 

Let’s check the data –

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SEQ_NO  SRC_STR
------  --------------------
1       ANUPAM
1       BIRESWAR
1       RAM
1       SUJAY
1       TRIDIB
2       BABU
2       PAPU
2       SAYAN
2       TUNKAI
3       ATBIS
3       IK
3       SAPMUNDA

 

As you know in TD we’ve significant restcriction regarding Hirarchical Queries & Recursive Queries. So, In this step we’ll build one relationship like employee & manager in popular employee table. So, if we have that kind of relation then we can easily establish & fit that in TD model.

Let’s create this intermediate table. In this case we’ll go for mapping between current rows with next rows. This is also very useful process. In Oracle, they have LEAD or LAG functions to achieve the same. But, here we’ve to work a little bit more to achive the same.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
CREATE MULTISET VOLATILE TABLE VT_SRC_ARRNG
AS
     (
            SELECT SEQ_NO,
                   SRC_STR,
                   MAX(SRC_STR) OVER(
                                        PARTITION BY SEQ_NO
                                        ORDER BY SEQ_NO, SRC_STR
                                        ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING 
                                    ) AS PREV_SRC_STR,
                   COUNT(*)  OVER(
                                    PARTITION BY SEQ_NO
                                 ) AS MAX_RECUR_CNT
            FROM WM_CONCAT_TAB
      )
WITH DATA
ON COMMIT
PRESERVE ROWS;
 
CREATE TABLE completed. 0 rows processed. Elapsed Time =  00:00:01.102

 

Let’s look the output –

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
SELECT *
FROM VT_SRC_ARRNG
ORDER BY 1,2;
 
 
 
 
SEQ_NO  SRC_STR  PREV_SRC_STR    MAX_RECUR_CNT
-----   -------  --------------- ---------------------
1       ANUPAM      BIRESWAR     5
1       BIRESWAR    RAM          5
1       RAM         SUJAY        5
1       SUJAY       TRIDIB       5
1       TRIDIB      ?            5
2       BABU        PAPU         4
2       PAPU        SAYAN        4
2       SAYAN       TUNKAI       4
2       TUNKAI      ?            4
3       ATBIS       IK           3
3       IK          SAPMUNDA     3
3       SAPMUNDA    ?            3

 

Fine. From the above VT we can see every Source String has one Previous Source String. Also, we’ve noted down that in each window of SEQ_NO how many levels are there by MAX_RECUR_CNT. We’ll use this column later.

Let’s move to the 2nd & final part –

Let’s aggregate the values based on SEQ_NO & club them with comma –

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
WITH RECURSIVE WM_CONCAT(SEQ_NO, SRC_STR, PREV_SRC_STR, MAX_RECUR_CNT, LVL,  COMMA_SEP_STR)
AS
     (
        SELECT SEQ_NO,
               SRC_STR,
               PREV_SRC_STR,
               MAX_RECUR_CNT,
               1 AS LVL,
               CAST( '' AS VARCHAR(100)) AS COMMA_SEP_STR
       FROM VT_SRC_ARRNG
       WHERE  PREV_SRC_STR IS NULL
       UNION ALL
       SELECT  b.SEQ_NO,
               b.SRC_STR,
               b.PREV_SRC_STR,
               b.MAX_RECUR_CNT,
               c.LVL+1 AS LVL,
               c.COMMA_SEP_STR||b.SRC_STR||',' AS COMMA_SEP_STR
       FROM VT_SRC_ARRNG b,
               WM_CONCAT c
       WHERE c.SRC_STR =  b.PREV_SRC_STR
     )
SELECT k.SEQ_NO,
       k.AGGR_STR
FROM (               
    SELECT SEQ_NO,
           SRC_STR,
           LVL,
           MAX_RECUR_CNT,
           MIN(CASE
                 WHEN LVL = 1 THEN
                    SRC_STR
               ELSE
                  'ZZZZZ'
               END   ) OVER(
                                 PARTITION BY SEQ_NO
                                 ORDER BY LVL ASC
                           ) ROOT_SRC_STR,
           COMMA_SEP_STR||ROOT_SRC_STR AS AGGR_STR
    FROM WM_CONCAT
    )  k
WHERE k.LVL = k.MAX_RECUR_CNT
ORDER BY 1,2;

 

Let’s check the output –

1
2
3
4
5
SEQ_NO  AGGR_STR
------- ---------------------------
1       SUJAY,RAM,BIRESWAR,ANUPAM,TRIDIB
2       SAYAN,PAPU,BABU,TUNKAI
3       IK,ATBIS,SAPMUNDA

 

I guess, We’ve done it. 😀

So, You can achieve the same without writing any UDF.

 

Magic SQL

Few years before in OTN one of the user is looking for a solutions, which we think might not possible to provide in a single SQL solution. At that time Michael came to rescue that and for the first time he showed some interesting XML Kung-Fu to all of us and earned a great reputation for providing magic solution to others. I personally love to call them as Magic SQL.

The following SQL calculates number of rows in each table in a specific schema without visiting any dba views. This particular script is my 2nd personal favourite.

scott@ORCL>
scott@ORCL>select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
PL/SQL Release 11.1.0.6.0 - Production
CORE 11.1.0.6.0 Production
TNS for 32-bit Windows: Version 11.1.0.6.0 - Production
NLSRTL Version 11.1.0.6.0 - Production

Elapsed: 00:00:00.01
scott@ORCL>
scott@ORCL>
scott@ORCL>select table_name,
2 DBMS_XMLGEN.getxmltype(
3 'select count(*) c from '||table_name
4 ).extract('//text()').getnumberval() tot_rows
5 from user_tables
6 where iot_type is null
7 or iot_type != 'IOT_OVERFLOW';

TABLE_NAME TOT_ROWS
------------------------------ ----------
DEPT 4
EMP 14
BONUS 0
SALGRADE 5
EMP_DETAILS 3
T_EMP 0
AUDIT_T 0
C_C 4
TRAIN_DET 2
TEST_COM_IMP 2
TIME_P 1

TABLE_NAME TOT_ROWS
------------------------------ ----------
PRI_UQ 4
TEST_CHK 0
ANSHUMANSAHAISGENIOUS 1
XEUROPE 2
D_D 8
PUBLICTRANSPORT 4
XASIA 2
TF1 0
TEST_HH 14
TEST_SWAP 4
XGMT 1

TABLE_NAME TOT_ROWS
------------------------------ ----------
CUSTOMER_DET 1
FOURWHEELER 2
SPOOL_LOG 13
CITYTRANSPORT 8
T1 2
T2 2
A_A 1
B_B 2
AUTOMOBILE 1
XDUAL 1
S_TEMP 0

33 rows selected.

Elapsed: 00:00:00.42
scott@ORCL>

But, this particular script has some limitations on Index-Organized-Table. But, overall this will work brilliantly without even touching all the dba views.

Grouped information in comma-separated values.

scott@ORCL>with tt
2 as
3 (
4 select 1 id, 'saty' name from dual
5 union all
6 select 1, 'anup' from dual
7 union all
8 select 1, 'sas' from dual
9 union all
10 select 2, 'rajib' from dual
11 union all
12 select 2, 'amit' from dual
13 )
14 select id,
15 cast(wmsys.wm_concat(name) as varchar2(100)) src
16 from tt
17 group by id;

ID SRC
---------- ------------------------------------------------------
1 saty,anup,sas
2 rajib,amit

Elapsed: 00:00:01.62
scott@ORCL>
scott@ORCL>

Function wm_concat is undocumented. So, you cannot use it in production environment. Even if you use – you won’t get any technical support from oracle if you have any production issue due to this. So, better not to use this function. But, certainly this reduces lots of our effort and provides a solution using single SQL. I’m still waiting to see it as documented function by Oracle. Till then, you have to go for your custom tailored solution.

The following sqls parse comma-separated values into rows. But, this cannot be applicable into any column of existing tables.

scott@ORCL>
scott@ORCL>SELECT cast(column_value as varchar2(40)) res
2 FROM TABLE(SYS.dbms_debug_vc2coll (24, 34, 25));

RES
----------------------------------------
24
34
25

Elapsed: 00:00:00.03
scott@ORCL>
scott@ORCL>
scott@ORCL>SELECT cast(column_value as varchar2(40)) res
2 FROM TABLE(SYS.dbms_debug_vc2coll ('A', 'B', 'C'));

RES
----------------------------------------
A
B
C

Elapsed: 00:00:00.05
scott@ORCL>
scott@ORCL>SELECT cast(column_value as varchar2(40)) res
2 FROM TABLE(SYS.dbms_debug_vc2coll (24, 'B', '@'));

RES
----------------------------------------
24
B
@

Elapsed: 00:00:00.14
scott@ORCL>

This can handle alpha numeric data type without declaring any custom type for it.

Following SQL is which i prefer most as one of the brilliant features introduced in Oracle and can surprise many developer. Write a select query to retrieve any column information without using the select clause. I know – it sounds crazy. But, you really can do that.

That is what i consider is the leading contender of my Magic SQL category.

scott@ORCL>
scott@ORCL>
scott@ORCL>xquery for $i in ora:view("emp")/ROW/ENAME return $i/text()
2 /

Result Sequence
----------------------------------------------------------------------------
SMITH
ALLEN
WARD
JONES
MARTIN
BLAKE
CLARK
SCOTT
KING
TURNER
ADAMS

Result Sequence
----------------------------------------------------------------------------
JAMES
FORD
MILLER

14 item(s) selected.

Elapsed: 00:00:00.14
scott@ORCL>
scott@ORCL>

Hope you liked this edition.

Regards.