Password Validation Using Regular Expression In Teradata 14 & 15

Today, we’ll be checking one new area where we can implement regular expression to achieve the password validation without involving any kind of Macro, Stored-Proc.

 

Let’s consider the following conditions to be implemented –

 

1. Password should contain characters between 6 & 10.

2. One character should be digit.

3. One character should be in upper case letter.

4. There should be at least one special character.

 

Let’s check the Query & Output –

 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
select seq_no,
       passwd,
       regexp_similar(passwd,'^(?=^([[:graph:]]{6,10})$)(?=.*([[:upper:]]{1,}))(?=.*([[:digit:]]{1,})).*$') as reg_test
from scott.login_det
order by 1;


SEQ_NO	PASSWD	 REG_TEST
-----   -------  --------------
1	hoti         0
2	hotimla	     0
3	hotImla	     0
4	hot@imla     0
5	hoT@imla     0
6	hoT@iml9a    1
7	hoT@iml9a66  0

 

Similarly, you can add condition of lower case character if you want to make it more complex.

 

Hope, this will give you another way – to implement the same logic. 🙂

String Manipulation Advanced Using Teradata 14.0 Regular Expression

Today, I’ll show couple of very useful functions or logic implemented in Teradata using It’s Regular Expression.

There is two very popular demand comes from most of the Developer across different databases regarding the following two cases –

1. How to Split Comma Separated Values in each rows 

2. How to bind separate values in 1 row (Just opposite of Step 1)

2nd Options are very demanding as Cross platform database professional specially Oracle Developers looking for these kind of implementation as Oracle has directly built-in functions to do the same. Those functions are Listagg, wm_concat, group_concat.

Let’s check the solution –

Case 1,

Let’s create the table & prepare some data –

 

1
2
3
4
5
6
7
CREATE MULTISET TABLE ETL_DATA.PARSE_STR
  (
     SEQ_NO       INTEGER,
     SRC_STR     VARCHAR(70)
  );
 
CREATE TABLE completed. 0 rows processed. Elapsed Time =  00:00:01.864

 

Let’s insert some data –

 

1
2
3
INSERT INTO ETL_DATA.PARSE_STR VALUES(1,'RAM,TRIDIB,ANUPAM,BIRESWAR,SUJAY')
;INSERT INTO ETL_DATA.PARSE_STR VALUES(2,'TUNKAI,SAYAN,BABU,PAPU')
;INSERT INTO ETL_DATA.PARSE_STR VALUES(3,'IK,ATBIS,SAPMUNDA');

 

Let’s check the value –

 

1
2
3
4
5
SEQ_NO          SRC_STR
------  ----------------------------------
    1   RAM,TRIDIB,ANUPAM,BIRESWAR,SUJAY
    2   TUNKAI,SAYAN,BABU,PAPU
    3   IK,ATBIS,SAPMUNDA

 

Fine, Now our objective will be split these comma separated values in each lines.

 

1
2
3
4
5
6
SELECT b.SEQ_NO,
       regexp_substr(b.SRC_STR,'[^,]+',1,day_of_calendar) AS SRC_STR
FROM sys_calendar.calendar ,
     PARSE_STR b
WHERE day_of_calendar BETWEEN 1 AND  (LENGTH(b.SRC_STR) - LENGTH(regexp_replace(b.SRC_STR,'[^A-Z]+','',1,0,'i'))+1 )
ORDER BY 1,2;

 

And, let’s check the output –

 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SEQ_NO  SRC_STR
-----   ----------------------
1       ANUPAM
1       BIRESWAR
1       RAM
1       SUJAY
1       TRIDIB
2       BABU
2       PAPU
2       SAYAN
2       TUNKAI
3       ATBIS
3       IK
3       SAPMUNDA

 

Gr8! I guess, result is coming as per my expectation. 🙂

 

Case 2(Subsitute Of Listagg, wm_concat, group_concat in Oracle),

This we’ve to do it in Two small Steps for better understanding & performance.

First, let us create another table –

 

1
2
3
4
5
6
7
CREATE MULTISET TABLE ETL_DATA.WM_CONCAT_TAB
   (
      SEQ_NO   INTEGER,
      SRC_STR VARCHAR(20)
   );
    
CREATE TABLE completed. 0 rows processed. Elapsed Time =  00:00:01.230

 

Good. Now we’ll populate some data into this table. We’ll populate data from Step 1 as this will provide the exact data that we’re expecting as input test data for Case 2.

Let’s insert those data –

 

1
2
3
4
5
6
INSERT INTO ETL_DATA.WM_CONCAT_TAB
SELECT b.SEQ_NO,
       regexp_substr(b.SRC_STR,'[^,]+',1,day_of_calendar) AS SRC_STR
FROM sys_calendar.calendar ,
     PARSE_STR b
WHERE day_of_calendar BETWEEN 1 AND  (LENGTH(b.SRC_STR) - LENGTH(regexp_replace(b.SRC_STR,'[^A-Z]+','',1,0,'i'))+1 );

 

Let’s check the data –

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SEQ_NO  SRC_STR
------  --------------------
1       ANUPAM
1       BIRESWAR
1       RAM
1       SUJAY
1       TRIDIB
2       BABU
2       PAPU
2       SAYAN
2       TUNKAI
3       ATBIS
3       IK
3       SAPMUNDA

 

As you know in TD we’ve significant restcriction regarding Hirarchical Queries & Recursive Queries. So, In this step we’ll build one relationship like employee & manager in popular employee table. So, if we have that kind of relation then we can easily establish & fit that in TD model.

Let’s create this intermediate table. In this case we’ll go for mapping between current rows with next rows. This is also very useful process. In Oracle, they have LEAD or LAG functions to achieve the same. But, here we’ve to work a little bit more to achive the same.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
CREATE MULTISET VOLATILE TABLE VT_SRC_ARRNG
AS
     (
            SELECT SEQ_NO,
                   SRC_STR,
                   MAX(SRC_STR) OVER(
                                        PARTITION BY SEQ_NO
                                        ORDER BY SEQ_NO, SRC_STR
                                        ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING 
                                    ) AS PREV_SRC_STR,
                   COUNT(*)  OVER(
                                    PARTITION BY SEQ_NO
                                 ) AS MAX_RECUR_CNT
            FROM WM_CONCAT_TAB
      )
WITH DATA
ON COMMIT
PRESERVE ROWS;
 
CREATE TABLE completed. 0 rows processed. Elapsed Time =  00:00:01.102

 

Let’s look the output –

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
SELECT *
FROM VT_SRC_ARRNG
ORDER BY 1,2;
 
 
 
 
SEQ_NO  SRC_STR  PREV_SRC_STR    MAX_RECUR_CNT
-----   -------  --------------- ---------------------
1       ANUPAM      BIRESWAR     5
1       BIRESWAR    RAM          5
1       RAM         SUJAY        5
1       SUJAY       TRIDIB       5
1       TRIDIB      ?            5
2       BABU        PAPU         4
2       PAPU        SAYAN        4
2       SAYAN       TUNKAI       4
2       TUNKAI      ?            4
3       ATBIS       IK           3
3       IK          SAPMUNDA     3
3       SAPMUNDA    ?            3

 

Fine. From the above VT we can see every Source String has one Previous Source String. Also, we’ve noted down that in each window of SEQ_NO how many levels are there by MAX_RECUR_CNT. We’ll use this column later.

Let’s move to the 2nd & final part –

Let’s aggregate the values based on SEQ_NO & club them with comma –

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
WITH RECURSIVE WM_CONCAT(SEQ_NO, SRC_STR, PREV_SRC_STR, MAX_RECUR_CNT, LVL,  COMMA_SEP_STR)
AS
     (
        SELECT SEQ_NO,
               SRC_STR,
               PREV_SRC_STR,
               MAX_RECUR_CNT,
               1 AS LVL,
               CAST( '' AS VARCHAR(100)) AS COMMA_SEP_STR
       FROM VT_SRC_ARRNG
       WHERE  PREV_SRC_STR IS NULL
       UNION ALL
       SELECT  b.SEQ_NO,
               b.SRC_STR,
               b.PREV_SRC_STR,
               b.MAX_RECUR_CNT,
               c.LVL+1 AS LVL,
               c.COMMA_SEP_STR||b.SRC_STR||',' AS COMMA_SEP_STR
       FROM VT_SRC_ARRNG b,
               WM_CONCAT c
       WHERE c.SRC_STR =  b.PREV_SRC_STR
     )
SELECT k.SEQ_NO,
       k.AGGR_STR
FROM (               
    SELECT SEQ_NO,
           SRC_STR,
           LVL,
           MAX_RECUR_CNT,
           MIN(CASE
                 WHEN LVL = 1 THEN
                    SRC_STR
               ELSE
                  'ZZZZZ'
               END   ) OVER(
                                 PARTITION BY SEQ_NO
                                 ORDER BY LVL ASC
                           ) ROOT_SRC_STR,
           COMMA_SEP_STR||ROOT_SRC_STR AS AGGR_STR
    FROM WM_CONCAT
    )  k
WHERE k.LVL = k.MAX_RECUR_CNT
ORDER BY 1,2;

 

Let’s check the output –

1
2
3
4
5
SEQ_NO  AGGR_STR
------- ---------------------------
1       SUJAY,RAM,BIRESWAR,ANUPAM,TRIDIB
2       SAYAN,PAPU,BABU,TUNKAI
3       IK,ATBIS,SAPMUNDA

 

I guess, We’ve done it. 😀

So, You can achieve the same without writing any UDF.

 

Regular Expression on Teradata 14.0

I’ve been working for more than 8 years in Oracle 10g, 11g & worked significant queries on Regular expressions in various scenario using SQL. It is real handy if you know how to use it & can reduce lots of pain with single SQL. And, the performance will be better compared to the total effort to achieve the same functionalists by using multiple SQL queries or PL/SQL Procedures.

Last couple of years, I’m working on Teradata. And, on some occasion – I was expecting features like these, where I can easily manipulate data with regular expression. I’m pretty excited when I heard that Teradata also introduced Regular Expression from Version 14.0.


As a result, I tried all those features that I think can be handy & useful for various scenarios & followings are the successful queries that I get. There are two occasion, where Teradata partially able to manipulate those strings. I’ve checked the latest Teradata Manual. However, unable to find those solution. So, I’m expecting other forum members can contribute here in order to make this thread useful for every one of us. And, I’ll post here as soon as I get some answers on these partial conversions.

For better understanding, I’ve provided the actual column value & after transformation value of that column in the output. That will help us to grasp it easily – I guess. 🙂


Case 1,

1
2
3
4
5
SELECT regexp_replace('SatyakiDe','([[:lower:]]{1,})([[:upper:]]{1,})','\1 \2') AS COL_VAL;

COLA COL_VAL
---------------- ----------------------------------------
SatyakiDe Satyaki De


Case 2,

1
2
3
4
5
select regexp_replace('919047242526','^([[:digit:]]{2})([[:digit:]]{10})','+\1 \2') COL_VAL;

COLA COL_VAL
------------ ---------------
919047255555 +91 9047255555



Case 3,

1
2
3
4
5
select regexp_replace('+++C','^([[:punct:]]{2})([[:punct:]]{1})(.*)$','\1\3') COL_VAL;

COLA COL_VAL
---- -----
+++C ++C



Case 4,

1
2
3
4
5
select initcap(regexp_replace(regexp_substr(' satyaki.de@mail.com','[^@]+'),'(.*)(\.)(.*)','\1 \3')) COL_VAL;

COLA COL_VAL
-------------------------------- --------------------------------------------------
satyaki.de@mail.com Satyaki De



Case 5,

1
2
3
4
5
select regexp_replace('100011001','([[:digit:]]{3})([[:digit:]]{2})([[:digit:]]{4})','XXX-XX-\3') as COL_VAL;

COLA COL_VAL
---------------- --------------------
100011001 XXX-XX-1001



Case 6,

1
2
3
4
5
select regexp_replace('123456789','([[:digit:]]{3})([[:digit:]]{3})([[:digit:]]{3})','\3.\2.\1') as COL_VAL;

COLA COL_VAL
--------- ---------------
123456789 789.456.123



Case 7,

1
2
3
4
5
SELECT regexp_replace('satyaki9de0loves3to8work2on2sql0and2bi6tools1','[^0-9]+','',1,0,'i') AS DER_VAL;

COLA DER_VAL
--------------------------------------------- ----------
satyaki1de0loves3to8work2on2sql0and2bi4tools1 1038220241




As you can see, all the characters have filtered out from the string & only numbers are kept here. These sorts of queries are very useful in lots of different business scenarios as well.

So, any extra space may not produce desired result. And, needs to pay attention into these small details. 

And, I’ve tested all these queries in the following two versions –

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
select * from dbcinfo;

InfoKey InfoData
-------- ------------------------
1 VERSION 14.10.00.02
2 RELEASE 14.10.00.02
3 LANGUAGE SUPPORT MODE Standard


select * from dbcinfo;

InfoKey InfoData
-------- ------------------------
1 VERSION 14.10.01.05
2 RELEASE 14.10.01.04
3 LANGUAGE SUPPORT MODE Standard



Hope, this will give you much more clarity. 🙂

One more thing, I would like to clarify here – my intention is to describe more features about these regexp_(similar/substr/instr/replace) functions.

I’ve received one question whether these regexp functions available in TD 13 or not in Teradata forum while posting the same article over there.

And, here is my answer to that question –  

Regarding version 13,

Let us check whether they have these regexp functions or not –

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
select * from dbcinfo;

InfoKey InfoData
-------- ------------------------
1 VERSION 13.00.00.15
2 RELEASE 13.00.00.15
3 LANGUAGE SUPPORT MODE Standard


select * from dbcinfo;

InfoKey InfoData
-------- ------------------------
1 VERSION 13.10.07.12
2 RELEASE 13.10.07.12
3 LANGUAGE SUPPORT MODE Standard


1
2
3
4
5
6
7
8
9
select regexp_replace('SatyakiDe','^(.*)([[:upper:]]{1,})(.*) $','\1 \2\3') AS COL_VAL;

select regexp_replace('SatyakiDe','^(.*)([[:upper:]]{1,})(.*) $','\1 \2\3') AS COL_VAL;

select regexp_replace('SatyakiDe','^(.*)([[:upper:]]{1,})(.*) $','\1 \2\3') AS COL_VAL;
$
*** Failure 3706 Syntax error: expected something between '(' and the string 'S' keyword.
Statement# 1, Info =35
*** Total elapsed time was 1 second.



Hope this will give adequate clarity to the answer of that above question.

Now, Lets see some other functionality.

REGEXP_SIMILAR has similar functionality like REGEXP_LIKE in Oracle.

Let’s see couple of such cases –

Lets prepare the table with some dummy data –


 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
SELECT * FROM dbc.dbcinfo;

InfoKey InfoData
-------- -----------------------
1 VERSION 14.10.01.05
2 RELEASE 14.10.01.04
3 LANGUAGE SUPPORT MODE Standard


CREATE MULTISET VOLATILE TABLE TEST_T1
(
COL1 VARCHAR(10)
)
ON COMMIT
PRESERVE ROWS;

INSERT INTO TEST_T1 VALUES('456')
;INSERT INTO TEST_T1 VALUES('123x')
;INSERT INTO TEST_T1 VALUES('x123')
;INSERT INTO TEST_T1 VALUES('y')
;INSERT INTO TEST_T1 VALUES('+789')
;INSERT INTO TEST_T1 VALUES('-789')
;INSERT INTO TEST_T1 VALUES('159-')
;INSERT INTO TEST_T1 VALUES('-1-');


Lets check the data now –

 1
2
3
4
5
6
7
8
9
10
11
12
SELECT *
FROM TEST_T1;

COL1
1 123x
2 456
3 x123
4 +789
5 -789
6 y
7 159-
8 -1-



Let’s look into the various scenarios now –


Case 1 (Returns Mixed Numbers, Signed Numbers & Non Numbers),

 1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT *
FROM TEST_T1
WHERE REGEXP_SIMILAR(COL1,'^[0-9]+$','c')=0;

COL1
-----
1 123x
2 x123
3 +789
4 -789
5 y
6 159-
7 -1-




Case 2 (Returns Only Unsigned Positive Numbers),

1
2
3
4
5
6
7
SELECT *
FROM TEST_T1
WHERE REGEXP_SIMILAR(COL1,'^[0-9]+$','c')=1;

COL1
-----
456



Case 3 (Returns All Numbers including Positive, Negative & unsigned),

 1
2
3
4
5
6
7
8
9
10
11
SELECT *
FROM TEST_T1
WHERE REGEXP_SIMILAR(COL1,'^[+-]?[0-9]+[+-]?$','c')=1;

COL1
-----
456
+789
-789
159-
-1-



Case 4 (Returns Only Non Numbers i.e. Characters),

1
2
3
4
5
6
7
SELECT *
FROM TEST_T1
WHERE REGEXP_SIMILAR(COL1,'[^0-9]+','c')=1;

COL1
----
y



Hope this will give you some additional idea. 🙂

My objective is to provide basic information to my friends. So, that they can write better SQL in TD while migrating from other popular databases or new developer in TD can get a flavor of this powerful feature & exploit them in all the positive aspect & apply them properly. 😀

Really appreciate your time to read this post.

Regards.

Satyaki De.