Regular Expression on Teradata 14.0

I’ve been working for more than 8 years in Oracle 10g, 11g & worked significant queries on Regular expressions in various scenario using SQL. It is real handy if you know how to use it & can reduce lots of pain with single SQL. And, the performance will be better compared to the total effort to achieve the same functionalists by using multiple SQL queries or PL/SQL Procedures.

Last couple of years, I’m working on Teradata. And, on some occasion – I was expecting features like these, where I can easily manipulate data with regular expression. I’m pretty excited when I heard that Teradata also introduced Regular Expression from Version 14.0.

As a result, I tried all those features that I think can be handy & useful for various scenarios & followings are the successful queries that I get. There are two occasion, where Teradata partially able to manipulate those strings. I’ve checked the latest Teradata Manual. However, unable to find those solution. So, I’m expecting other forum members can contribute here in order to make this thread useful for every one of us. And, I’ll post here as soon as I get some answers on these partial conversions.

For better understanding, I’ve provided the actual column value & after transformation value of that column in the output. That will help us to grasp it easily – I guess. 🙂

Case 1,

SELECT regexp_replace('SatyakiDe','([[:lower:]]{1,})([[:upper:]]{1,})','\1 \2') AS COL_VAL;
 
COLA             COL_VAL
---------------- ----------------------------------------
SatyakiDe        Satyaki De

Case 2,

select regexp_replace('919047242526','^([[:digit:]]{2})([[:digit:]]{10})','+\1 \2') COL_VAL;
 
COLA         COL_VAL
------------ ---------------
919047255555 +91 9047255555

Case 3,

select regexp_replace('+++C','^([[:punct:]]{2})([[:punct:]]{1})(.*)$','\1\3') COL_VAL;
 
COLA COL_VAL
---- -----
+++C ++C

Case 4,

select initcap(regexp_replace(regexp_substr(' satyaki.de@mail.com','[^@]+'),'(.*)(\.)(.*)','\1 \3')) COL_VAL;
 
COLA                             COL_VAL
-------------------------------- --------------------------------------------------
satyaki.de@mail.com              Satyaki De

Case 5,

select regexp_replace('100011001','([[:digit:]]{3})([[:digit:]]{2})([[:digit:]]{4})','XXX-XX-\3') as COL_VAL;
 
COLA             COL_VAL
---------------- --------------------
100011001        XXX-XX-1001

Case 6,

select regexp_replace('123456789','([[:digit:]]{3})([[:digit:]]{3})([[:digit:]]{3})','\3.\2.\1') as COL_VAL;
 
COLA      COL_VAL
--------- ---------------
123456789 789.456.123

Case 7,

SELECT regexp_replace('satyaki9de0loves3to8work2on2sql0and2bi6tools1','[^0-9]+','',1,0,'i') AS DER_VAL;

COLA                                            DER_VAL
---------------------------------------------   ----------
satyaki1de0loves3to8work2on2sql0and2bi4tools1   1038220241

As you can see, all the characters have filtered out from the string & only numbers are kept here. These sorts of queries are very useful in lots of different business scenarios as well.

So, any extra space may not produce desired result. And, needs to pay attention into these small details.

And, I’ve tested all these queries in the following two versions –

select * from dbcinfo;
 
    InfoKey  InfoData
    -------- ------------------------
1   VERSION  14.10.00.02
2   RELEASE  14.10.00.02
3   LANGUAGE SUPPORT MODE   Standard
 
 
select * from dbcinfo; 
 
    InfoKey  InfoData
    -------- ------------------------
1   VERSION  14.10.01.05
2   RELEASE  14.10.01.04
3   LANGUAGE SUPPORT MODE   Standard

Hope, this will give you much more clarity. 🙂

One more thing, I would like to clarify here – my intention is to describe more features about these regexp_(similar/substr/instr/replace) functions.

I’ve received one question whether these regexp functions available in TD 13 or not in Teradata forum while posting the same article over there.

And, here is my answer to that question –

Regarding version 13,

Let us check whether they have these regexp functions or not –

select * from dbcinfo;
 
    InfoKey  InfoData
    -------- ------------------------
1   VERSION  13.00.00.15
2   RELEASE  13.00.00.15
3   LANGUAGE SUPPORT MODE   Standard
 
 
select * from dbcinfo; 
 
    InfoKey  InfoData
    -------- ------------------------
1   VERSION  13.10.07.12
2   RELEASE  13.10.07.12
3   LANGUAGE SUPPORT MODE   Standard

select regexp_replace('SatyakiDe','^(.*)([[:upper:]]{1,})(.*) $','\1 \2\3') AS COL_VAL;
 
select regexp_replace('SatyakiDe','^(.*)([[:upper:]]{1,})(.*) $','\1 \2\3') AS COL_VAL;
 
select regexp_replace('SatyakiDe','^(.*)([[:upper:]]{1,})(.*) $','\1 \2\3') AS COL_VAL;
                                   $
  *** Failure 3706 Syntax error:  expected something between '(' and the string 'S' keyword.
                      Statement# 1, Info =35
 *** Total elapsed time was 1 second.

Hope this will give adequate clarity to the answer of that above question.

Now, Lets see some other functionality.

REGEXP_SIMILAR has similar functionality like REGEXP_LIKE in Oracle.

Let’s see couple of such cases –

Lets prepare the table with some dummy data –

SELECT * FROM dbc.dbcinfo;
 
    InfoKey  InfoData
    -------- -----------------------
1   VERSION  14.10.01.05
2   RELEASE  14.10.01.04
3   LANGUAGE SUPPORT MODE   Standard
 
 
CREATE MULTISET VOLATILE TABLE TEST_T1
  (
            COL1  VARCHAR(10)
  ) 
ON COMMIT 
PRESERVE ROWS;
 
  INSERT INTO TEST_T1 VALUES('456')
  ;INSERT INTO TEST_T1 VALUES('123x')
  ;INSERT INTO TEST_T1 VALUES('x123')
  ;INSERT INTO TEST_T1 VALUES('y')
  ;INSERT INTO TEST_T1 VALUES('+789')
  ;INSERT INTO TEST_T1 VALUES('-789')
  ;INSERT INTO TEST_T1 VALUES('159-')
  ;INSERT INTO TEST_T1 VALUES('-1-');

Lets check the data now –

SELECT *
FROM TEST_T1;
 
    COL1
1   123x
2   456
3   x123
4   +789
5   -789
6   y
7   159-
8   -1-

Let’s look into the various scenarios now –

Case 1 (Returns Mixed Numbers, Signed Numbers & Non Numbers),

SELECT *
 FROM TEST_T1
 WHERE REGEXP_SIMILAR(COL1,'^[0-9]+$','c')=0;
 
    COL1
    -----
1   123x
2   x123
3   +789
4   -789
5   y
6   159-
7   -1-

Case 2 (Returns Only Unsigned Positive Numbers),

SELECT *
 FROM TEST_T1
 WHERE REGEXP_SIMILAR(COL1,'^[0-9]+$','c')=1;
 
COL1
-----
456

Case 3 (Returns All Numbers including Positive, Negative & unsigned),

SELECT *
FROM TEST_T1
WHERE REGEXP_SIMILAR(COL1,'^[+-]?[0-9]+[+-]?$','c')=1;
 
COL1
-----
456
+789
-789
159-
-1-

Case 4 (Returns Only Non Numbers i.e. Characters),

SELECT *
FROM TEST_T1
WHERE REGEXP_SIMILAR(COL1,'[^0-9]+','c')=1;

COL1
----
y

Hope this will give you some additional idea. 🙂

My objective is to provide basic information to my friends. So, that they can write better SQL in TD while migrating from other popular databases or new developer in TD can get a flavor of this powerful feature & exploit them in all the positive aspect & apply them properly. 😀

Really appreciate your time to read this post.

Regards.

Satyaki De.

9 thoughts on “Regular Expression on Teradata 14.0”

Sachen says:

August 5, 2014 at 7:10 pm

Dear Satyakide,
I am newbie to Teradata could you please help me writing the Teradata Query for the below senario’s : Appreciate your time and patience.
1. For each source first, middle, last, title, and full name field, replace all occurrences of a period or a comma with a space.

2. For each source first, middle, last, title, and full name field, parse the field into individual words using one or more spaces as a delimiter.

3. For each word in each parsed source field, convert all but the first character to lowercase.

4. For each word in each parsed source field, if the word is a single alphabetic character then add a period to the end of the word.

5. For each word in each parsed source field, capitalize the next letter following any occurrence of a dash ‘-‘ or a slash ‘/’.

6. For each word in each parsed source field, if the second character is an apostrophe and the third character is not a space and is alphabetic, capitalize the third character. Using this rule, “O’connell” would become “O’Connell”.

7. For each word in each parsed source field, if the word begins with either “Mc” or “Mac” and has at least 2 non-space characters after those values, then capitalize the next letter in the word. Using these rules, “Macconnelly” would become “MacConnelly”, but “Mack” would remain as “Mack”.

P.S : : Here first, middle, last, title pulling from BASE_TAB.PRACTN table , and full name is pulling from _tab.PROV table

Loading...

SatyakiDe says:

August 8, 2014 at 6:06 am

Sachen, whenever you post your problem. Always, post with some sample input data & required output data in a formatted manner. So, that anyone can understand the problem. Also, you should always mention your TD version details.

Now, I’m addressing 2nd part of your post. Always, post the relevant query that you stuck. It is always better to give it a first shot by yourself. provide your query that you have tried in order to achieve the same. That will give us an idea about your way of delivering solution. Based on that, we can provide the solution that you understand better.

Remember one thing. If you easily get anything – you’ll tend to forget quickly. 🙂 But, if you spent lots of time on it – you’ll never forget the same. 😀

Just give it a try & let me know. I’ll be happy to provide couple of solution from this queries.

Thanks for your time to read my article. I hope that will give some basic idea about regular expression in TD 14.0.

Loading...

Kevin says:

November 25, 2014 at 10:53 pm

Yuyudhana, listen, your article has helped me immensely. For years I’ve needed regular expressions, and your essay and examples pried open the door.

Kudos also to Teradata for implementing them, and for also doing the EditDistance() function.

Loading...

1. SatyakiDe says:
  
  November 26, 2014 at 5:34 am
  
  Thanks Kevin. The purpose this blog is to share my knowledge and technical snippet to others. So, everyone can take benefits from it. Also, this is completely new area in TD. So, I’m encouraging many TD Developers to go with this rather making lengthy, complex and multiple SQLs to achieve the same.
  
  Loading...
  
koyel says:

February 18, 2015 at 12:02 pm

Hello Satyaki,

Case 1 (Returns Mixed Numbers, Signed Numbers & Non Numbers),
:Does this work only for varchar type.Whaf it the datatype is char?
Could find the this pattern matching solution only in your site and forum.Thanks for such a page with example.

Regards
Koyel

Loading...

1. SatyakiDe says:
  
  February 18, 2015 at 4:10 pm
  
  Hi Koel,
  
  Ideally, it should work for both varchar & char. Though, I haven’t tested that yet specifically. Why can’t you test that & let me know?
  
  Regards.
  
  Loading...
  
Name says:

March 12, 2015 at 6:15 pm

Its throwing error while executing the same above query Error :
Column datatype does not match a Defined Type name

Loading...

1. SatyakiDe says:
  
  March 13, 2015 at 8:12 am
  
  Which query you are talking about?
  
  Please mention the query over here. Also, post your Teradata Version details. Did you check what version you are trying?
  
  Loading...
  
shrishti kejriwal says:

September 16, 2016 at 12:33 pm

Hi SatrakiDe,

I’m using TD version 14, however, I’m receiving an error while executing the REGEXP_similar function.
Error: 5589. REGEXP_SIMILAR function not found.

This is my query.

select * from shristi WHERE REGEXP_SIMILAR(Val, ‘A’ )=1 ;

Could you please throw some light on this.

Loading...

	The LLM Security Chr… on The LLM Security Chronicles…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on AGENTIC AI IN THE ENTERPRISE:…
	AGENTIC AI IN THE EN… on Agentic AI in the Enterprise:…

Regular Expression on Teradata 14.0

Like this:

Related

Published by SatyakiDe

9 thoughts on “Regular Expression on Teradata 14.0”

Leave a ReplyCancel reply

Share this:

Like this:

Related

Published by SatyakiDe

9 thoughts on “Regular Expression on Teradata 14.0”

Leave a ReplyCancel reply

Discover more from Satyaki De's Blog