在 Oracle 中将字符串拆分为多行

我知道 PHP 和 MYSQL 在某种程度上已经回答了这个问题,但是我想知道是否有人可以教我在 Oracle 10g (最好是)和11g 中将字符串(以逗号分隔)分割成多行的最简单的方法。

下表如下:

Name | Project | Error
108    test      Err1, Err2, Err3
109    test2     Err1

我想创建以下内容:

Name | Project | Error
108    Test      Err1
108    Test      Err2
108    Test      Err3
109    Test2     Err1

I've seen a few potential solutions around stack, however they only accounted for a single column (being the comma delimited string). Any help would be greatly appreciated.

329241 次浏览

正则表达式是一个奇妙的东西:)

with temp as  (
select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error  from dual
union all
select 109, 'test2', 'Err1' from dual
)


SELECT distinct Name, Project, trim(regexp_substr(str, '[^,]+', 1, level)) str
FROM (SELECT Name, Project, Error str FROM temp) t
CONNECT BY instr(str, ',', 1, level - 1) > 0
order by Name

还有几个同样的例子:

SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab
FROM dual
CONNECT BY LEVEL <= regexp_count('Err1, Err2, Err3', ',')+1
/


SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab
FROM dual
CONNECT BY LEVEL <= length('Err1, Err2, Err3') - length(REPLACE('Err1, Err2, Err3', ',', ''))+1
/

另外,可以使用 DBMS _ UTILITY. comma _ to _ table & table _ to _ comma: Http://www.oracle-base.com/articles/9i/useful-procedures-and-functions-9i.php#dbms_utility.comma_to_table

REGEXP _ COUNT 直到 Oracle 11i 才被添加。

SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab
FROM dual
CONNECT BY LEVEL <=
LENGTH('Err1, Err2, Err3')
- LENGTH(REPLACE('Err1, Err2, Err3', ',', ''))
+ 1;

我已经使用了 DBMS _ UTILITY. comma _ to _ table 函数实际上它的工作原理 代码如下

declare
l_tablen  BINARY_INTEGER;
l_tab     DBMS_UTILITY.uncl_array;
cursor cur is select * from qwer;
rec cur%rowtype;
begin
open cur;
loop
fetch cur into rec;
exit when cur%notfound;
DBMS_UTILITY.comma_to_table (
list   => rec.val,
tablen => l_tablen,
tab    => l_tab);
FOR i IN 1 .. l_tablen LOOP
DBMS_OUTPUT.put_line(i || ' : ' || l_tab(i));
END LOOP;
end loop;
close cur;
end;

我使用了自己的表名和列名

这可能是一种改进的方法(也包括 regexp 和 connect by) :

with temp as
(
select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error  from dual
union all
select 109, 'test2', 'Err1' from dual
)
select distinct
t.name, t.project,
trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value))  as error
from
temp t,
table(cast(multiset(select level from dual connect by  level <= length (regexp_replace(t.error, '[^,]+'))  + 1) as sys.OdciNumberList)) levels
order by name

编辑: Here is a simple (as in, "not in depth") explanation of the query.

  1. length (regexp_replace(t.error, '[^,]+')) + 1使用 regexp_replace擦除任何不是分隔符(本例中为逗号)的内容,并使用 length +1获取有多少元素(错误)。
  2. select level from dual connect by level <= (...)使用 hierarchical query创建一个列,该列具有越来越多的匹配项,从1到错误总数。

    预览:

    select level, length (regexp_replace('Err1, Err2, Err3', '[^,]+'))  + 1 as max
    from dual connect by level <= length (regexp_replace('Err1, Err2, Err3', '[^,]+'))  + 1
    
  3. table(cast(multiset(.....) as sys.OdciNumberList)) does some casting of oracle types.
    • The cast(multiset(.....)) as sys.OdciNumberList transforms multiple collections (one collection for each row in the original data set) into a single collection of numbers, OdciNumberList.
    • The table() function transforms a collection into a resultset.
  4. FROM without a join creates a cross join between your dataset and the multiset. As a result, a row in the data set with 4 matches will repeat 4 times (with an increasing number in the column named "column_value").

    Preview:

    select * from
    temp t,
    table(cast(multiset(select level from dual connect by  level <= length (regexp_replace(t.error, '[^,]+'))  + 1) as sys.OdciNumberList)) levels
    
  5. trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value)) uses the column_value as the nth_appearance/ocurrence parameter for regexp_substr.
  6. You can add some other columns from your data set (t.name, t.project as an example) for easy visualization.

Some references to Oracle docs:

There is a huge difference between the below two:

  • 分隔单个分隔字符串
  • 分隔表中多行的分隔字符串。

如果不限制行,那么 CONNECT BY子句将生成 多行,并且不会提供所需的输出。

除了 正则表达式之外,还有其他一些替代方案:

  • XMLTable
  • MODEL 子句

设置

SQL> CREATE TABLE t (
2    ID          NUMBER GENERATED ALWAYS AS IDENTITY,
3    text        VARCHAR2(100)
4  );


Table created.


SQL>
SQL> INSERT INTO t (text) VALUES ('word1, word2, word3');


1 row created.


SQL> INSERT INTO t (text) VALUES ('word4, word5, word6');


1 row created.


SQL> INSERT INTO t (text) VALUES ('word7, word8, word9');


1 row created.


SQL> COMMIT;


Commit complete.


SQL>
SQL> SELECT * FROM t;


ID TEXT
---------- ----------------------------------------------
1 word1, word2, word3
2 word4, word5, word6
3 word7, word8, word9


SQL>

Using XMLTABLE:

SQL> SELECT id,
2         trim(COLUMN_VALUE) text
3  FROM t,
4    xmltable(('"'
5    || REPLACE(text, ',', '","')
6    || '"'))
7  /


ID TEXT
---------- ------------------------
1 word1
1 word2
1 word3
2 word4
2 word5
2 word6
3 word7
3 word8
3 word9


9 rows selected.


SQL>

使用 模特条款:

SQL> WITH
2  model_param AS
3     (
4            SELECT id,
5                      text AS orig_str ,
6                   ','
7                          || text
8                          || ','                                 AS mod_str ,
9                   1                                             AS start_pos ,
10                   Length(text)                                   AS end_pos ,
11                   (Length(text) - Length(Replace(text, ','))) + 1 AS element_count ,
12                   0                                             AS element_no ,
13                   ROWNUM                                        AS rn
14            FROM   t )
15     SELECT   id,
16              trim(Substr(mod_str, start_pos, end_pos-start_pos)) text
17     FROM     (
18                     SELECT *
19                     FROM   model_param MODEL PARTITION BY (id, rn, orig_str, mod_str)
20                     DIMENSION BY (element_no)
21                     MEASURES (start_pos, end_pos, element_count)
22                     RULES ITERATE (2000)
23                     UNTIL (ITERATION_NUMBER+1 = element_count[0])
24                     ( start_pos[ITERATION_NUMBER+1] = instr(cv(mod_str), ',', 1, cv(element_no)) + 1,
25                     end_pos[iteration_number+1] = instr(cv(mod_str), ',', 1, cv(element_no) + 1) )
26                 )
27     WHERE    element_no != 0
28     ORDER BY mod_str ,
29           element_no
30  /


ID TEXT
---------- --------------------------------------------------
1 word1
1 word2
1 word3
2 word4
2 word5
2 word6
3 word7
3 word8
3 word9


9 rows selected.


SQL>

我想添加另一个方法。这个使用递归查询,这是我在其他答案中没有看到的。Oracle 从11gR2开始就支持它。

with cte0 as (
select phone_number x
from hr.employees
), cte1(xstr,xrest,xremoved) as (
select x, x, null
from cte0
union all
select xstr,
case when instr(xrest,'.') = 0 then null else substr(xrest,instr(xrest,'.')+1) end,
case when instr(xrest,'.') = 0 then xrest else substr(xrest,1,instr(xrest,'.') - 1) end
from cte1
where xrest is not null
)
select xstr, xremoved from cte1
where xremoved is not null
order by xstr

It is quite flexible with the splitting character. Simply change it in the INSTR calls.

不使用 连接Regexp:

    with mytable as (
select 108 name, 'test' project, 'Err1,Err2,Err3' error from dual
union all
select 109, 'test2', 'Err1' from dual
)
,x as (
select name
,project
,','||error||',' error
from mytable
)
,iter as (SELECT rownum AS pos
FROM all_objects
)
select x.name,x.project
,SUBSTR(x.error
,INSTR(x.error, ',', 1, iter.pos) + 1
,INSTR(x.error, ',', 1, iter.pos + 1)-INSTR(x.error, ',', 1, iter.pos)-1
) error
from x, iter
where iter.pos < = (LENGTH(x.error) - LENGTH(REPLACE(x.error, ','))) - 1;

我想提出一种使用 PIPELINED 表函数的不同方法。它有点类似于 XMLTABLE 的技术,只是提供了自己的自定义函数来分割字符串:

-- Create a collection type to hold the results
CREATE OR REPLACE TYPE typ_str2tbl_nst AS TABLE OF VARCHAR2(30);
/


-- Split the string according to the specified delimiter
CREATE OR REPLACE FUNCTION str2tbl (
p_string    VARCHAR2,
p_delimiter CHAR DEFAULT ','
)
RETURN typ_str2tbl_nst PIPELINED
AS
l_tmp VARCHAR2(32000) := p_string || p_delimiter;
l_pos NUMBER;
BEGIN
LOOP
l_pos := INSTR( l_tmp, p_delimiter );
EXIT WHEN NVL( l_pos, 0 ) = 0;
PIPE ROW ( RTRIM( LTRIM( SUBSTR( l_tmp, 1, l_pos-1) ) ) );
l_tmp := SUBSTR( l_tmp, l_pos+1 );
END LOOP;
END str2tbl;
/


-- The problem solution
SELECT name,
project,
TRIM(COLUMN_VALUE) error
FROM t, TABLE(str2tbl(error));

结果:

      NAME PROJECT    ERROR
---------- ---------- --------------------
108 test       Err1
108 test       Err2
108 test       Err3
109 test2      Err1

这种方法的问题在于,优化器通常不知道表函数的基数,因此必须进行猜测。这可能对您的执行计划有潜在的危害,因此可以扩展此解决方案以为优化器提供执行统计信息。

通过对上面的查询运行 EXPLAIN PLAN,您可以看到这个优化器的估计:

Execution Plan
----------------------------------------------------------
Plan hash value: 2402555806


----------------------------------------------------------------------------------------------
| Id  | Operation                          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |         | 16336 |   366K|    59   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                      |         | 16336 |   366K|    59   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL                | T       |     2 |    42 |     3   (0)| 00:00:01 |
|   3 |   COLLECTION ITERATOR PICKLER FETCH| STR2TBL |  8168 | 16336 |    28   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Even though the collection has only 3 values, the optimizer estimated 8168 rows for it (default value). This may seem irrelevant at first, but it may be enough for the optimizer to decide for a sub-optimal plan.

解决方案是使用优化器扩展来为集合提供统计信息:

-- Create the optimizer interface to the str2tbl function
CREATE OR REPLACE TYPE typ_str2tbl_stats AS OBJECT (
dummy NUMBER,


STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList )
RETURN NUMBER,


STATIC FUNCTION ODCIStatsTableFunction ( p_function  IN  SYS.ODCIFuncInfo,
p_stats     OUT SYS.ODCITabFuncStats,
p_args      IN  SYS.ODCIArgDescList,
p_string    IN  VARCHAR2,
p_delimiter IN  CHAR DEFAULT ',' )
RETURN NUMBER
);
/


-- Optimizer interface implementation
CREATE OR REPLACE TYPE BODY typ_str2tbl_stats
AS
STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList )
RETURN NUMBER
AS
BEGIN
p_interfaces := SYS.ODCIObjectList ( SYS.ODCIObject ('SYS', 'ODCISTATS2') );
RETURN ODCIConst.SUCCESS;
END ODCIGetInterfaces;


-- This function is responsible for returning the cardinality estimate
STATIC FUNCTION ODCIStatsTableFunction ( p_function  IN  SYS.ODCIFuncInfo,
p_stats     OUT SYS.ODCITabFuncStats,
p_args      IN  SYS.ODCIArgDescList,
p_string    IN  VARCHAR2,
p_delimiter IN  CHAR DEFAULT ',' )
RETURN NUMBER
AS
BEGIN
-- I'm using basically half the string lenght as an estimator for its cardinality
p_stats := SYS.ODCITabFuncStats( CEIL( LENGTH( p_string ) / 2 ) );
RETURN ODCIConst.SUCCESS;
END ODCIStatsTableFunction;


END;
/


-- Associate our optimizer extension with the PIPELINED function
ASSOCIATE STATISTICS WITH FUNCTIONS str2tbl USING typ_str2tbl_stats;

测试结果执行计划:

Execution Plan
----------------------------------------------------------
Plan hash value: 2402555806


----------------------------------------------------------------------------------------------
| Id  | Operation                          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |         |     1 |    23 |    59   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                      |         |     1 |    23 |    59   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL                | T       |     2 |    42 |     3   (0)| 00:00:01 |
|   3 |   COLLECTION ITERATOR PICKLER FETCH| STR2TBL |     1 |     2 |    28   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

正如您所看到的,上面计划中的基数不再是8196的猜测值。这仍然是不正确的,因为我们正在向函数传递一个列而不是字符串文字。

在这种特殊情况下,对函数代码进行一些调整是必要的,以便给出一个更接近的估计,但是我认为这里已经很好地解释了整个概念。

这个答案中使用的 str2tbl 函数最初是由 Tom Kyte 开发的: Https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::p11_question_id:110612348061

通过阅读本文,可以进一步探讨将统计数据与对象类型相关联的概念: Http://www.oracle-developer.net/display.php?id=427

这里描述的技术工作在10g 以上。

下面是一个使用 XMLTABLE 的替代实现,它允许将数据类型强制转换为不同的数据类型:

select
xmltab.txt
from xmltable(
'for $text in tokenize("a,b,c", ",") return $text'
columns
txt varchar2(4000) path '.'
) xmltab
;

... 或者如果分隔的字符串存储在一个表的一行或多行中:

select
xmltab.txt
from (
select 'a;b;c' inpt from dual union all
select 'd;e;f' from dual
) base
inner join xmltable(
'for $text in tokenize($input, ";") return $text'
passing base.inpt as "input"
columns
txt varchar2(4000) path '.'
) xmltab
on 1=1
;

我也有同样的问题,xmltable 帮了我:

选择 id,饰(COLUMN _ VALUE)文本 FROM t,xmltable ((’”’| | REPLACE (text,’,’,’”,”’) | |’”’))

从 Oracle 12c 开始,你可以使用 JSON_TABLEJSON_ARRAY:

CREATE TABLE tab(Name, Project, Error) AS
SELECT 108,'test' ,'Err1, Err2, Err3' FROM dual UNION
SELECT 109,'test2','Err1'             FROM dual;

还有一个问题:

SELECT *
FROM tab t
OUTER APPLY (SELECT TRIM(p) AS p
FROM JSON_TABLE(REPLACE(JSON_ARRAY(t.Error), ',', '","'),
'$[*]' COLUMNS (p VARCHAR2(4000) PATH '$'))) s;

产出:

┌──────┬─────────┬──────────────────┬──────┐
│ Name │ Project │      Error       │  P   │
├──────┼─────────┼──────────────────┼──────┤
│  108 │ test    │ Err1, Err2, Err3 │ Err1 │
│  108 │ test    │ Err1, Err2, Err3 │ Err2 │
│  108 │ test    │ Err1, Err2, Err3 │ Err3 │
│  109 │ test2   │ Err1             │ Err1 │
└──────┴─────────┴──────────────────┴──────┘

Db < > 小提琴演奏

在 Oracle 11g 及更高版本中,可以使用递归子查询和简单的字符串函数(这可能比正则表达式和相关的分层子查询更快) :

Oracle 安装程序 :

CREATE TABLE table_name ( name, project, error ) as
select 108, 'test',  'Err1, Err2, Err3' from dual union all
select 109, 'test2', 'Err1'             from dual;

查询 :

WITH table_name_error_bounds ( name, project, error, start_pos, end_pos ) AS (
SELECT name,
project,
error,
1,
INSTR( error, ', ', 1 )
FROM   table_name
UNION ALL
SELECT name,
project,
error,
end_pos + 2,
INSTR( error, ', ', end_pos + 2 )
FROM   table_name_error_bounds
WHERE  end_pos > 0
)
SELECT name,
project,
CASE end_pos
WHEN 0
THEN SUBSTR( error, start_pos )
ELSE SUBSTR( error, start_pos, end_pos - start_pos )
END AS error
FROM   table_name_error_bounds

产出 :

NAME | PROJECT | ERROR
---: | :------ | :----
108 | test    | Err1
109 | test2   | Err1
108 | test    | Err2
108 | test    | Err3

分贝 < > 小提琴 给你

If you have Oracle APEX 5.1 or later installed, you can use the convenient APEX_STRING.split function, e.g.:

select q.Name, q.Project, s.column_value as Error
from mytable q,
APEX_STRING.split(q.Error, ',') s

第二个参数是分隔符字符串。它还接受第三个参数来限制您希望它执行的拆分次数。

Https://docs.oracle.com/en/database/oracle/application-express/20.1/aeapi/split-function-signature-1.html#guid-3be7ff37-e54f-4503-91b8-94f374e243e6