I was looking into my query library the other day and the existing query for historical temp tablespace usage I had was not satisfying my needs in case of search for a top temp abuser. I turned out to be a bug caused by DBMS_FEATURE_USAGE_INTERNAL.
SELECT h.sql_id,
h.TOP_LEVEL_SQL_ID,
MAX(round(temp_space_allocated/power(1024,3),2)) MAX_USER_TEMP_GB ,
NVL(TO_CHAR(SUBSTR(s.sql_Text, 0,3900)),'Check the TOP_LEVEL_SQL_ID: select * from SYS.DBA_HIST_SQLTEXT where sql_id='''||TOP_LEVEL_SQL_ID||'''; -- OR -- no sql_text was captured')
FROM DBA_HIST_ACTIVE_SESS_HISTORY h
LEFT JOIN SYS.DBA_HIST_SQLTEXT s
ON h.sql_id=s.sql_id
WHERE h.sample_time > sysdate-1
and TEMP_SPACE_ALLOCATED > (10*power(1024,3))
GROUP BY h.sql_id,
h.TOP_LEVEL_SQL_ID,
TO_CHAR(SUBSTR(s.sql_Text, 0,3900))
ORDER BY 3 DESC nulls last;
observations
Search This Blog
Wednesday, 5 April 2017
Friday, 3 March 2017
Can historical table access be determined based on DBA_HIST_SEG_STAT statistics?
The following topic just came up the other day whether you can check if a table was accessed in the past by using the DBA_HIST_SEG_STAT view (diagnostics pack licence necessary).
The idea is that you can get a more or less accurate result by checking the logical IO column of the view. My concern was that the accuracy of this information can vary depending on the internal maintenance task jobs. (statistics gathering, segment advisor)
Gather stats is fine since if the table content/size hasn't been changing recently the table will be excluded. But what about the segment advisor?
Let's run a quick test (12.1.0.2):
create table segadv_lio
as
select *
from all_objects, (select level l from dual connect by level <= 2);
Table created.
BALAZS@TESTDB
SQL> exec dbms_workload_repository.create_snapshot;
PL/SQL procedure successfully completed.
create or replace procedure run_sa
authid current_user
as
obj_id number;
begin
dbms_advisor.create_task (
advisor_name => 'Segment Advisor',
task_name => 'Manual_T' );
dbms_advisor.create_object (
task_name => 'Manual_T',
object_type => 'TABLE',
attr1 => user,
attr2 => 'SEGADV_LIO',
attr3 => NULL,
attr4 => NULL,
attr5 => NULL,
object_id => obj_id);
dbms_advisor.set_task_parameter(
task_name => 'Manual_T',
parameter => 'recommend_all',
value => 'TRUE');
dbms_advisor.execute_task('Manual_T');
dbms_advisor.delete_task('Manual_T');
end;
/
Procedure created.
SQL> exec dbms_workload_repository.create_snapshot;
PL/SQL procedure successfully completed.
col object_name for a30
col owner for a30
select obj.object_name,obj.owner,sg.LOGICAL_READS_DELTA,sg.snap_id
from DBA_HIST_SEG_STAT sg
join dba_objects obj
on sg.obj#=obj.object_id
where obj.owner='BALAZS'
order by snap_id;
OBJECT_NAME OWNER LOGICAL_READS_DELTA SNAP_ID
--------------- ------------- ------------------- ----------
SEGADV_LIO BALAZS 704 620
This is starting position, the table will now be analysed with the segment advisor.
SQL> exec run_sa
PL/SQL procedure successfully completed.
SQL> exec dbms_workload_repository.create_snapshot;
select obj.object_name,obj.owner,sg.LOGICAL_READS_DELTA,sg.snap_id
from DBA_HIST_SEG_STAT sg
join dba_objects obj
on sg.obj#=obj.object_id
where obj.owner='BALAZS'
order by snap_id;
OBJECT_NAME OWNER LOGICAL_READS_DELTA SNAP_ID
---------------- --------------- ------------------- ----------
SEGADV_LIO BALAZS 704 620
SEGADV_LIO BALAZS 6480 623
2 rows selected.
Trying to force physical reads
SQL> alter system flush buffer_cache;
System altered.
SQL> exec dbms_workload_repository.create_snapshot;
PL/SQL procedure successfully completed.
SQL> exec run_sa
PL/SQL procedure successfully completed.
SQL> exec dbms_workload_repository.create_snapshot;
PL/SQL procedure successfully completed.
select obj.object_name,obj.owner,sg.LOGICAL_READS_DELTA,sg.snap_id
from DBA_HIST_SEG_STAT sg
join dba_objects obj
on sg.obj#=obj.object_id
where obj.owner='BALAZS'
order by snap_id;
OBJECT_NAME OWNER LOGICAL_READS_DELTA SNAP_ID
------------------- ---------------- ------------------- ----------
SEGADV_LIO BALAZS 704 620
SEGADV_LIO BALAZS 6480 623
SEGADV_LIO BALAZS 6480 625
The question remains open how the segment advisor algorithm works whether it checks when the table was last analysed and if it hasn't been touched for a while it may omits it from the segment advisor job.
The idea is that you can get a more or less accurate result by checking the logical IO column of the view. My concern was that the accuracy of this information can vary depending on the internal maintenance task jobs. (statistics gathering, segment advisor)
Gather stats is fine since if the table content/size hasn't been changing recently the table will be excluded. But what about the segment advisor?
Let's run a quick test (12.1.0.2):
create table segadv_lio
as
select *
from all_objects, (select level l from dual connect by level <= 2);
Table created.
BALAZS@TESTDB
SQL> exec dbms_workload_repository.create_snapshot;
PL/SQL procedure successfully completed.
create or replace procedure run_sa
authid current_user
as
obj_id number;
begin
dbms_advisor.create_task (
advisor_name => 'Segment Advisor',
task_name => 'Manual_T' );
dbms_advisor.create_object (
task_name => 'Manual_T',
object_type => 'TABLE',
attr1 => user,
attr2 => 'SEGADV_LIO',
attr3 => NULL,
attr4 => NULL,
attr5 => NULL,
object_id => obj_id);
dbms_advisor.set_task_parameter(
task_name => 'Manual_T',
parameter => 'recommend_all',
value => 'TRUE');
dbms_advisor.execute_task('Manual_T');
dbms_advisor.delete_task('Manual_T');
end;
/
Procedure created.
SQL> exec dbms_workload_repository.create_snapshot;
PL/SQL procedure successfully completed.
col object_name for a30
col owner for a30
select obj.object_name,obj.owner,sg.LOGICAL_READS_DELTA,sg.snap_id
from DBA_HIST_SEG_STAT sg
join dba_objects obj
on sg.obj#=obj.object_id
where obj.owner='BALAZS'
order by snap_id;
OBJECT_NAME OWNER LOGICAL_READS_DELTA SNAP_ID
--------------- ------------- ------------------- ----------
SEGADV_LIO BALAZS 704 620
This is starting position, the table will now be analysed with the segment advisor.
SQL> exec run_sa
PL/SQL procedure successfully completed.
SQL> exec dbms_workload_repository.create_snapshot;
select obj.object_name,obj.owner,sg.LOGICAL_READS_DELTA,sg.snap_id
from DBA_HIST_SEG_STAT sg
join dba_objects obj
on sg.obj#=obj.object_id
where obj.owner='BALAZS'
order by snap_id;
OBJECT_NAME OWNER LOGICAL_READS_DELTA SNAP_ID
---------------- --------------- ------------------- ----------
SEGADV_LIO BALAZS 704 620
SEGADV_LIO BALAZS 6480 623
2 rows selected.
Trying to force physical reads
SQL> alter system flush buffer_cache;
System altered.
SQL> exec dbms_workload_repository.create_snapshot;
PL/SQL procedure successfully completed.
SQL> exec run_sa
PL/SQL procedure successfully completed.
SQL> exec dbms_workload_repository.create_snapshot;
PL/SQL procedure successfully completed.
select obj.object_name,obj.owner,sg.LOGICAL_READS_DELTA,sg.snap_id
from DBA_HIST_SEG_STAT sg
join dba_objects obj
on sg.obj#=obj.object_id
where obj.owner='BALAZS'
order by snap_id;
OBJECT_NAME OWNER LOGICAL_READS_DELTA SNAP_ID
------------------- ---------------- ------------------- ----------
SEGADV_LIO BALAZS 704 620
SEGADV_LIO BALAZS 6480 623
SEGADV_LIO BALAZS 6480 625
The question remains open how the segment advisor algorithm works whether it checks when the table was last analysed and if it hasn't been touched for a while it may omits it from the segment advisor job.
Wednesday, 11 May 2016
Add a new column with a value to a very large table
Many times we need to add new columns (with data) to our data warehouse tables which have a couple of hundred million or even billion rows and a simple ALTER TABLE table_name ADD column_name would take ages.
After refining the process this is the quickest way I could come up with:
0. Generate the DDL script for the existing table including all the indexes with a preferred tool or method. (i.e: select dbms_metadata.get_ddl('OBJECT_TYPE','OBJECT_NAME','OWNER') from dual; do the same for the corresponding indexes)
1. Create a new table including the new column based on the existing table:
CREATE /*+ PARALLEL(48) */ TABLE D_SHIPMENT_NEW
AS SELECT S.SK_SHIPMENT_POSITION
,S.SK_SHIPMENT
,S.SHIPPING_NUMBER
,S.SK_ARTICLE
,S.PAY_ID
FROM D_SHIPMENT S
LEFT JOIN PAY D_PM
ON (S.PAY_ID = NVL(D_PM.PM_ID,-1));
2. I usually do a quick comparison just to make sure the tables have the same number of rows:
SELECT COUNT(1) FROM D_SHIPMENT
UNION ALL
SELECT COUNT(1) FROM D_SHIPMENT_NEW;
3. If the numbers match then we are safe to carry on.
DROP TABLE D_SHIPMENT purge;
4. Rename the table
RENAME D_SHIPMENT_NEW TO D_SHIPMENT;
5. Gather stats on the table so Oracle can have some better estimates for the index creation.
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(USER,'D_SHIPMENT');
END;
/
6. Recreate the indexes, primary key
CREATE UNIQUE INDEX PK_D_SHIPMENT_POS ON D_SHIPMENT(SK_SHIPMENT_POSITION) PARALLEL 48;
ALTER TABLE D_SHIPMENTADD CONSTRAINT PK_D_SHIPMENT_POS PRIMARY KEY (SK_SHIPMENT_POSITION) USING INDEX;
ALTER INDEX PK_D_SHIPMENT_POS NOPARALLEL;
CREATE INDEX IDX_D_SHIP_NR ON D_SHIPMENT(SHIPPING_NUMBER) PARALLEL 48;
ALTER INDEX IDX_D_SHIP_NR NOPARALLEL;
And here you go - you have a new table with the added column including current statistics.
After refining the process this is the quickest way I could come up with:
0. Generate the DDL script for the existing table including all the indexes with a preferred tool or method. (i.e: select dbms_metadata.get_ddl('OBJECT_TYPE','OBJECT_NAME','OWNER') from dual; do the same for the corresponding indexes)
1. Create a new table including the new column based on the existing table:
CREATE /*+ PARALLEL(48) */ TABLE D_SHIPMENT_NEW
AS SELECT S.SK_SHIPMENT_POSITION
,S.SK_SHIPMENT
,S.SHIPPING_NUMBER
,S.SK_ARTICLE
,S.PAY_ID
FROM D_SHIPMENT S
LEFT JOIN PAY D_PM
ON (S.PAY_ID = NVL(D_PM.PM_ID,-1));
2. I usually do a quick comparison just to make sure the tables have the same number of rows:
SELECT COUNT(1) FROM D_SHIPMENT
UNION ALL
SELECT COUNT(1) FROM D_SHIPMENT_NEW;
3. If the numbers match then we are safe to carry on.
DROP TABLE D_SHIPMENT purge;
4. Rename the table
RENAME D_SHIPMENT_NEW TO D_SHIPMENT;
5. Gather stats on the table so Oracle can have some better estimates for the index creation.
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(USER,'D_SHIPMENT');
END;
/
6. Recreate the indexes, primary key
CREATE UNIQUE INDEX PK_D_SHIPMENT_POS ON D_SHIPMENT(SK_SHIPMENT_POSITION) PARALLEL 48;
ALTER TABLE D_SHIPMENTADD CONSTRAINT PK_D_SHIPMENT_POS PRIMARY KEY (SK_SHIPMENT_POSITION) USING INDEX;
ALTER INDEX PK_D_SHIPMENT_POS NOPARALLEL;
CREATE INDEX IDX_D_SHIP_NR ON D_SHIPMENT(SHIPPING_NUMBER) PARALLEL 48;
ALTER INDEX IDX_D_SHIP_NR NOPARALLEL;
And here you go - you have a new table with the added column including current statistics.
Labels:
#CTAS,
#Database,
#datawarehouse,
#DBA,
#DWH,
#Oracle,
#Parallel,
#Performance,
#Tricks,
#VLDB
Friday, 29 April 2016
Hash-partitioning in a data warehouse and some side effects
In our data warehouse we were struggling for quite a while to a find a suitable solution for some daily OLTP like loads which back in the days were relying on an Advanced Queuing solution leading to a massive amount of row-by-row data processing and very high amount of IOPS.
Since we are loading data throughout the day it became a necessity to switch to some kind of bulk load mechanism. Our developers came up with the idea to have some hash partitioned tables which enable partition wise joins when having the same partitioning key. Now the merges work pretty fast since statements using the parallel options Oracle (11.2.0.4) provides:
-parallel DML
-parallel query
On the flipside this is taking up a lot of database resources in terms of CPU and IO since the statements still end up in a full table scan only split on the partition level.
One of the tables is storing core information so as the time goes by more processes started to be built around that table using hash partitioning as well.
The point the developers made was: you have to have to same amount of partitions on the tables to achieve better performance.
I was interested whether it is possible to use less partitions or even no partitioning at all for the newly created tables without having much impact on the performance. I created some tests to demonstrate the results:
I created some tables all of them are using the order_id as the partitioning key and the number of partitions varies:
create table HASH_TEST
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 4;
create table HASH_TEST_BIG
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 2;
create table HASH_TEST_BIG_SAME
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 4;
insert into hash_no_part select customer_id,order_id from ods_acc where rownum<=100000;
insert into hash_test_big select customer_id,order_id from ods_acc where rownum<=10000000;
insert into hash_test_big_same_lot select customer_id,order_id from ods_acc where rownum<=10000000;
commit;
begin
dbms_stats.gather_table_stats(USER,'HASH_NO_PART');
dbms_stats.gather_table_stats(USER,'HASH_TEST_BIG');
dbms_stats.gather_table_stats(USER,'HASH_TEST_BIG_SAME');
end;
/
Now let's run selects to see whether the number of partitions really makes a difference:
--# of partitions same
select s.customer_id from hash_test s
join hash_test_big_same b
on s.order_id=b.order_id
where b.order_id=19422274;
Execution Plan
----------------------------------------------------------
Plan hash value: 261338677
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 116 | 2088 | 1517 (1)| 00:00:01 | | |
|* 1 | HASH JOIN | | 116 | 2088 | 1517 (1)| 00:00:01 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 3 | 36 | 46 (0)| 00:00:01 | | |
| 3 | PARTITION HASH SINGLE | | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
|* 4 | TABLE ACCESS FULL | HASH_TEST | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
| 5 | PARTITION HASH SINGLE | | 40 | 240 | 1471 (1)| 00:00:01 |KEY(AP)|KEY(AP)|
|* 6 | TABLE ACCESS FULL | HASH_TEST_BIG_SAME | 40 | 240 | 1471 (1)| 00:00:01 | 1 | 1 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
4 - filter("S"."ORDER_ID"=19422274)
6 - filter("B"."ORDER_ID"=19422274)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2094 consistent gets
0 physical reads
0 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed
--# of partitions not the same 2 for the small table 8 for the big table
select s.customer_id from hash_test s
join hash_test_big b
on s.order_id=b.order_id
where b.order_id=19422274;
Elapsed: 00:00:00.18
Execution Plan
----------------------------------------------------------
Plan hash value: 3648893118
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 110 | 1980 | 3670 (1)| 00:00:02 | | |
|* 1 | HASH JOIN | | 110 | 1980 | 3670 (1)| 00:00:02 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 3 | 36 | 46 (0)| 00:00:01 | | |
| 3 | PARTITION HASH SINGLE | | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
|* 4 | TABLE ACCESS FULL | HASH_TEST | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
| 5 | PARTITION HASH SINGLE | | 37 | 222 | 3623 (1)| 00:00:02 |KEY(AP)|KEY(AP)|
|* 6 | TABLE ACCESS FULL | HASH_TEST_BIG | 37 | 222 | 3623 (1)| 00:00:02 | 1 | 1 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
4 - filter("S"."ORDER_ID"=19422274)
6 - filter("B"."ORDER_ID"=19422274)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4822 consistent gets
0 physical reads
0 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed
In the first case both tables had 4 partitions and in the second case the larger table had only 2 partitions. Looking at the cost and the number of consistent gets shows a very significant difference so it seems like if the number of partitions are the same joining the two tables will consume less resources.
My point: I am fine with using the same number of partitions for every table involved in this process but could we just lower the number of partitions thereby reducing table fragmentation? Following my thinking here is another test to show how much influence the number of partitions have on the performance:
I created 2 tables with a higher amount of partitions:
create table HASH_TEST_LOT
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 16;
create table HASH_TEST_BIG_SAME_LOT
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 32;
insert into hash_test_lot select customer_id,order_id from ods_acc where rownum<=100000;
insert into hash_test_big_same_lot select customer_id,order_id from ods_acc where rownum<=10000000;
begin
dbms_stats.gather_table_stats(USER,'HASH_TEST_LOT');
dbms_stats.gather_table_stats(USER,'HASH_TEST_BIG_SAME_LOT');
end;
/
Let's run the same select as before:
select s.customer_id from hash_test_lot s
join hash_test_big_same_lot b
on s.order_id=b.order_id
where b.order_id=19422274;
188 rows selected.
Elapsed: 00:00:00.18
Execution Plan
----------------------------------------------------------
Plan hash value: 2985387547
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 89 | 3471 | 1833 (18)| 00:00:01 | | |
|* 1 | HASH JOIN | | 89 | 3471 | 1833 (18)| 00:00:01 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 4 | 104 | 45 (0)| 00:00:01 | | |
| 3 | PARTITION HASH SINGLE | | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
|* 4 | TABLE ACCESS FULL | HASH_TEST_LOT | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
| 5 | PARTITION HASH SINGLE | | 22 | 286 | 1788 (19)| 00:00:01 |KEY(AP)|KEY(AP)|
|* 6 | TABLE ACCESS FULL | HASH_TEST_BIG_SAME_LOT | 22 | 286 | 1788 (19)| 00:00:01 | 1 | 1 |
-------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
4 - filter("S"."ORDER_ID"=19422274)
6 - filter("B"."ORDER_ID"=19422274)
Note
-----
- dynamic sampling used for this statement (level=4)
Statistics
----------------------------------------------------------
72 recursive calls
2 db block gets
2432 consistent gets
0 physical reads
264 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed
begin
dbms_stats.gather_table_stats(USER,'HASH_NO_PART');
end;
/
188 rows selected.
Elapsed: 00:00:00.08
Execution Plan
----------------------------------------------------------
Plan hash value: 2379155543
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 32 | 1474 (1)| 00:00:01 | | |
|* 1 | HASH JOIN | | 1 | 32 | 1474 (1)| 00:00:01 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 1 | 26 | 3 (0)| 00:00:01 | | |
|* 3 | TABLE ACCESS FULL | HASH_NO_PART | 1 | 26 | 3 (0)| 00:00:01 | | |
| 4 | PARTITION HASH SINGLE | | 40 | 240 | 1471 (1)| 00:00:01 |KEY(AP)|KEY(AP)|
|* 5 | TABLE ACCESS FULL | HASH_TEST_BIG_SAME | 40 | 240 | 1471 (1)| 00:00:01 | 1 | 1 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
3 - filter("S"."ORDER_ID"=19422274)
5 - filter("B"."ORDER_ID"=19422274)
Statistics
----------------------------------------------------------
33 recursive calls
1 db block gets
2118 consistent gets
0 physical reads
132 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
5 sorts (memory)
0 sorts (disk)
188 rows processed
Pretty much the same as with partitioning so there is a possibility to use a simple table here.
The last point I was interested in having a global index on the large hash-partitioned table's partition key (order_id) which could might eliminate the hash-all partition scans:
create index glob_index on HASH_TEST_BIG_SAME_LOT(order_id) global;
select s.customer_id from hash_test_lot s
join hash_test_big_same_lot b
on s.order_id=b.order_id
where b.order_id=19422274;
188 rows selected.
Elapsed: 00:00:00.14
Execution Plan
----------------------------------------------------------
Plan hash value: 3234924659
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 188 | 7332 | 48 (0)| 00:00:01 | | |
|* 1 | HASH JOIN | | 188 | 7332 | 48 (0)| 00:00:01 | | |
| 2 | PARTITION HASH SINGLE| | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
|* 3 | TABLE ACCESS FULL | HASH_TEST_LOT | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
|* 4 | INDEX RANGE SCAN | GLOB_INDEX | 47 | 611 | 3 (0)| 00:00:01 | | |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
3 - filter("S"."ORDER_ID"=19422274)
4 - access("B"."ORDER_ID"=19422274)
Note
-----
- dynamic sampling used for this statement (level=4)
Statistics
----------------------------------------------------------
13 recursive calls
0 db block gets
226 consistent gets
2 physical reads
0 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed
Since we are loading data throughout the day it became a necessity to switch to some kind of bulk load mechanism. Our developers came up with the idea to have some hash partitioned tables which enable partition wise joins when having the same partitioning key. Now the merges work pretty fast since statements using the parallel options Oracle (11.2.0.4) provides:
-parallel DML
-parallel query
On the flipside this is taking up a lot of database resources in terms of CPU and IO since the statements still end up in a full table scan only split on the partition level.
One of the tables is storing core information so as the time goes by more processes started to be built around that table using hash partitioning as well.
The point the developers made was: you have to have to same amount of partitions on the tables to achieve better performance.
I was interested whether it is possible to use less partitions or even no partitioning at all for the newly created tables without having much impact on the performance. I created some tests to demonstrate the results:
I created some tables all of them are using the order_id as the partitioning key and the number of partitions varies:
create table HASH_TEST
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 4;
create table HASH_TEST_BIG
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 2;
create table HASH_TEST_BIG_SAME
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 4;
insert into hash_no_part select customer_id,order_id from ods_acc where rownum<=100000;
insert into hash_test_big select customer_id,order_id from ods_acc where rownum<=10000000;
insert into hash_test_big_same_lot select customer_id,order_id from ods_acc where rownum<=10000000;
commit;
begin
dbms_stats.gather_table_stats(USER,'HASH_NO_PART');
dbms_stats.gather_table_stats(USER,'HASH_TEST_BIG');
dbms_stats.gather_table_stats(USER,'HASH_TEST_BIG_SAME');
end;
/
--# of partitions same
select s.customer_id from hash_test s
join hash_test_big_same b
on s.order_id=b.order_id
where b.order_id=19422274;
Execution Plan
----------------------------------------------------------
Plan hash value: 261338677
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 116 | 2088 | 1517 (1)| 00:00:01 | | |
|* 1 | HASH JOIN | | 116 | 2088 | 1517 (1)| 00:00:01 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 3 | 36 | 46 (0)| 00:00:01 | | |
| 3 | PARTITION HASH SINGLE | | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
|* 4 | TABLE ACCESS FULL | HASH_TEST | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
| 5 | PARTITION HASH SINGLE | | 40 | 240 | 1471 (1)| 00:00:01 |KEY(AP)|KEY(AP)|
|* 6 | TABLE ACCESS FULL | HASH_TEST_BIG_SAME | 40 | 240 | 1471 (1)| 00:00:01 | 1 | 1 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
4 - filter("S"."ORDER_ID"=19422274)
6 - filter("B"."ORDER_ID"=19422274)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2094 consistent gets
0 physical reads
0 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed
--# of partitions not the same 2 for the small table 8 for the big table
select s.customer_id from hash_test s
join hash_test_big b
on s.order_id=b.order_id
where b.order_id=19422274;
Elapsed: 00:00:00.18
Execution Plan
----------------------------------------------------------
Plan hash value: 3648893118
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 110 | 1980 | 3670 (1)| 00:00:02 | | |
|* 1 | HASH JOIN | | 110 | 1980 | 3670 (1)| 00:00:02 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 3 | 36 | 46 (0)| 00:00:01 | | |
| 3 | PARTITION HASH SINGLE | | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
|* 4 | TABLE ACCESS FULL | HASH_TEST | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
| 5 | PARTITION HASH SINGLE | | 37 | 222 | 3623 (1)| 00:00:02 |KEY(AP)|KEY(AP)|
|* 6 | TABLE ACCESS FULL | HASH_TEST_BIG | 37 | 222 | 3623 (1)| 00:00:02 | 1 | 1 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
4 - filter("S"."ORDER_ID"=19422274)
6 - filter("B"."ORDER_ID"=19422274)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4822 consistent gets
0 physical reads
0 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed
In the first case both tables had 4 partitions and in the second case the larger table had only 2 partitions. Looking at the cost and the number of consistent gets shows a very significant difference so it seems like if the number of partitions are the same joining the two tables will consume less resources.
My point: I am fine with using the same number of partitions for every table involved in this process but could we just lower the number of partitions thereby reducing table fragmentation? Following my thinking here is another test to show how much influence the number of partitions have on the performance:
I created 2 tables with a higher amount of partitions:
create table HASH_TEST_LOT
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 16;
create table HASH_TEST_BIG_SAME_LOT
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 32;
insert into hash_test_lot select customer_id,order_id from ods_acc where rownum<=100000;
insert into hash_test_big_same_lot select customer_id,order_id from ods_acc where rownum<=10000000;
begin
dbms_stats.gather_table_stats(USER,'HASH_TEST_LOT');
dbms_stats.gather_table_stats(USER,'HASH_TEST_BIG_SAME_LOT');
end;
/
Let's run the same select as before:
select s.customer_id from hash_test_lot s
join hash_test_big_same_lot b
on s.order_id=b.order_id
where b.order_id=19422274;
188 rows selected.
Elapsed: 00:00:00.18
Execution Plan
----------------------------------------------------------
Plan hash value: 2985387547
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 89 | 3471 | 1833 (18)| 00:00:01 | | |
|* 1 | HASH JOIN | | 89 | 3471 | 1833 (18)| 00:00:01 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 4 | 104 | 45 (0)| 00:00:01 | | |
| 3 | PARTITION HASH SINGLE | | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
|* 4 | TABLE ACCESS FULL | HASH_TEST_LOT | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
| 5 | PARTITION HASH SINGLE | | 22 | 286 | 1788 (19)| 00:00:01 |KEY(AP)|KEY(AP)|
|* 6 | TABLE ACCESS FULL | HASH_TEST_BIG_SAME_LOT | 22 | 286 | 1788 (19)| 00:00:01 | 1 | 1 |
-------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
4 - filter("S"."ORDER_ID"=19422274)
6 - filter("B"."ORDER_ID"=19422274)
Note
-----
- dynamic sampling used for this statement (level=4)
Statistics
----------------------------------------------------------
72 recursive calls
2 db block gets
2432 consistent gets
0 physical reads
264 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed
As you see the number of consistent gets is slightly higher but I wouldn't consider it really significant (it's a big box) so in my opinion we could go for smaller number of hash partitions in the tables, keeping the number of partitions equal.
As I mentioned previously another option I wanted to examine is having the small table with no partitioning at all:
create table hash_no_part
(
customer_id NUMBER,
order_id NUMBER
)
;
insert into hash_no_part select customer_id,order_id from ods_acc where rownum<=100000;
dbms_stats.gather_table_stats(USER,'HASH_NO_PART');
end;
/
188 rows selected.
Elapsed: 00:00:00.08
Execution Plan
----------------------------------------------------------
Plan hash value: 2379155543
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 32 | 1474 (1)| 00:00:01 | | |
|* 1 | HASH JOIN | | 1 | 32 | 1474 (1)| 00:00:01 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 1 | 26 | 3 (0)| 00:00:01 | | |
|* 3 | TABLE ACCESS FULL | HASH_NO_PART | 1 | 26 | 3 (0)| 00:00:01 | | |
| 4 | PARTITION HASH SINGLE | | 40 | 240 | 1471 (1)| 00:00:01 |KEY(AP)|KEY(AP)|
|* 5 | TABLE ACCESS FULL | HASH_TEST_BIG_SAME | 40 | 240 | 1471 (1)| 00:00:01 | 1 | 1 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
3 - filter("S"."ORDER_ID"=19422274)
5 - filter("B"."ORDER_ID"=19422274)
Statistics
----------------------------------------------------------
33 recursive calls
1 db block gets
2118 consistent gets
0 physical reads
132 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
5 sorts (memory)
0 sorts (disk)
188 rows processed
Pretty much the same as with partitioning so there is a possibility to use a simple table here.
The last point I was interested in having a global index on the large hash-partitioned table's partition key (order_id) which could might eliminate the hash-all partition scans:
create index glob_index on HASH_TEST_BIG_SAME_LOT(order_id) global;
select s.customer_id from hash_test_lot s
join hash_test_big_same_lot b
on s.order_id=b.order_id
where b.order_id=19422274;
188 rows selected.
Elapsed: 00:00:00.14
Execution Plan
----------------------------------------------------------
Plan hash value: 3234924659
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 188 | 7332 | 48 (0)| 00:00:01 | | |
|* 1 | HASH JOIN | | 188 | 7332 | 48 (0)| 00:00:01 | | |
| 2 | PARTITION HASH SINGLE| | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
|* 3 | TABLE ACCESS FULL | HASH_TEST_LOT | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
|* 4 | INDEX RANGE SCAN | GLOB_INDEX | 47 | 611 | 3 (0)| 00:00:01 | | |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
3 - filter("S"."ORDER_ID"=19422274)
4 - access("B"."ORDER_ID"=19422274)
Note
-----
- dynamic sampling used for this statement (level=4)
Statistics
----------------------------------------------------------
13 recursive calls
0 db block gets
226 consistent gets
2 physical reads
0 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed
This looks very promising but also you have to keep in mind that the global index may cause some noticable performance degradation when you are inserting into the table but this is another topic for later maybe, another issue which was pointed out is the concurrency on global index while inserting into more partitions at the same time therefore the loading strategy should be redesigned as well.
Subscribe to:
Comments (Atom)