Many times we need to add new columns (with data) to our data warehouse tables which have a couple of hundred million or even billion rows and a simple ALTER TABLE table_name ADD column_name would take ages.
After refining the process this is the quickest way I could come up with:
0. Generate the DDL script for the existing table including all the indexes with a preferred tool or method. (i.e: select dbms_metadata.get_ddl('OBJECT_TYPE','OBJECT_NAME','OWNER') from dual; do the same for the corresponding indexes)
1. Create a new table including the new column based on the existing table:
CREATE /*+ PARALLEL(48) */ TABLE D_SHIPMENT_NEW
AS SELECT S.SK_SHIPMENT_POSITION
,S.SK_SHIPMENT
,S.SHIPPING_NUMBER
,S.SK_ARTICLE
,S.PAY_ID
FROM D_SHIPMENT S
LEFT JOIN PAY D_PM
ON (S.PAY_ID = NVL(D_PM.PM_ID,-1));
2. I usually do a quick comparison just to make sure the tables have the same number of rows:
SELECT COUNT(1) FROM D_SHIPMENT
UNION ALL
SELECT COUNT(1) FROM D_SHIPMENT_NEW;
3. If the numbers match then we are safe to carry on.
DROP TABLE D_SHIPMENT purge;
4. Rename the table
RENAME D_SHIPMENT_NEW TO D_SHIPMENT;
5. Gather stats on the table so Oracle can have some better estimates for the index creation.
BEGIN
DBMS_STATS.GATHER_TABLE_STATS(USER,'D_SHIPMENT');
END;
/
6. Recreate the indexes, primary key
CREATE UNIQUE INDEX PK_D_SHIPMENT_POS ON D_SHIPMENT(SK_SHIPMENT_POSITION) PARALLEL 48;
ALTER TABLE D_SHIPMENTADD CONSTRAINT PK_D_SHIPMENT_POS PRIMARY KEY (SK_SHIPMENT_POSITION) USING INDEX;
ALTER INDEX PK_D_SHIPMENT_POS NOPARALLEL;
CREATE INDEX IDX_D_SHIP_NR ON D_SHIPMENT(SHIPPING_NUMBER) PARALLEL 48;
ALTER INDEX IDX_D_SHIP_NR NOPARALLEL;
And here you go - you have a new table with the added column including current statistics.
Search This Blog
Wednesday, 11 May 2016
Friday, 29 April 2016
Hash-partitioning in a data warehouse and some side effects
In our data warehouse we were struggling for quite a while to a find a suitable solution for some daily OLTP like loads which back in the days were relying on an Advanced Queuing solution leading to a massive amount of row-by-row data processing and very high amount of IOPS.
Since we are loading data throughout the day it became a necessity to switch to some kind of bulk load mechanism. Our developers came up with the idea to have some hash partitioned tables which enable partition wise joins when having the same partitioning key. Now the merges work pretty fast since statements using the parallel options Oracle (11.2.0.4) provides:
-parallel DML
-parallel query
On the flipside this is taking up a lot of database resources in terms of CPU and IO since the statements still end up in a full table scan only split on the partition level.
One of the tables is storing core information so as the time goes by more processes started to be built around that table using hash partitioning as well.
The point the developers made was: you have to have to same amount of partitions on the tables to achieve better performance.
I was interested whether it is possible to use less partitions or even no partitioning at all for the newly created tables without having much impact on the performance. I created some tests to demonstrate the results:
I created some tables all of them are using the order_id as the partitioning key and the number of partitions varies:
create table HASH_TEST
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 4;
create table HASH_TEST_BIG
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 2;
create table HASH_TEST_BIG_SAME
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 4;
insert into hash_no_part select customer_id,order_id from ods_acc where rownum<=100000;
insert into hash_test_big select customer_id,order_id from ods_acc where rownum<=10000000;
insert into hash_test_big_same_lot select customer_id,order_id from ods_acc where rownum<=10000000;
commit;
begin
dbms_stats.gather_table_stats(USER,'HASH_NO_PART');
dbms_stats.gather_table_stats(USER,'HASH_TEST_BIG');
dbms_stats.gather_table_stats(USER,'HASH_TEST_BIG_SAME');
end;
/
Now let's run selects to see whether the number of partitions really makes a difference:
--# of partitions same
select s.customer_id from hash_test s
join hash_test_big_same b
on s.order_id=b.order_id
where b.order_id=19422274;
Execution Plan
----------------------------------------------------------
Plan hash value: 261338677
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 116 | 2088 | 1517 (1)| 00:00:01 | | |
|* 1 | HASH JOIN | | 116 | 2088 | 1517 (1)| 00:00:01 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 3 | 36 | 46 (0)| 00:00:01 | | |
| 3 | PARTITION HASH SINGLE | | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
|* 4 | TABLE ACCESS FULL | HASH_TEST | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
| 5 | PARTITION HASH SINGLE | | 40 | 240 | 1471 (1)| 00:00:01 |KEY(AP)|KEY(AP)|
|* 6 | TABLE ACCESS FULL | HASH_TEST_BIG_SAME | 40 | 240 | 1471 (1)| 00:00:01 | 1 | 1 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
4 - filter("S"."ORDER_ID"=19422274)
6 - filter("B"."ORDER_ID"=19422274)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2094 consistent gets
0 physical reads
0 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed
--# of partitions not the same 2 for the small table 8 for the big table
select s.customer_id from hash_test s
join hash_test_big b
on s.order_id=b.order_id
where b.order_id=19422274;
Elapsed: 00:00:00.18
Execution Plan
----------------------------------------------------------
Plan hash value: 3648893118
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 110 | 1980 | 3670 (1)| 00:00:02 | | |
|* 1 | HASH JOIN | | 110 | 1980 | 3670 (1)| 00:00:02 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 3 | 36 | 46 (0)| 00:00:01 | | |
| 3 | PARTITION HASH SINGLE | | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
|* 4 | TABLE ACCESS FULL | HASH_TEST | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
| 5 | PARTITION HASH SINGLE | | 37 | 222 | 3623 (1)| 00:00:02 |KEY(AP)|KEY(AP)|
|* 6 | TABLE ACCESS FULL | HASH_TEST_BIG | 37 | 222 | 3623 (1)| 00:00:02 | 1 | 1 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
4 - filter("S"."ORDER_ID"=19422274)
6 - filter("B"."ORDER_ID"=19422274)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4822 consistent gets
0 physical reads
0 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed
In the first case both tables had 4 partitions and in the second case the larger table had only 2 partitions. Looking at the cost and the number of consistent gets shows a very significant difference so it seems like if the number of partitions are the same joining the two tables will consume less resources.
My point: I am fine with using the same number of partitions for every table involved in this process but could we just lower the number of partitions thereby reducing table fragmentation? Following my thinking here is another test to show how much influence the number of partitions have on the performance:
I created 2 tables with a higher amount of partitions:
create table HASH_TEST_LOT
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 16;
create table HASH_TEST_BIG_SAME_LOT
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 32;
insert into hash_test_lot select customer_id,order_id from ods_acc where rownum<=100000;
insert into hash_test_big_same_lot select customer_id,order_id from ods_acc where rownum<=10000000;
begin
dbms_stats.gather_table_stats(USER,'HASH_TEST_LOT');
dbms_stats.gather_table_stats(USER,'HASH_TEST_BIG_SAME_LOT');
end;
/
Let's run the same select as before:
select s.customer_id from hash_test_lot s
join hash_test_big_same_lot b
on s.order_id=b.order_id
where b.order_id=19422274;
188 rows selected.
Elapsed: 00:00:00.18
Execution Plan
----------------------------------------------------------
Plan hash value: 2985387547
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 89 | 3471 | 1833 (18)| 00:00:01 | | |
|* 1 | HASH JOIN | | 89 | 3471 | 1833 (18)| 00:00:01 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 4 | 104 | 45 (0)| 00:00:01 | | |
| 3 | PARTITION HASH SINGLE | | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
|* 4 | TABLE ACCESS FULL | HASH_TEST_LOT | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
| 5 | PARTITION HASH SINGLE | | 22 | 286 | 1788 (19)| 00:00:01 |KEY(AP)|KEY(AP)|
|* 6 | TABLE ACCESS FULL | HASH_TEST_BIG_SAME_LOT | 22 | 286 | 1788 (19)| 00:00:01 | 1 | 1 |
-------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
4 - filter("S"."ORDER_ID"=19422274)
6 - filter("B"."ORDER_ID"=19422274)
Note
-----
- dynamic sampling used for this statement (level=4)
Statistics
----------------------------------------------------------
72 recursive calls
2 db block gets
2432 consistent gets
0 physical reads
264 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed
begin
dbms_stats.gather_table_stats(USER,'HASH_NO_PART');
end;
/
188 rows selected.
Elapsed: 00:00:00.08
Execution Plan
----------------------------------------------------------
Plan hash value: 2379155543
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 32 | 1474 (1)| 00:00:01 | | |
|* 1 | HASH JOIN | | 1 | 32 | 1474 (1)| 00:00:01 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 1 | 26 | 3 (0)| 00:00:01 | | |
|* 3 | TABLE ACCESS FULL | HASH_NO_PART | 1 | 26 | 3 (0)| 00:00:01 | | |
| 4 | PARTITION HASH SINGLE | | 40 | 240 | 1471 (1)| 00:00:01 |KEY(AP)|KEY(AP)|
|* 5 | TABLE ACCESS FULL | HASH_TEST_BIG_SAME | 40 | 240 | 1471 (1)| 00:00:01 | 1 | 1 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
3 - filter("S"."ORDER_ID"=19422274)
5 - filter("B"."ORDER_ID"=19422274)
Statistics
----------------------------------------------------------
33 recursive calls
1 db block gets
2118 consistent gets
0 physical reads
132 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
5 sorts (memory)
0 sorts (disk)
188 rows processed
Pretty much the same as with partitioning so there is a possibility to use a simple table here.
The last point I was interested in having a global index on the large hash-partitioned table's partition key (order_id) which could might eliminate the hash-all partition scans:
create index glob_index on HASH_TEST_BIG_SAME_LOT(order_id) global;
select s.customer_id from hash_test_lot s
join hash_test_big_same_lot b
on s.order_id=b.order_id
where b.order_id=19422274;
188 rows selected.
Elapsed: 00:00:00.14
Execution Plan
----------------------------------------------------------
Plan hash value: 3234924659
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 188 | 7332 | 48 (0)| 00:00:01 | | |
|* 1 | HASH JOIN | | 188 | 7332 | 48 (0)| 00:00:01 | | |
| 2 | PARTITION HASH SINGLE| | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
|* 3 | TABLE ACCESS FULL | HASH_TEST_LOT | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
|* 4 | INDEX RANGE SCAN | GLOB_INDEX | 47 | 611 | 3 (0)| 00:00:01 | | |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
3 - filter("S"."ORDER_ID"=19422274)
4 - access("B"."ORDER_ID"=19422274)
Note
-----
- dynamic sampling used for this statement (level=4)
Statistics
----------------------------------------------------------
13 recursive calls
0 db block gets
226 consistent gets
2 physical reads
0 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed
Since we are loading data throughout the day it became a necessity to switch to some kind of bulk load mechanism. Our developers came up with the idea to have some hash partitioned tables which enable partition wise joins when having the same partitioning key. Now the merges work pretty fast since statements using the parallel options Oracle (11.2.0.4) provides:
-parallel DML
-parallel query
On the flipside this is taking up a lot of database resources in terms of CPU and IO since the statements still end up in a full table scan only split on the partition level.
One of the tables is storing core information so as the time goes by more processes started to be built around that table using hash partitioning as well.
The point the developers made was: you have to have to same amount of partitions on the tables to achieve better performance.
I was interested whether it is possible to use less partitions or even no partitioning at all for the newly created tables without having much impact on the performance. I created some tests to demonstrate the results:
I created some tables all of them are using the order_id as the partitioning key and the number of partitions varies:
create table HASH_TEST
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 4;
create table HASH_TEST_BIG
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 2;
create table HASH_TEST_BIG_SAME
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 4;
insert into hash_no_part select customer_id,order_id from ods_acc where rownum<=100000;
insert into hash_test_big select customer_id,order_id from ods_acc where rownum<=10000000;
insert into hash_test_big_same_lot select customer_id,order_id from ods_acc where rownum<=10000000;
commit;
begin
dbms_stats.gather_table_stats(USER,'HASH_NO_PART');
dbms_stats.gather_table_stats(USER,'HASH_TEST_BIG');
dbms_stats.gather_table_stats(USER,'HASH_TEST_BIG_SAME');
end;
/
--# of partitions same
select s.customer_id from hash_test s
join hash_test_big_same b
on s.order_id=b.order_id
where b.order_id=19422274;
Execution Plan
----------------------------------------------------------
Plan hash value: 261338677
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 116 | 2088 | 1517 (1)| 00:00:01 | | |
|* 1 | HASH JOIN | | 116 | 2088 | 1517 (1)| 00:00:01 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 3 | 36 | 46 (0)| 00:00:01 | | |
| 3 | PARTITION HASH SINGLE | | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
|* 4 | TABLE ACCESS FULL | HASH_TEST | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
| 5 | PARTITION HASH SINGLE | | 40 | 240 | 1471 (1)| 00:00:01 |KEY(AP)|KEY(AP)|
|* 6 | TABLE ACCESS FULL | HASH_TEST_BIG_SAME | 40 | 240 | 1471 (1)| 00:00:01 | 1 | 1 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
4 - filter("S"."ORDER_ID"=19422274)
6 - filter("B"."ORDER_ID"=19422274)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2094 consistent gets
0 physical reads
0 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed
--# of partitions not the same 2 for the small table 8 for the big table
select s.customer_id from hash_test s
join hash_test_big b
on s.order_id=b.order_id
where b.order_id=19422274;
Elapsed: 00:00:00.18
Execution Plan
----------------------------------------------------------
Plan hash value: 3648893118
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 110 | 1980 | 3670 (1)| 00:00:02 | | |
|* 1 | HASH JOIN | | 110 | 1980 | 3670 (1)| 00:00:02 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 3 | 36 | 46 (0)| 00:00:01 | | |
| 3 | PARTITION HASH SINGLE | | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
|* 4 | TABLE ACCESS FULL | HASH_TEST | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
| 5 | PARTITION HASH SINGLE | | 37 | 222 | 3623 (1)| 00:00:02 |KEY(AP)|KEY(AP)|
|* 6 | TABLE ACCESS FULL | HASH_TEST_BIG | 37 | 222 | 3623 (1)| 00:00:02 | 1 | 1 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
4 - filter("S"."ORDER_ID"=19422274)
6 - filter("B"."ORDER_ID"=19422274)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4822 consistent gets
0 physical reads
0 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed
In the first case both tables had 4 partitions and in the second case the larger table had only 2 partitions. Looking at the cost and the number of consistent gets shows a very significant difference so it seems like if the number of partitions are the same joining the two tables will consume less resources.
My point: I am fine with using the same number of partitions for every table involved in this process but could we just lower the number of partitions thereby reducing table fragmentation? Following my thinking here is another test to show how much influence the number of partitions have on the performance:
I created 2 tables with a higher amount of partitions:
create table HASH_TEST_LOT
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 16;
create table HASH_TEST_BIG_SAME_LOT
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 32;
insert into hash_test_lot select customer_id,order_id from ods_acc where rownum<=100000;
insert into hash_test_big_same_lot select customer_id,order_id from ods_acc where rownum<=10000000;
begin
dbms_stats.gather_table_stats(USER,'HASH_TEST_LOT');
dbms_stats.gather_table_stats(USER,'HASH_TEST_BIG_SAME_LOT');
end;
/
Let's run the same select as before:
select s.customer_id from hash_test_lot s
join hash_test_big_same_lot b
on s.order_id=b.order_id
where b.order_id=19422274;
188 rows selected.
Elapsed: 00:00:00.18
Execution Plan
----------------------------------------------------------
Plan hash value: 2985387547
-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 89 | 3471 | 1833 (18)| 00:00:01 | | |
|* 1 | HASH JOIN | | 89 | 3471 | 1833 (18)| 00:00:01 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 4 | 104 | 45 (0)| 00:00:01 | | |
| 3 | PARTITION HASH SINGLE | | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
|* 4 | TABLE ACCESS FULL | HASH_TEST_LOT | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
| 5 | PARTITION HASH SINGLE | | 22 | 286 | 1788 (19)| 00:00:01 |KEY(AP)|KEY(AP)|
|* 6 | TABLE ACCESS FULL | HASH_TEST_BIG_SAME_LOT | 22 | 286 | 1788 (19)| 00:00:01 | 1 | 1 |
-------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
4 - filter("S"."ORDER_ID"=19422274)
6 - filter("B"."ORDER_ID"=19422274)
Note
-----
- dynamic sampling used for this statement (level=4)
Statistics
----------------------------------------------------------
72 recursive calls
2 db block gets
2432 consistent gets
0 physical reads
264 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed
As you see the number of consistent gets is slightly higher but I wouldn't consider it really significant (it's a big box) so in my opinion we could go for smaller number of hash partitions in the tables, keeping the number of partitions equal.
As I mentioned previously another option I wanted to examine is having the small table with no partitioning at all:
create table hash_no_part
(
customer_id NUMBER,
order_id NUMBER
)
;
insert into hash_no_part select customer_id,order_id from ods_acc where rownum<=100000;
dbms_stats.gather_table_stats(USER,'HASH_NO_PART');
end;
/
188 rows selected.
Elapsed: 00:00:00.08
Execution Plan
----------------------------------------------------------
Plan hash value: 2379155543
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 32 | 1474 (1)| 00:00:01 | | |
|* 1 | HASH JOIN | | 1 | 32 | 1474 (1)| 00:00:01 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 1 | 26 | 3 (0)| 00:00:01 | | |
|* 3 | TABLE ACCESS FULL | HASH_NO_PART | 1 | 26 | 3 (0)| 00:00:01 | | |
| 4 | PARTITION HASH SINGLE | | 40 | 240 | 1471 (1)| 00:00:01 |KEY(AP)|KEY(AP)|
|* 5 | TABLE ACCESS FULL | HASH_TEST_BIG_SAME | 40 | 240 | 1471 (1)| 00:00:01 | 1 | 1 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
3 - filter("S"."ORDER_ID"=19422274)
5 - filter("B"."ORDER_ID"=19422274)
Statistics
----------------------------------------------------------
33 recursive calls
1 db block gets
2118 consistent gets
0 physical reads
132 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
5 sorts (memory)
0 sorts (disk)
188 rows processed
Pretty much the same as with partitioning so there is a possibility to use a simple table here.
The last point I was interested in having a global index on the large hash-partitioned table's partition key (order_id) which could might eliminate the hash-all partition scans:
create index glob_index on HASH_TEST_BIG_SAME_LOT(order_id) global;
select s.customer_id from hash_test_lot s
join hash_test_big_same_lot b
on s.order_id=b.order_id
where b.order_id=19422274;
188 rows selected.
Elapsed: 00:00:00.14
Execution Plan
----------------------------------------------------------
Plan hash value: 3234924659
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 188 | 7332 | 48 (0)| 00:00:01 | | |
|* 1 | HASH JOIN | | 188 | 7332 | 48 (0)| 00:00:01 | | |
| 2 | PARTITION HASH SINGLE| | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
|* 3 | TABLE ACCESS FULL | HASH_TEST_LOT | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
|* 4 | INDEX RANGE SCAN | GLOB_INDEX | 47 | 611 | 3 (0)| 00:00:01 | | |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."ORDER_ID"="B"."ORDER_ID")
3 - filter("S"."ORDER_ID"=19422274)
4 - access("B"."ORDER_ID"=19422274)
Note
-----
- dynamic sampling used for this statement (level=4)
Statistics
----------------------------------------------------------
13 recursive calls
0 db block gets
226 consistent gets
2 physical reads
0 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed
This looks very promising but also you have to keep in mind that the global index may cause some noticable performance degradation when you are inserting into the table but this is another topic for later maybe, another issue which was pointed out is the concurrency on global index while inserting into more partitions at the same time therefore the loading strategy should be redesigned as well.
Subscribe to:
Comments (Atom)