observations

The following topic just came up the other day whether you can check if a table was accessed in the past by using the DBA_HIST_SEG_STAT view (diagnostics pack licence necessary).
The idea is that you can get a more or less accurate result by checking the logical IO column of the view. My concern was that the accuracy of this information can vary depending on the internal maintenance task jobs. (statistics gathering, segment advisor)
Gather stats is fine since if the table content/size hasn't been changing recently the table will be excluded. But what about the segment advisor?

Let's run a quick test (12.1.0.2):

create table segadv_lio
as
select *
from all_objects, (select level l from dual connect by level <= 2);

Table created.

BALAZS@TESTDB
SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

create or replace procedure run_sa
authid current_user
as
obj_id number;
begin
dbms_advisor.create_task (
advisor_name => 'Segment Advisor',
task_name => 'Manual_T' );

dbms_advisor.create_object (
task_name => 'Manual_T',
object_type => 'TABLE',
attr1 => user,
attr2 => 'SEGADV_LIO',
attr3 => NULL,
attr4 => NULL,
attr5 => NULL,
object_id => obj_id);

dbms_advisor.set_task_parameter(
task_name => 'Manual_T',
parameter => 'recommend_all',
value => 'TRUE');

dbms_advisor.execute_task('Manual_T');
dbms_advisor.delete_task('Manual_T');
end;
/

Procedure created.

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

col object_name for a30
col owner for a30

select obj.object_name,obj.owner,sg.LOGICAL_READS_DELTA,sg.snap_id
from DBA_HIST_SEG_STAT sg
join dba_objects obj
on sg.obj#=obj.object_id
where obj.owner='BALAZS'
order by snap_id;

OBJECT_NAME OWNER LOGICAL_READS_DELTA SNAP_ID
--------------- ------------- ------------------- ----------
SEGADV_LIO BALAZS 704 620

This is starting position, the table will now be analysed with the segment advisor.

SQL> exec run_sa

PL/SQL procedure successfully completed.

SQL> exec dbms_workload_repository.create_snapshot;

select obj.object_name,obj.owner,sg.LOGICAL_READS_DELTA,sg.snap_id
from DBA_HIST_SEG_STAT sg
join dba_objects obj
on sg.obj#=obj.object_id
where obj.owner='BALAZS'
order by snap_id;

OBJECT_NAME OWNER LOGICAL_READS_DELTA SNAP_ID
---------------- --------------- ------------------- ----------
SEGADV_LIO BALAZS 704 620
SEGADV_LIO BALAZS 6480 623

2 rows selected.

Trying to force physical reads

SQL> alter system flush buffer_cache;

System altered.

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> exec run_sa

PL/SQL procedure successfully completed.

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

select obj.object_name,obj.owner,sg.LOGICAL_READS_DELTA,sg.snap_id
from DBA_HIST_SEG_STAT sg
join dba_objects obj
on sg.obj#=obj.object_id
where obj.owner='BALAZS'
order by snap_id;

OBJECT_NAME OWNER LOGICAL_READS_DELTA SNAP_ID
------------------- ---------------- ------------------- ----------
SEGADV_LIO BALAZS 704 620
SEGADV_LIO BALAZS 6480 623
SEGADV_LIO BALAZS 6480 625

The question remains open how the segment advisor algorithm works whether it checks when the table was last analysed and if it hasn't been touched for a while it may omits it from the segment advisor job.

In our data warehouse we were struggling for quite a while to a find a suitable solution for some daily OLTP like loads which back in the days were relying on an Advanced Queuing solution leading to a massive amount of row-by-row data processing and very high amount of IOPS.
Since we are loading data throughout the day it became a necessity to switch to some kind of bulk load mechanism. Our developers came up with the idea to have some hash partitioned tables which enable partition wise joins when having the same partitioning key. Now the merges work pretty fast since statements using the parallel options Oracle (11.2.0.4) provides:
-parallel DML
-parallel query
On the flipside this is taking up a lot of database resources in terms of CPU and IO since the statements still end up in a full table scan only split on the partition level.
One of the tables is storing core information so as the time goes by more processes started to be built around that table using hash partitioning as well.

The point the developers made was: you have to have to same amount of partitions on the tables to achieve better performance.
I was interested whether it is possible to use less partitions or even no partitioning at all for the newly created tables without having much impact on the performance. I created some tests to demonstrate the results:

I created some tables all of them are using the order_id as the partitioning key and the number of partitions varies:

create table HASH_TEST
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 4;

create table HASH_TEST_BIG
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 2;

create table HASH_TEST_BIG_SAME
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 4;

insert into hash_no_part select customer_id,order_id from ods_acc where rownum<=100000;
insert into hash_test_big select customer_id,order_id from ods_acc where rownum<=10000000;
insert into hash_test_big_same_lot select customer_id,order_id from ods_acc where rownum<=10000000;
commit;

begin
dbms_stats.gather_table_stats(USER,'HASH_NO_PART');
dbms_stats.gather_table_stats(USER,'HASH_TEST_BIG');
dbms_stats.gather_table_stats(USER,'HASH_TEST_BIG_SAME');
end;
/

Now let's run selects to see whether the number of partitions really makes a difference:

--# of partitions same
select s.customer_id from hash_test s
join hash_test_big_same b
on s.order_id=b.order_id
where b.order_id=19422274;

Execution Plan
----------------------------------------------------------
Plan hash value: 261338677

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 116 | 2088 | 1517 (1)| 00:00:01 | | |
|* 1 | HASH JOIN | | 116 | 2088 | 1517 (1)| 00:00:01 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 3 | 36 | 46 (0)| 00:00:01 | | |
| 3 | PARTITION HASH SINGLE | | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
|* 4 | TABLE ACCESS FULL | HASH_TEST | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
| 5 | PARTITION HASH SINGLE | | 40 | 240 | 1471 (1)| 00:00:01 |KEY(AP)|KEY(AP)|
|* 6 | TABLE ACCESS FULL | HASH_TEST_BIG_SAME | 40 | 240 | 1471 (1)| 00:00:01 | 1 | 1 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("S"."ORDER_ID"="B"."ORDER_ID")
4 - filter("S"."ORDER_ID"=19422274)
6 - filter("B"."ORDER_ID"=19422274)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2094 consistent gets
0 physical reads
0 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed

--# of partitions not the same 2 for the small table 8 for the big table
select s.customer_id from hash_test s
join hash_test_big b
on s.order_id=b.order_id
where b.order_id=19422274;

Elapsed: 00:00:00.18

Execution Plan
----------------------------------------------------------
Plan hash value: 3648893118

----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 110 | 1980 | 3670 (1)| 00:00:02 | | |
|* 1 | HASH JOIN | | 110 | 1980 | 3670 (1)| 00:00:02 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 3 | 36 | 46 (0)| 00:00:01 | | |
| 3 | PARTITION HASH SINGLE | | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
|* 4 | TABLE ACCESS FULL | HASH_TEST | 3 | 36 | 46 (0)| 00:00:01 | 1 | 1 |
| 5 | PARTITION HASH SINGLE | | 37 | 222 | 3623 (1)| 00:00:02 |KEY(AP)|KEY(AP)|
|* 6 | TABLE ACCESS FULL | HASH_TEST_BIG | 37 | 222 | 3623 (1)| 00:00:02 | 1 | 1 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("S"."ORDER_ID"="B"."ORDER_ID")
4 - filter("S"."ORDER_ID"=19422274)
6 - filter("B"."ORDER_ID"=19422274)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4822 consistent gets
0 physical reads
0 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed

In the first case both tables had 4 partitions and in the second case the larger table had only 2 partitions. Looking at the cost and the number of consistent gets shows a very significant difference so it seems like if the number of partitions are the same joining the two tables will consume less resources.

My point: I am fine with using the same number of partitions for every table involved in this process but could we just lower the number of partitions thereby reducing table fragmentation? Following my thinking here is another test to show how much influence the number of partitions have on the performance:

I created 2 tables with a higher amount of partitions:

create table HASH_TEST_LOT
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 16;

create table HASH_TEST_BIG_SAME_LOT
(
customer_id NUMBER,
order_id NUMBER
)
partition by hash (ORDER_ID)
partitions 32;

insert into hash_test_lot select customer_id,order_id from ods_acc where rownum<=100000;
insert into hash_test_big_same_lot select customer_id,order_id from ods_acc where rownum<=10000000;

begin
dbms_stats.gather_table_stats(USER,'HASH_TEST_LOT');
dbms_stats.gather_table_stats(USER,'HASH_TEST_BIG_SAME_LOT');
end;
/

Let's run the same select as before:

select s.customer_id from hash_test_lot s
join hash_test_big_same_lot b
on s.order_id=b.order_id
where b.order_id=19422274;

188 rows selected.

Elapsed: 00:00:00.18

Execution Plan
----------------------------------------------------------
Plan hash value: 2985387547

-------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 89 | 3471 | 1833 (18)| 00:00:01 | | |
|* 1 | HASH JOIN | | 89 | 3471 | 1833 (18)| 00:00:01 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 4 | 104 | 45 (0)| 00:00:01 | | |
| 3 | PARTITION HASH SINGLE | | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
|* 4 | TABLE ACCESS FULL | HASH_TEST_LOT | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
| 5 | PARTITION HASH SINGLE | | 22 | 286 | 1788 (19)| 00:00:01 |KEY(AP)|KEY(AP)|
|* 6 | TABLE ACCESS FULL | HASH_TEST_BIG_SAME_LOT | 22 | 286 | 1788 (19)| 00:00:01 | 1 | 1 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("S"."ORDER_ID"="B"."ORDER_ID")
4 - filter("S"."ORDER_ID"=19422274)
6 - filter("B"."ORDER_ID"=19422274)

Note
-----
- dynamic sampling used for this statement (level=4)

Statistics
----------------------------------------------------------
72 recursive calls
2 db block gets
2432 consistent gets
0 physical reads
264 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
188 rows processed

As you see the number of consistent gets is slightly higher but I wouldn't consider it really significant (it's a big box) so in my opinion we could go for smaller number of hash partitions in the tables, keeping the number of partitions equal.

As I mentioned previously another option I wanted to examine is having the small table with no partitioning at all:

create table hash_no_part

(

customer_id NUMBER,

order_id NUMBER

)

;

insert into hash_no_part select customer_id,order_id from ods_acc where rownum<=100000;

begin
dbms_stats.gather_table_stats(USER,'HASH_NO_PART');
end;
/

188 rows selected.

Elapsed: 00:00:00.08

Execution Plan
----------------------------------------------------------
Plan hash value: 2379155543

---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 32 | 1474 (1)| 00:00:01 | | |
|* 1 | HASH JOIN | | 1 | 32 | 1474 (1)| 00:00:01 | | |
| 2 | PART JOIN FILTER CREATE| :BF0000 | 1 | 26 | 3 (0)| 00:00:01 | | |
|* 3 | TABLE ACCESS FULL | HASH_NO_PART | 1 | 26 | 3 (0)| 00:00:01 | | |
| 4 | PARTITION HASH SINGLE | | 40 | 240 | 1471 (1)| 00:00:01 |KEY(AP)|KEY(AP)|
|* 5 | TABLE ACCESS FULL | HASH_TEST_BIG_SAME | 40 | 240 | 1471 (1)| 00:00:01 | 1 | 1 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("S"."ORDER_ID"="B"."ORDER_ID")
3 - filter("S"."ORDER_ID"=19422274)
5 - filter("B"."ORDER_ID"=19422274)

Statistics
----------------------------------------------------------
33 recursive calls
1 db block gets
2118 consistent gets
0 physical reads
132 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
5 sorts (memory)
0 sorts (disk)

188 rows processed

Pretty much the same as with partitioning so there is a possibility to use a simple table here.
The last point I was interested in having a global index on the large hash-partitioned table's partition key (order_id) which could might eliminate the hash-all partition scans:

create index glob_index on HASH_TEST_BIG_SAME_LOT(order_id) global;

select s.customer_id from hash_test_lot s
join hash_test_big_same_lot b
on s.order_id=b.order_id
where b.order_id=19422274;

188 rows selected.

Elapsed: 00:00:00.14

Execution Plan
----------------------------------------------------------
Plan hash value: 3234924659

--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 188 | 7332 | 48 (0)| 00:00:01 | | |
|* 1 | HASH JOIN | | 188 | 7332 | 48 (0)| 00:00:01 | | |
| 2 | PARTITION HASH SINGLE| | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
|* 3 | TABLE ACCESS FULL | HASH_TEST_LOT | 4 | 104 | 45 (0)| 00:00:01 | 1 | 1 |
|* 4 | INDEX RANGE SCAN | GLOB_INDEX | 47 | 611 | 3 (0)| 00:00:01 | | |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("S"."ORDER_ID"="B"."ORDER_ID")
3 - filter("S"."ORDER_ID"=19422274)
4 - access("B"."ORDER_ID"=19422274)

Note
-----
- dynamic sampling used for this statement (level=4)

Statistics
----------------------------------------------------------
13 recursive calls
0 db block gets
226 consistent gets
2 physical reads
0 redo size
4273 bytes sent via SQL*Net to client
656 bytes received via SQL*Net from client
14 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)

188 rows processed

This looks very promising but also you have to keep in mind that the global index may cause some noticable performance degradation when you are inserting into the table but this is another topic for later maybe, another issue which was pointed out is the concurrency on global index while inserting into more partitions at the same time therefore the loading strategy should be redesigned as well.

observations

Search This Blog

Wednesday, 5 April 2017

A quick query to find top TEMP abusers in case of a query within a package

Friday, 3 March 2017

Can historical table access be determined based on DBA_HIST_SEG_STAT statistics?

Wednesday, 11 May 2016

Add a new column with a value to a very large table

Friday, 29 April 2016

Hash-partitioning in a data warehouse and some side effects