Oracle Streaming Synchronization not apply

Product Oracle Server – Enterprise Edition
Product Version 10.2.0.4
Operating System Oracle Solaris on SPARC (64-bit) OS Version 10
Problem Description Data Changes not apply on destination site
$V$STREAMS_CAPTURE==>PAUSED FOR FLOW CONTROLCOLUMN queue_schema FORMAT A12
COLUMN queue_name FORMAT A20
SELECT queue_schema, queue_name, unbrowsed_msgs, overspilled_msgs, memory_usage, publisher_state
FROM V$BUFFERED_PUBLISHERS;QUEUE_SCHEMA QUEUE_NAME UNBROWSED_MSGS OVERSPILLED_MSGS MEMORY_USAGE PUBLISHER_STATE
———————————————————–
STRMADMIN CAPTURE_SRC 14060 0 80 IN FLOW CONTROL: TOO MANY UNBROWSED MESSAGES
STRMADMIN APPLY_SRC 3682 0 80 PUBLISHING MESSAGES
STRMADMIN APPLY_SRC 0 0 80 PUBLISHING MESSAGES

This is bidirectional streams
We had two Production database sites and both the sites had 2-Node Oracle10g RAC
We had Configured the Oracle Streaming between two sites.
Oracle Streaming working fine since more then 8 months.
Since today early morning we faces the problem.

Data changes activity only perform on primary site and right not it’s going on
table data is not replicating on destination sites due to our business continue loss also more customer escalations

We have provided the Streaming Health checkup report both the sites.
streams_hc_10GR2_orcld_05092012.html
streams_hc_10GR2_orclp_05092012.html

How to resolved the issue?.
Issue Clarification-1

Data Changes not apply on destination site
$V$STREAMS_CAPTURE==>PAUSED FOR FLOW CONTROL

Report Analysis-1
++ OPEN STREAMS CAPTURE TRANSACTIONS ++ (Source)

+** Count **+

Streams Name Open Transactions Total LCRs
CAPTURES_SRC 1 8573662

+** Detail **+

INST_ID Streams Name Streams Ty XIDUSN XIDSLT XIDSQN CUMULATIVE_MESSAGE_COUNT TOTAL_MESSAGE_COUNT FIRST_MESSAGE_TIM FIRST_MESSAGE_NUMBER LAST_MESSAGE_TIME LAST_MESSAGE_NUMBER
1 CAPTURES_SRC CAPTURE 76 0 79680 8573662 0 22:02:18 09/04/12 8.2413E+12 00:10:09 09/05/12 8.2414E+12

++ APPLY SPILLED TRANSACTIONS ++ (Destination)

Apply Name Transaction
ID
SCN of First
Message in Txn
First Message
Creation Time
Count of
Messages in Txn
Spill
Creation Time
APPLYS_DEST 76.0.79680 8.2413E+12 22:02:18 09/04/12 2513933 22:03:37 09/04/12

++ APPLY Reader Statistics ++  (Destination)

Apply Name Captured or
User-Enqueued L
Process STATE Total Messages
Dequeued
Total Messages
Spilled
SGA Used Oldest
SCN
Oldest
Transaction
APPLYS_DEST Captured LCRS P000 DEQUEUE MESSAGES 33600091 4232671 0 8.2413E+12 76.0.79680
Apply Name Apply Reader
Latency
(Seconds)
Dequeued Messa
Creation
Timestamp
Dequeue
Timestamp
Last
Dequeued Message
Number
Last
Browsed Message
Number
APPLYS_DEST 48743 00:13:51 09/05 13:46:14 09/05 8.2414E+12 0
Apply Name Total Messages
Dequeued
Total Messages
Spilled
Elapsed Time
Dequeue
(centisecs)
Elapsed Time
Schedule
(centisecs)
Elapsed Time
Spill
(centisecs)
APPLYS_DEST 33600106 4232671 472603319 5325073 746028

++ OPEN STREAMS CAPTURE TRANSACTIONS ++

+** Count **+

Streams Name Open Transactions Total LCRs
CAPTURES_DEST 1 0

+** Detail **+

INST_ID Streams Name Streams Ty XIDUSN XIDSLT XIDSQN CUMULATIVE_MESSAGE_COUNT TOTAL_MESSAGE_COUNT FIRST_MESSAGE_TIM FIRST_MESSAGE_NUMBER LAST_MESSAGE_TIME LAST_MESSAGE_NUMBER
1 CAPTURES_DEST CAPTURE 10 22 4184725 0 0 13:46:08 09/05/12 8.2422E+12 0

+ MESSAGES IN BUFFER QUEUE ++  (Destination)

Queue
Name
STARTUP_TIME Current Number of
Outstanding
Messages
in Queue
Current Number of
Spilled
Messages
in Queue
Cumulative
Number
of Messages
in Queue
Cumulative Number
of Spilled
Messages
in Queue
Current Number of
Expired
Messages
in Queue
STRMADMIN.CAPTURE_DEST 02:59:05 05/16/12 0 0 3682 13 0
STRMADMIN.APPLY_DEST 02:56:39 05/16/12 4369 0 33604301 1549 0
SYS.KUPC$S_1_20120516020419 02:04:22 05/16/12 0 0 16 0 0

1. There an open capture and apply transactions with large number of LCR’s.
2. Also there are large amount of spilled messages in the apply queue.

The above indicates that the apply process is not working fast enough, or there are huge transaction caused the apply process to slow down until finish it, and as a result, the capture process has been paused to control the flow of the messages from source to target.

Action Plan-1

To verify the above conclusion, please generate another set of health check reports from both sides, I will compare the statistics numbers between the old and new sets and let you know the results.. If my conclusion is correct, we will have only two options : Wait for the large transaction to finish, or instruct the capture and apply to ignore it (not recommended for data consistency) ..

Please generate and provide another two new health check reports from the two databases and provide them.

please find the required streaming hc reports

streams_hc_10GR2_orcld_05092012_2.html
streams_hc_10GR2_orclp_05092012_2.html

Report Analysis-2

++ APPLY Reader Statistics ++ (Destination)

Apply Name Captured or
User-Enqueued L
Process STATE Total Messages
Dequeued
Total Messages
Spilled
SGA Used Oldest
SCN
Oldest
Transaction
APPLYS_DEST Captured LCRS P000 DEQUEUE MESSAGES 33895603 4232671 0 8.2413E+12 76.0.79680
Apply Name Apply Reader
Latency
(Seconds)
Dequeued Messa
Creation
Timestamp
Dequeue
Timestamp
Last
Dequeued Message
Number
Last
Browsed Message
Number
APPLYS_DEST 54081 00:15:44 09/05 15:17:05 09/05 8.2414E+12 0
Apply Name Total Messages
Dequeued
Total Messages
Spilled
Elapsed Time
Dequeue
(centisecs)
Elapsed Time
Schedule
(centisecs)
Elapsed Time
Spill
(centisecs)
APPLYS_DEST 33895609 4232671 472610296 5866722 746028

++ OPEN STREAMS CAPTURE TRANSACTIONS ++ (Source)

+** Count **+

Streams Name Open Transactions Total LCRs
CAPTURES_SRC 1 918398

+** Detail **+

INST_ID Streams Name Streams Ty XIDUSN XIDSLT XIDSQN CUMULATIVE_MESSAGE_COUNT TOTAL_MESSAGE_COUNT FIRST_MESSAGE_TIM FIRST_MESSAGE_NUMBER LAST_MESSAGE_TIME LAST_MESSAGE_NUMBER
1 CAPTURES_SRC CAPTURE 76 0 79680 918398 0 8.2414E+12 00:15:50 09/05/12 8.2414E+12

++ MESSAGES IN BUFFER QUEUE ++ (Destination)

Queue
Name
STARTUP_TIME Current Number of
Outstanding
Messages
in Queue
Current Number of
Spilled
Messages
in Queue
Cumulative
Number
of Messages
in Queue
Cumulative Number
of Spilled
Messages
in Queue
Current Number of
Expired
Messages
in Queue
STRMADMIN.CAPTURE_DEST 02:59:05 05/16/12 0 0 3682 13 0
STRMADMIN.APPLY_DEST 02:56:39 05/16/12 4634 0 33900201 1549 0
SYS.KUPC$S_1_20120516020419 02:04:22 05/16/12 0 0 16 0 0

Issue Clarification-2

I see the capture status has been changed from “PAUSED FOR FLOW CONTROL” to “CAPTURING CHANGES” .. This mean that the capture process is working now. However, it might take some to time to catch up as the large transaction I have identified before is still open :

Apply reader total messages dequeued : Old value : 33600091 new value : 33895603 (apply is working good) .

OPEN STREAMS CAPTURE TRANSACTIONS :

XID: 76.0.79680 , number of LCR’s : Old value : 8573662 , new value : 918398

As you see the total number of LCR’s of this transaction has been changed and the new value is less than the old value, this is indicator that capture is capturing this transaction from beginning, another evidence is the following :

Capture name : CAPTURES_SRC, session ID: old value: 1031 , new value : 938
Last status timestamp: Old : 10:51:45 09/05/12 , new : 15:15:06 09/05/12

So it seems that the capture process has been restarted with new session ID, not sure if that was done automatically or if you have restarted it, anyway, I will recommend you to monitor the capture for couple of hours, I think it will catch up after some time as its not paused anymore.. Please monitor the system for couple of hours, and let me know if the new changes are being replicated or not, if there are still a problem then, please generate new health check reports and send them.. the action in this case might be to ignore that transaction..

Reply from client

SQL> SELECT CAPTURE_NAME,QUEUE_NAME,STATUS,ERROR_NUMBER,ERROR_MESSAGE FROM DBA_CAPTURE;

CAPTURE_NAME QUEUE_NAME STATUS ERROR_NUMBER ERROR_MESSAGE
——————————————————————————–
CAPTURES_SRC CAPTURE_SRC ENABLED

SQL> SELECT CAPTURE_NAME,STATE FROM V$STREAMS_CAPTURE;

CAPTURE_NAME STATE
——————————————————————————–
CAPTURES_SRC PAUSED FOR FLOW CONTROL

SQL> SELECT PROPAGATION_NAME,STATUS,ERROR_MESSAGE,ERROR_DATE FROM DBA_PROPAGATION;

PROPAGATION_NAME STATUS ERROR_MESSAGE ERROR_DAT
——————————————————-
PROP_SRC_TO_DEST ENABLED

SQL> SELECT APPLY_NAME,STATUS,ERROR_NUMBER,ERROR_MESSAGE FROM DBA_APPLY;

APPLY_NAME STATUS ERROR_NUMBER ERROR_MESSAGE
——————————————————————————–
APPLYS_SRC ENABLED

We want the fast resolution with work around as our business services loosing continue
What the proc and cons of ignore that transaction.
Can we know that which user-schema/sql statement doing so long changes in table?
C001: long running txn detected, xid: 0x004c.000.00013740==>This is the logs of primary site

If we ignore that transaction then subsequent transaction apply or not?
Can we know the sql statement of that long transaction what you suggest for ignore?
Why there are lots of log switch happen but not shown log miner logs in alert logs?

Action Plan-2

1. To know the sql of the transaction, you can use the logminer utility with ad-hoc option, please use the instructions in the following article to do that :

311620.1 How to Use LogMiner to Determine Problem SQL in a Streams Environment

2. Yes, if you ignore that transaction, and other transactions depends on it might cause a an apply errors.. We can’t decide that unless you know the sql commands of that transaction, but for example it might be any update statement affects only one table, in this case, it will affect only the subsequent transactions to that specific table only. But even with that, it might be the only solution.

If you decide to ignore that transaction, the following is the required steps:

Ignore/skip the transaction on the capture and apply process –

dbms_capture_adm.set_parameter(‘CAPTURE_NAME’,’_ignore_transaction’,’TXN_ID’);
dbms_apply_adm.set_parameter(‘APPLY_NAME’,’_ignore_transaction’,’TXN_ID’);

At the destination site:
Please stop apply.
exec dbms_apply_adm.stop_apply(‘APPLY_NAME_HERE’);
— Then set the parameter to ignore that transaction
execute dbms_apply_adm.set_parameter
(‘apply_name_here’,’_ignore_transaction’,’5.87.14127′);
— Now restart apply
exec dbms_apply_adm.start_apply(‘APPLY_NAME_HERE’);

Do the same for your capture process and it also will need to be restarted.

Please let me know if you have any further questions..

Reply from client

Can we get info on process ? I mean how many transaction pending and how long it will take ?

Action Plan-3

It will not be easy as the transaction is still open on the capture side, meaning that the capture will need to mine more logs for the same transaction after it resumes (its currently paused for flow control) .. You can generate another set of health check reports and I will let you know the progress till now. However, please DON”T restart the capture under any condition, every time you will restart it, it will mine this transaction from scratch, and I think its already happened before.

please find the required reports
streams_hc_10GR2_orcld_05092012_3.html
streams_hc_10GR2_orclp_05092012_3.html

Report Analysis-3

++ APPLY Reader Statistics ++(Destination)

Apply Name Captured or
User-Enqueued L
Process STATE Total Messages
Dequeued
Total Messages
Spilled
SGA Used Oldest
SCN
Oldest
Transaction
APPLYS_DEST Captured LCRS P000 DEQUEUE MESSAGES 34569811 4232671 0 8.2413E+12 76.0.79680
Apply Name Apply Reader
Latency
(Seconds)
Dequeued Messa
Creation
Timestamp
Dequeue
Timestamp
Last
Dequeued Message
Number
Last
Browsed Message
Number
APPLYS_DEST 67009 00:20:05 09/05 18:56:54 09/05 8.2414E+12 0
Apply Name Total Messages
Dequeued
Total Messages
Spilled
Elapsed Time
Dequeue
(centisecs)
Elapsed Time
Schedule
(centisecs)
Elapsed Time
Spill
(centisecs)
APPLYS_DEST 34569814 4232671 472626432 7177295 746028

++ OPEN STREAMS CAPTURE TRANSACTIONS ++  (Source)

+** Count **+

Streams Name Open Transactions Total LCRs
CAPTURES_SRC 1 1590994

+** Detail **+

INST_ID Streams Name Streams Ty XIDUSN XIDSLT XIDSQN CUMULATIVE_MESSAGE_COUNT TOTAL_MESSAGE_COUNT FIRST_MESSAGE_TIM FIRST_MESSAGE_NUMBER LAST_MESSAGE_TIME LAST_MESSAGE_NUMBER
1 CAPTURES_SRC CAPTURE 76 0 79680 1590994 0 8.2414E+12 00:20:13 09/05/12 8.2414E+12

++ APPLY SPILLED TRANSACTIONS ++ (Destination)

Apply Name Transaction
ID
SCN of First
Message in Txn
First Message
Creation Time
Count of
Messages in Txn
Spill
Creation Time
APPLYS_DEST 76.0.79680 8.2413E+12 22:02:18 09/04/12 2190807 22:03:37 09/04/12

++ MESSAGES IN BUFFER QUEUE ++  (Destination)

Queue
Name
STARTUP_TIME Current Number of
Outstanding
Messages
in Queue
Current Number of
Spilled
Messages
in Queue
Cumulative
Number
of Messages
in Queue
Cumulative Number
of Spilled
Messages
in Queue
Current Number of
Expired
Messages
in Queue
STRMADMIN.CAPTURE_DEST 02:59:05 05/16/12 0 0 3690 13 0
STRMADMIN.APPLY_DEST 02:56:39 05/16/12 4323 0 34574101 1549 0
SYS.KUPC$S_1_20120516020419 02:04:22 05/16/12 0 0 16 0 0

Issue Clarification-3

++ APPLY Reader Statistics ++

Total Messages
Dequeued

Report1 : 33600091
Report2 : 33895603
Report3 : 34569811

++ MESSAGES IN BUFFER QUEUE ++

Outstanding messages: Queue name : STRMADMIN.APPLY_DEST

Report1: 4369
Report2: 4634
Report3: 4323

Cumulative number of messages:

Report1: 33604301
Report2: 33900201
Report3: 34574101

RCA

XID 76.0.79680

On target(apply) : number of LCR’s : 2190807
On Source(capture) : Number of LCR’s: 1590994

This is normal because the capture process has been restarted at some point of time, at that point the apply has received around 2 millions of the messages in that transaction.. But when the capture has been restarted , it has started mining the same transaction from beginning, so the apply is waiting till the same dequeued LCR (LCR number 2190807) is processed.

Anyway, I reviewed the oldest health check, which was generated before restarting the capture process, and the number of LCR’s according to that report was : 8573662 (more than 8 millions) ..

That means that there are AT LEAST 6 million LCR of the same transaction needs to be mined on the source.. which is too much.. I cant assume how much time it would take, but the following might be indicator :

The capture last started at : 12:16:47 09/05/12
The Capture current time stamp (according to the last report) : 18:54:17 09/05/12

That mean that the capture has processed around 1590994 (one million and half ) in about 4 hours.. if it has worked with the same frequency, that mean it will need around 2.75 hour for one million LCR.. that lead us to expect around 14-15 hours for the captur to process the 8 million LCR’s in the queue.. Remember that this is the capture time, the apply might be working with the same frequency (hopefully) or might be slower.

The results might not be accurate, and the frequency might be changed with time, but don’t expect a lot..

I know that these estimations is not lovely and you will not like them, but I have to put all the facts on the table so you can take the correct decision..

For example, if you decide to ignore transaction 76.0.79680 , that might lead to data inconsistency, yes.. that is possible.. But if this is case, how much time do you need to synchronize the data using EXP/IMP for example ? is this less than or more than the 15 hours estimated for the capture to catch up (given that this number is not finale and may be increased) ??

We cannot wait for the long transaction to finish as you know we are losing our business services.
So as you suggest in earlier response we can use oracle datapump and reconfigure the streaming solutions
In Our case expdp/impdp required less time as we had single table of 4gb data in it.

1. De-configure the Oracle Streaming at both the sites
2. Configure the Oracle Streaming using the expdp/impdp

Troubleshooting Commands:-

Oracle Streaming Health checkup scripts

streams_hc_10GR2.sql

prompt
============================================================================================
prompt
 prompt ++ OPEN STREAMS CAPTURE TRANSACTIONS ++
prompt
prompt +** Count **+
select streams_name,count(*) "Open Transactions",sum(cumulative_message_count) "Total LCRs"from gv$streams_transaction where streams_type='CAPTURE' group by streams_name;
prompt
prompt +** Detail **+
select * from gv$streams_transaction where streams_type='CAPTURE' order by streams_name,first_message_number;

prompt Configuration:  Queue 

Capture  Propagation   Apply   Database 

prompt Analysis:  History   Rules   Notifications   Configuration   Performance   Wait Analysis 
prompt Statistics Quick Link: Queue Capture Propagation Apply  Errors

prompt =========================================================================================
============================================================================================
prompt
 prompt ++ APPLY SPILLED TRANSACTIONS ++

col APPLY_NAME Head 'Apply Name'
col txn_id HEAD 'Transaction|ID'
col FIRST_SCN Head 'SCN of First| Message in Txn'
col MESSAGE_COUNT Head 'Count of |Messages in Txn'
col FIRST_MESSAGE_CREATE_TIME Head 'First Message|Creation Time'
col SPILL_CREATION_TIME Head ' Spill |Creation Time'

select Apply_name,
xidusn||'.'||xidslt||'.'||xidsqn txn_id,
first_scn,
first_message_create_time,
message_count,
spill_creation_time
from dba_apply_SPILL_TXN;

prompt

============================================================================================
prompt =========================================================================================
prompt
 prompt ++ MESSAGES IN BUFFER QUEUE ++
col QUEUE format a50 wrap
col "Message Count" format 9999999999999999 heading 'Current Number of|Outstanding|Messages|in Queue'
col "Spilled Msgs" format 9999999999999999 heading 'Current Number of|Spilled|Messages|in Queue'
col "TOtal Messages" format 9999999999999999 heading 'Cumulative |Number| of Messages|in Queue'
col "Total Spilled Msgs" format 9999999999999999 heading 'Cumulative Number|of Spilled|Messages|inQueue'
col "Expired_Msgs" heading 'Current Number of|Expired|Messages|in Queue'

SELECT queue_schema||'.'||queue_name Queue, startup_time, num_msgs "Message Count", spill_msgs "Spilled Msgs",
cnum_msgs "Total Messages", cspill_msgs "Total Spilled Msgs", expired_msgs FROM gv$buffered_queues;

prompt Configuration:  Queue 

Capture  Propagation   Apply   Database 

prompt Analysis:  History   Rules   Notifications   Configuration   Performance   Wait Analysis 

prompt

============================================================================================

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.