Yaping's Weblog

August 30, 2008

ARCH relevant

Filed under: Oracle — Yaping @ 4:29 am
Tags:

Previous Oracle 10g, when ARCH process need archive redo log, it firstly build an archive destination list. Once this list is completed, the ARCH process reads 1 MB chunk of redo log data that is to be archived. The chunk size is controlled by one hidden parameter called _log_archive_buffer_size, its default value is 2048 redo log blocks, the redo log block size is 512 byte, so the chunk size is just 1 MB. Then this 1MB chunk is sent to the first destination in the list, after write has completed, the same 1 MB chunk is sent to the second destination. It continues until this chunk data has been written to all destinations. Next, the ARCH process reads the next 1MB chunk of redo log, repeats the above actions until this redo log has been written to all destination. So archiving is only as fast as the slowest destination.

 

So what will happen if the writing progress is slow or dead?

In some situations, the network is slow or the server is under hard work load, it can cause the arch progress is very slow. If the redo log does not be archived completely, the database can switch to the next redo log file if there’re redo log groups available. It is possible that the log writer process recycles through all available online redo log groups and tries to use the redo log file which has not yet been archived completely. Now the database will be suspended.

If the ARCH process can’t get response from archive destination, the network or server is maybe dead, the result will be different from progress slow. The unavailable destination just be closed from archive and the ARCH process continues to archive log to available destinations. The OPTIONAL and MANDATORY parameters just control that whether the online redo log file can be reused when this redo log doesn’t be completed archive to the destination.

 

We may pay more attention if there’re archive destinations through network (NFS or standby). If the data is transmitted over a slow network, no error is encountered and the destination is not closed. Transmission continues, but is very slow. Ultimately, the database will be suspended caused by lack for available redo log groups.

 

Since Oracle 9.2.0.5, it introduces one parameter called _LOG_ARCHIVE_CALLOUT, allow the DBA to change the default behavior:

_LOG_ARCHIVE_CALLOUT=’LOCAL_FIRST=TRUE’

If this parameter is set and the standby adopts ARCH process to archive log, then the ARCH process will archive to the local destination first. Once the redo log has been completely and successfully archived at least one local destination, it will then be transmitted to the remote destination. This is default behavior since Oracle 10g Release 1.

 

 

@>@getPar

Enter value for parameter: _log_archive_buffer_size

old   6: ksppinm like lower(‘%&parameter%’)

new   6: ksppinm like lower(‘%_log_archive_buffer_size%’)

NAME                                     VALUE                                    DESCRIPTION

—————————————- —————————————- —————————————-

_log_archive_buffer_size                 2048                                     Size of each archival buffer in log file

                                                                                   Blocks

The max value is 2048 within Oracle 9i under Linux.

 

[oracle@chen ~]$ ps -ef|grep arc

oracle   12148     1  0 03:25 ?        00:00:00 ora_arc0_chen

oracle   12150     1  0 03:25 ?        00:00:00 ora_arc1_chen

oracle   12529 12495  1 04:34 pts/9    00:00:00 grep arc

 

[oracle@chen ~]$ strace -p 12148 -o ora_arc0_chen.log &

[1] 12532

 

[oracle@chen ~]$ strace -p 12150 -o ora_arc1_chen.log &

[2] 12559

 

@>alter system switch logfile;

System altered.          

 

We can find the similar following records in the trace files.

… …

open(“/u03/oradata/9208/chen/redo02.log”, O_RDONLY|O_DIRECT|O_LARGEFILE) = 16

… …

open(“/u03/oradata/arch/1_329.dbf”, O_RDWR|O_SYNC|O_DIRECT|O_LARGEFILE) = 18

… …

pread(16, “I\1\1c\241\201&\270\326 \t \t\200″…, 1048576, 512) = 1048576

… …

pwrite(18, “I\1\1\10M\241\201&D\212\233                “…, 1048576, 1049088) = 1048576

… …

 

 

@>@getPar

Enter value for parameter: _log_archive_callout

old   6: ksppinm like lower(‘%&parameter%’)

new   6: ksppinm like lower(‘%_log_archive_callout%’)

NAME                                     VALUE                                    DESCRIPTION

—————————————- —————————————- —————————————-

_log_archive_callout          

 

 

Logs gap resolution

Since Oracle 9i Release 1, automatic gap resolution is implemented during log transport processing. As the LGWR or ARCH process begins to send redo over to the standby, the sequence number of the log being archived is compared to the last sequence received by the RFS process on the standby. If the RFS process detects that the archive log being received is greater than the last sequence received plus one, then the RFS will piggyback a request to the primary to send the missing archive logs.

 

Starting in Oracle Release 2, automatic gap resolution has been enhanced. In addition to the above, the ARCH process on the primary database polls all standby databases every minute to see if there is a gap in the sequence of archived redo logs. If a gap is detected then the ARCH process sends the missing archived redo log files to the standby databases that reported the gap. Once the gap is resolved, the LGWR process is notified that the site is up to date.

 

If the MRP process finds that the archived log is missing or is corrupt on standby, FAL is called to resolve the gap or obtain a new copy.  Since MRP has no direct communications link with the primary, it must use the FAL_SERVER and FAL_CLIENT initialization parameters to resolve the gap. Both of these parameters must be set in the standby site. The two parameters are defined as:

 

FAL_SERVER: An OracleNet service name that exist in the standby tnsnames.ora file that points to the primary database listener.

FAL_CLIENT: An OracleNet service name that exist in the primary tnsnames.ora file that points to the standby database listener.

 

Once MRP needs to resolve a gap it uses the value from FAL_SERVER to call the primary database. Once communication with the primary has been established, MRP passes the FAL_CLIENT value to the primary ARCH process. The primary ARCH process locates the remote archive destination with the corresponding service name and ships the missing archived redo logs.

 

 

Archive process

When we take a database backup, we generally archive current redo log first, after backup datafiles, we archive current redo log again. We can achieve it by using commands alter system switch logfile or alter system archive log current. They have a different. The switch command will fire background process to archive log and return back to the command line immediately. So we can continue to next tasks, but the archive task maybe doesn’t be completed. But the archive command will fire this user process to archive log, it doesn’t return to the command line until it complete to archive. So we should consider use the archive command to archive current log when we write backup scripts.

 

We can use the following experiment to confirm it.

 

@>@myid

Wrote file /tmp/myvar.sql

sid:10 serial:7 pid:11 spid:12691

 

[oracle@chen ~]$ strace -p 12691 -o 12691.log &

[3] 12693

 

@>alter system archive log current;

System altered.

 

Advertisements

1 Comment »

  1. Excellent. I was looking for this missing information for DG.

    Comment by Asif Jafri — May 11, 2009 @ 4:00 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: