Wednesday, September 21, 2011

log file sync" wait is topping AWR



From the AWR report, we can see that log file sync average wait time was 24 ms.

We normally expect the average wait time for log file sync less than 5ms.

So, first, please check if the disk I/O for redo logs.

High 'log file sync' waits could be caused by various reasons, some of these are:
1. slower writes to redo logs
2. very high number of COMMITs
3. insufficient redo log buffers
4. checkpoint
5. process post issues

The five reasons  listed above  are examples of reasons that could cause 'log file sync' we need more information to determine the reason for the wait,
 then we can make recommendations.

Please also download and run the script from note 1064487.1.

When a user session(foreground process) COMMITs (or rolls back), the session's redo information needs to be flushed to the redo logfile. The user session will post the LGWR to write all redo required from the log buffer to the redo log file. When the LGWR has finished it will post the user session.

The user session waits on this wait event while waiting for LGWR to post it back to confirm all redo changes are safely on disk.

This may be described further as the time user session/foreground process spends waiting for redo to be flushed to make the commit durable.

Therefore, we may think of these waits as commit latency from the foreground process (or commit client generally).

For further information about log file sync, please refer to MOS note 34592.1 and note 857576.1 How to Tune Log File Sync.


1. For analysis check AWR 

1.1 "log file sync" wait is topping AWR:

Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
log file sync 59,558 3,275 55 54.53 Commit <===== average wait time 55 ms. for 8 hrs. Very high.
DB CPU 1,402 23.34

1.2. Most wait time is contributed by "log file parallel write":

Event Waits %Time -outs Total Wait Time (s) Avg wait (ms) Waits /txn % bg time
db file parallel write 136,730 0 3,957 29 2.44 42.37
log file parallel write 61,635 0 2,810 46 1.10 30.09 <===== 46 ms or 84% of "log file sync" time

2. Output of script from note 1064487.1
Wait histogram also shows the same:
INST_ID EVENT WAIT_TIME_MILLI WAIT_COUNT
---------- ------------------------------------- -------------------------- ----------------------
1 log file sync 1 176
1 log file parallel write 1 117
1 LGWR wait for redo copy 1 150

1 log file sync 2 3013
1 log file parallel write 2 3433
1 LGWR wait for redo copy 2 8

1 log file sync 4 15254
1 log file parallel write 4 18064
1 LGWR wait for redo copy 4 6

1 log file sync 8 44676
1 log file parallel write 8 51155
1 LGWR wait for redo copy 8 9

2.2. Spikes of log file sync are really bad (hundreds of ms):

APPROACH: These are the minutes where the avg log file sync time
was the highest (in milliseconds).
MINUTE INST_ID EVENT
--------------------------------------------------------------------------- ---------- ------------------------------
TOTAL_WAIT_TIME WAITS AVG_TIME_WAITED
----------------- ---------- -----------------
Aug18_0311 1 log file sync
48224.771 58 831.462

Aug18_0312 1 log file sync
100701.614 117 860.698

Aug18_0313 1 log file sync
20914.552 33 633.774

Aug18_0613 1 log file sync
84651.481 93 910.231

Aug18_0614 1 log file sync
139550.663 139 1003.962

2.3. User commits averaged 2/sec, which is low. (Caveat: this number is an 8-hr average.)


2.  oswatcher data, to pinpoint where the IO bottleneck is


From the oswatcher data,

I see extreme IO contention across almost all of the disks between 4-7AM. Disk utilization is between 80-100% with worst IO response time over 200-300 ms. Sdc and sdd are the busiest disks with util constantly above 80%. Example:

avg-cpu: %user %nice %system %iowait %steal %idle
         69.62 1.48  13.46   15.44   0.00   0.00

Device: rrqm/s wrqm/s r/s  w/s rsec/s  wsec/s avgrq-sz avgqu-sz await svctm %util
   sdc 5.21   0.65 119.54 7.49 8932.90 834.85 76.89    33.03    240.00 7.87 100.00 <=== This disk is saturated
  sdc1 5.21   0.65 119.54 7.49 8932.90 834.85 76.89    33.03    240.00 7.87 100.00

  sdd 4.89 0.33 119.87 7.82 9386.32 103.58 74.32 6.64 52.81 7.68 98.05 <=== Same with this one
 sdd1 4.89 0.33 119.87 7.82 9386.32 103.58 74.32 6.64 52.81 7.68 98.05

  sde 0.00 0.33 0.65 8.47 5.21 232.57 26.07 1.79 201.43 92.86 84.69 <=== High utilization @85%
  sde1 0.00 0.33 0.65 8.47 5.21 232.57 26.07 1.79 201.43 92.86 84.69

I also noticed that there were many DBMS jobs running from ~ 7 different databases at the same time. CPU util is ~80%.

zzz ***Sun Aug 21 06:02:00 EDT 2011
top - 06:02:04 up 18 days, 13:51, 6 users, load average: 16.87, 8.64, 4.56
Tasks: 1042 total, 22 running, 1019 sleeping, 0 stopped, 1 zombie
Cpu(s): 69.9%us, 10.2%sy, 0.3%ni, 0.0%id, 18.8%wa, 0.2%hi, 0.6%si, 0.0%st
Mem: 49458756k total, 49155380k used, 303376k free, 951836k buffers
Swap: 4194296k total, 1548904k used, 2645392k free, 39128504k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7680 oracle 15 0 2255m 90m 80m S 12.9 0.2 0:00.80 ora_j002_oracl1 <=== db 1


3. Trace the LGWR while running a dummy DELETE transaction.

3.1) create a dummy user table that has, say 1 million rows.
3.2) At the OS prompt, run
strace -fo lgwr_strace.out -r -p <LGWR_PID>
3.3) From a different window, sqlplus session:
DELETE FROM <dummy_table>;
3.4) COMMIT; -- IMPORTANT. To trigger a log flush.
3.5) CTRL-C the strace and upload the lgwr_strace.out file.


Conclusion
All this indicates IO capacity/hot disk issue. You are over-leveraging the server in terms of IO capacity (especially on the 2 disks: sdc and sde).

You would experience the same slow IO issue in other databases if those 2 disks are shared across the other DBs.

Action Plan

1. Review ASM diskgroup configuration. Make sure the diskgroups are spanned across all available disks.
Example: avoid creating a DATA diskgroup on a couple of disks that are shared among ALL your databases.
2. Consider using EXTERNAL disk redundancy for your DEV databases (especially the REDO logs) to reduce IO load provided that
3. Stagger the DBMS job schedules across multiple DBs so that the workload is more evenly distributed throughout the day,
especially if the jobs are of a similar nature and all run in the say, 4-7AM window.

Hope this helps! Rupam