ACFS filesystem hangs

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

ACFS filesystem hangs

Roshan
Oracle DB 19.18
standalone
GRID 19c

Hello Erman,

could you please advise why mount hangs.

total 0
brwxrwx--- 1 root asmadmin 251, 221697 Jun  1 09:46 interface01-433
brwxrwx--- 1 root asmadmin 251,   1025 Jun  1 10:32 osashare01-2
[grid@T24R18DBDEV asm]$

/bin/mount -t acfs  /dev/asm/interface01-433 /osashare

No errors in mount trace file

[grid@T24R18DBDEV trace]$ cat mount_36970.trc
Trace file /grid/base/diag/crs/t24r18dbdev/crs/trace/mount_36970.trc
Oracle Database 19c Clusterware Release 19.0.0.0.0 - Production
Version 19.18.0.0.0 Copyright 1996, 2023 Oracle. All rights reserved.
2023-06-01 11:02:55.573 :MOUNT.ACFS:1055947392: [36970] Start: /sbin/mount.acfs.bin /dev/asm/osashare01-2 /interface -o rw
[grid@T24R18DBDEV trace]$ cat crsctl_36926.trc
Trace file /grid/base/diag/crs/t24r18dbdev/crs/trace/crsctl_36926.trc
Oracle Database 19c Clusterware Release 19.0.0.0.0 - Production
Version 19.18.0.0.0 Copyright 1996, 2023 Oracle. All rights reserved.
2023-06-01 11:00:14.408 :  CRSCTL:1703599872: crsctl_main: crsctl command failed with status 1
2023-06-01 11:00:14.408 :  CRSCTL:1703599872: ./crsctl.bin stop crs

Thanks,
Roshan
Reply | Threaded
Open this post in threaded view
|

Re: ACFS filesystem hangs

Roshan
Jun  1 10:18:50 T24R18DBDEV kernel: R13: 00000000000001ed R14: 0000000000000004 R15: 0000000000000004
Jun  1 10:18:50 T24R18DBDEV kernel: INFO: task mount.acfs.bin:34014 blocked for more than 368 seconds.
Jun  1 10:18:50 T24R18DBDEV kernel:      Tainted: P           O      5.4.17-2136.314.6.3.el8uek.x86_64 #2
Jun  1 10:18:50 T24R18DBDEV kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun  1 10:18:50 T24R18DBDEV kernel: mount.acfs.bin  D    0 34014      1 0x00000084
Jun  1 10:18:50 T24R18DBDEV kernel: Call Trace:
Jun  1 10:18:50 T24R18DBDEV kernel: __schedule+0x29f/0x5f7
Jun  1 10:18:50 T24R18DBDEV kernel: schedule+0x44/0xb3
Jun  1 10:18:50 T24R18DBDEV kernel: schedule_preempt_disabled+0xe/0x14
Jun  1 10:18:50 T24R18DBDEV kernel: __mutex_lock.isra.8+0x287/0x465
Jun  1 10:18:50 T24R18DBDEV kernel: __mutex_lock_slowpath+0x13/0x19
Jun  1 10:18:50 T24R18DBDEV kernel: mutex_lock+0x2c/0x33
Jun  1 10:18:50 T24R18DBDEV kernel: get_gendisk+0x5f/0x157
Jun  1 10:18:50 T24R18DBDEV kernel: __blkdev_get+0x154/0x581
Jun  1 10:18:50 T24R18DBDEV kernel: blkdev_get+0xef/0x153
Jun  1 10:18:50 T24R18DBDEV kernel: ? blkdev_get_by_dev+0x50/0x4d
Jun  1 10:18:50 T24R18DBDEV kernel: blkdev_open+0x87/0x97
Jun  1 10:18:50 T24R18DBDEV kernel: do_dentry_open+0x143/0x3a3
Jun  1 10:18:50 T24R18DBDEV kernel: vfs_open+0x2d/0x33
Jun  1 10:18:50 T24R18DBDEV kernel: path_openat+0x334/0x161d
Jun  1 10:18:50 T24R18DBDEV kernel: ? KsCacheTagFree+0x16/0xb0 [oracleoks]
Jun  1 10:18:50 T24R18DBDEV kernel: ? KsLHInfoReleCb+0x15a/0x190 [oracleoks]
Jun  1 10:18:50 T24R18DBDEV kernel: ? KsReleaseFastMutex_debug+0x40/0x70 [oracleoks]
Jun  1 10:18:50 T24R18DBDEV kernel: ? KsLHHMRecFnRemoveInt+0x2b4/0x330 [oracleoks]
Jun  1 10:18:50 T24R18DBDEV kernel: do_filp_open+0x93/0xfa
Jun  1 10:18:50 T24R18DBDEV kernel: ? audit_alloc_name+0x8f/0xe3
Jun  1 10:18:50 T24R18DBDEV kernel: ? __alloc_fd+0x46/0x172
Reply | Threaded
Open this post in threaded view
|

Re: ACFS filesystem hangs

ErmanArslansOracleBlog
Administrator
What do you have in the ACFS logs ?
(
filenames and the path should be similar to the following ->
<Grid Infrastructure Oracle Home>/log/<hostname>/acfs/kernel/acfs.log.0
)
Reply | Threaded
Open this post in threaded view
|

Re: ACFS filesystem hangs

Roshan
This post was updated on .
Thanks for the update.

Unfortunately the acfs log is missing. Should I enable it or is it enabled by default?
total 16
drwxr-xr-x 2 grid oinstall 4096 Dec 29 09:59 security
drwxr-x--- 2 grid oinstall 4096 Dec 29 09:59 replication
drwxr-x--- 2 grid oinstall 4096 Dec 29 09:59 resources
drwxr-x--- 2 grid oinstall 4096 Dec 29 09:59 replicationroot
[grid@T24R18DBDEV acfs]$ cd resources/
[grid@T24R18DBDEV resources]$ ls
[grid@T24R18DBDEV resources]$ cd ..
[grid@T24R18DBDEV acfs]$ pwd
/grid/app/product/19.0.0/gridhome_1/log/t24r18dbdev/acfs
[grid@T24R18DBDEV acfs]$ find -name *acfs* .

I check the strace output.. there is connection error.
strace_mount.out
Reply | Threaded
Open this post in threaded view
|

Re: ACFS filesystem hangs

ErmanArslansOracleBlog
Administrator
Check the following MOS Note :
It suits your case (not %100 maybe cause it is for formatting the fs.., but waits are looking similar.. D state, Conenction timed outs and stuff like that.. And solution may be applicable) Just check it carefully..

Create a file system with ACFS hang at mkfs -t acfs (Doc ID 2331497.1)

your mount process waits/hangs in D state..
so, it is uninterruptable.. Probably related with I/O

That log may be in somewhere else.. (it may change according to your version and env)
You can also use the "find" command to find them.. for instance ; cd / ; find . -name acfs*
Also strace says : trace files : /grid/base/diag/crs/t...... So check that directory as well..

"/grid/base/diag/crs/t24r18dbdev/crs/trace/mount_48167.trc"

Use the utils to get info about it;
acfsutil log: Retrieves memory diagnostic log files and manages debug settings.

Note that: strace, acfs.log, dmesg.. these are your friends for diagnosing it.

Strace findings:

48167 08:21:05.977112 openat(AT_FDCWD, "/dev/asm/interface01-433", O_RDWR|O_SYNC <unfinished ...>
48170 08:21:05.999392 <... futex resumed>) = -1 ETIMEDOUT (Connection timed out)


48169 08:21:06.307714 futex(0x1795a80, FUTEX_WAKE_PRIVATE, 1) = 0
48169 08:21:06.307815 futex(0x17a615c, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=24999675} <unfinished ...>
48170 08:21:06.329436 <... futex resumed>) = -1 ETIMEDOUT (Connection timed out)
Reply | Threaded
Open this post in threaded view
|

Re: ACFS filesystem hangs

Roshan
Thanks for the update. Please find attached.
acfs.0
Reply | Threaded
Open this post in threaded view
|

Re: ACFS filesystem hangs

ErmanArslansOracleBlog
Administrator
HM: WARNING syscall appears hung pid:239740  
S 239740   tid:0xffff8d2fdd8496c0 tsd:0xffff8d257f23c3a0 l_OfsVolNum:-1  
S 239740   name:mount.acfs.bin   current_record_seq_cnt:20
S 239740   total_elapsed_secs:187224    hang_elapsed_secs:187224  
S 239740   record_cnt:8   lock_cnt:3   func_cnt:5  
S 239740   last_rec_rem:LH_WaitEventGeneric
S 239740   0  L LH_WaitEventGeneric          type:375
S 239740   0  L   asmutil.c                  line:4078  
S 239740   0  L   LHV:LH_WaitEventGeneric:375 LHV2:LH_WaitEventGeneric:375
S 239740   0  L   id1:0xffffada001da7738 id2:0x000000016723282d
S 239740   0  L   mode:0x200:LH_WAIT_EVENT  
S 239740   0  L   state:blocked              prev_tid:0x0000000000000000
S 239740   0  L   blocked_ticks:187214291    held_ticks:0
S 239740   0  L   elapsed_secs:187214        seq_num:20                
K 6212538.880 KsLHHMThread[238743] Hang Analyzer: Spawning a callback to handle hung thread.
Reply | Threaded
Open this post in threaded view
|

Re: ACFS filesystem hangs

ErmanArslansOracleBlog
Administrator
These again redirecting me to MOS : Create a file system with ACFS hang at mkfs -t acfs (Doc ID 2331497.1)

Did you check the note?