The problem here is that Corba invocation does not return and none of exception is thrown when corba server is stopped. In my case, there is only one multiple-thread corba proxy(Window), monitoring one backend corba server. The IDL of corba server is:
void run()
void echo();
The proxy checks backend's health via echo heartbeat invocation. The proxy would classify backend as DOWN state if corba exception is thrown in echo. This procedure works in most time but the backend is shut down.
1) If i shutdown backend machine, the echo throw exception immediately.
2) If I stop backend corba process, the echo invocation is hang and no return, not exception at client side. The client can not go future any more.
None of run invocation occur in above two cases.
The log with 'ORBDebugLevel 10' shows proxy completes echo request sending, and netstat showns there do one TCP connection between proxy and backend machine though the backend corba server process is stopped(I admit the backend server is disordered or bad programmed). But as proxy, how it avoid blocked by individual invocation failure if it neither return, nor throw exception?
Below are two logs, with default strategy:
TAO (276|1592) - Invocation_Adapter::invoke_i, making a TAO_CS_REMOTE_STRATEGY i
nvocation
TAO (276|1592) - Transport_Cache_Manager_T::is_entry_available_i[828], true, sta
te is ENTRY_IDLE_AND_PURGABLE
TAO (276|1592) - Cache_IntId_T::recycle_state, ENTRY_IDLE_AND_PURGABLE->ENTRY_BU
SY Transport[828] IntId=00A64ABC
TAO (276|1592) - Transport_Cache_Manager_T::find_i, Found available Transport[82
8] @hash:index {-1062676757:0}
TAO (276|1592) - Transport_Connector::connect, got an existing connected Transpo
rt[828] in role TAO_CLIENT_ROLE
TAO (276|1592) - Muxed_TMS[828]::request_id, <4>
TAO (276|1592) - GIOP_Message_Base::dump_msg, send GIOP message v1.2, 60 data by
tes, my endian, Type Request[4]
GIOP message - HEXDUMP 72 bytes
47 49 4f 50 01 02 01 00 3c 00 00 00 04 00 00 00 GIOP....<.......
03 00 00 00 00 00 cd cd 1b 00 00 00 14 01 0f 00 ................
52 53 54 00 00 00 6c 00 06 9c b5 00 00 00 00 00 RST...l.........
00 00 01 00 00 00 01 cd 05 00 00 00 65 63 68 6f ............echo
00 cd cd cd 00 00 00 00 ........
TAO (276|1592) - Transport[828]::drain_queue_helper, sending 1 buffers
TAO (276|1592) - Transport[828]::drain_queue_helper, buffer 0/1 has 72 bytes
TAO - Transport[828]::drain_queue_helper (0/72) - HEXDUMP 72 bytes
47 49 4f 50 01 02 01 00 3c 00 00 00 04 00 00 00 GIOP....<.......
03 00 00 00 00 00 cd cd 1b 00 00 00 14 01 0f 00 ................
52 53 54 00 00 00 6c 00 06 9c b5 00 00 00 00 00 RST...l.........
00 00 01 00 00 00 01 cd 05 00 00 00 65 63 68 6f ............echo
00 cd cd cd 00 00 00 00 ........
TAO (276|1592) - Transport[828]::drain_queue_helper, end of data
TAO (276|1592) - Transport[828]::cleanup_queue, byte_count = 72
TAO (276|1592) - Transport[828]::cleanup_queue, after transfer, bc = 0, all_sent
= 1, ml = 0
TAO (276|1592) - Transport[828]::drain_queue_helper, byte_count = 72, head_is_em
pty = 1
TAO (276|1592) - Transport[828]::drain_queue_i, helper retval = 1
TAO (276|1592) - Transport[828]::make_idle
TAO (276|1592) - Cache_IntId_T::recycle_state, ENTRY_BUSY->ENTRY_IDLE_AND_PURGAB
LE Transport[828] IntId=00A64ABC
TAO (276|1592) - Leader_Follower[828]::wait_for_event, (follower), cond <00B10DD
8>
With static Client_Strategy_Factory "-ORBTransportMuxStrategy EXCLUSIVE"
2014-Sep-03 16:34:26.143024
TAO (6664|5612) - Invocation_Adapter::invoke_i, making a TAO_CS_REMOTE_STRATEGY
invocation
TAO (6664|5612) - Transport_Cache_Manager_T::is_entry_available_i[824], true, st
ate is ENTRY_IDLE_AND_PURGABLE
TAO (6664|5612) - Cache_IntId_T::recycle_state, ENTRY_IDLE_AND_PURGABLE->ENTRY_B
USY Transport[824] IntId=00854ABC
TAO (6664|5612) - Transport_Cache_Manager_T::find_i, Found available Transport[8
24] @hash:index {-1062667171:0}
TAO (6664|5612) - Transport_Connector::connect, got an existing connected Transp
ort[824] in role TAO_CLIENT_ROLE
TAO (6664|5612) - Exclusive_TMS::request_id - <3>
TAO (6664|5612) - GIOP_Message_Base::dump_msg, send GIOP message v1.2, 60 data b
ytes, my endian, Type Request[3]
GIOP message - HEXDUMP 72 bytes
47 49 4f 50 01 02 01 00 3c 00 00 00 03 00 00 00 GIOP....<.......
03 00 00 00 00 00 cd cd 1b 00 00 00 14 01 0f 00 ................
52 53 54 00 00 00 55 00 0d 7a 85 00 00 00 00 00 RST...U..z......
00 00 01 00 00 00 01 cd 05 00 00 00 65 63 68 6f ............echo
00 cd cd cd 00 00 00 00 ........
TAO (6664|5612) - Transport[824]::drain_queue_helper, sending 1 buffers
TAO (6664|5612) - Transport[824]::drain_queue_helper, buffer 0/1 has 72 bytes
TAO - Transport[824]::drain_queue_helper (0/72) - HEXDUMP 72 bytes
47 49 4f 50 01 02 01 00 3c 00 00 00 03 00 00 00 GIOP....<.......
03 00 00 00 00 00 cd cd 1b 00 00 00 14 01 0f 00 ................
52 53 54 00 00 00 55 00 0d 7a 85 00 00 00 00 00 RST...U..z......
00 00 01 00 00 00 01 cd 05 00 00 00 65 63 68 6f ............echo
00 cd cd cd 00 00 00 00 ........
TAO (6664|5612) - Transport[824]::drain_queue_helper, end of data
TAO (6664|5612) - Transport[824]::cleanup_queue, byte_count = 72
TAO (6664|5612) - Transport[824]::cleanup_queue, after transfer, bc = 0, all_sen
t = 1, ml = 0
TAO (6664|5612) - Transport[824]::drain_queue_helper, byte_count = 72, head_is_e
mpty = 1
TAO (6664|5612) - Transport[824]::drain_queue_i, helper retval = 1
TAO (6664|5612) - Leader_Follower[824]::wait_for_event, (follower), cond <009009
10>
I understand this might be thread and ORB model problem. I tried some client strategy:
static Client_Strategy_Factory "-ORBTransportMuxStrategy EXCLUSIVE -ORBClientConnectionHandler RW"
This can reduce frequency of problem occurrence, but can not resolve problem complete.
This is similar to my experience 6 years ago. In that case, a invocation is send in one thread of client. Before receiving response, this thread is reused for sending another corba request due to reactor pattern. This case seems different to the case post here since it is only one corba invocation. My impression of thread stack is somewhat like:
server.anotherInvocation() //the thread is used for another invocation.
...
server::echo() //send 1st corba invocation
....
orb-run()
The problem is that it is OS dependent when the network stack will detect the disconnect of the server, sometimes it never happens. Much better and safer is to set a RELATIVE_RT_TIMEOUT_POLICY_TYPE policy to force a timeout on the invocation, see ACE_wrappers/TAO/tests/Timeout for an example how todo this.