hyperledger-fabrichyperledgeribm-blockchainhyperledger-fabric-orderer

Orderers not responding after upgrading version from v2.4.4 to v2.5 in Hyperledger Fabric network


I encountered an issue after updating the version of my network's orderers from v2.4.4 to v2.5. I have a total of 5 orderers in the network, and their endpoints were updated before the version change. However, since the update, none of these orderers are responding to any requests, resulting in a loss of quorum.

Here are some relevant logs for reference:

2023-05-22 22:12:15.626 UTC 0036 INFO [orderer.consensus.etcdraft] confirmSuspicion -> Suspecting our own eviction from the channel for 10m0.03452979s channel=publicchannel node=6
2023-05-22 22:12:15.655 UTC 0037 INFO [orderer.common.cluster.puller] fetchLastBlockSeq -> orderer01.domain.xyz:7050 is at block sequence of 34 channel=publicchannel
2023-05-22 22:12:15.655 UTC 0038 INFO [orderer.common.cluster.puller] fetchLastBlockSeq -> orderer02.domain.xyz:7050 is at block sequence of 34 channel=publicchannel
2023-05-22 22:12:15.656 UTC 0039 INFO [orderer.common.cluster.puller] fetchLastBlockSeq -> orderer03.domain.xyz:7050 is at block sequence of 34 channel=publicchannel
2023-05-22 22:12:15.657 UTC 003a INFO [orderer.common.cluster.puller] fetchLastBlockSeq -> orderer00.domain.xyz:7050 is at block sequence of 34 channel=publicchannel
2023-05-22 22:12:15.657 UTC 003b INFO [orderer.common.cluster.puller] fetchLastBlockSeq -> orderer04.domain.xyz:7050 is at block sequence of 34 channel=publicchannel
2023-05-22 22:12:15.657 UTC 003c INFO [orderer.common.cluster.puller] HeightsByEndpoints -> Returning the heights of OSNs mapped by endpoints map[orderer00.domain.xyz:7050:35 orderer01.domain.xyz:7050:35 orderer02.domain.xyz:7050:35 orderer03.domain.xyz:7050:35 orderer04.domain.xyz:7050:35] channel=publicchannel
2023-05-22 22:12:15.658 UTC 003d INFO [orderer.consensus.etcdraft] confirmSuspicion -> Last config block was found to be block [34] channel=publicchannel node=6
2023-05-22 22:12:15.659 UTC 003e INFO [comm.grpc.server] 1 -> streaming call completed grpc.service=orderer.AtomicBroadcast grpc.method=Deliver grpc.peer_address=10.60.22.184:49650 grpc.peer_subject="CN=orderer00.domain.xyz,OU=orderer,O=Hyperledger,ST=North Carolina,C=US" error="context finished before block retrieved: context canceled" grpc.code=Unknown grpc.call_duration=9.036094ms
2023-05-22 22:12:15.661 UTC 003f INFO [orderer.consensus.etcdraft] confirmSuspicion -> Cannot confirm our own eviction from the channel, our certificate was found in config block with sequence 34 channel=publicchannel node=6
2023-05-22 22:12:16.069 UTC 0040 INFO [comm.grpc.server] 1 -> streaming call completed grpc.service=orderer.AtomicBroadcast grpc.method=Deliver grpc.peer_address=10.60.27.163:60880 grpc.peer_subject="CN=orderer02.domain.xyz,OU=orderer,O=Hyperledger,ST=North Carolina,C=US" error="context finished before block retrieved: context canceled" grpc.code=Unknown grpc.call_duration=9.513544ms
2023-05-22 22:12:16.705 UTC 0041 INFO [comm.grpc.server] 1 -> streaming call completed grpc.service=orderer.AtomicBroadcast grpc.method=Deliver grpc.peer_address=10.60.27.163:60884 grpc.peer_subject="CN=orderer01.domain.xyz,OU=orderer,O=Hyperledger,ST=North Carolina,C=US" error="context finished before block retrieved: context canceled" grpc.code=Unknown grpc.call_duration=13.161555ms
2023-05-22 22:12:17.412 UTC 0042 INFO [comm.grpc.server] 1 -> streaming call completed grpc.service=orderer.AtomicBroadcast grpc.method=Deliver grpc.peer_address=10.60.27.163:60900 grpc.peer_subject="CN=orderer03.domain.xyz,OU=orderer,O=Hyperledger,ST=North Carolina,C=US" error="context finished before block retrieved: context canceled" grpc.code=Unknown grpc.call_duration=10.398404ms

I have been able to reproduce this issue consistently by upgrading the orderer version to v2.5 after updating the endpoints in the channel config block. Has anyone else encountered this problem or knows how to resolve it? Any help or insights would be greatly appreciated.


Solution

  • It seems like I hit a bug in Hyperledger Fabric v2.5, which has been reported in the official GitHub repository here: GitHub Issue #4199. This bug was causing unexpected behavior in my Fabric network.

    Fortunately, the maintainer of Hyperledger Fabric has created a pull request to address this issue: Pull Request #4201. The fix provided in this pull request resolves the problem I was facing.

    To validate the fix, I built a new Docker image using the code from the pull request. I have published and tested the image, which can be found at akshaysood112/fabric-orderer:2.5.1-snapshot-ea9b9e6fa.

    I would like to share this information with the community, as it may be helpful for others who are experiencing the same bug. Please feel free to test the provided image and confirm if the issue is resolved for you as well.