c++performancenetwork-programmingboostbeast

Performance drop on port from Beast.1.0.0-b66 to Boost.1.67.0.Beast


I have observed a precipitous drop in performance (and dramatic increase in CPU consumption) on migrating from Beast.1.0.0-b66 (using Boost.1.64.0) to Boost.1.67.0.Beast (i.e. with Beast integrated into Boost). No doubt I've done something wrong, but I cannot imagine what.

What was:

typedef beast::http::request<beast::http::string_body> BeastHttpRequest;

is now:

namespace http = boost::beast::http;    
typedef http::request<http::string_body> BeastHttpRequest;

and what was:

beast::http::prepare(req);
beast::http::write(stream, req);

is now:

req.prepare_payload();
http::write(stream, req);

Of course, I also had to make numerous API changes. For example:

req.fields.replace(hdrName, hdrValue);

is now:

req.set(hdrName, hdrValue);

The application works correctly -- including both SSL handshaking and proxy negotiation -- but I have to fix the spike in CPU consumption and commensurate drop in performance. I wonder if anyone knows of something obvious that I've overlooked.

EDIT: I should have mentioned that I'm using a flat_buffer for the SSL stream.

I've had a chance to profile both the "before port" and "after port" performance data. Here is the "before port" (i.e. good performance) call chain:

- 34.61% HttpRequest::send
   - 32.02% beast::http::write<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >&>, true, beast::http::string_body,
      - 31.90% beast::http::write<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >&>, true, beast::http::string_bo
         - 20.77% beast::http::detail::write_preparation<true, beast::http::string_body, beast::http::basic_fields<std::allocator<char> > >::init
            - 18.25% beast::http::detail::write_fields<beast::basic_streambuf<std::allocator<char> >, beast::http::basic_fields<std::allocator<char> > >
               - 10.38% beast::write<beast::basic_streambuf<std::allocator<char> >, boost::basic_string_ref<char, std::char_traits<char> > >
                  - 10.20% beast::detail::write_dynabuf<beast::basic_streambuf<std::allocator<char> >, boost::basic_string_ref<char, std::char_traits<char> > >
                     + 3.66% beast::basic_streambuf<std::allocator<char> >::prepare
                     + 2.90% beast::basic_streambuf<std::allocator<char> >::commit
                     + 1.81% boost::lexical_cast<std::string, boost::basic_string_ref<char, std::char_traits<char> > >
                     + 1.47% boost::asio::buffer_copy<beast::basic_streambuf<std::allocator<char> >::mutable_buffers_type>
               - 7.51% beast::write<beast::basic_streambuf<std::allocator<char> >, char [3]>
                  - 7.47% beast::detail::write_dynabuf<beast::basic_streambuf<std::allocator<char> >, 3ul>
                     + 3.14% beast::basic_streambuf<std::allocator<char> >::prepare
                     + 2.84% beast::basic_streambuf<std::allocator<char> >::commit
                     + 1.32% boost::asio::buffer_copy<beast::basic_streambuf<std::allocator<char> >::mutable_buffers_type>
            + 2.04% beast::http::detail::write_start_line<beast::basic_streambuf<std::allocator<char> >, beast::http::basic_fields<std::allocator<char> > >
         - 9.60% beast::http::string_body::writer::write<beast::http::detail::writef0_lambda<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_serv
            - 9.57% beast::http::detail::writef0_lambda<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >&>, beast:
               - 9.54% boost::asio::write<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >&>, beast::detail::buffe
                  - 9.48% boost::asio::write<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >&>, beast::detail::bu
                     - 8.55% boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >&>::write_some<boost::asio::detail::c
                        - 7.56% boost::asio::ssl::detail::io<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >, boost::asio::ssl::detail::wr
                           - 6.98% boost::asio::write<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >, boost::asio::mutable_buffers_1>
                              - 6.93% boost::asio::write<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >, boost::asio::mutable_buffers_1,
                                 - 6.69% boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >::write_some<boost::asio::detail::consuming_buffer
                                    - 6.69% boost::asio::stream_socket_service<boost::asio::ip::tcp>::send<boost::asio::detail::consuming_buffers<boost::asio::const_buffer, boost::asio::mutable_buffers_1>
                                       - 6.67% boost::asio::detail::reactive_socket_service_base::send<boost::asio::detail::consuming_buffers<boost::asio::const_buffer, boost::asio::mutable_buffers_1> >
                                          - 6.51% boost::asio::detail::socket_ops::sync_send
                                             - 6.32% 0xec6d
                                                - 6.08% system_call_fastpath
                                                   - 6.07% sys_sendmsg
                                                      - 6.06% __sys_sendmsg
                                                         - 5.94% ___sys_sendmsg
                                                            + 5.84% sock_send

and here is the "after port" (i.e. poor performance) call chain:

- 53.77% HttpRequest::send
   - 53.32% boost::beast::http::write<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp>&>, true, boost::beast::http::basic_string_body<char, std::char_traits<char>, std::allo
      - 53.30% boost::beast::http::write<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp>&>, true, boost::beast::http::basic_string_body<char, std::char_traits<char>, std::a
         - 53.14% boost::beast::http::write<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp>&>, true, boost::beast::http::basic_string_body<char, std::char_traits<char>, std
            - 53.03% boost::beast::http::write_some<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp>&>, true, boost::beast::http::basic_string_body<char, std::char_traits<ch
               - 52.95% boost::beast::http::detail::write_some_impl<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp>&>, true, boost::beast::http::basic_string_body<char, std
                  - 36.58% boost::beast::http::serializer<true, boost::beast::http::basic_string_body<char, std::char_traits<char>, std::allocator<char> >, boost::beast::http::basic_fields<std::allocator<
                     - 35.28% boost::beast::http::serializer<true, boost::beast::http::basic_string_body<char, std::char_traits<char>, std::allocator<char> >, boost::beast::http::basic_fields<std::allocat
                        - 25.93% boost::beast::http::detail::write_some_lambda<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp>&> >::operator()<boost::beast::detail::buffers
                           - 25.82% boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp>&>::write_some<boost::beast::detail::buffers_ref<boost::beast::buffers_prefix_view<boost:
                              - 25.69% boost::asio::ssl::detail::io<boost::asio::basic_stream_socket<boost::asio::ip::tcp>, boost::asio::ssl::detail::write_op<boost::beast::detail::buffers_ref<boost::beas
                                 - 22.20% boost::asio::write<boost::asio::basic_stream_socket<boost::asio::ip::tcp>, boost::asio::mutable_buffer>
                                    - 22.13% boost::asio::write<boost::asio::basic_stream_socket<boost::asio::ip::tcp>, boost::asio::mutable_buffer, boost::asio::detail::transfer_all_t>
                                       - 21.96% boost::asio::detail::write_buffer_sequence<boost::asio::basic_stream_socket<boost::asio::ip::tcp>, boost::asio::mutable_buffer, boost::asio::mutable_buffer
                                          - 21.47% boost::asio::basic_stream_socket<boost::asio::ip::tcp>::write_some<boost::asio::const_buffers_1>
                                             - 21.34% boost::asio::detail::reactive_socket_service_base::send<boost::asio::const_buffers_1>
                                                - 21.17% boost::asio::detail::socket_ops::sync_send
                                                   - 20.45% 0xec6d
                                                      - 19.63% system_call_fastpath
                                                         - 19.57% sys_sendmsg
                                                            - 19.52% __sys_sendmsg
                                                               - 19.16% ___sys_sendmsg
                                                                  + 18.90% sock_sendmsg

Solution

  • It should be faster, not slower, as the HTTP algorithms were optimized. I think I know what is going on though. It would be helpful if you could compare the performance of the two versions when using regular sockets and not SSL. boost::asio::ssl::stream has a shortcoming which is that when writing buffer sequences of length greater than one, it writes to the socket for each buffer in the sequence instead of coalescing those encrypted buffers into a single write. This can have significant performance impact.

    This really needs to be fixed in Boost.Asio but a workaround is that you can write your own stream wrapper whose write algorithm, when presented with buffer sequences whose length is greater than one, creates a new sequence of length one by using memcpy and dynamic allocation. I would also open an issue here: https://github.com/boostorg/asio/issues

    Why the difference in the two versions of Beast? The older version of Beast allocated memory during serialization to hold a linear version of the message. The current version uses an algorithm which requires no memory allocations at all. It is faster in the normal case, but as you have discovered when working with ssl::stream types it can be slower.

    See https://github.com/boostorg/beast/issues/1108