clickhouseclickhouse-client

Connection reset by peer while uploading large csv in clickhouse


Getting this error while uploading a csv to clickhouse with row count > 2.5M and number of column > 90.

code: 210. DB::NetException: Connection reset by peer, while writing to socket (10.107.146.25:9000): data for INSERT was parsed from stdin

Here is the create table statement of the table

CREATE TABLE table_names
(

    {column_names and types}
)
ENGINE = MergeTree
ORDER BY tuple()
SETTINGS index_granularity = 8192,
allow_nullable_key = 1

Here is the command which I am running for inserting the csv

cat {filepath}.csv | sudo docker run -i --rm yandex/clickhouse-client -m --host {host}  -u {user} --input_format_allow_errors_num=10 --input_format_allow_errors_ratio=0.1  --max_memory_usage=15000000000 --format_csv_allow_single_quotes 0 --input_format_skip_unknown_fields 1 --query='INSERT INTO table_name FORMAT CSVWithNames'

This is the error logged in query_logs system table in clickhouse

Code: 33, e.displayText() = DB::Exception: Cannot read all data. Bytes read: 65735. Bytes expected: 134377. (version 21.6.5.37 (official build))

Here is the stack trace (again from query_log table)

0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x8b6cbba in /usr/bin/clickhouse
1. DB::ReadBuffer::readStrict(char*, unsigned long) @ 0x8ba7c4d in /usr/bin/clickhouse
2. DB::CompressedReadBufferBase::readCompressedData(unsigned long&, unsigned long&, bool) @ 0xf2347bc in /usr/bin/clickhouse
3. DB::CompressedReadBuffer::nextImpl() @ 0xf233f27 in /usr/bin/clickhouse
4. void DB::readVarUIntImpl<false>(unsigned long&, DB::ReadBuffer&) @ 0x8ba7eac in /usr/bin/clickhouse
5. ? @ 0xf40843b in /usr/bin/clickhouse
6. DB::SerializationString::deserializeBinaryBulk(DB::IColumn&, DB::ReadBuffer&, unsigned long, double) const @ 0xf40723b in /usr/bin/clickhouse
7. DB::ISerialization::deserializeBinaryBulkWithMultipleStreams(COW<DB::IColumn>::immutable_ptr<DB::IColumn>&, unsigned long, DB::ISerialization::DeserializeBinaryBulkSettings&, std::__1::shared_ptr<DB::ISerialization::DeserializeBinaryBulkState>&, std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, COW<DB::IColumn>::immutable_ptr<DB::IColumn> > > >*) const @ 0xf3d4dd5 in /usr/bin/clickhouse
8. DB::SerializationNullable::deserializeBinaryBulkWithMultipleStreams(COW<DB::IColumn>::immutable_ptr<DB::IColumn>&, unsigned long, DB::ISerialization::DeserializeBinaryBulkSettings&, std::__1::shared_ptr<DB::ISerialization::DeserializeBinaryBulkState>&, std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, COW<DB::IColumn>::immutable_ptr<DB::IColumn> > > >*) const @ 0xf3f550f in /usr/bin/clickhouse
9. DB::NativeBlockInputStream::readData(DB::IDataType const&, COW<DB::IColumn>::immutable_ptr<DB::IColumn>&, DB::ReadBuffer&, unsigned long, double) @ 0xfa9f8f5 in /usr/bin/clickhouse
10. DB::NativeBlockInputStream::readImpl() @ 0xfaa07b3 in /usr/bin/clickhouse
11. DB::IBlockInputStream::read() @ 0xf30f452 in /usr/bin/clickhouse
12. DB::TCPHandler::receiveData(bool) @ 0x104403c4 in /usr/bin/clickhouse
13. DB::TCPHandler::receivePacket() @ 0x10435bec in /usr/bin/clickhouse
14. DB::TCPHandler::readDataNext(unsigned long, long) @ 0x10437e5f in /usr/bin/clickhouse
15. DB::TCPHandler::processInsertQuery(DB::Settings const&) @ 0x1043625e in /usr/bin/clickhouse
16. DB::TCPHandler::runImpl() @ 0x1042eb09 in /usr/bin/clickhouse
17. DB::TCPHandler::run() @ 0x10441839 in /usr/bin/clickhouse
18. Poco::Net::TCPServerConnection::start() @ 0x12a3fd4f in /usr/bin/clickhouse
19. Poco::Net::TCPServerDispatcher::run() @ 0x12a417da in /usr/bin/clickhouse
20. Poco::PooledThread::run() @ 0x12b7ab39 in /usr/bin/clickhouse
21. Poco::ThreadImpl::runnableEntry(void*) @ 0x12b76b2a in /usr/bin/clickhouse
22. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
23. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so

I initially though this was due to partitioning and order key, I removed everything, still the same issue comes and row count differs by more than 1 M rows.


Solution

  • The error DB::Exception: Cannot read all data. means data you're trying to insert is somehow corrupted. Likely, some of your 2.5M rows don't have all fields, or there is a type mismatch in the content of some rows.

    I would suggest trying to insert in smaller batches, so you can figure out where the problem with your data is. So you could get some succeeded batches until it finds a corrupted row.