mysqlrubynet-sshnet-sftp

Table files transfered between servers flags table as crashed


Work has a web site that uses large data sets, load balanced between two MySQL 5.6.16-64.2 servers using MyISAM, running on Linux (2.6.32-358.el6.x86_64 GNU/Linux.) This data is being updated hourly from a text based file set that is received from a MS-SQL database. To avoid disruption on reads from the web site and at the same time make sure the updates doesn't take too long following process was put in place:

Have the data one a third Linux box (only used for data update processing,) update the different data tables as needed, move a copy of the physical table files to the production servers under a temporary name, and then do a table swap by MySQL TABLE RENAME.

But every time the table (under the temporary name) is seen by the destination MySQL servers as being crashed and require repair. The repair takes too long, so it cannot be forced to do a repair before doing the table swap.

The processing is programmed in Ruby 1.8.7 by having a thread for each server (just as a FYI, this also happens if not doing it in a thread to a single server.)

The steps to perform file copy is as follows:

Use Net::SFTP to transfer the files to a destination folder that is not the database folder (done due to permissions.) Code example of the file transfer for the main table files (if table also has partition files then they are transferred separately and rspFile is assigned differently to match the temporary name.) For speed it is parallel uploaded:

Net::SFTP.start(host_server, host_user, :password => host_pwd) do |sftp|
  uploads = fileList.map { |f|
    rcpFile = File.basename(f, File.extname(f)) + prcExt + File.extname(f)
    sftp.upload(f, "/home/#{host_user}/#{rcpFile}")
  }
  uploads.each { |u| u.wait }
end

Then assign the files the owner and group to the mysql user and to move the files to the MySQL database folder, by using Net::SSH to execute sudo shell commands:

Net::SSH.start(host_server, host_user, :port => host_port.to_i, :password => host_pwd) do |ssh|
  doSSHCommand(ssh, "sudo sh -c 'chown mysql /home/#{host_user}/#{prcLocalFiles}'", host_pwd)
  doSSHCommand(ssh, "sudo sh -c 'chgrp mysql /home/#{host_user}/#{prcLocalFiles}'", host_pwd)
  doSSHCommand(ssh, "sudo sh -c 'mv /home/#{host_user}/#{prcLocalFiles} #{host_path}'", host_pwd)
end

The doSSHCommand method:

def doSSHCommand(ssh, cmd, pwd)
  result = ""
  ssh.open_channel do |channel|
    channel.request_pty do |c, success|
        raise "could not request pty" unless success

      channel.exec "#{cmd}" do |c, success|
          raise "could not execute command '#{cmd}'" unless success

        channel.on_data do |c, data|
          if (data[/\[sudo\]|Password/i]) then
            channel.send_data "#{pwd}\n"
          else
            result += data unless data.nil?
          end
        end
      end
    end
  end
  ssh.loop

  result
end

If done manually by using scp to move the files over, do the owner/group changes, and move the files, then it never crashes the table. By checking the file sizes compared between scp and Net::SFTP there are no difference.

Other process methods has been tried, but experience they take too long compared to using the method described above. Anyone have an idea of why the tables are being crashed and if there a solution to avoid table crash without having to do a table repair?


Solution

  • The issue was found and solved:

    The process database had the table files copied from one of the production databases, and did not show crashed on the process server and no issues when query and updating the data.

    While searching the web following SO answer was found: MySQL table is marked as crashed

    So by guessing that when the tables was copied from production to the process server, that the header info stayed the same and might interfere when copied back to the production servers during the processor. So it was tried by repairing the table on the process server and then run a few tests on our staging environment where the issue was also experienced. And surely enough that corrected the issue.

    So the final solution was to repair the tables once on the process server before having the process script run hourly.