c++gpuopencleigenviennacl

Performance Discrepancy between GPU and CPU for Matrix Multiplication: Eigen vs. ViennaCL


I'm facing a performance issue when performing matrix multiplication operations using the Eigen and ViennaCL libraries in C++. I'm comparing the performance between executing these operations on the integrated GPU of my system and on the CPU.

My system has an integrated Intel GPU, and I'm running the code on an eighth-generation Intel Core i5. To my surprise, I found that matrix multiplication takes about 200 seconds when executed on the GPU using ViennaCL, while it takes only about 20 seconds when executed on the CPU using Eigen.

I'm puzzled by this performance discrepancy and would like to understand better the reason behind it. Can an integrated GPU really have such inferior performance compared to the CPU for matrix multiplication operations?

I using premake

workspace "Project"
    configurations { "Debug", "Release" }
    location "build"

project "Project"
    kind "ConsoleApp"
    language "C++"
    targetdir "build/bin/%{cfg.buildcfg}"
    objdir "build/obj/%{cfg.buildcfg}"

    files { "src/*.cpp", "include/*.hpp" }
    includedirs { "include", "vendor/*" }

    filter "configurations:Debug"
        symbols "On"
        optimize "On"

    filter "configurations:Release"
        symbols "Off"
        optimize "On"

    filter {}
@:~/repos/cpp-projct$ tree -L 1
.
├── build
├── cr.sh
├── include
├── premake.lua
├── src
└── vendor (eigen and viennaCL here, just $ wget and $ tar)

Edit:

Code edited with suggestions, output in file.txt; I didn't test the suggestion to use integers or another type because I couldn't simply swap the type.

#include <Eigen/Dense>
#include <chrono>
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include <viennacl/matrix.hpp>

int main() {
  const int size = 1500;  // Size of the matrices
  std::string fileName1 = "./list1.txt";
  std::string fileName2 = "./list2.txt";

  // Creating two large matrices using ViennaCL
  viennacl::matrix<float> matrix1_viennacl(size, size);
  viennacl::matrix<float> matrix2_viennacl(size, size);

  // Creating two large matrices using Eigen
  Eigen::MatrixXf matrix1_eigen(size, size);
  Eigen::MatrixXf matrix2_eigen(size, size);

  // Initializing the matrices with random values
  for (int i = 0; i < size; ++i) {
    for (int j = 0; j < size; ++j) {
      matrix1_viennacl(i, j) = rand() / static_cast<float>(RAND_MAX);
      matrix2_viennacl(i, j) = rand() / static_cast<float>(RAND_MAX);
    }
  }

  // Initializing the matrices with the same random values
  for (int i = 0; i < size; ++i) {
    for (int j = 0; j < size; ++j) {
      matrix1_eigen(i, j) = matrix1_viennacl(i, j);
      matrix2_eigen(i, j) = matrix2_viennacl(i, j);
    }
  }

  //==============================================================

  // Performing computation with the matrices using ViennaCL and
  // measuring the execution time
  auto start_viennacl = std::chrono::steady_clock::now();
  viennacl::matrix<float> result_viennacl =
      viennacl::linalg::prod(matrix1_viennacl, matrix2_viennacl);
  auto end_viennacl = std::chrono::steady_clock::now();
  std::chrono::duration<double> time_viennacl = end_viennacl - start_viennacl;

  std::ofstream file1(fileName1);
  if (!file1.is_open()) {
    std::cout << "Err file 1" << std::endl;
    return 1;
  }
  for (int i = 0; i < size; ++i) {
    for (int j = 0; j < size; ++j) {
      file1 << result_viennacl(i, j) << std::endl;
    }
  }

  // Printing the execution time with ViennaCL
  std::cout << "Execution time with ViennaCL: " << time_viennacl.count()
            << " seconds" << std::endl;

  //=================================================================

  // Performing computation with the matrices using Eigen and
  // measuring the execution time
  auto start_eigen = std::chrono::steady_clock::now();
  Eigen::MatrixXf result_eigen = matrix1_eigen * matrix2_eigen;
  auto end_eigen = std::chrono::steady_clock::now();
  std::chrono::duration<double> time_eigen = end_eigen - start_eigen;

  std::ofstream file2(fileName2);
  if (!file2.is_open()) {
    std::cout << "Err file 2" << std::endl;
    return 1;
  }
  for (int i = 0; i < size; ++i) {
    for (int j = 0; j < size; ++j) {
      file2 << result_eigen(i, j) << std::endl;
    }
  }

  // Printing the execution time with Eigen
  std::cout << "Execution time with Eigen: " << time_eigen.count() << " seconds"
            << std::endl;

  return 0;
}
@:~/repos/cpp-template$ bash cr.sh
Execution time with ViennaCL: 9.81101 seconds
Execution time with Eigen: 0.68594 seconds
@:~/repos/cpp-template$ 

Solution

  • I believe I've figured it out!

    I wasn't configuring ViennaCL with OpenCL correctly. By default, ViennaCL no longer utilizes OpenCL, so I need to explicitly enable it.

    workspace "MeuProjeto"
        configurations { "Debug", "Release" }
        location "build"
    
    project "MeuProjeto"
        kind "ConsoleApp"
        language "C++"
        targetdir "build/bin/%{cfg.buildcfg}"
        objdir "build/obj/%{cfg.buildcfg}"
    
        files { "src/*.cpp", "include/*.hpp" }
        includedirs { "include", "vendor/eigen", "vendor/viennaCL", "vendor/viennaCL/CL" }
    
        defines { "VIENNACL_WITH_OPENCL" }
    
        filter "configurations:Debug"
            symbols "On"
            optimize "Off"
    
        filter "configurations:Release"
            symbols "Off"
            optimize "On"
    
        filter {}
    
        links { "OpenCL" }
    

    In case of any doubts, the compilation command with g++ would be like:

    g++ -I./vendor/eigen -I./vendor/viennaCL -I.vendor/viennaCL/CL -DVIENNACL_WITH_OPENCL -O3 src/main.cpp -o my_program -lOpenCL

    Additionally, I installed some OpenCL dependencies, like the Intel SDK. However, it's uncertain if they were necessary. For example, intel-opencl-icd.

    And I mistakenly remove a loop that repeated the calculation multiple times, which initially hindered GPU performance. However, reinstating it showed the clear advantage of using GPU.

      calc_time();
      viennacl::matrix<float> result_viennacl =
            viennacl::linalg::prod(matrix1_viennacl, matrix2_viennacl);
      loop(100)
        viennacl::matrix<float> result_viennacl =
            viennacl::linalg::prod(matrix1_viennacl, matrix2_viennacl);
      calc_time();
    
      calc_time();
      loop(100) Eigen::MatrixXf result_eigen = matrix1_eigen * matrix2_eigen;
      calc_time();
    

    Result:

    @:~/repos/cpp-proj$ bash cr.sh (using release / optimize = true)
    Execution time with ViennaCL: 14.5453 seconds
    Execution time with Eigen: 99.242 seconds