My goal is to complete FFTs of 2 - 4K Data points together. Hence, I made 2 kernel objects from the same kernel and Enqueued the tasks at once, i.e. without any Buffer Read-Write or any callbacks in between. I find out that it doesn't happen that way. In addition to that, there is also some idle time between the executions. Can someone please explain?
I was expecting both of them to run in parallel because my FPGA seems to have more area. About 38 percent of it is used.
I found this question that kind off answers my doubts. It can be foundhere