Optimize outer loop with OpenMP and a reduction

Question

I struggle a bit with a function. The calculation is wrong if I try to parallelize the outer loop with a

#pragma omp parallel reduction(+:det).

Can someone show me how to solve it and why it is failing?

// template<class T> using vector2D = std::vector<std::vector<T>>;

float Det(vector2DF &a, int n)
{
  vector2DF m(n - 1, vector1DF(n - 1, 0));

  if (n == 1) return a[0][0];
  if (n == 2) return a[0][0] * a[1][1] - a[1][0] * a[0][1];

  float det = 0;
  for (int i = 0; i < n; i++)
  {
    int l = 0;
#pragma omp parallel for private(l)
    for (int j = 1; j < n; j++)
    {
      l = 0;
      for (int k = 0; k < n; k++)
      {
        if (k == i) continue;
        m[j - 1][l] = a[j][k];
        l++;
      }
    }
    det += std::pow(-1.0, 1.0 + i + 1.0) * a[0][i] * Det(m, n - 1);
  }

  return det;
}

The calculation is wrong And, what is expected output? What is actual output? What steps did you take to try and resolve the problem yourself? — Algirdas Preidžius
– Algirdas Preidžius, Commented May 19, 2017 at 13:50
All information, necessary to answer your question (or solve your problem) should be present in the question itself, and not some external links, that may expire at an unspecified time. For that reason - I refuse to follow external links. — Algirdas Preidžius
– Algirdas Preidžius, Commented May 19, 2017 at 14:03
Don't write std::pow(-1.0, 1.0 + i + 1.0) * ... which is btw the same as std::pow(-1.0, i) * .... Do this with something like (i%2 == 0 ? 1.0 : -1.0) * ... or even so: (i%2 == 0 ? det += ... : det -= ...). — BlameTheBits
– BlameTheBits, Commented May 19, 2017 at 14:16

Community · Accepted Answer · 2017-05-23 11:33:26Z

If you parallelize the outer loop, there is a race condition on this line:

m[j - 1][l] = a[j][k];

Also you likely want a parallel for reduction instead of just a parallel reduction.

The issue is, that m is shared, even though that wouldn't be necessary given that it is completely overwritten in the inner loop. Always declare variables as locally as possible, this avoids issues with wrongly shared variables, e.g.:

float Det(vector2DF &a, int n)
{
  if (n == 1) return a[0][0];
  if (n == 2) return a[0][0] * a[1][1] - a[1][0] * a[0][1];

  float det = 0;
  #pragma omp parallel reduction(+:det)
  for (int i = 0; i < n; i++)
  {
    vector2DF m(n - 1, vector1DF(n - 1, 0));
    for (int j = 1; j < n; j++)
    {
      int l = 0;
      for (int k = 0; k < n; k++)
      {
        if (k == i) continue;
        m[j - 1][l] = a[j][k];
        l++;
      }
    }
    det += std::pow(-1.0, 1.0 + i + 1.0) * a[0][i] * Det(m, n - 1);
  }
  return det;
}

Now that is correct, but since m can be expensive to allocate, performance could benefit from not doing it in each and every iteration. This can be done by splitting parallel and for directives as such:

float Det(vector2DF &a, int n)
{
  if (n == 1) return a[0][0];
  if (n == 2) return a[0][0] * a[1][1] - a[1][0] * a[0][1];

  float det = 0;
  #pragma omp parallel reduction(+:det)
  {
    vector2DF m(n - 1, vector1DF(n - 1, 0));
    #pragma omp parallel for
    for (int i = 0; i < n; i++)
    {
      for (int j = 1; j < n; j++)
      {
        int l = 0;
        for (int k = 0; k < n; k++)
        {
          if (k == i) continue;
          m[j - 1][l] = a[j][k];
          l++;
        }
      }
      det += std::pow(-1.0, 1.0 + i + 1.0) * a[0][i] * Det(m, n - 1);
    }
  }
  return det;
}

Now you could also just declare m as firstprivate, but that would assume that the copy constructor makes a completely independent deep-copy and thus make the code more difficult to reason about.

Please be aware that you should always include expected output, actual output and a minimal complete and verifiable example.

Collectives™ on Stack Overflow

Optimize outer loop with OpenMP and a reduction

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related