2

I'm having issues with trying to recode a nested for loop in order to parallelize it:

for(i=0; i<n; i++)
{
    for(j=0; j<n; j++)
    {
        if(asubsref(struct1,j) > 0)
            asubsref(struct2,j) = asubsref(struct3,j) + 1;
    }
    for(j=0; j<n; j++)
        asubsref(struct1,j) = asubsref(struct2,j) - asubsref(struct3,i);
}

Struct1/struct2 are two structs with a width/height/int-float array respectively. struct3 is a float struct.

My attempt so far was to make them into two different loops but alas, it didn't work as I'd get a lot of incorrect results:

#pragma omp parallel
{
#pragma omp for private(j)
   for(i=0; i<n; i++)
   {
     for(j=0; j<n; j++)
     {
       if(asubsref(struct1,j) > 0)
         asubsref(struct2,j) += 1;
     }
   }
#pragma omp for private(j)
   for(i=0; i<n; i++)
   {
     k = asubsref(struct3,i);
     for (j=0; j<n; j++)
     {
       asubsref(struct1,j) -= k;
     }
   }
}

I'm not looking for an answer but some guidance to help me thinking how to go about this/tips toward the answer and the like.

5
  • Also, are you sure that: asubsref(bin,j) = asubsref(bin,j) + 1; and asubsref(bin,j) += 1; are identical statements? If asubsref has side effects, this may not be true. Commented May 23, 2012 at 10:26
  • What I mean is: Are the results 'correct' when you compile the second version of your code without OpenMP? Commented May 23, 2012 at 10:27
  • 1
    is there a specific reason why you want to parallelize the nested loop and not the outer-most one? Commented May 23, 2012 at 11:03
  • @AntonPegushin - It is the outermost loop that is being parallelized. But it has been split into two loops. This of course, is questionable (depends on side effect of all the calls to asubsref) Commented May 23, 2012 at 11:53
  • Sorry, as it turns out my code in the second half was completely wrong, should've checked whether it worked without the pragmas first! Also, I wanted to parallelize the outer loops, not the inner ones. Thanks for the help! Commented May 23, 2012 at 17:54

1 Answer 1

4

What I see in this code is three arrays:

array1: asubsref(seed,0) ... asubsref(seed,n-1)
array2: asubsref(bin,0) ... asubsref(bin,n-1)
array3: asubsref(w,0) ... asubsref(w,n-1)

If this assumption is correct and asubsref does not produce any side effects, the following invariant can be derived:

After the end of the execution of the loop, array2[j] is incremented by the number x which is the largest number such that sum of array3[i] for i from 0 to x is smaller than array1[j].

Here is what you can do. First, you can merge the two innermost loops since (under our assumption) their iterations are independent:

for(i=0; i<n; i++)
{
    for(j=0; j<n; j++)
    {
        if(asubsref(seed,j) > 0)
            asubsref(bin,j) = asubsref(bin,j) + 1;
        asubsref(seed,j) = asubsref(seed,j) - asubsref(w,i);
    }
}

Then interchange the innermost and outermost loops

for(j=0; j<n; j++)
{
   for(i=0; i<n; i++)
   {
        if(asubsref(seed,j) > 0)
            asubsref(bin,j) = asubsref(bin,j) + 1;
        asubsref(seed,j) = asubsref(seed,j) - asubsref(w,i);
   }
}

Now it is clear that the following code should work

#pragma omp parallel for (private i)
for(j=0; j<n; j++)
{
   for(i=0; i<n; i++)
   {
        if(asubsref(seed,j) > 0)
            asubsref(bin,j) = asubsref(bin,j) + 1;
        asubsref(seed,j) = asubsref(seed,j) - asubsref(w,i);
   }
}

while splitting the loops clearly breaks the invariant.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.