1

I'm trying to search though my code replacing all old style PHP array()s with the shorthand [] style. However, I'm having some trouble creating a working/reliable regex...

What I currently have: (^|[\s])array\((['"](\s\S)['"]|[^)])*\) (View on Regex101)

// Match All
array('array()')

array('key' => 'value');
array(
    'key'  => 'value',
    'key2' => '(value2)'
);
    array()
  array()
array()

// Match Specific Parts
function (array $var = array()) {}
$this->in_array(array('something', 'something'));

// Don't match
toArray()
array_merge()
in_array();

I've created a Regex101 for it...

EDIT: This isn't the answer to the question, but one alternative is to use PHPStorm's Traditional syntax array literal detected inspection...

How to:

  • Open the Code menu
  • Click Run inspection by name... (Ctrl + Alt + Shift + I)
  • Type Traditional syntax array literal detected
  • Press <Enter>
  • Specify the scope you wish to run it on
  • Press <Enter>
  • Review/Apply the changes in the Inspection window.
4
  • 4
    You can't make a regex that will work 100% reliably for this. And really, there is no practical reason to do it at all. Commented Nov 7, 2014 at 14:12
  • Consistency is enough of a practical reason for me, especially with an automated process. Commented Nov 8, 2014 at 5:12
  • I get why this appeals to you but I agree with @Jon in this specific case. To illustrate - if you change all integers in your code to strings, you are being consistent but you are not being practical. Commented Nov 8, 2014 at 6:01
  • Changing integers to strings is counter productive and will cause errors, it's not a change in syntax. Completely different. Regardless, the question is not rather or not it's practical, the question is concerning the regular expression. Commented Nov 8, 2014 at 6:34

1 Answer 1

4

It is possible but not trivial since you need to fully describe two parts of the PHP syntax (that are strings and comments) to prevent parenthesis to be interpreted inside them. Here is a way to do it with PHP itself:

$pattern = <<<'EOD'
~
(?(DEFINE)
    (?<quotes> (["']) (?: [^"'\\]+ | \\. | (?!\g{-1})["'] )*+ (?:\g{-1}|\z) )
    (?<heredoc> <<< (["']?) ([a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*) \g{-2}\R
                (?>\N*\R)*?
                (?:\g{-1} ;? (?:\R | \z) | \N*\z)
    )
    (?<string> \g<quotes> | \g<heredoc> )

    (?<inlinecom> (?:// |\# ) \N* $ )
    (?<multicom> /\*+ (?:[^*]+|\*+(?!/))*+ (?:\*/|\z))
    (?<com> \g<multicom> | \g<inlinecom> )

    (?<nestedpar> \( (?: [^()"'<]+ | \g<com> | \g<string> | < | \g<nestedpar>)*+ \) )
)

(?:\g<com> | \g<string> ) (*SKIP)(*FAIL)
|
(?<![-$])\barray\s*\( ((?:[^"'()/\#]+|\g<com>|/|\g<string>|\g<nestedpar>)*+) \)
~xsm
EOD;

do {
    $code = preg_replace($pattern, '[${11}]', $code, -1, $count);
} while ($count);

The pattern contains two parts, the first is a definition part and the second is the main pattern.

The definition part is enclosed between (?(DEFINE)...) and contains named subpattern definitions for different useful elements (in particular "string" "com" and "nestedpar"). These subpatterns would be used later in the main pattern.

The idea is to never search a parenthese inside a comment, a string or among nested parentheses.

The first line: (?:\g<com> | \g<string> ) (*SKIP)(*FAIL) will skip all comments and strings until the next array declaration (or until the end of the string).

The last line describes the array declaration itself, details:

(?<![-$])\b        # check if "array" is not a part of a variable or function name
array \s*\(
(                   # capture group 11
    (?:             # describe the possible content
        [^"'()/\#]+ # all that is not a quote, a round bracket, a slash, a sharp
      |             # OR
        \g<com>     # a comment
      |
        /           # a slash that is not a part of a comment
      |
        \g<string>  # a string
      |
        \g<nestedpar> # nested round brackets
    )*+
)
\)

pattern demo

code demo

about nested array declarations:

The present pattern is only able to find the outermost array declaration when a block of nested array declarations is found.

The do...while loop is used to deal with nested array declarations, because it is not possible to perform a replacement of several nesting level in one pass (however, there is a way with preg_replace_callback but it isn't very handy). To stop the loop, the last parameter of preg_replace is used. This parameter contains the number of replacements performed in the target string.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.