Accessing a string's bytes is faster by an order of magnitude. Why? PHP likely just has each array index referenced to the index where it is storing each byte in memory. So it likely just goes right to the location it needs to, reads in one byte of data, and it is done. Note that unless the characters are single-byte you will not actually get a usable character from accessing via string byte-array.
When accessing a potential multi-byte string (via mb_substr) a number of additional steps need to be taken in order to ensure the character is not more than one byte, how many bytes it is, then access each needed byte and return the individual [possibly multi-byte] character (notice there are a few extra steps).
So, I put together a simple test code just to show that array-byte access is orders of magnitude faster (but will not give you a usable character if it a multi-byte character exists as a given string's byte index). I grabbed the random character function from here ( Optimal function to create a random UTF-8 string in PHP? (letter characters only) ), then added the following:
$str = rand_str( 5000000, 5000000 );
$bStr = unpack('C*', $str);
$len = count($bStr)-1;
$i = 0;
$startTime = microtime(true);
while($i++<$len) {
$char = $str[$i];
}
$endTime = microtime(true);
echo '<pre>Array access: ' . $len . ' items: ', $endTime-$startTime, ' seconds</pre>';
$i = 0;
$len = mb_strlen($str)-1;
$startTime = microtime(true);
while($i++<$len) {
$char = mb_substr($str, $i, 1);
if( $i >= 100000 ) {
break;
}
}
$endTime = microtime(true);
echo '<pre>Substring access: ' . ($len+1) . ' (limited to ' . $i . ') items: ', $endTime-$startTime, ' seconds</pre>';
You will notice that the mb_substr loop I have restricted to 100,000 characters. Why? It just takes too darn long to run through all 5,000,000 characters!
What were my results?
Array access: 12670380 items: 0.4850001335144 seconds
Substring access: 5000000 (limited to 100000) items: 17.00200009346 seconds
Notice the string array access was able to filter through all 12,670,380 bytes -- yep, 12.6 MILLION bytes from 5 MILLION characters [many were multi-byte] -- in just 1/2 second while the mb_substring, limited to 100,000 characters, took 17 seconds!
str_splitwill split into bytes as well.