Speed Comparison of strlen() VS empty() in PHP Shows empty() is Faster

Thursday, October 8th, 2009

While working on a PHP project, I found myself occasionally switching between using empty() or !strlen() when I was dealing with strings. In PHP, there are often multiple solutions for the same problem and this was one of those cases. I didn’t really have a reason to use one over the other and, as most PHP developers know, you should be careful when using empty() for strings because you can run into this problem:

<?php
    // Here's a string that isn't "empty", it's zero
    $string = '0';
    if (empty($string)) {
        echo 'Oops!';
    }
?>
 
Oops!

Being that I felt capable of being careful with empty(), I wanted to know if there was any speed advantage to using it over strlen(). I wrote a little function and was very surprised by the results:

Checking "if (!strlen($string))" VS "if (empty($string))"

unset($string);
Average speed gain using empty() instead of strlen(): 84.62%

$string === false;
Average speed gain using empty() instead of strlen(): 57.25%

$string = '';
Average speed gain using empty() instead of strlen(): 47.02%

Repeating the test yields similar results every time. Keep in mind, we’re talking fractions of seconds here, but it all adds up. I was expecting there to be a much smaller difference in speed between the two.

Here’s the test I wrote. It’s nothing spectacular and there may be a better way to do it but it gets the job done:

<?php
function test_strlen_vs_empty($string) {
    /* We occasionally have oddities so I wanted to make them observable and count them properly so our avg is accurate */
 
    $oddities = 0;
 
    /* We want to be able to test an unset variable */
    if ($string == 'unset') {
        unset($string);
    }   
 
    /* We'll do 100 tests */
    for ($i = 0; $i < 100; $i++) {
        /* Start our first timer */
        $start_time = microtime(true);
 
        /* We want to check each function 100 times */
        for ($ii = 0; $ii < 100; $ii++) {
            /* Check if the string is empty() */
            if (empty($string)) {
            }   
        }
        /* End our first timer */
        $end_time = microtime(true);
 
        /* Calculate how long it took */
        $time_for_empty = $end_time - $start_time;
 
        /* Start our second timer */
        $start_time = microtime(true);
        for ($ii = 0; $ii < 100; $ii++) {
            /* Check if strlen() returns something equal to false */
            if (!strlen($string)) {
            }   
        }
 
        /* End our second timer */
        $end_time = microtime(true);
 
        /* Calculate how long it took */
        $time_for_strlen = $end_time - $start_time;
 
        /* If strlen() was faster (which doesn't happen often) we want to record it */
        if ($time_for_strlen < $time_for_empty) {
            $oddities++;
        } else {
            /* Otherwise, it's as normal. We want to grab a nice percentage of how much faster empty was than strlen */
            $slowers_total += round(100 - (($time_for_empty / $time_for_strlen) * 100), 2); 
        }   
    }   
    /* Output the results with an average taking into account any oddities that may have occurred */
    echo "\n" . 'Average speed gain using empty() instead of strlen(): ' . round($slowers_total / (100 - $oddities), 2) . '%';
    if ($oddities) {
        echo "\n" . 'Number of times where strlen() was actually faster: ' . $oddities;
    }   
}
 
echo "\n\n" . 'Checking "if (!strlen($string))" VS "if (empty($string))"' . "\n\n";
echo 'unset($string);';
test_strlen_vs_empty('unset');
echo "\n\n" . '$string === false;';
test_strlen_vs_empty(false);
echo "\n\n" . '$string = \'\';';
test_strlen_vs_empty('');
?>

This works by calculating how long it takes to run a check on the string 100 times using each function. It compares the times, and works out how much faster one is over the other in a percentage. This is performed 100 times. At the end, each of the 100 percentages are averaged out to discover how much faster empty() was over strlen(). We check this for three different instances of of empty strings:

  1. When the $string variable is not set
  2. When the $string variable is set to false
  3. When the $string variable is set to a string with no characters (”)

If you look at the code, you’ll see I added handling for “oddities.” Occasionally, strlen() would actually perform faster than empty(). It was random and didn’t happen on every test but it was something that I wanted to observe and properly handle so it’s in there.

So, you may be wondering, why empty() seems to be so much faster. Here is a simple explanation for you. empty() is faster…

“…because empty() is a language construct built into the Zend engine, while strlen is implemented as a standard extension function.

Conclusion: if you are careful enough with empty() you should probably try to use it over strlen() if you’re trying to squeeze all the speed you can out of your application.

If you liked this post, be sure to subscribe to my feed.

Improved AJAX, XHTML, and RSS Safe Function for Converting Entities to ASCII in PHP

Monday, August 17th, 2009

In March, I reported on a particular problem that may or may not have arisen for developers. In short: WebKit and, consequently, Google Chrome might have introduced some problems for developers who weren’t properly converting their variables to ensure they were safe for XHTML, RSS and AJAX by converting their characters to the proper ASCII format.

Here’s the solution that I came up with in March:

function clean_string_for_valid_xml($string) {
    $entities_array = array();
    foreach (get_html_translation_table(HTML_ENTITIES, ENT_QUOTES) as $character => $entity) {
        $entities_array[$entity] = '&#' . ord($character) . ';';
    }
    return str_replace(array_keys($entities_array), $entities_array, $string);
}

While this will take care of most special entities that may have to be commonly dealt with, get_html_translation_table, the built-in PHP function designated to handle these type of operations, doesn’t provide a comprehensive list of the entities or elements that one may encounter when dealing with the aforementioned standards where ASCII conversion is necessary. In fact, there are quite a few elements that are missed.

After doing a bit of searching on the internet, I feel I have compiled a fairly complete list of entities that do not appear in get_html_translation_table that may assist people relying on the PHP built-ins. This will handle characters ranging from “ to Ω and all of your ∗ and ≅ characters too.

So, here you go:

clean_string_for_xhtml($string) {
    $entities_array = array();
    foreach (get_html_translation_table(HTML_ENTITIES, ENT_QUOTES) as $character => $entity) {
        $entities_array[$entity] = '&#' . ord($character) . ';';
    }
    $entities_array += array ('&apos;'=>'&#39;', '&minus;'=>'&#45;', '&circ;'=>'&#94;', '&tilde;'=>'&#126;', '&Scaron;'=>'&#138;', '&lsaquo;'=>'&#139;', '&OElig;'=>'&#140;', '&lsquo;'=>'&#145;' , '&rsquo;'=>'&#146;', '&ldquo;'=>'&#147;', '&rdquo;'=>'&#148;', '&bull;'=>'&#149;', '&ndash;'=>'&#150;', '&mdash;'=>'&#151;', '&tilde;'=>'&#152;', '&trade;'=>'&#153;', '&scaron;'=>'&#154;', '&rsaquo;'=>'&#155;', '&oelig;'=>'&#156;', '&Yuml;'=>'&#159;', '&yuml;'=>'&#255;', '&OElig;'=>'&#338;', '&oelig;'=>'&#339;', '&Scaron;'=>'&#352;', '&scaron;'=>'&#353;', '&Yuml;'=>'&#376;', '&fnof;'=>'&#402;', '&circ;'=>'&#710;', '&tilde;'=>'&#732;', '&Alpha;'=>'&#913;', '&Beta;'=>'&#914;', '&Gamma;'=>'&#915;', '&Delta;'=>'&#916;', '&Epsilon;'=>'&#917;', '&Zeta;'=>'&#918;', '&Eta;'=>'&#919;', '&Theta;'=>'&#920;', '&Iota;'=>'&#921;', '&Kappa;'=>'&#922;', '&Lambda;'=>'&#923;', '&Mu;'=>'&#924;', '&Nu;'=>'&#925;', '&Xi;'=>'&#926;', '&Omicron;'=>'&#927;', '&Pi;'=>'&#928;', '&Rho;'=>'&#929;', '&Sigma;'=>'&#931;', '&Tau;'=>'&#932;', '&Upsilon;'=>'&#933;', '&Phi;'=>'&#934;', '&Chi;'=>'&#935;', '&Psi;'=>'&#936;', '&Omega;'=>'&#937;', '&alpha;'=>'&#945;', '&beta;'=>'&#946;', '&gamma;'=>'&#947;', '&delta;'=>'&#948;', '&epsilon;'=>'&#949;', '&zeta;'=>'&#950;', '&eta;'=>'&#951;', '&theta;'=>'&#952;', '&iota;'=>'&#953;', '&kappa;'=>'&#954;', '&lambda;'=>'&#955;', '&mu;'=>'&#956;', '&nu;'=>'&#957;', '&xi;'=>'&#958;', '&omicron;'=>'&#959;', '&pi;'=>'&#960;', '&rho;'=>'&#961;', '&sigmaf;'=>'&#962;', '&sigma;'=>'&#963;', '&tau;'=>'&#964;', '&upsilon;'=>'&#965;', '&phi;'=>'&#966;', '&chi;' =>'&#967;', '&psi;'=>'&#968;', '&omega;'=>'&#969;', '&thetasym;'=>'&#977;', '&upsih;'=>'&#978;', '&piv;'=>'&#982;', '&ensp;'=>'&#8194;', '&emsp;'=>'&#8195;', '&thinsp;'=>'&#8201;', '&zwnj;'=>'&#8204;', '&zwj;'=>'&#8205;', '&lrm;'=>'&#8206;', '&rlm;'=>'&#8207;', '&ndash;'=>'&#8211;', '&mdash;'=>'&#8212;', '&lsquo;'=>'&#8216;', '&rsquo;'=>'&#8217;', '&sbquo;'=>'&#8218;', '&ldquo;'=>'&#8220;',     '&rdquo;'=>'&#8221;', '&bdquo;'=>'&#8222;', '&dagger;'=>'&#8224;', '&Dagger;'=>'&#8225;', '&bull;'=>'&#8226;', '&hellip;'=>'&#8230;', '&permil;'=>'&#8240;', '&prime;'=>'&#8242;', '&Prime;'=>'&#8243;', '&lsaquo;'=>'&#8249;', '&rsaquo;'=>'&#8250;', '&oline;'=>'&#8254;', '&frasl;'=>'&#8260;', '&euro;'=>'&#8364;', '&image;'=>'&#8465;', '&weierp;'=>'&#8472;', '&real;'=>'&#8476;', '&trade;'=>'&#8482;', '&alefsym;'=>'&#8501;', '&larr;'=>'&#8592;', '&uarr;'=>'&#8593;', '&rarr;'=>'&#8594;', '&darr;'=>'&#8595;', '&harr;'=>'&#8596;', '&crarr;'=>'&#8629;', '&lArr;'=>'&#8656;', '&uArr;'=>'&#8657;', '&rArr;'=>'&#8658;', '&dArr;'=>'&#8659;', '&hArr;'=>'&#8660;', '&forall;'=>'&#8704;', '&part;'=>'&#8706;', '&exist;'=>'&#8707;', '&empty;'=>'&#8709;', '&nabla;'=>'&#8711;', '&isin;'=>'&#8712;', '   &notin;'=>'&#8713;', '&ni;'=>'&#8715;', '&prod;'=>'&#8719;', '&sum;'=>'&#8721;', '&minus;'=>'&#8722;', '&lowast;'=>'&#8727;', '&radic;'=>'&#8730;', '&prop;'=>'&#8733;', '&infin;'=>'&#8734;', '&ang;'=>'&#8736;', '&and;'=>'&#8743;', '&or;'=>'&#8744;', '&cap;'=>'&#8745;', '&cup;'=>'&#8746;', '&int;'=>'&#8747;', '&there4;'=>'&#8756;', '&sim;'=>'&#8764;', '&cong;'=>'&#8773;', '&asymp;'=>'&#8776;', '&ne;'=>'&#8800;', '&equiv;'=>'&#8801;', '&le;'=>'&#8804;', '&ge;'=>'&#8805;', '&sub;'=>'&#8834;', '&sup;'=>'&#8835;', '&nsub;'=>'&#8836;', '&sube;'=>'&#8838;', '&supe;'=>'&#8839;', '&oplus;'=>'&    #8853;', '&otimes;'=>'&#8855;', '&perp;'=>'&#8869;', '&sdot;'=>'&#8901;', '&lceil;'=>'&#8968;', '&rceil;'=>'&#8969;', '&lfloor;'=>'&#8970;', '&rfloor;'=>'&#8971;', '&lang;'=>'&#9001;', '&rang;'=>'&#9002;', '&loz;'=>'&#9674;', '&spades;'=>'&#9824;', '&clubs;'=>'&#9827;', '&hearts;'=>'&#9829;', '&diams;'=>'&#9830;');
 
    return str_replace(array_keys($entities_array), $entities_array, $string);
}

Hope that helps others. Any comments or fixes to the code? Please post them below.

If you liked this post, then please consider subscribing to my feed.

Amazon AWS API REST Authentication for PHP 5

Monday, July 27th, 2009

Since Amazon decided all of their requests needed to be authenticated, developers have been scrambling to convert their existing code to work with their new authentication architecture.

Here’s an excerpt of the email you probably received:

“… signatures will be necessary to authenticate each call to the Product Advertising API. This requirement will be phased in starting May 11, 2009, and by August 15, 2009, all calls to the Product Advertising API must be authenticated or they will not be processed. For pointers on how you can easily authenticate requests to the Product Advertising API, please refer to the developer guide, available here.”

You can find the developer guide documentation they are referring to here:
Product Advertising API

The documentation is very thorough and complete, but doesn’t get to the gist of the problem: what do I need to do in order to make my existing request, still work?

After some research, I came across the work of a couple of individuals, made some slight modifications, and have come up with a function that should help you. Here are the works I am referring to:
Amazon Product Advertising API Signature – PHP
REST Authentication for PHP4

If you’re using PHP and the REST architecture, then you’ve probably got something in your script that creates your URI and fires it off with a file_get_contents and simplexml_load_string.

In this case, you’ve probably already got everything together to create your nice URI. It may look something like this:

$uri = "http://webservices.amazon.com/onca/xml?" .
       "Service=AWSECommerceService" . 
       "&Operation=ItemSearch" . 
       "&MerchantId=All" . 
       "&Condition=All" . 
       "&Availability=Available" . 
       "&Sort={$sort_by}" . 
       "&Version={$amazon_version}" . 
       "&SubscriptionId={$amazon_subscription_id}" . 
       "&AssociateTag={$amazon_associate_tag}" . 
       "&{$amazon_search_type}={$search_title}" . 
       "&SearchIndex={$search_database}" . 
       "&ResponseGroup={$information_requested}";

The new requirements allow you to still use your old code, but there are modifications necessary. A date needs to be added, a signature needs to be created and the keys/URI have to be ordered/formatted correctly with all the necessary character replacements in order for everything to work. It seems like it could be quite a task to write something that works very differently from your previous code.

In the end, I was able to take my original URI and feed it into a function, and let the code continue along it’s merry way. Here’s what I came up with:

/**
  * This function will take an existing Amazon request and change it so that it will be usable 
  * with the new authentication.
  *
  * @param string $secret_key - your Amazon AWS secret key
  * @param string $request - your existing request URI
  * @param string $access_key - your Amazon AWS access key
  * @param string $version - (optional) the version of the service you are using
  */
function getRequest($secret_key, $request, $access_key = false, $version = '2009-03-01') {
    // Get a nice array of elements to work with
    $uri_elements = parse_url($request);
 
    // Grab our request elements
    $request = $uri_elements['query'];
 
    // Throw them into an array
    parse_str($request, $parameters);
 
    // Add the new required paramters
    $parameters['Timestamp'] = gmdate("Y-m-d\TH:i:s\Z");
    $parameters['Version'] = $version;
    if (strlen($access_key) > 0) {
        $parameters['AWSAccessKeyId'] = $access_key;
    }   
 
    // The new authentication requirements need the keys to be sorted
    ksort($parameters);
 
    // Create our new request
    foreach ($parameters as $parameter => $value) {
        // We need to be sure we properly encode the value of our parameter
        $parameter = str_replace("%7E", "~", rawurlencode($parameter));
        $value = str_replace("%7E", "~", rawurlencode($value));
        $request_array[] = $parameter . '=' . $value;
    }   
 
    // Put our & symbol at the beginning of each of our request variables and put it in a string
    $new_request = implode('&', $request_array);
 
    // Create our signature string
    $signature_string = "GET\n{$uri_elements['host']}\n{$uri_elements['path']}\n{$new_request}";
 
    // Create our signature using hash_hmac
    $signature = urlencode(base64_encode(hash_hmac('sha256', $signature_string, $secret_key, true)));
 
    // Return our new request
    return "http://{$uri_elements['host']}{$uri_elements['path']}?{$new_request}&Signature={$signature}";
}

I hope others find this helpful. If you have any comments or changes to the code, feel free to submit them below.

If you liked this post, then please consider subscribing to my feed.