It is surely not news to you that PHP is awful: there is a thriving sub-genre of tech blog posts about how very, very bad PHP is. It should tell you something about what a clown rodeo PHP is that even in the presence of Eevee’s magisterial “PHP: A Fractal Of Bad Design” article, so many of us feel compelled to contribute to the vast body of PHP-criticizing literature anyhow. For my part, even after acknowledging excellent works like “Fractal Of Bad Design,” man-of-mystery Pi’s “PHP is a low-level programming language at the wrong level,” and Watts Martin’s pungently-named “PHP is not an acceptable COBOL,” all written by pals who are smarter than I am, I still am called to add to the discourse. There are plenty of languages that one may dislike, and there are plenty of warts on any language one does like—and yet, PHP is sui generis in its terribleness.
Before starting in on my own complaints, I’m going to stop and acknowledge another rant: during Leonard Pierce’s delightfully funny and acerbic chronicle of hating Billy Joel, there is an aside that answers very well the question of why people hate PHP in a way that people almost never hate JavaScript, C++, or Visual Basic, deeply flawed languages all of them.
Just as one can argue that there were better World Series teams than the 1927 New York Yankees, one can argue that various performers have written worse songs than those produced from the depressingly fertile mind of Billy Joel. “Yummy Yummy Yummy,” to cite an older example, is certainly the product of a nightmarish hatred of all humanity, and arguably worse than any Billy Joel song. To cite a more recent example, “My Humps” offends the soul and mind in ways to which only the most cretinous songs by the Man from Hicksville can aspire. But while there are those who can honestly contend that the ‘27 Bronx Bombers were not the greatest of all World Series teams, no one—not even those who hate the Yankees with a soul-scorching fire, as do all right-thinking humans—can argue that they are not the best baseball franchise ever. The numbers simply speak for themselves. No other team has even remotely come close to topping their total number of world championships. Similarly, no other performer or group has ever had so many horrible songs become so successful on the charts as has Billy Joel. Others have been worse; others have been bigger. But no one has been bigger and badder at the same time than Billy Joel.
No one has been bigger and badder at the same time than PHP.
The things that make me most unhappy about PHP center around the fact that software development does not happen in a vacuum. Our choices as developers affect others, including our future selves. Most of the substance of the writeups that I’ve cited is devoted to PHP’s legion of failings as an abstract tool for computation: they’ve got that well-covered. What I want to focus on is PHP’s catastrophic badness as a participant in a living ecosystem of software projects, tools, and traffic (this is where I think Jeff Atwood deeply misjudged PHP). There are three broad areas of failure here:
- PHP is the avatar of technical debt
- The flagship PHP projects are bad examples
- PHP’s flaws grievously pollute the commons
In all three of these areas, my beef is the effects that PHP-the-language and PHP projects have on the world around them. The ways in which PHP is broken as a tool for computation, lead directly to the ways in which the software written in PHP is broken as a tool for improving people’s lives.
The Avatar Of Technical Debt
One of the most visible ripple effects of PHP is the effect is has on maintenance programmers and
on “consumers” of code. Even stipulating that the newest versions of PHP are tremendous
improvements on PHP4, legacy code is still a thing in general, and PHP’s specific shape makes it
more of a problem for PHP than in the general case. Because of the tremendous obstacles to setting
up something like Python’s virtualenv or Ruby’s rbenv in the PHP world, the uptake of new
versions of PHP is disproportionately impaired by the most out-of-date libraries or language
features being used. “Past technical decisions making it harder to make the right technical
decisions in the present” is pretty much the definition of technical debt.
The obstacles don’t have to be insurmountable to keep people from upgrading. It just has to have
friction somewhere in the process. The difference between “easy enough that people can do it” and
“easy enough that people do do it,” is significant. PHP is on the wrong side of that difference:
breaking changes have been introduced in minor versions, for example. And the bigger the PHP
codebase you’re maintaining, the slower your upgrades will be, because you have a ton of surface
area to deal with and you can’t use something like virtualenv or rbenv to attempt an
incremental update of either PHP itself or of any library or C module you may happen to be
using. The cost of upgrading rises faster than with other languages because of this: you have fewer
ways to escape dependency traps.
On top of this, PHP has enough gotchas, pitfalls, and required boilerplate that reading it
carefully is difficult. You don’t need to read code like this all the time, but there are times
when you absolutely will need to read code like this—the most benign example is when you pick up
your own code that you haven’t worked with for a while, but you’ll also read code this way when
you’re reviewing it in an appsec-centric state of mind. PHP makes this task harder, and so
artificially increases the amount of technical debt you take on, whether you’re working with your
own code or others’. Which version of PHP was this file written against? Does this function require
a particular ini_set() invocation that could be clobbered elsewhere? Does this if-block behave
correctly when the result of an expression is 0 instead of FALSE? Does all the code use ===
instead of ==? Does this function behave acceptably if one of its variables gets clobbered by a
global?
Maintaining PHP code has too much friction for people to do so diligently in practice. This is why the “you can write FORTRAN in any language” rebuttal to criticisms of PHP is so utterly bankrupt: PHP does not just make it possible to write bad code, its quirks actively make it harder to write good code. One of Perl’s slogans is that the language “makes easy things easy and hard things possible.” PHP, as though coming from a mirror universe with a sinister goatee, makes easy things hard (behold the absolute train wreck of PHP’s comparison operators) and hard things impossible (largely because PHP is meant to die). In a perversion of good language design, a concise and readable piece of PHP is more likely to have bugs, not less. I call PHP the avatar of technical debt because using PHP at all is incurring tremendous technical debt that comes from this friction with libraries and ensuring good code. You can write good code in PHP—but it’s harder to know that you’ve written good code, the good code will be longer, and the cost of updating the language and libraries makes it harder to get good code into production.
Critically, this technical debt is almost always an externality: it is a cost that the person writing the code doesn’t have to pay. Instead, the cost is borne by future engineers and users who might as well be strangers. Beware of externalities! If you are not paying the real, full costs of your decisions, you will be led to make bad decisions. Because PHP fails to make the right thing easy, it tends to make the wrong thing the default—and the costs of dealing with the wrong thing are all too often externalized, whether that’s from a coder to their future self, from an engineer to a sysadmin, or from a vendor to the users of software.
Don’t Follow The Leader
When the question of PHP’s awfulness comes up, inevitably someone tries to use Wikipedia, Facebook, and WordPress as examples of PHP’s success. Even if you leave aside how that’s like saying that Harvard is an average American university, Wikipedia, Facebook, and WordPress all have significant problems that are directly attributable to their decision to use PHP! If you are not prepared to deal with those problems, then you had better not use PHP. The fact that Wikipedia, Facebook, and WordPress are all built on PHP is insufficient to demonstrate that you personally should use PHP for anything: you have to know more about how those projects work and the tradeoffs they made, to know if their use of PHP recommends PHP for your application.
Wikipedia is the easiest example to pick on here, because they provide all the damning evidence themselves. Go and check out a copy of the MediaWiki source code (I’m going to treat “Wikipedia” and “MediaWiki” as mostly synonymous here) and take a look at it. Reflect on how many engineer-hours it took to get the project to that state, and how many more hours are being requested. Reflect on the contents of their “Annoying Large Bugs” and “Annoying Little Bugs” pages. If you want to use Wikipedia as a role model, being blind to Wikipedia’s flaws is a terrible idea.
Because Wikipedia is such a high-profile target (huge PageRank points, huge repository of user-generated content, huge mindshare) there’s a steady record of vulnerabilities with MediaWiki. If you get into the plumbing of Wikipedia, get under the layer that just presents pages to visitors, get familiar with the greasy-handed wiki-gnomes, you’ll find all kinds of interesting infrastructure designed to cope with this. I wouldn’t like to see anyone argue that Wikipedia is a bad project. It’s a triumph of the open-source ethos, and an incalculably valuable community resource for the entire Internet-using population. But as an engineering project, you should be very careful about emulating it. You should make sure that you can invest proportionate engineer-hours into security and maintenance—and that you account for how a PHP-based project needs more of those hours.
Speaking of gigantic quantities of engineer-hours, there’s Facebook. Facebook is an even worse
choice as an example of PHP’s success, because Facebook has effectively forked PHP. Look at their
HipHop PHP project: it’s replacing the default PHP interpreter wholesale, and
replacing Apache/mod_php as well. You shouldn’t use Facebook as evidence that your project should
use PHP, because the way you use PHP is not like the way that Facebook uses PHP. Facebook basically
ended up writing their own entire PHP toolchain! This is probably not the way you want to go for
your project.
On top of that, there are ways in which Facebook’s usage of PHP is dubious, or at least suggests that they would rather not be using PHP. Before the current version of HipHop, which is a VM that executes PHP, they were cross-compiling to C++. When “cross-compile to C++” makes your project less painful, that’s a bad sign. This loops back to the “avatar of technical debt” thing. Facebook at this point is trapped in PHP and making the best of it: they have a bigger PHP codebase to maintain, so they’re more trapped. They’re up to the point where they’re compiling PHP and doing static-analysis optimization on it—which is to say, they are doing original computer-science research, because PHP’s internals are that much of a mess.
WordPress is also not a good PHP role model: it’s gotten better over time, but the direction of its evolution is away from “blog” and towards “maximalist content management system,” which means expanding the attack surface. WordPress is actually less of a bad PHP role model than Wikipedia and Facebook: rather than being a giant application hosted and administered by Someone Else, WordPress is a PHP application that you can download, install, and investigate for yourself. They’ve invested a lot of effort in making that part easy.
Unfortunately, “easy PHP” is pretty much always “insecure PHP.” So WordPress has a
long track record of nasty vulnerabilities. It’s also saddled with a
reputation for spam—being the easy choice, a platform that you can set up yourself with no
gatekeeper (compare to Movable Type, professionally hosted WordPress installations, or Blogger
instances), it’s become the choice for people who want to automatically deploy large numbers of
WordPress instances targeted to specific content/keyword niches. Then there’s the architecture
matter: maybe this is just taste, but I find things like rewind_posts() inherently suspect (and
there are unproven allegations of grotesque features lurking in the codebase). More
substantially, there’s mutable global state lurking all over the place (on top of the distressing
action-at-a-distance issues PHP inherently has—see Eevee’s writeup for more about that), the app
buys into the “sanitize input” voodoo, and like most PHP apps, it requires a bunch of
read-and-write access to its environment that a better app wouldn’t. This is part of the ongoing
security problem WordPress faces—which leads us to talking about PHP’s role in software platforms
and ecosystems.
The Superfund Site Of Programming Languages
At this point we need to get abstract for a minute. Part of why infosec is important is that we do not create, modify, or use software in isolation. We interact with software in a social context, in a technological context, in a networked context. Similar to how herd immunity in medicine means that the chance of catching a particular disease is unevenly distributed, software vulnerabilities are dangerous even to people who aren’t running the affected software. The most common thing that an attacker might do with a compromised machine is suborn its resources, using it to propagate further attacks (e.g. having it join a botnet). This is why it’s particularly dismaying that PHP is so big and so bad: even if I don’t run anything based on PHP myself (or based on its co-conspirator in suckitude, MySQL), PHP is still a problem for me.
The problem is severe, too. A security researcher finds that of sites vulnerable to password dumps, most are built on PHP. A remote-code-execution vulnerability in two of the most popular WordPress plugins is discovered—and the subsequent patches have an utterly dismal uptake rate. There are a multitude of PHP-based server control panels that have deeply disturbing security problems of grave severity. A search on GitHub reveals a multitude of PHP projects open to a trivial SQL injection attack. A bug in parsing URLs—surely an action that should be a core competency for a “web language”!—turns out to be implemented in the shoddiest way.
The popularity of apps like WordPress exacerbates the security problems: WordPress has become a platform as much as it is an app, and the difficult transition from app to platform, it has had trouble with. When you run WordPress, the potential problems are determined not by the worst code you wrote nor the worst code that the WP team wrote, but by the union of that code and the worst code that the authors of any plugin or theme you use wrote. There are a huge number of WordPress themes and plugins, and they can effectively run anything—my favorite example is RePress, which staples a web proxy onto the side of your blog for the use of folks in locales where services like Google and Wikipedia are blocked. WordPress is a particularly problematic example because its target audience is non-engineering users. Someone who sets up an instance of MediaWiki, Joomla, or Drupal faces a higher barrier to entry than a WordPress user, who is the beneficiary of vigorous and successful efforts to make WordPress accessible to a wide audience.
Unfortunately, that experience of easy-to-install software ends up recapitulating the experience of the Windows 9x era: it’s easy to install something that creates an opportunity for attackers, and not easy to tell ahead of time which things you install belong in that category. In WordPress’ case, some of its most high-profile plugins, like the TimThumb image resizer and the caching plugins discussed earlier (and by the way it speaks very poorly of WordPress that two of the most popular plugins are caching hacks attempting to work around WordPress’ run-time inefficiency), have seen remote-code-execution vulnerabilities that can be exploited at scale, by botnets—and which are particularly likely to succeed against users of WordPress whose blogs and their upkeep are not an every-waking-moment concern.
I worked with Magento professionally for a while, and one thing that gave me massive creepy-crawlies about it was that it has the same problem as WordPress in the form of a wild and problematic plugin ecosystem, but centered around an app that’s meant to be handling people’s credit-card information. “All the security of WordPress, but handles credit-card data!” does not inspire confidence. At least with eBay now running the show, there’s a good chance that Magento will have the budget to shape up security-wise.
If the problems I’ve been talking about only affected the people actually running that software, I’d care far less. It’s important that people have to some extent the right to make their own dang mistakes. But these things don’t happen in a vacuum. Every unpatched MediaWiki install sitting around, every forgotten WordPress instance, every homebrew app quietly chugging away, is susceptible to becoming part of a botnet and worsening the state of the entire Internet. Every machine that gets rooted, is another machine conducting attacks of one kind or another—and even if I run my own servers on the magical free-ponies language of sparkles and no security vulnerabilities ever, a legion of zombie PHP-running boxes can still just throw denial-of-service attacks my way until it doesn’t matter what I’m running.
This is why it matters that PHP is both big and bad: by being both ubiquitous and insecure, it pollutes the commons. It adds unncessary cost and friction to any project we undertake that’s connected to the Internet—that is to say, nearly everything. Every server that connects to the Internet has its attack surface artificially enlarged because PHP’s own attack surface is so vast. Programming doesn’t happen in a vacuum, it happens in an ecosystem—an ecosystem that PHP-based systems have a long and terrifying track record of dumping nuclear waste into.
