Strongly Emergent

What comes from combining humans, computers, and narrative

The Prisoner of Zend.php

It is surely not news to you that PHP is awful: there is a thriving sub-genre of tech blog posts about how very, very bad PHP is. It should tell you something about what a horrid clown rodeo PHP is that even in the presence of Eevee’s magisterial “PHP: A Fractal Of Bad Design” article, so many of us feel compelled to contribute to the vast body of PHP-criticizing literature anyhow. For my part, even after acknowledging excellent works (written by pals smarter than me) like Fractal Of Bad Design, man-of-mystery Pi’s “PHP is a low-level programming language at the wrong level,” and Watts Martin’s trenchant “PHP is not an acceptable COBOL,” I still think there’s more that needs saying. There are plenty of languages that one may dislike, and there are plenty of warts on any language one does like — and yet, PHP is sui generis in its terribleness.

Before starting in on my own complaints, I’m going to cite a rant from outside the programming world. During Leonard Pierce’s massively acerbic chronicle of hating Billy Joel there is an aside that I’m gonna use to answer the question “why do people hate PHP in a way that people almost never hate JavaScript, C++, or Visual Basic, deeply flawed languages all?”

Just as one can argue that there were better World Series teams than the 1927 New York Yankees, one can argue that various performers have written worse songs than those produced from the depressingly fertile mind of Billy Joel. […] But while there are those who can honestly contend that the ‘27 Bronx Bombers were not the greatest of all World Series teams, no one — not even those who hate the Yankees with a soul-scorching fire, as do all right-thinking humans — can argue that they are not the best baseball franchise ever. The numbers simply speak for themselves. No other team has even remotely come close to topping their total number of world championships. Similarly, no other performer or group has ever had so many horrible songs become so successful on the charts as has Billy Joel. Others have been worse; others have been bigger. But no one has been bigger and badder at the same time than Billy Joel.

No one has been bigger and badder at the same time than PHP. That’s why.

To expand lightly on the criteria Fractal uses, a programming language is a tool for thinking about a problem space and for expressing solutions to particular problems in that space. The writeups that I’ve cited do great work on talking about this, but I think there’s a little more that needs to be said. We usually take this for granted, but a tool for task X should, as the very least, most basic requirement, help you accomplish X more often than it hinders you in trying to accomplish X. PHP fails at this. Additionally, software engineering does not happen in a vacuum. Choices we engineers make affect others, including our future selves. Software inherently has a social context, and how it interacts with that context, matters (this is where I think Jeff Atwood deeply misjudged PHP). So here’s what I want to add to the conversation:

  • PHP is not just a sub-optimal or distasteful tool, it’s a treacherous one
  • In addition to being treacherous for those writing it, PHP pollutes the commons
  • Because PHP is a treacherous tool whose use pollutes the commons, it should be torn out and demolished like an unsafe bridge or building

A Tool That Fits No Hand

Part of why “Fractal Of Bad Design” commands attention is the sheer volume of issues with PHP it collects and contextualizes. The gotchas, pitfalls, and boilerplate-chunks in PHP combine to produce an environment where simple, easy-to-read code is often wrong. In turn, this means that code written with diligence and caution in PHP, is harder to give a close reading to. You don’t need to intensely scrutinize code every time you read it, but when you pick up your own code that you haven’t worked with for a while, when you’re reviewing code in a security-focused state of mind, or when you’re deciding whether external changes require altering the code in front of you, giving the code a close reading is extremely important. But when the simple way is often wrong, a close reading is far harder than it should be. You will ask yourself many questions, individually small and not particularly difficult, but enormous in number and potential consequences.

  • Which specific major, minor, and patch version of PHP was this file written against?
  • Does this function require a particular ini_set() invocation that could be clobbered elsewhere?
  • Does this if-block behave correctly when the result of an expression is 0 instead of FALSE?
  • Does all the code use === instead of ==?
  • Does this function behave acceptably if one of its variables gets clobbered by a global?
  • Does this block handle sleep()’s many possible return values correctly?
  • Does this library perpetrate asynchronous atrocities behind my back?

The burden of dealing with these questions means that PHP does not just make it possible write bad code, but that its quirks actively make it harder to write good code and more likely that you will write bad code. You can write good code in PHP, but the path is a fearful one. Compared to other languages, you will write more lines code to do the same tasks, it’s harder to know or prove that the code you’ve written is good, and the language ecosystem is so burdened with dubious code that good code cannot be quickly brought into projects of any significant age. One of Perl’s design goals is to “make easy things easy and hard things possible.” PHP, as though coming from a mirror universe with a sinister goatee, makes easy things hard and hard things impossible. In a total inversion of good language design, a concise and readable piece of PHP is more likely to have bugs, not less. This is what pushes PHP from “a tool that I have distaste for” to “a tool that is bad” — when I say that it is “treacherous,” I’m talking about this property where simple code is prone not just to being wrong, but to being wrong in a way that tends to fail silently and to fail with extremely dangerous effects.

We have a specific term for “past technical decisions are making it harder to make the right technical decisions in the present”: technical debt. Languages change over time. Production environments often achieve stability specifically by slowing down their update cycle. This much is normal. However, the volume of PHP’s technical debt makes updates much more of a problem for PHP than in the general case. Because something like Python’s virtualenv or Ruby’s rbenv doesn’t exist in the PHP world as of 2013, incremental updates (either of PHP itself or of any library or C module you may happen to be using) are very difficult: the difficulty of using new versions of PHP is dominated by the most out-of-date libraries or language features a project uses. Because of how hard it is to make them incremental, updates are risky: it is extremely difficult to fully understand and accurately predict their effects, especially in judging security and stability issues. One of the ways that PHP fails as a tool is that when improvements in the language or in libraries come along, it makes it hard to take advantage of those improvements.

When the question of PHP’s quality comes up, inevitably someone tries to use Wikipedia, Facebook, and WordPress as examples of PHP’s success. Even if you leave aside how that’s like saying that most American universities are Harvard, it ignores that Wikipedia, Facebook, and WordPress all have significant problems that are directly attributable to their decision to use PHP! If you are not prepared to deal with those problems, then you had better not use PHP. To argue that PHP is a good tool because these large, successful projects have been built with PHP while ignoring that all of these projects had to make extraordinary investments in technical infrastructure, is to advocate that other people waste tremendous quantities of time and money. More precisely, the fact that Wikipedia, Facebook, and WordPress all used PHP is insufficient to demonstrate that you personally should use PHP for anything: you must know how those projects work and what tradeoffs they made in order to to know whether their use of PHP means it’s a good idea to use PHP for your application.

Wikipedia is the easiest example to pick on here, because they provide all the evidence themselves. Go and check out a copy of the MediaWiki source code (I’m going to treat “Wikipedia” and “MediaWiki” as synonymous) and take a look at it. Reflect on how many engineer-hours it took to get the project to that state, and how many more hours are being requested. Reflect on the contents of their “Annoying Large Bugs” and “Annoying Little Bugs” pages. If you want to use Wikipedia as a role model, being blind to Wikipedia’s flaws is a terrible idea.

Because Wikipedia is such a high-profile target (huge PageRank points, huge repository of user-generated content, huge mindshare) there’s a steady record of vulnerabilities with MediaWiki. If you get into the plumbing of Wikipedia, get under the layer that just presents pages to visitors, get familiar with the greasy-handed wiki-gnomes, you’ll find all kinds of interesting infrastructure designed to cope with this. As a social project, Wikipedia is not a bad project: it’s an amazingly good one. It’s a triumph of the cooperative open-source ethos and an incalculably valuable community resource. But as an engineering project, you should be very careful about emulating it. You should make sure that you can invest proportionate engineer-hours into security and maintenance — and that you account for how a PHP-based project needs far more of those hours than other kinds of project.

Speaking of gigantic quantities of engineer-hours, there’s Facebook. Facebook is an even worse choice as an example of PHP’s success, because Facebook has effectively re-built PHP from the ground up. Look at their HipHop PHP project: it’s replacing the default PHP interpreter wholesale and replacing Apache’s mod_php as well. You shouldn’t use Facebook as evidence that your project should use PHP, because the way you use PHP is not like the way that Facebook uses PHP. Facebook ended up writing not just their own PHP toolchain, but their own entire PHP runtime. This is probably not the way you want to go for your project: it’s expensive and optimizes for solving problems that you don’t have.

On top of that, there are ways in which Facebook’s usage of PHP is dubious, or at least suggests that they would rather not be using PHP. Before the current version of HipHop, which is a VM that executes PHP, they were cross-compiling to C++. When “cross-compile to C++” makes your project less painful, that’s a bad sign. This emphasizes the earlier point about technical debt: Facebook at this point is trapped in PHP and making the best of it. They’re up to the point where they’re custom-compiling PHP and doing static-analysis optimization on it — which is to say, they are doing original compsci research, because PHP’s internals are that much of a mess.

Nor is WordPress a good PHP role model. It’s gotten better over time, but the direction of its evolution is away from “blog” and towards “maximalist content management system,” which massively expands the number of things that can go wrong. WordPress has a huge difference from Wikipedia and Facebook: rather than being a giant application hosted and administered by someone else, WordPress is a PHP application that you can download, install, and investigate for yourself. They’ve invested a lot of effort in making that part easy. Unfortunately, “easy PHP” is pretty much always “insecure PHP.” So WordPress has a long track record of nasty vulnerabilities. It also has a well-earned reputation as a tool spammers love. Because it’s a platform that you can set up yourself with no gatekeeper (compare to Movable Type, professionally hosted WordPress installations, or Blogger instances), it’s become the best choice for spammers (who want to programmatically deploy large numbers of WordPress instances). Then there’s the architecture matter: maybe this is just taste, but I find things like rewind_posts() inherently suspect (and there are unproven allegations of grotesque features lurking in the codebase). More substantially, there’s mutable global state lurking all over the place (on top of the distressing action-at-a-distance issues PHP inherently has — see Eevee’s writeup for more about that), the app buys into the “sanitize input” voodoo, and like most PHP apps, it requires a bunch of read-and-write access to its environment that other language ecosystems . Wordpress' engineering problems lead to persistent and near-intractable security problems, and those problems affect more than just the people running WordPress blogs.

The Superfund Site Of Programming Languages

Because of the friction discussed earlier, problems fixed or mitigated in new versions of PHP (tremendous improvements on versions like PHP4) have a very long half-life before they’re no longer found in the wild. Obstacles to upgrading software don’t have to be insurmountable to keep users on old versions, they just have to exist. There’s a big difference between “easy enough that people can do it” and “easy enough that people actually do it,” and PHP is on the wrong side of that difference. The design & usability world has known for a long time that if the right thing and the easy thing are different, your users will almost never do the right thing. PHP’s legacy of technical debt means that maintaining PHP code has far too much friction for maintainers to always do the right thing. I throw the epithet “avatar of technical debt” at PHP sometimes, because this dynamic means that to use PHP at all is to incur a wallop of technical debt. Worse, this technical debt is almost always an externality, a cost that the person writing the code doesn’t have to pay. Instead, the cost is borne by unknown future engineers and users. Beware of externalities! If you are not paying the real, full costs of your decisions, you will be led to make worse decisions. Because PHP fails so hugely at making the right thing easy, it tends to make the wrong thing the default — and the costs of dealing with the wrong thing are all too often externalized, whether that’s from today’s coder to the same person tomorrow, from an engineer to a sysadmin, or from the vendor to the users of a piece of software.

That it’s hard to update PHP projects wouldn’t matter if those projects were only relevant to their creators and users. This is not the case: those projects are relevant to the public good. As programmers, do not create, modify, or use software in isolation. We interact with software in a social context, in a technological context, and in a networked context. Similar to how herd immunity in medicine means that the chance of catching a particular disease is unevenly distributed, software vulnerabilities are dangerous even to people who aren’t running the affected software. The most common thing that an attacker might do with a compromised machine is suborn its resources, using it to propagate further attacks (e.g. having it join a botnet). This is why it matters that PHP is so big and so bad: even if I don’t write any PHP code and don’t operate anything based on PHP (or on MySQL, its co-conspirator in suckitude), PHP is still a severe and frequent problem for me!

In the recent past:

Returning to WordPress in particular, WordPress' popularity exacerbates these security problems: WordPress has become a platform as much as it is an app. Going from app to platform is both difficult in general and difficult particularly in the security context. A WordPress setup’s susceptibility to attack comes not just from problems in code its users write nor just from problems in code that WordPress' creators write, but by those potential problems multiplied by the worst code in any plugin or theme being used. There are a huge number of WordPress themes and plugins, and they can do anything they like. For example, there’s RePress, which staples a web proxy onto the side of your blog for the use of folks in locales where services like Google and Wikipedia are blocked. Whatever one thinks of RePress, it’s only possible for it to exist because WordPress just picks up plugin code and lets it do whatever it asks. WordPress is a particularly acute example because its target audience is non-engineering users. Someone who sets up an instance of MediaWiki, Joomla, or Drupal faces a higher barrier to entry than a WordPress user, who is the beneficiary of vigorous and successful efforts to make WordPress accessible to a wide audience. Unfortunately, that experience of easy-to-install software ends up re-enacting the Windows 9x era: it’s very easy to install things that create opportunities for attackers, and almost impossible to tell ahead of time which things are safe to install. In WordPress' case, some of its most high-profile plugins, like the TimThumb image resizer and the popular caching plugins, have seen remote-code-execution vulnerabilities that can be exploited at scale, by botnets — and which are particularly likely to succeed against users of WordPress whose blogs and their upkeep are not an every-waking-moment concern.

I worked with Magento professionally for a while, and one thing that gave me massive creepy-crawlies about it was that it has the same kind of wild and problematic plugin ecosystem as WordPress, but centered around an app that’s meant to be handling people’s credit-card information. “All the security of WordPress, also people use it to handle money!” does not inspire confidence (though with eBay now running the show, there’s a good chance that Magento will have the budget to shape up security-wise).

If the problems I’ve been talking about only affected the people actually running that software, I’d care far less. It’s important for people have to the right to make their own dang mistakes. But these things don’t happen in a vacuum. Facebook is the ultimate example: a steady trickle of facebook vulnerabilities make their way to light over time, and there are over a billion Facebook users who can be very directly affected by them. Every unpatched MediaWiki install sitting around, every forgotten WordPress instance, every homebrew app quietly chugging away, is susceptible to becoming part of a botnet and worsening the state of the entire Internet. Every machine that gets rooted, is another machine conducting attacks of one kind or another — and even all of my own servers run on an imaginary free-ponies-with-awesome-sparkles-and-no-security-vulnerabilities-ever language, a legion of zombie PHP-running boxes can still just throw denial-of-service attacks my way until it doesn’t matter what I’m running.

This is why it matters that PHP is both big and bad: by being both ubiquitous and insecure, it pollutes the commons. It adds unncessary cost and friction to any project we undertake that’s connected to the Internet — which is to say, to everything. Every server that connects to the Internet has its attack surface artificially enlarged because PHP’s own attack surface is so vast. Programming doesn’t happen in a vacuum, it happens in an ecosystem — an ecosystem that PHP-based systems have a long and terrifying track record of dumping nuclear waste into.

Public Hazard

Its being sub-optimal, distateful to me, or outright poorly designed, wouldn’t remotely justify my spending time and heartache on telling people not to use PHP. Likewise, I don’t think that PHP will make you a worse programmer except in the extremely boring sense that it’ll waste a lot of your time and thus make it harder to rack up the quantities of deliberate-focused-practice time that one needs for mastery, which is absolutely not a sense worth picking a fight over: everyone has a right to their own yak-shaving. There are plenty of people out there being total jackasses “in defense of” PHP, but those people are freely deciding to be jackasses: their social deficiencies are very much separate from their choice of programming language (plus, Ruby is an amazing language and its community has no shortage of tremendous jackasses). As a programmer who cares about craft and tools, I think other languages will reward your time & effort far better than PHP, but if you don’t use those, oh well. I have zero interest in picking a fight over PHP on that basis. As someone who cares about the Internet being safe and functional enough for me to buy music, check my credit card balance, and communicate with my friends, I want you to stop using PHP and replace existing PHP code — like, yesterday — and I think you should be restrained from using PHP for new projects. I’m willing to pick a fight about PHP on the basis of its decade-and-counting track record of design problems that cause security problems that cause “you don’t write or use PHP but this is going to mess up your day anyhow!” problems.

Software can’t be isolated from its social context any more than it can be isolated from its technological context. The social and technological context of modern software is the Internet. With a large enough userbase, any software project is de facto infrastructure (especially if it participates in the Internet). As builders of infrastructure, we have a moral responsibility to not build hazardous, shoddy infrastructure because doing so hurts everyone who uses or depends on that infrastructure, even indirectly. PHP’s track record demonstrates that it is a grossly deficient tool for building infrastructure. When you undertake to build or maintain infrastructure, you take on a responsibility to everyone affected by the quality & functionality of that infrastructure. Choosing to use grossly deficient tools like PHP is irresponsible and unethical for builders of infrastructure, especially if it’s justified in terms of ease or of being able to build a thing swiftly. By definition, infrastructure projects require that you prioritize durability and certainty over ease and swiftness! Nor is there an argument to be had on a “the other tools are also flawed” basis: none of the other tools have PHP’s decade-long track record of massive deficiency, nor do their maintainers have the indifference towards fixing deficiencies that PHP’s maintainers display. It is only by combining its track record of problems with the long reach that those problems have, that PHP crosses the threshold of “should people be restrained from using this tool?” No-one should lower the requirements for that kind of thing: we should be very, very wary of doing so. PHP has met those requirements: there are no other tools in such wide use whose problems are so many, are inflicted on so many people beyond the tool’s users, have gone unfixed for so long, have so few virtues to excuse them, and are the responsibility of maintainers who have done so little to fix them. Nothing short of that should prompt the programming community to say “no, this tool is not okay to use, stop.”

How to attain the elimination of PHP is a question I don’t have a good answer for, and it’s obvious that the programming community as a whole hasn’t yet come up with a good answer. It’s especially important to demand that a scheme for reducing & eliminating PHP not make it more difficult to get into programming. PHP offers “you can just write code and see it work!” and that’s a hugely, hugely important feature for making programming accessible — the problem is that PHP offers this feature at a ruinously high cost and smudges the ink on the metaphorical price tag. I also think it’s going to be very difficult in general: I’ve been comparing PHP to pollution-causing industrial tactics here, but America has not done at all a good job of holding people who cause pollution responsible for its harmful effects.

I look forward to a future where we’ve invested the collective effort in building tools that fit our hands gracefully and that don’t sabotage our efforts to build durable, predictable, world-improving infrastructure. Software has both a social and a technological context: this means that the apparently-social problem of eliminating PHP also is a technical problem. The technical problem is “how do we build something better than PHP?” and the tremendous numbers of beautiful & useful solutions we’ve already come up with for that problem, give me every confidence that we can handle that part. Now let’s work on the social part.

Note: this post was updated in summer 2016.