Exterminating the Non-Breaking Space Bug

O layout mutilator! O blogger humiliator!

Among the most dramatic results of last Monday’s hearing on President of the United States Donald J Trump’s Twitter habits and related matters was the appearance in the virtual pages of Lawfareblog – among the majorest of major minor blogs of this post-blog epoch – of the Phantom Non-Breaking Space Bug.

Chrome Inspection reveals a major minor infestation in Lawfareland:

Exhibit 1

As of this writing, Chrome and Brave will insert the unwanted, layout-endangering character code via interaction with “TinyMCE,” a key component in text editing applications around the web, at a virtual level beneath or deeper than the HTML that we see in the lower section of the image above. In short, when the user, operating in a certain common way, replaces selected text with new text, an adjacent white space gets converted into a non-breaking space or facsimile, which can disrupt word-wrapping, since the non-breaking space is, perhaps unsurprisingly, a space that does not break: The affected browser treats the   as a single letter within a single, not to be broken word. On occasion, the presence of an unwanted non-breaking space may also interrupt other processes or applications (some embedders, footnoters, and the like) that look for an empty space at the particular spot, not a pseudo-empty one.

Note that the bug does not recur during many common types of revision, but mainly during a peculiar type of text highlighting and replacement: The typical writer behavior that produces the problem is, to be more precise, a kind of confined over-writing (in the operational, not the stylistic sense), not usually simple deletion and replacement. Note also that it occurs only while working in the “Visual” editor (the preference of, I believe, the vast majority of WordPress users).1

Unfortunately, the problematic approach to revising text is quite common and in some circumstances and for some writers will be second nature: see a word, want a different word, over-write it, move on: I do it, and I would be willing to bet that Mr. Wittes at Lawfare was doing it when finishing off his latest excellent piece on matters legal and Trump. On a heavily revised post, the bugs may accumulate, and, to make things even worse, these potential layout-mutilators, blogger-humiliators will remain invisible to you when you are looking at your apparently quite beautiful handiwork in un-augmented WordPress Visual or Text editing panes. If you’re in a bit of a hurry and neglect to check your preview carefully, it may be only the published post that suddenly reveals the typographical horror that you have unleashed upon your readership.

As of this writing, the bug still affects Chrome and Brave, which are both “WebKit”-derived browsers, but not (at least on Windows10) Safari, even though Safari happens to be the WebKit grandparent. Otherwise, comparisons across browsers and machines may turn up other incidences or ways to non-breakingly break your layouts, similar to others reported over the years, but, in any event, a particular problem on Chrome – est. over 50% of all web usage as of December 2016 – affecting WordPressers (ca. 25% of the web last I checked) is already a widespread problem, even if in itself the problem may seem to some to be a small problem.2

OK, so how about fixing it?

Because the bug occurs “upstream” from WordPress itself, we cannot fully eradicate it from within WordPress, but we can prevent it from being saved to the database and actually affecting display when posts are loaded.

First, I’ll present a way to stop the dang thing from happening, then I’ll deal with some choices for dealing with archived posts already littered with the unwanted characters.

Using PHP and WordPress filters to get rid of the bad characters

The most straightforward PHP solutions will utilize either the content_save_pre or wp_insert_post_data WordPress filter. All post content passes through each of these filters before getting saved. wp_insert_post_data also handles a lot of other things other than post content, so qualifies as both more powerful and more complicated than content_save_pre.

First, for a simple, global fix, here’s the content_save_pre version.

The main peculiarity to note in the above are the characters that WordPress, or, to be more precise, the interaction of browser and database encoding, interprets as  , but which lurk in the “UTF-8”-encoded WordPress MySQL database as \xc2\xa0. It can be confusing and frustrating if you were not already aware of this factor – and that’s all I really wanted to say here, except that I also have to note the other peculiarity of the code, which is that it addresses both the HTML   as well as the UTF-8 \xc2\xa0. The UTF-8 version was supplied by WordPress maven “Rarst” at StackExchange as per the link in the code, but Rarst’s solution would not work at, for instance, this site, or any other of the 1 million WordPress sites that use Next Generation Gallery (NGG), or at any site where a plug-in was in use that happen to treat “the content” with script similar to the one that, as I discovered, NGG uses.3

For the vast majority of users, the above will be as good as a true fix, though there are times and places where, in the course of human blog-editing events, someone might like to keep an  , or even a bunch of them. For those who need to allow for exceptions, we can add one of WordPress’s many built-in conditional tags to isolate certain types of post or post content from the function.

For the example, we’ll shift to wp_insert_post_data:

You could write something that gave users the ability to exclude other types of posts, or even to quarantine or reverse-quarantine particular passages, though I think that the vast majority of users will be satisfied with never using  's in post content at all, as in the first example.

There is a third PHP version, or actually something of a hybrid, that also works, but not quite as well, in my opinion. It was offered by a user at Make WordPress Core, on one of several somewhat redundant threads on this subject. He or she neglected to include a full working version of it, so I’ll do that here. Note that you may have to save twice to get it to work after installation.

The advantage of this hybrid might be that it will leave hard-coded  ‘s intact if entered and saved under the “Text” pane. However, solutions that depend on users remembering to avoid the Visual pane are prone to failure. Having a category or other taxonomy, or perhaps a format or Custom Post Type, set aside for leaving desired non-breaking spaces alone would be more user-proof. Though with some further work we could introduce warnings or other workarounds, I’ll stick with the other alternatives.

But what about all the posts already saved with all those extra  ’s?

There are two ways to handle the problem of already-infected legacy posts. Which one you choose will depend on how much you care about curing your database permanently, and how comfortable you are dealing with it the database directly. If you actually like or need, or liked or needed, to use non-breaking spaces for certain purposes, meaning your database includes a sprinkling or more of non-breaking spaces that you’d like to preserve, that might also figure into your calculations.

jQuery-style

Without touching your database, and independently of whether or not you employ a version of the PHP functions, you can apply a jQuery solution that will at least preserve good layout. Note that all of these examples assume your theme is using standard WordPress post classes: If you’re using a specialized theme with its own unique approach, the code might have to be adjusted, perhaps as substantially as your theme differs from vanilla WordPress.

For the front end, the following script will do – but it needs to be enqueued carefully. A safe general-purpose alternative will be supplied further below:

Enqueue it only for single-post pages – as below – or you will likely end up with destroyed front pages or archive pages:

The same script will work on the Admin side, but with body substituted for .post-entry.

How exactly you’ll want to add the script to your set-up may vary with features of your theme, as well as with your preferences for script consolidation. For details on how to add jQuery scripts to WordPress themes, refer to the WordPress Codex or any of countless tutorials, but here’s a basic format for adding the script, just for “singular” posts/pages, excluding the ‘Silly App’ category, and post ID# 64513. We’re locating the script in the anti_nbsp.js file in our theme’s js folder:

*See notes.4

The above code sets the script to work for single post pages only. If your theme uses “the_excerpt” and is set to include posts as saved up to a “read_more” link (fairly standard among themes, but by no means universal), you may still end up with some pesky NBSPs on archive pages ( “archive pages” in this usage includes typical blog front pages as well as typical author pages, search pages, category pages, and so on).

To cover that alternative, you might want to try the following script, enqueued without the exclusions. Or you can use both scripts, with the following one enqueued separately for ! is_singular() – i.e., for pages other than archive pages:

The above script enqueued without exclusions would work for posts whether on single post pages or in archives – so, if you’re not really worried about keeping some  ‘s, it might be preferable.

Working out more precisely targeted exclusions for particular posts or post-types with hard-coded  ‘s that serve some purpose for you or your plug-ins, and that you want to be allowed in archive pages, too, while not enqueueing the script separately, is certainly possible. Almost anything is possible text-manipulation-wise, but the precise method will probably need to be tailored to your installation: For instance, if we knew that your app always produced a certain CSS class, we could target the exclusion – maybe $('.hentry').not('.category-silly-app').each(//etc. – or some such.5

Curing the db

Finally, if you want to cure the database – on general principle or because that’s what your client wants – you can do it with WordPress commands: WordPress is already a database manager, although it’s user-friendlier for some purposes than for others.

Your code might look something like the following:

The catch with the above is that the operation turns out to be resource-intensive: Even on an only moderately large database of posts – ca. 1,000 or so – you may get timeouts, and either need to adjust your PHP configuration, or slice up the selection into pieces using WP_Query parameters, which get_posts() does accept, and which you can also use to exclude certain posts or types of post.

For a large database, if you’re not comfortable with MySQL database operations, you may be better off 1) using phpMyAdmin or some other tool to export the posts table, then 2) using a text editor to run a search and replace on the table (for “\xc2\xa0”), then 3) re-importing it in place of the original. All usual warnings about making backups and backups of backups apply here, especially if you’re not used to working with the db. More complex operations will require good MySQL skills.

If you do need to slice and dice the table – specifically excluding or including different categories or posts or post-types, including some time periods but not others, etc. – you may be better off going back to WP_Query arguments, which, not to put a fine point on it, were designed by people whose MySQL skills are likely better than yours and certainly superior to mine.

Or…

You could also try never editing posts on affected browsers. Or you could be very careful when editing posts on those browsers, including by picking through the text using an inspection tool, and manually deleting  ‘s. …Or by never using the Visual Editor. Or you could live with occasionally messed-up layouts, while waiting for an upstream fix that may never come…

…or that may turn up unexpectedly any day, rendering this entire discussion obsolete…

What I do:

In my case, since I used Firefox for almost everything for years, and tried to correct bad layouts when they cropped up, I’m not worried about the database being heavily infected. Since I now use Chrome as a baseline and Brave as a backup, however, I’ll be sticking with the first function for new posts, and the jQuery solution to keep the Front End tidy.

Notes:

  1. If you wanted to re-produce the bug for your own edification, you could open a WordPress post for editing, target some text for inspection, highlight just the text you want to replace, and then hit the backspace key once, and you’ll see the   appear. A lot of the time in such situations, a user would proceed by hitting backspace a second time, and in the process clear the  , but if, instead, you carefully highlight only the text you want to replace, then over-write it with your intended improvement, then skim over to some other location or just hit “update,” the   will remain behind. []
  2. It very likely affects other writers on other platforms, but they’ll just have to fend for themselves, I guess. []
  3. I had persuaded myself – and had misinformed some users – in a WordPress Support Forum and at Make WordPress Core – that the same code exept with nbsp;, only, not with the UTF-8 encoding, would solve their problems. However, when I happened to test my work with NGG de-activated, the fix failed: It just so happens that, deeply buried in NGG (one of the more complex of widely used WP plug-ins), an Ajax process is initiated that makes the character-code accessible to the function as HTML. With NGG active, the UTF-8 version doesn’t work. So, I’ll recommend using code with both versions. []
  4. I like to add a version number to scripts during testing, so that when I update them I don’t have to re-set the browser cache to see the results. For production sites, you may wish to leave these off for sake of the (slight) benefit to page speed scores. []
  5. By the way, .hentry has been around forever. What does or did it stand for, I wonder. “Headline entry” maybe? []

WordPresser
Home Page  Public Email  Twitter  Facebook  YouTube  Github   

Writing since ancient times, blogging, e-commercing, and site installing-designing-maintaining since 2001; WordPress theme and plugin configuring and developing since 2004 or so; a lifelong freelancer, not associated nor to be associated with any company, publication, party, university, church, or other institution. 

Commenter Ignore Button by CK's Plug-Ins

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Noted & Quoted

[E]ven Fox didn’t tout Bartiromo’s big scoops on Trump’s legislative agenda, because 10 months into the Trump presidency, nobody is so foolish as to believe that him saying, “We’re doing a big infrastructure bill,” means that the Trump administration is, in fact, doing a big infrastructure bill. The president just mouths off at turns ignorantly and dishonestly, and nobody pays much attention to it unless he says something unusually inflammatory.On some level, it’s a little bit funny. On another level, Puerto Rico is still languishing in the dark without power (and in many cases without safe drinking water) with no end in sight. Trump is less popular at this point in his administration than any previous president despite a generally benign economic climate, and shows no sign of changing course. Perhaps it will all work out for the best, and someday we’ll look back and chuckle about the time when we had a president who didn’t know anything about anything that was happening and could never be counted on to make coherent, factual statements on any subject. But traditionally, we haven’t elected presidents like that — for what have always seemed like pretty good reasons — and the risks of compounding disaster are still very much out there.

Comment →

So, does Mitchell make any money on the work, which has been shared so many times? He uploaded a high-res image of the symbol and granted permission for anyone to use it personally for free. But for those who want to support his work or simply want something readymade, you can also buy T-shirts, sweatshirts, mugs, and journals emblazoned with the symbol through Threadless.“I really just want to spread the image as much as possible and cement it in history,” Mitchell says. “In all honesty, the amount I’ve made from my Threadless shop so far is still less than my hourly rate, so I don’t really see it as a big deal. If you look at my Twitter, half the replies are people wanting to know where they can buy a shirt. Threadless is happy to help them out with that, and so I’m happy to let that happen.”Now that the symbol has flooded our streets and our timelines, Mitchell just has one request: “Impeach this idiot already,” he says.

Comment →

This is a Waterloo moment for Trump, the tea party and their alliance. They have been stopped in their tracks not only by Democratic opposition but because of a mutiny within their own ranks. Although never particularly liked or respected, it is now clear that they are no longer feared. The bankruptcy of their ideas and their incompetence have been exposed. Their momentum has been dissipated. Their rejection of political norms has itself been scorned. Our long national nightmare may finally be coming to an end.

Comment →
CK's WP Plugins

State of the Discussion

+ BTW, I recently upgraded some this and that on the back end of the blog, and it does seem to make comments post much faster [. . .]
Gutenberg: The Invention of the Printing Press, the Destruction of WordPress

For WordPress self-hosted people, there is already a "restore legacy editor" plugin, even though Gutenberg hasn't been installed yet as the default.

Gutenberg: The Invention of the Printing Press, the Destruction of WordPress
+ I thought you were on WordPress.com, not self-hosted WordPress. I can't find any info on WordPress.com and Gutenberg or Gutenbergerish editing, so I don't know [. . .]
Gutenberg: The Invention of the Printing Press, the Destruction of WordPress

Extraordinary Comments

CK's WP Plugins

Categories

Related