It’s an API, do I really need to escape anything?

Escaping output is often overlooked in APIs, but it’s crucial for preventing security vulnerabilities like XSS attacks. Even when returning JSON, unsafe characters can lead to risks if not properly escaped. This article explores why output escaping is essential and how to secure your API responses.

25 days ago   •   8 min read

By Stephen Rees-Carter
Table of contents

Let’s talk about escaping output with APIs, as I’ve found this is an area that’s often overlooked and may come back to bite you. I once found a vulnerability in a popular open-source project that made the unfortunate assumption that API output didn’t need to be escaped, and I’ll tell you that story shortly - but first, let’s look at what escaping is.

What is escaping output?

If you’re a web developer, the first thing you think of when you hear “escaping output” will probably be something along the lines of escaping user input when it’s displayed within HTML - usually by translating special characters into special HTML codes. This instructs the browser to render these characters safely, rather than interpret them as HTML.

For example, if we take the following string:

"Hello, World!" > "foobar"

It’s a perfectly legitimate string that a user may submit into a form field, however if we try to put that value into the form field, we’ll have a problem:

<input type="text" value=""Hello, World!" > "foobar"">

Which looks like this in the browser:

While this is a safe example, it’s trivial to exploit this to inject some malicious code into the page. 

Consider this input:

"><script>alert('Boom! XSS!')</script>

Which produces this HTML:

<input type="text" value=""><script>alert('Boom! XSS!')</script>">

Which gives us one of these in the browser:

The issue here is the double-quote (") within the string breaks out of the HTML value attribute, and the angle brackets (< >) are used to close the input tag and open a new script tag.

The solution here is to use output escaping - specially by swapping out those special characters with their HTML Code equivalents:

For example:

& → &amp;

" → &quot;

' → &#039;

< → &lt;

> → &gt;

We can do this easily in PHP with the htmlspecialchars() method:

> htmlspecialchars("\"><script>alert('Boom! XSS!')</script>");

&quot;&gt;&lt;script&gt;alert(&#039;Boom! XSS!&#039;)&lt;/script&gt

The resulting string is safe to use within HTML - it won’t break out of any HTML attributes, or introduce new HTML tags.

But all of that relates to HTML, right?

Do I Need To Escape Non-HTML Output?

It’s fairly common for APIs to return JSON and not HTML, and you’ll usually be building your JSON using a converter, so do you need to worry about escaping user values?

Let’s take a look.

Consider this PHP:

<?php
$output = [
    'output' => "<img src=x onerror=alert('Boom!')>",
];
echo json_encode($output);

We would expect the JSON output to look something like this:

{
    "output": "<img src=x onerror=alert('Boom!')>"
}

Which looks like safe JSON, right?

Running it in the browser gives me this:

Ok, so you could argue that this is a content type issue, and it is really easily solved by adding in the following header:

header('Content-Type: application/json');

However, this relies on your application returning the right content type in the header, and the browser actually honouring it. Unfortunately, the browser tries to be helpful, and if things get corrupted or broken, there is always the possibility that some HTML will be executed. You’d also have to be loading the API’s JSON page directly for it to execute - although this could be done inside an iframe under the right conditions.

So you probably noticed the weird \u003C and \u003E in the above screenshot? Chrome did that automatically when rendering the JSON - I’m not entirely sure why, but I assume it’s a security escaping thing..?

Regardless, here’s the raw output that was sent to the browser:

{
    "output": "<img src=x onerror=alert('Boom!')>"
}

However, this provides a direct hint as to what we can do to make this JSON safer - and it’s something we looked at earlier!

We can escape the special characters!

We can’t use the HTML character codes we used above, but we can use those HEX sequences to represent the Unicode for the special characters.

This can be done with PHP’s json_encode() method using these flags:

JSON_HEX_TAG: Converts < and > to \u003C and \u003E.

JSON_HEX_AMP: Converts & to \u0026.

JSON_HEX_APOS: Converts ' (single quote) to \u0027.

JSON_HEX_QUOT: Converts " (double quote) to \u0022.

Here’s our new code:

<?php
header('Content-Type: application/json');
$output = [
    'output' => "<img src=x onerror=alert('Boom!')>",
];
echo json_encode($output, JSON_HEX_TAG | JSON_HEX_AMP | JSON_HEX_APOS | JSON_HEX_QUOT);

Which gives us this output:

{
    "output": "\u003Cimg src=x onerror=alert(\u0027Boom!\u0027)\u003E"
}

Now the brackets and quotes are escaped, removing the content type header from the code doesn’t affect the output. The output is properly escaped, so no malicious code can be executed. 

Wrap up the function and flags in a helper function, use it on all of your JSON outputs, and you’ll have some solid protections in place.

What About Consuming APIs?

Escaping output inside JSON is a common finding I come across when I do my penetration tests, and Burp Suite always gives me a bunch of these reports after a scan of an SPA:

However, it’s worth pointing out that last line - which I’ve highlighted.

“However, the issue might be indirectly exploitable if a client-side script processes the response and embeds it into an HTML context.”

There are two sides to this:

First, if you’re consuming data from your own API and rendering it on the page, you need to be aware of the data you’re sending to your front end and how you’re rendering it, and secondly, if you’re consuming APIs from other providers - how are you handling the data they’ve given you?

Consuming Your Own APIs

When you’re consuming your own APIs within something like an SPA, it’s tempting to put all of your escaping on the server side, and just render what you’ve given in the front end blind. Maybe you’re doing complex manipulations of the data to build specific HTML blocks or inject some markup? Or maybe it’s just more consistent to do everything on the server side and have the front end just handle rendering the template? 

While it’s not a terrible solution, it does require you to be consistent across your application in how escaping is handled, and it’s easy to overlook it if you’re working on the backend and know you’ll be sending the output to the view in JSON - and not HTML.

I’ve come across a number of vulnerabilities where HTML was being constructed in the backend and sent through to the browser - with neither side escaping the output! One of my recent ones involved search results,  where the search terms were being highlighted in the results through some HTML.

The backend just did a string-replace of the search term to add a <span> tag wrapper to highlight it, but didn’t escape the search term - which was user input. The front end rendered the output raw because of the injected <span>tags, and Cross-Site Scripting (XSS) was quite easy to inject and abuse.

There were two ways to fix this:

Either escape the search terms on the backend as part of the injection - which comes down to being consistent in your escaping everything the API sends to the browser, or do the search term highlighting in the front end, and escape the search term there.

Both are valid solutions, but you need to be consistent.

Consuming Other APIs

I teased it at the start, and it’s time to look into the vulnerability I found in Mastodon! 

I was procrastinating one day, when I noticed the following post:

It caught my eye because the text was clearly truncated, but it was showing the preview for a link (to my website) - which wasn’t present in what I could see.

Confused, I clicked “Show Original” and was presented with this:

The original message contained a lot more information (in German), including the link I was expecting to see. While I don’t know any words in German, it was pretty clear that the truncation lined up with that <script> tag on the second line, which got me thinking… 🤔😈

I checked the source and sure enough, the translation was being injected onto the page - without escaping! 

So naturally, I had to try this myself. I composed a post in German and tried it!

Here’s the original translated version:

And here’s the translation!

The browser’s Content Security Policy (CSP) blocked the attack, but I had a successful XSS vector on Mastodon - an open source project used by a huge number of people…

I dutifully went off to report it (responsibly) to Mastodon, who resolved the issue quickly and an update was rolled out.

The cause of the issue was simple: Mastodon was trusting the API output of the translation service to be safe and wasn’t escaping it.

The fix was pretty straightforward, they had to escape the output from the translation API before rendering it. 

Summary

As a security person, I feel like I say this a lot, but I need to repeat it here again:

Don’t forget about output escaping!

Don’t forget about output escaping!

Don’t forget about output escaping!

Don’t forget about output escaping!

Don’t forget about output escaping!

It’s easy to overlook it, and forget where the boundaries are or that it applies to more than just raw HTML, but it’s critical that you’re always thinking about escaping. This is where a tool like Treblle’s API Security can help - it’ll monitor your APIs and let you know about content type issues and malicious looking values.

💡
Start optimizing your API performance today with Treblle. Experience the benefits of real-time monitoring, comprehensive logging, and actionable insights. See how Treblle can enhance your API observability and help you maintain robust and reliable APIs!

Spread the word

Keep reading