Reece Payne, Security Consultant | November 23, 2018
So, you've spent the better part of six months sweating lovingly over your app, feeding it, watering it, finding somewhere to host, and maybe even somebody to buy. Great, get ready to #ShipThatBadBoi! Except... no... wait. Is that a user? Oh no, oh goodness no! Look at the mess they've made! Who knew an errant semi-colon could have done that! That was just a normal user, imagine if it was somebody who was... how do I put it, looking for data that could make a GDPR enthusiast blush.
Maybe you should have spent some time testing your app to see if the user input was handled properly, making sure nothing except letters and numbers was allowed through the gauntlet of your front-end code…. Maybe. Or maybe a better way would have been to incorporate some up-front principles that would have meant less work in the end. Nothing is more painful than grepping through code and changing input handling, well... plenty of stuff is more painful, but you see my point, right?
We often think of security as a stage-gate, or a horizontal strata in the phases of application development, and heck, 9/10 if you develop like I do, security is the sort of afterthought that happens six months after you go live. Okay, I'm not nearly that bad, I'm just trying to fit in.
This particular piece is about user input in web applications and may even be a part of a series on application security. Before I jump in to the principles let's have a look at what stuff in a web-server could consist of user-controlled input.
Hmmmn, what could be user-controllable on a web-page?
So maybe you already know what is and isn't user-controlled input. For the sake of making sure we're all on the same page, I'm going to take us through a little example.Here's a shoddy HTML form I threw together:
So, we have a few elements at play here:
- A textbox;
- A dropdown / selection list;
- A range slider; and
- A submit button.
I'll give you a second to think about which of these elements is able to be abused by a user.
Have you got it yet? Did you think it was the text box? Well yes, that's _one_ of the places a user could input something bad, and certainly the thing the user might accidentally stick the wrong thing in.
But in reality, it's every. Damn. Element. On. The. Page. Sure, in the UI, somebody can't make the range slider have a value of <script>alert('lol')</script>, but if somebody captured that input in something like Burp or OWASP ZAP, they can change the values. Heck, you don't even need Burp or ZAP. If you know the input parameter names, you can use curl, python, or the developer tools in any web browser to just change the value.
Depending on what you do for a living, this might make sense. Graphic designers and front-end guys might tend towards thinking that sliders, dropdowns and checkboxes aren't an issue. However backend devs may have a better idea of what I'm on about. And the unicorn full-stack devs? Presumably they're off getting paid a motza.
Let's look in to this in more detail. Be warned, this is either going to be technical, or university level boredom inducing, because I'm going to explain HTTP! Yay!
Here's what the web browser sends to the server when I submit the above form:
POST http://localhost:8080/otherthing.html Host: localhost:8080 Referer: http://localhost:8080/example.html Content-Type: application/x-www-form-urlencoded Content-Length: 31 input1=test&cars=volvo&range=50
If I had submitted the form as a get request it would have looked something like this instead:
GET /otherthing.html?input1=test&cars=volvo&range=50 HTTP/1.1 Host: localhost:8080 Referer: http://localhost:8080/example.html
As you can see the input values are plain as day, and a man-in-the-middle proxy like Burp or ZAP can change those values, or you can use a utility like curl to craft your own HTTP requests.
Let's see what changing things could look like in action:
POST http://localhost:8080/otherthing.html Host: localhost:8080 Referer: http://localhost:8080/example.html Content-Type: application/x-www-form-urlencoded Content-Length: 31 //Notice that all three values from the form are just plain text? Any of those can be changed. For example, the second line I've altered the range slider to something cheeky. input1=test&cars=volvo&range=50 input1=test&cars=volvo&range=';DROP TABLE users;= //Or I can change the 'cars' dropdown: input1=test&cars=&range=50
Okay, so what about the get request? Even simpler.
GET /otherthing.html?input1=UH_OH&cars=malicious_string_here&range=XSS_goes_haer HTTP/1.1 Host: localhost:8080 Referer: http://localhost:8080/example.html
But query parameters and form formatted inputs aren't the only user controllable input. Let's have a look at the below request.
POST http://localhost:8080/otherthing.html //These are user controllable. Host: localhost:8080 Referer: http://localhost:8080/example.html Content-Type: application/x-www-form-urlencoded Content-Length: 31 X-Custom-Header: 1337 //The cookies are also user controllable. Cookie: PHPSESSID=298zf09hf012fh2; csrftoken=u32t4o3tb3gg43; _gat=1; //The entire HTTP message body is also user controllable. input1=test&cars=volvo&range=50
We have a bit more going on here. Notice we have a bunch of headers, and a cookie. Now, changing these to anything else will most likely either make no difference, or the server will error, but there have been issues in the past with parsing headers, [like this one](https://blog.qualys.com/securitylabs/2018/08/23/detecting-apache-struts-2-namespace-rce-cve-2018-11776). Most of the time, the web server or framework you are using will handle all of the default header types and there's not much you can do except to make sure the framework or web server you are using is up to date. However, if you are using headers (like the X-Custom-Header I threw in), make sure you validate the value of the header before using it.
The same goes for cookies, which a user can easily change the value of. This gets even worse when the value of a cookie is actual JSON, or a python pickle, or a serialised java object.
Okay, so what's the point I'm trying to make? Basically, everything is user-controllable input in a HTTP request. While your average user may just fat-finger something and cause a minor issue that worthy of a laugh more than anything, you don't want to spit out your trove of GDPR-fodder just because some kid from a country with a low cost of living and nothing to lose is experimenting around with SQL injection.
Onwards to Reece's handy principles for handling user input.
- While client side input validation is laudable (and output validation is a good idea), it is no substitute for server-side validation.
- Don't blacklist known-bad characters (or characters you think are bad). Whitelist known-good characters instead, for example, only allow a-z and 1-0.
- Use built-in sanitisation functions before writing your own. That said, if there are no built-in functions, use a library, and if there's no libraries, write your own. Doing something is better than nothing.
- Don't just assume that because your application can handle funny input, some other application that uses your data (or data input by your users) will also handle funny input properly.
- Convert characters to their HTML entity or URL-encoded versions before storage, rather than only when read from where you are storing the data.
- Base64 may be a benign format, but you are always better off transforming (converting to encoded or entities) the string than just base64 encoding it, otherwise you are shipping the problem down the stream to where you are decoding that data.
- In fact, minimise storing anything important in cookies, and whatever you do store in cookies, treat it like any other form of user input. Doesn't matter if you're storing the colour of the website, or the user's time zone, or a random, long session ID, validate it.
- An application should take care of its own input validation and equally should make sure any output it passes on is also clean. The number of times I've heard "our app only gets input from this other upstream app, so we don't need to validate" is too many times. All apps should equally sanitize their input and their output. Don't trust anything, but at the same time be a good citizen in your application environment and santize data that passes through.
- If you want users to be able to generate formatted content in your application (think a wiki or blog) use markdown, not HTML. Markdown is generally only used for formatting, whereas HTML mixes a whole bunch of concerns. Don't trust users with HTML.
- If you absolutely have to log user input (which is probably something to write a whole other blog post on), you may want to preserve what was originally entered (say to look for users trying to throw malicious things into forms), but you might want to convert it to a safe format beforehand so that it won't execute in whatever you look at logs in (Splunk had an issue with this a while back). This is where base64 encoding input may be a good idea.
- Understand that there is more than one goal somebody is trying to reach when abusing any place that accepts user input:
- Cross-site scripting - To force other users to see lame memes or steal session cookies and auth tokens.
- Remote code execution - If you're running PHP and the user can enter some more PHP in an input location and it executes, this is a *bad thing*.
- Command execution - Like the above, except in the underlying operating system. This is especially prevalent on home routers where some of the network functions in the web interface actually just run operating system commands and get their output. If done wrong a user can use a ';' or '|' to execute additional commands.
- Database injection - Tricking the database to be in a non-GDPR friendly state.
- And more obscure things, don’t be surprised when they pop up.
Okay, so that list might not be exhaustive, but you get the point. Remember these things at the start of developing an application and they'll be easy to incorporate, test and manage. Adding them in after the fact would really not be a fun time.