[Tutorial] Prevent SQL Injection!
So, a few days ago I asked about how to prevent mysql injection. Someone suggested me to use only a mysql_real_escape_string
but that's not enough because someone can use shit like ORDER BY
and others that I will not tell you here. I found that replacing bad characters with nothing in my scripts will make it impossible for someone to perform an injection.
$post = mysql_real_escape_string($_GET['post']); $post = preg_replace('/[a-zA-Z _()-.,@]/', '', $post); $posts = mysql_query("SELECT * FROM posts WHERE id=$post");
what that does is in
line 1: replace characters like ' with \'
line 2: replace bad characters and letters (i only use numbers) with space. if you use letters too just erase the "a-zA-Z"
line 3: make a safe query.
Comments
It appears that you're still using data directly in your query. Do NOT attempt to self-sanitize your queries. Sanitizing input is to clean up data, not create/increase security. There will be criteria that you will not anticipate (people are crafty), meanwhile, you will think that you're secure.
http://www.slideshare.net/billkarwin/sql-injection-myths-and-fallacies
http://stackoverflow.com/questions/129677/whats-the-best-method-for-sanitizing-user-input-with-php
http://bobby-tables.com/ - Paragraph titled "How to avoid Bobby Tables"
@Damian sorry but my native language isn't english. could you explain that?
Your doing it a crazy way. just cast it to an int. for strings use thwarting proper quotes then mysql escape which will work. using or and stuff can't work then. Php devs are much more experienced than you use their tools don't make your own
If you're only using numbers, you can just use $post = intval($_GET['post']); ...
@OneTwo: Use parameterized mysql. No need to worry.
To effect this:
If you're writing object-oriented, or "class"-based code, use PDO: http://www.php.net/manual/en/intro.pdo.php
If you're writing not-object oriented, or "flat" code, it may be easier for you to use mysqli: http://us2.php.net/manual/en/book.mysqli.php
I believe mysql_real_escape_string is enough to filter out things but what the hell, I use yiiframework.com as the PHP framework and it does all for me
For the purpose of this response I will ignore the existence of PDO and friends.
Using a regex to sanitize your queries is a horrible idea, as they're rather expensive to parse.
The situation you mentioned where things without quotes (ORDER BY, etc) can be abused, can be very easily defended against by:
1. Always putting values in apostrophes in a query (even numeric values)
2. Checking for a value being numeric.
Example code using a shorthand if statement:
The above example also implements the practice of prefixing variables with sanitation status as explained in http://www.joelonsoftware.com/articles/Wrong.html. It also makes use of {} around variable names to improve readability and consistency - which, in turn, makes it less likely for you to make errors.
btw that could work also:
$post = preg_replace('/[^0-9]/', '', $post);
use
$post = intval($_GET['post']);
or
$post =(int) ($_GET['post']);
nothing else ....
Inserting data in the database without stripping html code is generally a bad idea.
Why you say that? HTML doesn't hurt anyone. It is best to store the entities though.
HTML was deadly in the 90s.
Assuming you'll be listing names from the database somewhere (something like <?php echo $name; ?>, I can be a bit evil and break your site layout or I can be more evil and put javascript code, and you'll get nice XSS.
Yay, someone else actually uses this!
@OneTwo there is a famous saying about regex,
Why reinvent the wheel, when you have what @Damian said above. You don't even need OOP, just use the normal functions of mysqli_*
It is true, however sometimes people will escape the output from the database, before outputting it to the user. But in general, it is a bad idea to allow HTML into the database.
Uhm, no. It's actually the prefered way of doing things. The only sanitation you should do when inserting something in a database, is making it ready for a query; otherwise, the data should stay unchanged as you cannot predict where you will output the data later on. It is an extremely bad practice to further modify data before storing it.
The correct moment to use something like htmlspecialchars() is after database retrieval, just before outputting it into a HTML page.
EDIT: Or strip_tags(), if you have more specific needs.
@joepie91 is dah master. Period
xD
Thanks and more thanks to all. But I don't want to mess your signatures n_n
so does any cast ?
http://www.php.net/manual/en/language.types.string.php#language.types.string.conversion
Seems pointless adding it in
Unless you're expecting html input, you should strip it. Why would you let users enter any code to your database if you don't expect it? Give me one good reason why would you let someone put html code in the name input field (for example). It will bounce off your head sooner or later, you are only inserting that data once, and you may display it multiple times and someone will forget to strip_tags or htmlspecialchars sooner or later and then you're in trouble.
Prevent SQL injection? Don't use SQL
Like a boss!
Let's say you run a website that has a 'company directory' where people can look up local businesses. One certain company has, literally,
<Arrow> Communications
as their company name, and fill that in in the appropriate field. Now say you run htmlspecialchars over all your input by default. Guess what? The entry in your database is now<Arrow> Communications
. Now let's say you sell your database to a company that is going to make an offline desktop application, distributed on CD, with all company data in it. Their application - which doesn't have a clue what HTML even is - now has to deal with the possibility of<
and>
in the database fields.All this could have been avoided by simply storing the input as provided (only escaping to prepare it for a query), and only using htmlspecialchars() before outputting it. Interestingly, that does not have any drawbacks, so there is no real reason not to do it.
HTML doesn't do anything to a database. If it doesn't have to be filtered, don't. Always keep stored data as close to the original input as possible.
EDIT: For the record, htmlspecialchars() could care less what is between those < and >. If it looks like a HTML tag and it smells like a HTML tag, then to htmlspecialchars() it's a HTML tag and it gets replaced.
"could care less" doesn't make sense
The drawback is what I said before. Sooner or later someone will forget or just be too lazy to escape every single string coming from the database, I've seen it happen before.
Your highly hypothetical and marginal case will still need to be exported, converted or whatever before adding it to CD. Running htmlspecialchars_decode or similar during that is not really that difficult. And let's say you're going to sell your database to the company that's going to make a web application? How do you know they'll escape everything? Do you prefer having
<
on a CD or malicious code on their site?No, it doesn't have to be filtered. It's a good idea to filter it tho. Don't store malicious code in your database, just don't, it's a bad idea.
And that is the exact same situation as you could get when escaping before inserting into the database, so in that sense it makes no difference at all whether you do it before or after insertion. To get an idea, look at the amount of people that forgets to escape their SQL queries.
Yes, because noone ever uses a database for more than one purpose.
Which can then be done when it is exported, in a manner that appropriately prepares it for the medium it is going to be displayed on.
Which means you're trying to un-malform data that you unnecessarily malformed before. Not to mention introducing an extra 'link' in the chain where things could go wrong (what if there's a bug in the decoding function?)
You don't know, that's their responsibility; not yours. You're a dataset provider, not their application programmer.
And I think we've just seen why it's a bad idea to filter it.
HTML is not malicious code. HTML is HTML, and in the context of a database, it cannot possibly be malicious, only in the context of the place where you output it, which is why you take appropriate measures at that point.
This discussion is getting ridiculous. Do you actually have even one valid point for escaping before inserting into a database, considering I already brought up a valid point for not doing it?
As I said before, you're inserting it once and it's easier to escape it once then every time you display it. Also, you are already preparing data for insert and you can do it in one go.
I said that?
Isn't that pretty much the same what I said?
I'm pretty sure they don't want to buy a javascript keylogger from me (for example)
You want to keep the data intact at the risk of possible mistake by you or someone else in the future with possible catastrophic consequences, that's your point.
I want to be safe and escape everything. A mistake in the future will only lead to several characters displayed incorrectly, that's my point.
I can live with some characters displayed incorrectly. You don't have to agree with me.
And if you have several submission forms and one overview, the situation is the exact opposite. Your point?
This is ridiculous. It's just as easy to "forget one function" as it is to forget sanitizing entirely.
No, but your claim that my example was marginal and "highly hypothetical" certainly very strongly implied that.
No, as you assumed malformed data and I didn't.
As long as it's in the database, it's not a "Javascript keylogger", it's a bunch of characters. Only when shown on a page in a browser does it become a Javascript keylogger, and at that point you sanitize it.
I want to be safe and escape everything.
Yeah, only no. As I already pointed out several times the exact same risks exist in your example, just at a different point in time.
And a mistake in the past will lead to someone getting his account stolen. Right.
Then I do hope I never have to work with an application that you wrote, if integrity of data is apparently that unimportant to you.
And now I can say I hope I never have to work with an application you wrote if safety of users is apparently unimportant to you (which isn't true I'm sure the same way as the above statement isn't). But ok, no point in arguing with you, when you start developing for real you'll see how dangerous your approach can be.
You may want to actually look into my past projects before you make claims like that.
By "for real" I meant doing it as your primary occupation, 8+ hours per day 5 days per week and living from it. Sorry, didn't want to insult your work in any way, your VPS Comparision Table looks quite good actually!