Webmasters: Review external code!

August 19th, 2007

This morning I walked my parents through the cleaning of their computer, which had gotten infested with everything you can think of. Eventually the culprit was discovered, a program installed that was touting itself as a greeting card e-mailer.

The hours spent cleaning the computer could have been saved had the file been scanned before being installed, but who wouldn't trust such an innocent sounding program!

People who use computers often of course know not to trust something like that, especially when, in this case, it comes in an email from an unknown address. But I was thinking afterwards how many times I've seen people put things on websites with the same blind trust.

Typically by the time your average webmaster sticks something on their website it's been reviewed by tons of other people, but that isn't something to rely on. People miss things, new bugs are introduced, and some of the programs are intenionally flawed.

I've run into both, both times it was stuff that had been installed by others without looking at the source at all. I remember one, a Sudoku program, had source code similar to this



One glance, and any decent programmer would have known that using this was a terrible idea. On a sidenote, I decided to go through with this, echoing the code. It sent me through a long loop of more of similar things to this, before eventually getting to the actual intent of the original programmer. I found it funny that they went through as many steps as they did, as anyone trying to see the end code wasn't going to be fooled by the extra steps.

Another instance of malicious code wasn't necessarily malicious code, but had the potential to be. I came upon a site that allowed people to use their javascript code on their own site. The javascript code had at the beginning a check to see which url it was being accessed with. If the URL was anything but the original domain it wouldn't work. The code in the file was harmless, but being on another server, it could easily be switched out with more malicious code, that stole cookies, etc.

Accidental security errors


These are the most common, but can be the most damaging. Almost every large program you can get for your site has had a few, and some have had a lot. This category includes phpBB, Wordpress, etc. These programs aren't necessarily programmed worse than something you could create, but because they're much better known, once a bug is found, script kiddies start hitting Google searching for "Wordpress" (or whatever) and find your site.

This happened recently with a site in our company. The error was simple. A folder that people could upload images into. The installed script chmod'ed the directory to 777 and didn't check what was being uploaded. Several people realized this for this software, and started uploading their own PHP files onto sites and wreaking havoc. The site in our company was lucky and had other security things in place so that nothing could be permanently deleted, and no information could be stolen (nothing meaningful that is), but typically sites aren't this lucky.

If you're going to use external code, make sure that at the least you follow these steps

If Server Side Code
  • Go through the code looking for the common input problems, input that isn't stripped (or cleaned) of special characters, quotes, html tags, etc.
  • Go through the code looking for file uploads that aren't carefully checked (note: Don't trust the mime type sent by the browser, this can be faked, use a mime checking function on the server, or use a different way of checking)
  • Look out for Global variables,
  • Look for values that should be kept encrypted that aren't. (User information, passwords, etc)
  • Look for possibly intentionally malicious code. This doesn't happen often, but it's out there. Any external code, any obfuscated code, or any sending of information to something outside your domain and it should be ditched.


If Client Side code
  • Make sure that you can run it on your server, don't point to an external Javascript file.
  • Make sure that the Javascript file doesn't include other files off your server. There are some pretty clever ways to do this now, so make sure you check code that doesn't seem to make much sense.
  • If there's any recording of cookie information make sure that it needs to be there.

If you can't follow these steps, then don't use others code. I know that's harsh, but most of the these programs collect personal information. If I use your forum, comment on your blog, etc, I shouldn't have to worry about the safety of your code.

If you must must must use something without reviewing it, put a warning on your site, so I know to stay away.

I'm fairly certain I've forgotten other things you should look for, if so, let me know in the comments.
Why you should use preprogrammed functions

August 14th, 2007

Several years ago, when I was just learning Javascript, I ran into a situation where I had to convert a string to it's monetary equivalent -

10 = 10.00
or 15.320000 = 15.32

Rather than spend the time searching for a function to do it for me I wrote out a quick little function to solve the problem. A few days later a friend of mine, who at the time was a much better coder than me, saw it, and reacted like I'd just run over his mom.

Friend: (mouth agape) Why are you doing this?
Me: I didn't want to look for a function.
Friend: There has to be something that does what you want.
Me: I'm sure, but, this does it too.
Friend: (look that suggests friend is questioning his friendship) Always use the functions that are already in the language.

He didn't explain why, and I didn't ask him. It wasn't until a few years later that I learned the reason, and it was so obvious, that I couldn't believe it'd taken me that long to realize it.

The language you're using was built using another language, one closer to the machine. This means that for every instruction you're giving the language you're programming in, that language is giving several instructions to the the language it was programmed in, and so on and so forth until the code reaches the machine. Prebuilt functions are one level closer, which can be much much faster.

Besides the speed benefit, there are direct ones as well. I knew beforehand what I'd have to do to write the toFixed method needed above. There was nothing learned in that. Now that wasn't a terrible waste because the code was easy, and small. But say someone gets in the habit of doing it. A friend of mine was recently working on a program that at first glance seemed easy, but as he got deeper and deeper into it, more and more obstacles popped up. He spent a few days on this, with nothing to show for it. He asked me to take a look at it, and I briefly looked online, almost immediately finding a function that did exactly what he wanted, that was part of the language.

By getting in the habit of looking for what you want, you can in the long run save yourself time. Even if something seems like an uncommon operation look for it. You'd be surprised how many times I've found what I've wanted when I was looking for something I thought was going to be unique to my project.

On a related note - say you're working on a project with a bottleneck, and you can't figure out how to make it faster, a lot of languages have a way to write in the language they were built on, although semi hackish. Take PHP for example, if you're willing and able to recompile the core, you can edit it yourself. In C you can write in Assembly, and in XUL you can write in a host of languages (C, Python, and others, though others will require more work). Put some of the code in one of the lower languages, and get your speedboost. Depending on the cause of the bottleneck you can get up to 7 times the speed. (As a warning, don't overdo this. Maintenance will be a pain, and quite often the bottleneck will be related to bad database queries or just user lag time)

Basically, just spend a few minutes checking when faced with something you haven't done before, in the long run it will save you time, but more importantly it can save your users time.
How important is the cleanliness of client side code?

August 9th, 2007

A couple of weeks ago I was working on a Server Side Dom Parser for imperfect markup, and I realized while doing it that I could extend it to display HTML sent to the user with more organized markup. At first, this sounded like a good idea, but after thinking about it for a little longer, I decided against it.

As a programmer, organized code is very important to me. Which is not to say that I'm a style nazi, I don't care how many spaces you indent blocks with (but it better be atleast one), I don't care if a brace comes at the end of a statement, or on the next line, and I don't typically care if you use the right textcase for variables. What I do care about however, is that you're consistent. I can pick up a style pretty quickly from looking at something you've coded, and adapt to it. But if you're switching back and forth, you're going to drive me crazy.

The problem is, when HTML is hand coded, consistency is easy. When HTML is dynamically generated, you can waste a lot of time trying to get it to fit in with the rest of the document. When I see any code, even code I'm not going to work with, if it's organized poorly, it will bug me. But, 99.999 percent of people who see a website won't care at all how the code is organized. They want a site to load fast. The people who will be working with your code, will most likely have the server side copy, and rarely if ever look at the client side generated code. (Not to mention, for those that will, there are extensions for Firefox that can do it for you, such as Firebug)

However, I believe there is one (at least) exception. If you're working on a program that will be used by others to generate code (such as a WYSIWYG editor), you should try and ensure that editing the output of that is as easy as you can make it. I've worked with both good and bad WYSIWYG editors, and so far every single one has had bugs, but the ones that generated a clean, readable output have been the easiest to fix.

To summarize, if you're spending time trying to organize dynamically generated code, (or worse, spending users time in increased load times), to the benefit of only a sliver of the people who will see your site, you've made the process more important than the goal, and you need to reevaluate.

(As a sidenote, this can also apply to comments. Don't make users download your notes if you can help it. When you can, stick them in server side comment tags.)
Using the browser find command with Flash

August 7th, 2007

I've seen a lot of blog posts recently about usability in Flash, most of which are very negative, written by people who seem to have used earlier versions of Flash, and decided that nothing has changed. However, there is one thing that has kept me away from doing projects in Flash with a lot of text.

That one thing, is the browsers find function. I use the find function a lot. Often times I'm only looking for a portion of a page, and so I'll do a search on a key term. With Flash (and others, like OpenLaszlo, XUL, etc) the find won't work. Even when an OpenLaszlo program is converted to DHTML, find won't work on some items, because they're converted to images.

However, I think I've got a solution. I'd like to note first though, that this is currently a concept, and I haven't put in the testing legwork yet, though I plan to.

First, when the Flash file is generated, and the non flash version is created for those without Flash, add some javascript, that locates every character, word, etc, and stick it into an object.

Now, add a listener, that checks at a frequent interval for highlighted text.



'getPageSelection' being the id of the Object holding the Flash file.

You'd then create a function to send this information to Flash (or have something set up to have Flash pull it in) In Flash you'd then create an object holding the text in the file (which, I believe could be done dynamically for most of the text in a file)

This object would probably need to be set up so that each letter would exist at a different spot in a multidimensional array, in this example, this would hold the information to find - "f-i-n-d" ->



Probably not something you'd like to do by hand, but, it seems to me possible to generate dynamically, and in the realm of possibility for Flash. When I get time to give it a shot, I'll flesh this out some more.
XUL is not HTML, don't treat it like it is

August 5th, 2007

This morning, I read a post on the XUL User Group, from someone confused as to why some of CSS properties he was familiar with weren't working with XUL.

This is something I've seen pretty often with the RIA XML based languages, (Flex, XAML, XUL). People have gotten so used to HTML, that they've decided it's the standard to judge other languages against.

The truth is, even with CSS, HTML is terrible. I know, it has a bright past, and it will have a bright future, but compared to what can be done with other markup languages, HTML isn't even in the same ballpark.

I'm not saying that HTML is at fault. If anything, HTML has done pretty well with the hand it's been dealt. Most languages probably would have thrown themselves in front of a train if they had to deal with what HTML has dealt with.

  • Originally created to display just text
  • Used by millions, has to accept a very wide range of user inputs, and attempt to fix errors, caused by past specifications, or user mistakes
  • Due to the errors, different companies are able on some items to set their own rules, seperate from the others
  • Has to be lightweight enough to be sent quickly over the internet, to all sorts of connections, but readable enough for anyone to learn


RIA languages don't have these problems (except for in certain instances the last one). Take for example Flex.

  • Built to display new media. Video, Vectors, Audo. Someday it will likely not fit what people want, but it's probably a long ways off
  • XML has a strict set of rules that are pretty well known by anyone who will be creating a project in one of these languages, they no longer have to account for user error, or worry about code written 10 years ago
  • Only one company is developing it, there won't be multiple implementations, except for versioning, and because developers don't need to upgrade, new versions don't require as much backwards compatibility.
  • Virtual Machines can contain a lot of the items for you, and with Apollo users don't need to repeatedly download the images or code


HTML and CSS are very useful for the web, and will probably have a place for a long time. But, when it comes to rich applications, they can no longer compare. I understand the desire to have something work the way you're familiar with, but trust me, the future is bright.

Categories

© Matthew Minix 2008