Brad Rees

Archive for July, 2011

Rules Engine for HTTP Requests

by on Jul.10, 2011, under Security, Web Development

Inspired by the power of Fiddler, I wanted to create a user friendly interface and rules engine to monitor and filter requests made by my computer when browsing the web. Fiddler itself is a great tool, however it is definitely aimed at technical people. Luckily Eric Lawrence, author of Fiddler, offers the core engine as a separate library, ready for other developers to customise and extend as they see fit.

Last year I began work on a Windows based application for monitoring the requests that are made by a computer over HTTP and then apply a rules engine to modify those requests before being send to the server. It is also able to modify the responses before being passed on to the web browser. Based on FiddlerCore and WPF, the application sits in the system tray displaying information on the recent HTTP requests and any rules that have been applied. Rules can be enabled and disabled via the user interface and customised by the use of a simple XML file.

The application sets itself as the system proxy, so all browsers that are set to use the system proxy will start issuing requests via the application. A word of warning, if the process is forcefully closed it will not have an opportunity to clean up, this may result in all web connections being blocked, as the system proxy will not be set to an address that is listening for requests. A quick fix is to restart the application and close it cleanly via the menu.

Screenshots

Log of rules screenshot

Log of matched rules and URL that triggered the rule

 

List of requests screenshot

A list of the requests that have been issued by the computer

 

List of rules screenshot

A list of the rules and that status of each rule

Provided rules

  • Block request – Each request can be blocked from being sent to the server.
  • AdBlock Plus implementation – A cut down version of the AdBlock Plus for blocking advertising and tracking content. See below for more information.
  • Https Everywhere implementation - Send requests over HTTPS instead of HTTP for popular sites.
  • Python script – Run custom code for each request. This is provided to give extensible functionality.
  • Modify header – Modify a header before it is sent to the server or returned to the browser, including removing it.
  • Modify cookie – Similar to the modify header rule, with support for individual cookie values.
  • Break action – Prevent any other rules from running for the request or response.
  • Save file – Save request content to the disk if matching a pattern.
  • Age filter – Implementation of my proposed header for restricting content that is inappropriate for minors.

AdBlock Plus rule

Included is a version of AdBlock Plus that can potentially give support to Internet Explorer, as well as other applications that issue HTTP requests via the system proxy. Since it does not run inside a browser only rules that are based on URL patterns are supported, all requests that work by hiding HTML elements will not run.

I’m not advocating the use of an ad blocker, as most websites are funded by advertising revenue, resulting in much of what we read on the internet being ‘free’. This was built mainly as a technical exercise for me to see any false positives that are preventing my pages from rendering properly when running the real AdBlock extension. It is extremely useful in that role, so I thought other web developers would benefit too.

I also built this as a way to write my own basic regular expression implementation, and as such it may not perform as well as a version based on the optimised Regex classes within the .NET framework. I was curious to see how hard it would be to write a simple regular expression parser and matcher, and the opportunity presented it quite nicely in the form of the AdBlock Plus rules engine.

The Firefox plugin, on which this is based, converts each rule to a regular expression, then runs it using the optimised engine within Firefox. I wanted to circumvent this step and see if I could directly parse and interpret each rule, as this would give me an insight into how a regular expression engine works. While I am very pleased with the results, it is still not up to the performance of the standard regular expression engine, and all the optimisations that have been added over the years. I may work on optimising my engine in the future, but only if time permits.

Performance

To put it bluntly, this will not speed up your system, quite the opposite in fact. On my Core i5 system the overhead is not great, however my Core 2 Duo laptop has a noticeable delay before requests are sent. Generally, the large sets of URL patterns by the AdBlock Plus rule are the culprit, so if performance is an issue try disabling that first. Additionally the Https Everywhere rule will cause a significant slowdown on sites that are forced to use HTTPS, due to the additional overhead of using a secure connection.

Download

I have just added the application to GitHub as an open source project. You can download the installer here.

Leave a Comment :, , more...

Baby name recommendations using the names of 170 million facebook users

by on Jul.08, 2011, under Web Development

I had a little spare time a few weekends ago, and had an idea to write a recommendation engine for finding names that are related to each other. I got off to a great start with a fantastic data source, the list of 170 million facebook users provided by a security researcher.

Starting with this, I wrote an engine that finds related names, using common surnames as a base. Each name is calculated to find all the related names, then ranks them using 4 separate methods. The main problem that I faced what that the most popular names are the most related names for everyone – most people seem to have a Michael in their family. Each ranking method tries to find the names that are more common in the related name set compared to the global list of names, which are hence the most related.

I found that the results were initially a little bit inconsistent, for example the name Brad would be related to Brett, however the inverse wasn’t true. So counter this, I added a reverse lookup table and factored that into the ranking alorithm, which helped clean up the data and remove the odd combinations. This ensures that both names need to be related to each other, and removes names that are only related in one direction.

The results were better than I had hoped for, with over 25,000 of the most common names being computed and stored. I can see there are clear trends amongst regions, religions, similar sounding names, and even real world items, such as colours, fruits and emotions. I hope someone finds this useful, especially for trying to find odd or obscure baby names.

http://www.namedrop.it

Leave a Comment more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Blogroll

A few highly recommended websites...

Archives

All entries, chronologically...