Lemon Blog
Wednesday, 25 November 2009
Wednesday, 18 November 2009
Microsoft gets involved with HTML 5
As the web has evolved from a collection of “pages” to a collection of “applications”, RIA technologies like Flash have grown in prominence because the functionality and user experiences required to create increasingly sophisticated internet applications surpass what can be done with basic HTML. HTML 5 is being designed to change that and is expected to provide new capabilities, including:
- The native display of audio and video content through a standard interface.
- A “canvas” that supports 2D drawing on a web page.
- Drag-and-drop support.
- Support for running scripts in the background.
- Local data storage permitting applications to “work offline”.
- New form controls for common elements such as dates, times, emails and URLs.
Of course, to accomplish this, all of the browser makers will need to play along. Earlier this month, Microsoft signaled that it’s taking internet standards more seriously as a posting it made to the W3C mailing list indicated that the Internet Explorer team is reviewing the HTML 5 specification and would “share...feedback and discuss this in the working group”.
Whether Microsoft’s participation in the HTML 5 working group truly evidences a willingness to work for standards remains to be seen. Indeed, Microsoft’s posting noted that “At this stage we have more questions than answers”.
Right now, HTML 5 is years away and therein lies the problem. By the time the HTML 5 specification has been finalized, it’s almost certain that the market will have evolved even further.
Already, proprietary technologies are entrenched. Companies have made significant investments in RIAs like Flash and Silverlight and in the case of Flash in particular, penetration is so high as to make the technology ubiquitous.
Because of this, the question for consumers, developers and technology companies is whether the HTML 5 specification really matters. While its virtues are very appealing in theory, the slow speed at which the HTML 5 spec is being hammered out demonstrates that building a specification and doing it with broad-based consensus is a time-consuming process that really can’t keep up with the commercial needs of the web.
While we can hope for the best with HTML 5, the reality is that business will go on as usual and proprietary technologies will continue to be developed and adopted because the individuals and companies that use the internet can’t wait around.
Real-time on the web: addressing performance, scalability and availability - 1 of 4
Today's focus on real-time services is a reflection of the evolution of interactivity on the internet but applications that are heavy on interaction present unique challenges for developers when it comes to performance, scalability and availability. While simple applications that primarily pull content from a database and display it to users are can be made highly-efficient using techniques such as caching, interactive applications that are designed to be used in real-time can be much more difficult to maintain and scale.
Twitter, arguably the purest example of a popular 'real-time' internet service, is the perfect example of that. It has been plagued by performance and downtime issues for some time now and it's not hard to see why: at any given moment, there are thousands upon thousands of Twitter users posting and pulling content, with constant polling to the web servers for updates from Twitter clients, the website and through the API. This means lots of database reads and writes and a mountain of HTTP traffic. It's a developer's worst nightmare: a steady flow of resource-intensive database writes coupled with an almost never-ending flurry of database reads.
When it comes to dealing with performance, scalability and availability for real-time web applications developers now need to think about the following key issues (amongst others no doubt, but these are the ones I am focusing on) when designing applications:
- HTTP polling and providing a real-time experience for users will increase the number of requests to to the server unless a connection can remain open. Traditional web servers don't provide a solution for this and opening socket connections are not really a viable solution as firewalls will typically block this from within corporate networks.
- Database read and write performance and avoiding the locks that will ensue as a result of the high volume of writes to tables. Typical RDBMS databases are simply not ideal storage solutions when high volumes of read and write requests are required.
- CPU or IO intensive operations that need to be queued and processed separately to ensure the web servers remain responsive to "normal" requests.
- Autoscaling to support unexpected loads in a cost effective manner.
HTTP Push - Polling, Streaming and Sockets
The issues we have with why web servers struggle with real-time lie primarily with the antiquated HTTP protocol which is unfortunately a legacy we're going to be stuck with for some time. Google and others are fortunately looking at solutions. Google has recently published a proposal for a new protocol called SPDY, which seems to address some of the biggest faults of HTTP's suitability to today's web applications. With the current HTTP protocol, connections are not persistent (meaning every interaction with the server requires a new request, new headers, new response headers, authentication etc.) and most of the communication is typically uncompressed. SPDY sets about addressing these two key issues, with the persistent connection being the one relevant to HTTP polling.
Currently, when a user visits a web page which provides a real-time experience, what is typically happening is that the web browser is in fact polling the web server every second or so to say "is there an update?". These requests are pretty lightweight, but immediately present a massive problem when thousands of visitors use the website at the same time. Assuming you poll the server every 2 seconds, and you have 1,000 visitors on your site at any one time, the server(s) would receive approximately 500 requests per second asking "is there any update?"
Whilst 500 is a digestible amount of requests, increasing the number of visitors to 50,000 during a peak period meaning 25,000 requests per second would quickly bring down any small web farm. The issue again lies with the fact that HTTP does not allow data to be pushed back to the browser, so the browser has no option but to keep polling and overloading the server with unnecessary requests.
A number of approaches have been taken to solve this problem, with Google's recent suggestion being the most sensible way of fixing this without "hacking" the HTTP protocol, however SPDY protocol is a long way away and is not something we can rely on. The common ways to work around the HTTP issues are as follows:
- Long Polling allows the browser to open a connection to a web server and keep the connection open for an extended period of time waiting for data to be sent to the browser. As as data is sent, a new Long Polling request is opened to the server waiting for the next event to be sent from the server.
Tornado is a web server built specifically to provide this type of long polling functionality and was built by FriendFeed which is now released as an open source server. For those of you using Nginx, you can configure Nginx as a Comet server using this beta plugin. The plugin allows your standard web application to pull and push content, and let the plugin do all the hard work distributing data to the clients. - Streaming allows the browser to open the connection to the web server and keep it open for as long as the user is on the website. This solution has numerous problems around browser support and the inability to detect the state of the connection. Whilst this is an option using the iFrame or XMLHttpRequest method, we do not recommend this approach.
- Socket Connections are achieved through the use of a plugin such as the common Adobe Flash. Flash has complete support for raw socket connections providing a facility for your application to open a bi-directional asynchronous connection to a server, however this is not done over HTTP. As a result of this being a raw socket connection, users behind strict corporate firewalls will often not be able to connect using these socket connections which means a socket connection is probably only viable for consumers using the application at home. Server solutions, commonly used by Flash based game developers include ElectroServer and SmartFoxServer (which is based on Red5 which is the open source alternative to Adobe Flash Media Server)
Concurrency
Once you've figured out your solution to support push from the server to the browser, your next challenge may very well be how to allow your website visitors to experience a truly real-time experience and interact with other visitors. A great example of this is Google Spreadsheets which allows you to work with your Google Doc in real-time, updating the document and receiving updates in real-time (e.g. if you update a formula, you see other cells update once Google pushes the changes back to you), and also to chat to other users editing your document at the same time. Providing an application which allows this type of message queuing and dispatching between users can be extremely difficult and is typically addressed with languages that are more adept at handling concurrency.Erlang, a language developed by Ericsson way back when to help them with virtually unlimited scale and conurrency, is designed from the ground up to deal with concurrency. There are no shared variables which ensures there are no locking issues which is the typical hell that developers need to deal with when trying to write applications that handle concurrency gracefully. Erlang avoids locking issues by supporting the idea of messages so that each function or method simply passes messages to other functions or methods. Using this message passing and queuing system as an integral part of the language, applications can easily scale by dding more servers capable of receiving and dispatching messages, and issues around concurrency will never materialise. However, saying all this, Erlang is not ideal as a web server and typically Erlang based solutions use a proxy server to handle HTTP requests and push HTTP requests back to the browser. See Alexey's post on how he is trying to build a system which can cope with a million long poll requests using Erlang and Nginx. You'll also need to learn a language and syntax so have fun ;)
Facebook use Erlang to drive their chat along with a Comet solution, if it works for them I'm sure it will work for you.
Scala is another functional language which is also more suitable for concurrent programming, and has been employed by Twitter to help them scale their concurrency issues.
Summary
Whilst there are numerous ways to address the inadequacies of the HTTP protocol and the inevitable and complex concurrency issues with most programming languages, building true real-time solutions is not easy and it's not likely to be easy for some time to come. New methods to address all of these issues are constantly being discovered and suggested, and even Google who are clearly fed up with the restrictions imposed upon them by the protocol are trying to find an alternative solution. If anyone is driven to make it work they are fortunately with their entire business relying on an increasingly usable web experience.Further reading
- JQuery polling plugin
- ReverseHTTP solution which provides a "push-like" solution for web servers
- Web Socket API proposal as part of HTML5 which will solve all our problems (well not quiet)
- Comet programming - a good description of what it is and how it provides real-time experiences for users
Tuesday, 8 September 2009
Building Twitter Apps with PHP and the Twitter API
Twitter's popularity can be attributed to a number of factors. One of those factors is the Twitter's open API. From desktop clients to web-based management tools for brands, Twitter's open API gives every developer the ability to develop cool and useful applications that enhance the Twitter user experience and extend Twitter's utility.
One of the nicest things about Twitter's API is that, like Twitter itself, it's pretty darn simple. In this post, we'll discuss how you can get started developing for Twitter using PHP and a handy PHP class.
Requirements
Before you begin developing, you'll need a few things:
- A Twitter account.
- A development server.
- Knowledge of a programming language.
Given Twitter's popularity, it's not surprising that there are many client libraries available that eliminate the need for developers to reinvent the wheel. From ActionScript to Ruby, The Twitter API Wiki lists libraries for a variety of programming languages.
For the purposes of this post, we'll be working with PHP and the easy-to-use PHP Twitter class. To use this class, your server will need PHP 5.2 and the lib_curl PHP module installed.
Getting Started
Once you've downloaded the PHP Twitter class, upload it to a directory your server. Since I like to place classes in their own directory, we'll assume that the PHP Twitter PHP file has been uploaded to a classes folder in your HTTP root.When creating a PHP script that interacts with Twitter, the first thing you'll need to do is create an instance of the PHP Twitter class. Here's the code for that:
Now that you're officially instantiated, the Twitter API is your oyster. Let's look at some of the things you'll probably want to do.
require_once('classes/class.twitter.php');
$t = new Twitter;
$t->username = 'twitterusername';
$t->password = 'twitterpassword';
Retrieving Tweets
The following code will retrieve and echo your own tweets:Sample output:
$tweets = $t->userTimeline();
foreach($data as $tweet) {
echo $tweet->text . "<br />";
}
Working with the Twitter API!
This is amazing!
Not all that interested in yourself? Find out what your friends are up to by retrieving your friends' tweets:
Sample output:
$tweets = $t->friendsTimeline();
foreach($data as $tweet) {
echo $tweet->user->screen_name . " tweeted: " . $tweet->text . "<br />";
}
someuser tweeted: Going to the beach!
anotheruser tweeted: Working late (again).
Each of the methods used above (userTimeline and friendsTimeline) allows you to pass in a number of parameters. Be sure to check out the Twitter API documentation for information on these.
Sending Tweets
Listening is important on Twitter but tweeting is probably more fun. Using the Twitter API, it's easy to send tweets:
$tweet = $t->update("If a tre falls in the forest and...");
The above code will post a tweet stating "If a tre falls in the forest and..." to the timeline of the authenticated user.
Of course, there's a typo in the above tweet. Fortunately, there's a method for deleting tweets. So let's go ahead and delete our tweet before anybody sees our typo. First, we have to know the ID of the tweet we just sent. The update method returns the ID of every new tweet that has been posted so we can retrieve the ID of our typo tweet with the following:
$tweet_id = $tweet->id;
Now that we have the ID of the tweet to delete, we can get rid of it using the destroy method, which is called as follows with Twitter PHP:
$tweet_delete = $tweet->deleteStatus($tweet_id);
Putting It All Together
To put it all together, let's write some simple code that will take the most recent tweet from the authenticated user's friends' timeline and retweet it.
$tweet = $t->friendsTimeline('', '', '', '1');
$retweet_string = "RT @" . $tweet->user->screen_name . " " . $tweet->text;
$retweet = $t->update($retweet_string);
Obviously, it's always good to do some error checking. Here, for instance, you may want to know beforehand whether your tweet is going to exceed 140 characters and deal with it if it is.
Taking It to the Next Level
As you can see from these simple examples, it's quite easy to retrieve and send tweets with Twitter's API using the PHP Twitter class. Tasks such sending direct messages, retrieving information about users, pulling followers and accessing Twitter's search functionality are equally easy to perform using other API methods Twitter provides. These generally work in the same fashion as those methods discussed here.
For a complete list of the API methods Twitter offers and more detailed information about the API, be sure to check out the Twitter API Wiki. You'll find that the most important methods are accessible through the PHP Twitter class.
By combining these straightforward methods in creative ways, building fun and useful Twitter applications can be done in very short order.
Labels: api, php, technical, twitter, twitterapi
Monday, 6 July 2009
History of Graphic Design
Sometimes Twitter could be really inspirational. I've just read a tweet about this fantastic website - a visual history of Graphic Design. So I thought to share it on this blog. Check it out: http://www.designhistory.org/
It is missing a few things but I think the idea behind it is just great!
Wednesday, 24 June 2009
Please support fixoutlook.org
Microsoft has indicated they would listen to feedback. So please help the cause by joining the rally and tweeting your mis-givings (if any).
Remember to link to fixoutlook.org in your tweet.
Wednesday, 17 June 2009
Zend Framework and AMF
Serving AMF data from ZF is plain easy, it's also very simple to integrate into your existing website. As ZF is a component based framework you could simply pick out the Zend_Amf and related classes and use them. But obviously the integration is better if you have a full ZF stack.
For renault.tv we had setup an AMF API. Since we wanted to serve both HTML and AMF from the same backend we opted to setup the AMF server as a controller under ApiController.php:
class ApiController extends Zend_Controller_Action
{
public function preDispatch()
{
$this->_helper->layout()->disableLayout();
$this->_helper->viewRenderer->setNoRender(true);
}
public function indexAction()
{
// instantiate server
$server = new Zend_Amf_Server();
// set production mode to true to suppress debug messages
$server->setProduction(false);
// handle request
$response = $server->handle();
echo($response);
}
}
We started with Matthew Weier O'Phinney's pastebin application which we found to be an excellent starting point for the structure of our boostrap and initialization code.
During development we quickly found the AMF response times to be slow, especially as the data size returned grew larger. The following .htaccess conditions helped improve performance drastically:
# Set some default PHP values
php_flag zlib.output_compression 1
php_value zlib.output_compression_level 2
# Gzip CSS, JS and AMF
AddOutputFilterByType DEFLATE text/css application/x-javascript application/x-amf
Here are a few best practices we picked up during development:
- All API functions receive an object pass back an object - this proved to be a very extensible approach and seemed to work well with Flash.
- Unit test all code. We setup unit tests for both PHP and Flash. As you may soon realise debugging AMF is very hazardous. To help easy the pain we use jQuery's qUnit test suite to mirror AMF calls issued from Flash. More on this in a future post.
- Use a HTTP Proxy to inspect your AMF output. Charles is highly recommended if you're on a Mac.
- Make sure no trailing spaces are left in your code output - editors tend to do this a lot. A good way to avoid this is to not close your PHP tags.
- We found associative arrays very problematic as the keys get lost in translation. So try to avoid them. Passing back objects is a good way to avoid this issue.
Go crazy!
Labels: amf, php, technical, zendframework
