Amazon S3 as a Content Delivery Network

It seems that Amazon S3 is perfectly situated to act as a content delivery network, competing with the likes of Akami as long as you don't mind getting your hands dirty.

S3 supports your own domain names when setup as a CNAME with a bucket of the same name, but requires you to manually synchronize the files that it hosts.

On the cost side, other providers such as SimpleCDN boast one-off fees per file hosted with no additional bandwidth or storage charges and while Amazon S3 comes a close second with it's low fees it still doesn't support some features we need.

Automatic Sync

Yes, a big feature that S3 doesn't have - automatic syncing of image files from your website to S3, at least not in a fully transparent web-hosting friendly way.

Tools exist such as S3Sync to find modified files and upload them to S3, however these usually require more access than the majority of webhosts provide and aren't available on Windows or <your J2EE container of choice>.

SimpleCDN provide 'AutoCDN', which by prefixing your images with http://dl<your-account-id>.simplecdn.com/autocdn/<your-autocdn-folder>/ their servers will cache your images and serve them via their distributed CDN without any intervention required.

To add insult to injury, Amazon S3 doesn't support FTP upload, although this capability could be added relatively easily with third party software (and it's a project I'm considering) it seems that Amazon are by far more targeted at developers, narrowing the selection of people who might want to adopt it.

Pricing

Taken directly from SimpleCDN's pricing page for a single 85 KiB image serving 50,000 hits a day for a year.

CDNBandwidthStorageUpload BWService FeesTotalCompared
SimpleCDN$0.00$0.00$0.00$0.10$0.10-
Amazon S3$192.32$0.00$0.00$0.00$192.32+ $192.22
Nirvanix$266.29$0.00$0.00$0.00$266.29+ $266.19
Limelight$547.37$0.00$0.00$0.00$547.37+ $547.27
Akamai$1,449.80$0.00$0.00$0.00$1,449.80+ $1,449.70
CacheFly$269.85$0.49$0.00$3,588.00$3,958.33+ $3,958.23

Although this comes from the providers own data and is presumed to be at least a little bias, their pricing structure for small or even large images really blows the competition away (Amazon have since reduced the cost outbound bandwidth to $0.10 per GiB, reducing the total to $148).

Disadvantages

Personally I wouldn't consider Amazon ready for mainstream content distribution, especially if you have a large customer base in Australia or generally Asia.

  • It's not transparent to developers
  • No Asia or Australia coverage
  • No video encoding services

With the exception of additional data-centers which are largely in Amazon's hands you can implement the others, but it requires developer time and additional costs (your own servers or Amazon EC2) for video conversion and/or hosting.

Skip to Page:  1 2 3 4 … 8

Gcc and libstdc++ compatibility

I've recently been dealing with a C++ library to interface with our stock market data provider for real-time information. The wire-protocol is open and documented, but this library is closed source and unmaintained with the only alternative being to read the hard to understand protocol docs and to experiment a lot.

We needed to be able to run and test the application locally which resulted in my colleague to build Gcc 3.x from scratch to be able to compile our app and link against the library.

Obviously the real solution here is to provide the source code for the library, or to provide a C library (with an inherently more stable ABI), but that's not going to happen any time soon probably for political reasons.

Once you have the right version of gcc or a backdated version of libstdc++ to link against there are two roads you can go down. One is to link both versions of libstdc++, version 5 for the library and version 6 for your application. The other is to compile your application against the same version of libstdc++ as the library.

Linking both versions

Not only am I using a different version of libstdc++ by default with Gcc 4.1, but 4.1 uses a separate ABI version introducing even more incompatibilities. The symptoms for this are that any symbols defined in the old library you're linking against will come up as undefined.

The way to solve this is to force Gcc to output symbols with the old ABI naming style, this can be done with:

-fabi-version=1

And now C++ symbols in libstdc++ v5 which are referenced in the old library will not be found, so you have to manually link libstdc++.so.5 in, usually passing it directly in while linking:

$(LINK) -o $@ $(LDFLAGS) /usr/lib/libstdc++.so.5 $+

This introduces yet another problem, objects from either versions of libstdc++ being leaked into each other which will result in your program crashing and burning. Luckily in our case the library was self contained and only took parameters in regular C types or references to objects defined in it's only library. I didn't manage to do much testing on this though, so it may have still crashed at some point and I recommend using the alternative approach.

Linking correctly

I'm sure there's a way to make Gcc 4.1 link against the standard library provided by Gcc 3.2, but I haven't been able to work it out so far. So I installed the backported Gcc 3.2 package provided by Ubuntu - which is probably cheating a bit and wouldn't be suitable if you had code which depended on Gcc 4.x features.

So that solves the linking issues, but to go one step further and making this executable run on more systems without having to resolve stupid dependency problems I decided to statically link the libstdc++ library, meaning all unresolved external symbols remaining are in libc and libm etc. using the C ABI.

Johan Petersson provides a good reference on how this can be done, which essentially just specifies -static-libgcc and links libstdc++.a directly. More information can be found at his article on linking libstdc++ staticly.

After linking, the dependencies look like:

        linux-gate.so.1 =>  (0xffffe000)
        libmysqlclient_r.so.15 => /usr/lib/libmysqlclient_r.so.15 (0xb7dbc000)
        libz.so.1 => /usr/lib/libz.so.1 (0xb7da7000)
        libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0xb7d81000)
        libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb7c37000)
        libpthread.so.0 => /lib/tls/i686/cmov/libpthread.so.0 (0xb7c1f000)
        libcrypt.so.1 => /lib/tls/i686/cmov/libcrypt.so.1 (0xb7bf1000)
        libnsl.so.1 => /lib/tls/i686/cmov/libnsl.so.1 (0xb7bd9000)
        /lib/ld-linux.so.2 (0xb7fbb000)

Which should be compatible with all current, previous & future versions of LSB where MySQL is provided.

Skip to Page:  1 2 3 4 … 8

Abusing Type Hinting in PHP

PHP has the ability to type hint function arguments to only accept that type or derivatives of that class or interface, this can lead to better interface design and much less runtime checking (as calling a type hinted function implicitly requires the parameter be of the correct type).

Can we go one step further though? Hacking on strict typing at the function level while allowing the benifits of weak typing within?

We do this by defining classes for all of the basic types: String, Number, Resource, Boolean as classes so they can be type hinted. For example:

class String {
  public $value;
  public function __construct( $value ) {
    $this->value = $value;
  }
}

And then repeat for the other basic types. What we end up is functions which can look like:

function myFunc( String $var1, Boolean $var2 ) {
  $var1 = $var1->value;
  $var2 = $var2->value;
  // And so on
}

The memory overheads of this approach might not be justifyable, considering that classes take up much more memory than just the plain strings or booleans alone.

For strings as an example, this paves the way for String bass classes which provide a consistent interface for string operations with uniform names ala Java or C#.. or any other fully object orientated language (even Javascript). Looking at how naming fluctuates for string functions in PHP's core, this might be a distinct advantage.

See the differences:

  • str_ireplace
  • nl2br
  • htmlspecialchars
  • chunk_split
  • quoted_printable_decode
  • stripcslashes
  • strip_tags

And so on...

Advantages

Using strict typing would allow for better error handling (the function cannot be called unless you give it the correct variable type) rather than failing horribly in the middle of a function because a resource was given instead of a string or number.

Disadvantages

You have to wrap your values with these special classes before you can call the function - this requires code modification and will not work with existing code.

$var1 = new String("Hello World");
$var2 = new Boolean(FALSE);
myFunc($var1, $var2);

Operator overloads will also hinder it's usefulness as the only type casting operator which can be used is `__toString`, so the following code will not work (which obviously is a major hinderance).

$var1 = new Number(10);
$var2 = new Number(3);
$var3 = $var1 / $var2; // Error!

Alternatives

I can pretty much presume that this is not going to be the best solution and will probably introduce more problems than it solves. One possible alternative is to use input contracts via asserts (which should be done any) to ensure that data types are valid before any processing is done.

The previous function `myFunc` would translate into:

function myFunc( $var1, $var2 ) {
  assert( is_string($var1) );
  assert( is_bool($var2) );
  // Function logic here
}

It's not as declarative as the method described above, but it leads to better checks & hopefully better code quality in the long run.

Skip to Page:  1 2 3 4 … 8

Manipulating PHP arrays with SQL

With toolkits like LINQ for .NET and the subsequant PHPLinq (thanks Nick), we're always looking for more power to manipulate data in-memory rather than writing one-off algorithms to do whatever commonly used sorting, ordering & manipulation you need.

Trent Richardson created a very small and simple JsonSQL library for JavaScript which allows you to run an extremely limited subset of SQL against a Json array/object.

I quickly ported it over to PHP 5 and it works like a charm, although the syntax for the WHERE clause isn't exactly the same but the rest ported across properly.

Be warned, this is in no way performance concious as eval() gets called in a loop to do the filtering based on your where conditions.

Download PhpSQL source code

Using PhpSQL

$testdata = array(
	array('username' => 'hello123', 'id' => 1),
	
	array('username' => 'harry', 'id' => 5),
	array('username' => 'test', 'id' => 6),
	array('username' => 'blah', 'id' => 7),
	array('username' => 'whatever', 'id' => 8),
	
	array('username' => 'hello123', 'id' => 20),
);


// Return all entries in reverse order: 20, 8, 7, 6
print_r( PhpSql::query( 'SELECT * FROM data WHERE ($id > 5) ORDER BY id DESC LIMIT 5', $testdata ) );

// Return only array('blah')
print_r( PhpSql::query( 'SELECT username FROM data WHERE ( $id == 7 )', $testdata) );

The 'FROM' clause has no effect and isn't interpreted at the moment, however in future something interesting could probably be done with it.

Source Code

/*
 * JsonSQL
 * By: Trent Richardson [http://trentrichardson.com]
 * Version 0.1
 * Last Modified: 1/1/2008
 * 
 * Copyright 2008 Trent Richardson
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/**
 * PHP port of JsonSQL
 * 
 * @author Harry Roberts
 */
class PhpSql
{
	protected $fields;
	protected $from;
	protected $where = 'true';
	protected $orderby = array();
	protected $order = 'asc';
	protected $limit = array();
	protected $result = array();
	public $data = array();
	
	public static function query( $sql, array $data )
	{
		$sql = new self( $sql, $data );
		return $sql->result;
	}
	
	public function __construct( $sql, array $data )
	{
		assert( ! empty($sql) );
		
		$rs = preg_match('/^(select)\s+([a-z0-9_\,\.\s\*]+)\s+from\s+([a-z0-9_\.]+)(?: where\s+\((.+)\))?\s*(?:order\sby\s+([a-z0-9_\,]+))?\s*(asc|desc|ascnum|descnum)?\s*(?:limit\s+([0-9_\,]+))?/i', $sql, $returnfields);
		if( $rs == FALSE )
		{
			throw new Exception( "Unable to match SQL statement" );
		}
		
	 	$this->fields = explode(',',str_replace(' ','',$returnfields[2]));
		$this->from = str_replace(' ', '', $returnfields[3]);		
		$this->where = ( ! isset($returnfields[4]) ) ? "true" : $returnfields[4];
		$this->orderby = ( ! isset($returnfields[5]) ) ? array() : explode(',',str_replace(' ', '', $returnfields[5]));
		$this->order = ( ! isset($returnfields[6]) ) ? 'asc' : $returnfields[6];
		$this->limit = ( ! isset($returnfields[7]) ) ? array() : explode(',',str_replace(' ', '', $returnfields[7]));
		
		$this->result = array();
		$this->data = $data; 
		
		$this->returnFilter( );
		$this->returnOrderBy( );
		$this->returnLimit( );
	}
	
	protected function returnFilter( )
	{
		if( empty($this->where) )
		{
			$this->where = 'true';
		}
				
		foreach( $this->data AS $__ROWKEY => $__ROWDATA )
		{
			extract( $__ROWDATA, EXTR_OVERWRITE );
			
			// Ewww - horible port directly from JavaScript!
			eval( '$__ROWSTATUS = (' . $this->where . ');' );
			if( $__ROWSTATUS )
			{
				$this->result[] = $this->returnFields($__ROWDATA);
			}
		}
	}
	
	protected function returnFields( array $scope )
	{
		if( ! count($this->fields) OR $this->fields[0] == '*' )
		{
			return $scope;
		}
		
		$returnobj = array();
		
		foreach( $this->fields AS $field_name )
		{
			$returnobj[$field_name] = $scope[$field_name];
		}
		
		return $returnobj;
	}
	
	protected function sortCallback( $a, $b )
	{
		switch( strtolower($this->order) )
		{
		case 'desc':
			return $a[$this->orderby[0]] < $b[$this->orderby[0]] ? 1 : -1;
			
		case 'descnum':
			return $a[$this->orderby[0]] - $b[$this->orderby[0]];
			
		case 'ascnum':
			return $b[$this->orderby[0]] - $a[$this->orderby[0]];
				
		case 'asc':
		default:
			return $a[$this->orderby[0]] > $b[$this->orderby[0]] ? 1 : -1;
		}
	}
	
	protected function returnOrderBy( )
	{
		usort( $this->result, array(&$this,'sortCallback') );
	}
	
	protected function returnLimit( )
	{
		switch( count($this->limit) )
		{
		case 1:
			$this->result = array_slice($this->result, 0, $this->limit[0], TRUE); 
			break;
			
		case 2:
			$this->result = array_slice($this->result, $this->limit[0] - 1, $this->limit[1], TRUE); 
			break;
		}
	}
}
Skip to Page:  1 2 3 4 … 8

About

Harry is a professional developer and sysadmin from London, UK.

He's an atheist, employed at PixelMags LLC, a socialist and has a pragmatic outlook on life, love and religion.

Bookmarks

I'm constantly finding interesting stuff, here are some of the things I've bookmarked recently:

HarryR on Faves.com