PostgreSQL - A Thank you from Nautoguide

by Dave Barter on

Here’s a rhetorical question for you, “*do you use open source software?”. I don’t even need to wait for your answer because it is 100% going to be “yes”. Open source software is a simple concept, a bunch of programmers write some stuff and allow anyone to use it for free. Moreso they allow them to look at the code and modify it as they see fit.

I could write a million pages on the history, logistics and benefits of open source but stay with me as I’m not going to do that. I had it all spelled out to me by Richard Stallman many years ago during a fascinating lecture he gave in Sheffield, made even more entertaining by him eating his lunch mid-speech whilst fully microphoned up.

But I am going to write a few pages on one piece of open source software that has significantly changed the way I work. Not only has it done that, it’s allowed me to build a business upon it and consequently deliver services to customers a damn sight cheaper than I was able to do twenty years ago.

That bit of software is called PostgreSQL. It’s a database and is entirely open sourced. I don’t have to pay anyone to use it, no matter what the application, scope or the number of users. PostgreSQL will never send me a bill if I use it on multi-core processors and never require a fee for the latest version, upgrade or patch.

But even if they did I’d probably pay it. The reason being that PostgreSQL is stuffed full of brilliant stuff that we regularly use in Nautoguide. Indulge me for a few pages as I tell you all about them.

PostgreSQL Supports JSON

Just in case you didn’t know. JSON is brilliant. It’s a simple way of holding data that is very easy to read and extend. Here’s a little snippet of JSON for the uneducated:-

{
   “name” :    {“firstname” : “Dave”, “surname” : “Barter”},
   “address”: {“street”: “19 Foobaa Avenue”, “town” : “Devnull”, “postcode” : “RS1 1ED”}	
}

As you can see nice and readable and it is pretty clear what the data hierarchy is and how I get to the bits that I need. In javascript this is really easy:-

var firstname = json.name.firstname;

In XML this would look like:-

<?xml version="1.0" encoding="UTF-8"?>
<person>
	<name>
		<forename>Dave</forename>
		<surname>Barter</surname>
	</name>
	<address>
		<street>19 Foobaa Avenue</street>
		<town>Devnull</town>
		<postcode>RS1 1ED</postcode>
	</address>
</person>

Urgghh look at all those horrible closing tags and also the need for the outer tag to make the document valid. And things get worse when you want to actually get to some data, if you don’t believe me then simply Google “xpath”.

The great thing about JSON is that it is so easy to extend your structure with new data. This helps us a lot as we often get involved in projects where requirements are not clear from the outset. So at the start of the project the customer will ask us to store an entity called “person” and they will have a name and an address. So far so good, but at the last minute they own up that they’ve forgotten to tell us about the requirement to store phone number as well.

If we’d gone down the traditional route we’d be sighing into our coffee and revisiting our database schema. No doubt this would cause issues in client side code and here we are heading to change control which is no good for any of us.

Instead we like to use JSON and I’ll give you a glimpse into our geospatial data store. Our tables look broadly like this:-

CREATE TABLE features (

	feature_id BIGSERIAL,
	public_attributes JSONB,
	private_attributes JSONB
	wkb_geometry GEOMETRY
);

This allows us to keep our database schema really clean by storing complex geospatially related data in four columns. The JSON columns are using a PostgreSQL data format called JSONB which is a binary form and provides efficiencies for data compression and indexing.

We can easily add new attributes to our features as we see fit and the database does not even need to know about them. So from the client I can send:-

{
  “type” : “point_of_interest”,
  “display_name” : “Really good pub”,
  “other_stuff” : {“colour” : “red”, “icon” :”pub.tif”}
  “tag” : “pub”
}

And keep adding to other stuff as I see fit.

Adding an index to the JSONB column makes things lightening fast:-

CREATE INDEX features_public_attributes_idx ON features USING GIN(public_attributes jsonb_path_ops);

Now using the postgres JSONB operator @> I can very quickly find a feature that I’m interested in

SELECT feature_id FROM features where public_attributes @> ‘{“tag” : “pub”}’

As you can see this is fantastic as ONE index allows me to quickly navigate the entire JSON structure as this query will use the same index:-

SELECT feature_id 
FROM features 
WHERE public_attributes @> ‘{“other_stuff” : {“colour” : red”}}’’

If I had put this in a separate column then I’d need to create a new index and up goes the size of my database accordingly.

So as you can see JSON in PostgreSQL offers us a whole load of benefits:-

  • flexibility in our data structures
  • easier client/server code (especially when using web technologies)
  • reduced indexing overhead and data storage requirement

Let’s make things even more betterer

PostgreSQL Supports Javascript

Now I know that many IT zealots will be furiously hammering away at the keyboard right now in order to tell me that Javascript is bad and should be killed with fire.

Hammer away as it’s made a real difference to our business.

When we’re messing about with data we like to do it in the database. Our architecture says that data only comes out when it is ready to be viewed or passed on to something else. Up and till that point the most efficient place to mangle it is where it lives. This architecture is not for everyone but it’s allowed us to make stratospheric cost savings buy moving to an entirely serverless operating model, so I’ll let the money talk.

The downside of messing with data in the database is that you inevitably have to do it with SQL. And SQL is lovely and horrible in equal measure. It’s designed specifically for getting data in and out of a database and is really good at its job. But sometimes it slips out of its comfort zone when doing stuff with that data, JSON being a great case in point.

In javascript if I want to add a new attribute to a JSON object it’s pretty easy to do:-

var foo = {};
foo[‘bar’] = ‘wibble’;

In SQL (PostgreSQL variant) it’s a bit more horrible:-

DO
$$
DECLARE
	foo JSONB DEFAULT jsonb_build_object();
BEGIN
	foo = jsonb_set(foo, '{bar}', ('"wibble"')::JSONB, true);
END;
$$ LANGUAGE PLPGSQL

This caused us a bit of a problem as we have lots of tasks that needed our internal JSON to be messed about with but did not want to have to retrain all of our programmers in the fine art of JSON postgres SQL hackery. Also, it’s hard to maintain. So we turned to Javascript.

Adding javascript to a PostgreSQL database is incredibly easy, one line adds the V8 engine for us:-

CREATE EXTENSION PLV8;

Now I can code in Javascript

DO
$$
var foo = {};
foo[‘bar’] = ‘wibble’;

$$ LANGUAGE PLV8

Nice and easy. We made things even easier for our guys by using triggers on the tables of concern and allowing them to hack away in javascript at their content:-

CREATE FUNCTION post_process() RETURNS trigger AS
$$
	//TG_OP tells us what postgres did with the record
    //NEW is our new record
    
	if(TG_OP === “INSERT”) {

		NEW.public_attributes[‘foo’] = ‘baa’;
	}
            
            //If we modify and return NEW the record will be modified in the database
	return NEW;

$$
LANGUAGE PLV8;

CREATE TRIGGER update_trigger
    BEFORE INSERT OR UPDATE OR DELETE
    ON features FOR EACH ROW
    EXECUTE PROCEDURE post_process();

This trigger allows our guys to think javascript. We’ve gone much further and created the ability for them to create libraries and shared code. They never need to understand the database schema or the intricacies of SQL and can focus upon what they know best. Javascript and JSON.

PostgreSQL Supports even more stuff

Phew, that’s just two aspects we use and I have a shed load more to tell you about. For instance how we’ve written a database API in PL/PGSQL, how we’ve used schemas and inheritance to securely segment our customer’s data and some of the stuff we’ve done in PostGIS that is simply awesome.

Stay tuned for my next PostgreSQL blog when I’ll dive deeper into each of those.

PostgreSQL - A Thank you from Nautoguide

Contact us

Subscribe to Nautoguide