We cover basic installation and usage of PostgreSQL and MADlib on OSX and Ubuntu. Instructions for other environments should be similar. PostgreSQL is an Open Source database with enterprise functionalities which often lack in MySQL. MADlib is an Open Source library which enhances a PostgreSQL or Greenplum database with functionalities for scalable in-database analytics.
PostgreSQL: installation
First, we need to have a PostgreSQL installation. On OSX, we will use Homebrew instead of compiling the source by ourselves. On Ubuntu, apt-get will do it. Note, Homebrew is expected to be present on your machine. At the time of this writing, the current proposed version is 9.0.1. The following commands will install and configure a new PostgreSQL installation. Additionally, it will be registered as a startup service.
OSX:
brew update
brew install postgres
initdb /usr/local/var/postgres
mkdir -p ~/Library/LaunchAgents
cp /usr/local/Cellar/postgresql/9.1.3/homebrew.mxcl.postgresql.plist ~/Library/LaunchAgents/
launchctl load -w ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist
In case your having trouble on OSX lion with the following message:
FATAL: could not create shared memory segment: Cannot allocate memory
DETAIL: Failed system call was shmget(key=1, size=1646592, 03600).
You should edit “/etc/sysctl.conf” or create it if it doesn’t exist and write:
kern.sysv.shmall=65536
kern.sysv.shmmax=16777216
Unbuntu:
apt-get update
apt-get install postgres
Make sure your postgres
command map to your new PostgreSQL installation, not to the default one. Executing which postgres
should print “/usr/local/bin/postgres”. If you see “/usr/bin/postgres” instead, you need to modify your path and place “/usr/local/bin” before “/usr/bin”. To do do, you may edit your ”/.bashrc”, ”/.profile” or “/etc/paths” file. Here’s an example:
echo "export PATH=/usr/local/bin:\$PATH" >> ~/.bashr
. ~/.profile
which postgres
Your PostgreSQL server should already be up and running. In the future, if you want to start/stop the server:
pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start
pg_ctl -D /usr/local/var/postgres stop -s -m fast
PostgreSQL: running
Note, the Homebrew formula configures the system with local “trust” authentication. Authentication is done against the current logged-in Unix user. Homebrew does not create a “postgres” user and the server is launch using the same user account under which you ran Homebrew.
You can test to connect to your new PostgreSQL server by issuing with psql
. For example, the command psql -d postgres -c "\l"
with open a connection to the “postgres” database (using your current username) and run the command \l
to list the databases.
Now, let’s please pgadmin by installing the Instrumentation functions not present by default.
psql -d postgres -c "CREATE EXTENSION "adminpack";"
psql -d postgres -c "select * from pg_extension"
psql -d postgres -c "select * from pg_available_extensions"
That’s pretty much with PostgreSQL, now let’s move on with installing the MADlib library.
MADlib: installation
We start by downloading the MADlib package and run the installation program.
Once installed, MADlib needs to be registered against a PostgreSQL database. We start by creating a database “mad” and then we register it:
psql -d postgres -c "CREATE DATABASE mad"
/usr/local/madlib/bin/madpack -p postgres -c $USER@$HOST/mad install
MADlib: testing
MADlib comes with a complete set of test which you can also use as a usage reference or simply as a source of inspiration. The script are present in “/usr/local/madlib/ports/postgres/modules//test/.sql”. To run the test suite:
/usr/local/madlib/bin/madpack -p postgres -c $USER@$HOST/mad install-check
Quick quantile example
We will test quantile function. For a simple definition, a quantile is the division of a total into equal subgroups.
As an example, the “.5” quantile of the serie ‘.25,.45,.5,.5,.55,.75’ is “.5”. If we substract “.1” to each element and use the same quantile, then the “.5” quantile of the serie ‘.15,.35,.4,.4,.45,.65’ is “.4”. We may interpret the result by saying that the serie is divided in two at “.4”. Finally, using the last serie, the “.75” quantile is “.425” meaning that one fourth of the values are after “.415”.
We will use the MADlib quantile test as a source of inspiration. It is located at “/usr/local/madlib/ports/postgres/modules/quantile/test/quantile.sql_in”.
Our simplified version will:
- Create a new table
- Insert some random values
- Compute the “.5” quantile
- Drop the table
CREATE TABLE TestQuantile ( val FLOAT );
INSERT INTO TestQuantile SELECT random()*100 FROM generate_series(1,1000);
SELECT MADLIB.quantile('TestQuantile', 'val', 0.5);
DROP TABLE TestQuantile;
Since our random()*100
function will generate a uniform representation between 0 and 100, we expect a result relatively close to 50.