The Kdb+ database, K and Q.

Everybody to a certain degree is familiar with basic database concepts, how they work how to make simple queries, how to install this or that database &c &c. Probably every programmer has at some point installed MySQLor other similar DB’s-, have learned a bit of SQL and perhaps made some professional use of those knowledge.

SQL is very nice and chubby, with a lot of fancy features covered by hundreds of books, there are also thousands of people ready to help you on stackoverflow if you are in trouble with your favorite query.

MySQL’s style databases are used by everybody, unless everybody really care about performance (Well, and a couple of additional oddities are well welcomed..) then they –the everybody– will probably shift to a bit more specific type of database, like Kdb+: A in-memory column-oriented based on the concept of ordered list, the database come along with the powerful query language Q.

Before going further, let’s disassemble the last statement in an attempt to understand what Kdb+ is:

  1. “Is an in-memory DB”, that means that it use a lot of RAM.. a very lot of RAM! Most of the information and data handled by the Kdb+ engine are stored in memory, with very limited or no usage at all of a usual disks.Every common database at some point (at some very early point..) store information on disk, nothing wrong here.

    The problem is that disk’s are slow, and storing/loading data from disk might be too slow for certain applications, that is: Market data analysis, Heavy math/statistical data-oriented computation, Sensor data storage and analysis &c.Almost every financial institution on planet earth use Kdb+ for trading and market analysis, considering how cheap RAM memory is today for who have the money to buy it, it’s not a problem that a DB store in-memory billions or trillions of entries. Disks are for the poor one!

  2. “Is column oriented”, canonical relational databases are row oriented which means that the data stored in the tables are grouped in memory in a row-style fashion, so to say: Each field of the row is recorded in memory in a sequential manner. Inserting data means that new row has to be created and stored as a contiguous sequence of bytes, querying data means extracting a row from the table (yeah, spartan explanation but you get the point, right?).In Kdb+ data are column oriented, which means that extracting data is much faster if you care about data series, the information of a table are sequentially ordered in memory following the column series order.

    Columns are stored in memory sequentially, one by one.Operations like finding the average value, the min/max, or any other math/stat operation that come’s to your mind are outrageously faster on column oriented DB’s like Kdb+. Column oriented DB’s benefit for the locality of data in memory to boost the data series reading performance, clearly the counter-side is that reading an entire row is going to be much slower compared to row-oriented DB’s. But here’s the point, Kdb+ is a powerful databases for data series analysis which include a powerful array processing language!

    Column oriented storage vs Row oriented storage, Kdb+ is a column-oriented DB.


  3. “based on the concept of ordered list”, any other database for mortals has no concept of order. Oh yeah, you might force the DB to do some ordering but that’s not going to be natural due to the way information are stored (row oriented). Kdb+ works on vectors of ordered list, which make it very simple to perform query on data series.
  4. “…the query language Q”, Oh yeah, this is the juicy part! Q is the built-in programming language that can be used to program and use Kdb+, it’s an interpreted and dynamically typed array processing language which at it’s root is just a wrapper for K.

Why the K programming language needs a wrapper? Since K interact directly with Kdb+ then why not use directly K? Before writing a single additional word on this topic, let’s see how a K program looks like:

Nice K solution of the sudoku game.

*{$[&/x;,x;,/.z.s'@[x;i;:;]'&27=x[,/p i:x?0]?!10]}@.:'*.z.x

This is the implementation of the sudoku game. And no, i swear God I’m not trying to obfuscate the code, this is actually how K looks like! Give the following sudoku board:

Graphical visualization of the input for the sudoku game.

The output is: 284375169639218457571964382152496873348752916796831245967143528813527694425689731 or:

Graphical visualization of the output of the sudoku game solution

You can run the sudoku game program with the following command (Assuming that the K code is stored in the file sudoku.k):

The image of the table if for clarity purpose, there’s no graphics capabilities involved here. The output is just a string of digits which represent the game solution.

To be honest here K is not much more difficult or cryptic that many other languages out there, the equivalent perl program (listed below) to solve the sudoku puzzle doesn’t looks like much better if you really push the limit of the language for the purpose of reducing the size of the code.

And this is true also for most other languages. The point here is that you don’t really have to obfuscate the code in order to have almost unreadable listing of symbols which actually are programs, K programs are really intended to be written as short as possible and as efficient as possible, there’s no option to write clean code in K.

Ugly perl solution of the sudoku game.


If you want to have a peek to a bit more complex piece of K code, then have a look at how Q is written in K following this link. You remember what I wrote earlier right? Q is just a wrapper of K, and the file the former link is point at is the implementation of Q in K.

But here we’re going really rally too far from the premises of this post, which were to talk a bit about Kdb+ and perhaps write down a list of steps to install and run some Kdb+ stuff, sadly I was too verbose already here and I’ll have to prepare a new post to follow those other topics.

But I know you’re excited! I can smell right now your excitement from the basement where I’m writing this stuff! For this reason before leaving you with the promise to be back on this topic soon, let’s see how to actually install Kdb+ and run the sudoku code I’ve been showing you..

  1. Download Kdb+! The community version is free for non commercial uses and.. is almost completely useless! You’re limited to two Gigs of address space in the free version, which is sufficient for learning purpose but completely useless for any real use of the database.
  2. Unzip the file and store it in your favorite location, then export the QHOME environment variable, it should point to the location where the q.k file is stored.
  3. Congratulations, your installation is completed!

To run Kdb+ just execute the q binary from your installation directory, the first run will show you something like:

Where you’re immediately absorbed in the interactive console of the database. To run sudoku please have a look at  my earlier code snippets, they includes the command to run the code.Also, if for some reason you’re asking yourself if there are books about Q, then the answer is yes! This link points to a sadly very expensive book on Q (and therefore on K).

And that’s all for today, I’ll be back on this topic soon with some example on how to use the database.

Thanks for reading

Leave a Reply