The direct-sqlite calls DS.step and DS.columnInt64 map quite directly to their native SQLite counterparts sqlite3_step and sqlite3_column_int. SumRowsColumnInt64 :: DS.Statement -> Int64 -> IO Int64 sumRowsColumnInt64 stmt !acc = do r do i return acc The overhead of just calling into SQLite from Haskell was higher than the actual cost of computing the result set inside SQLite! Yet, there wasn’t much going on in the wrapper library.Ĭonsider the innerloop of the direct-sqlite benchmark: Somehow roughly 840 clock cycles per row were spent in Haskell SQLite bindings. Similarly, at 2.43M rows/s on direct-sqlite, each row cost roughly 1300 clock cycles out of which 460 was spent in the native SQLite library. On my 3.2GHz machine, 6.9M rows/s means SQLite spends roughly 460 clock cycles per row. To highlight how large the performance delta between C and direct-sqlite were, it’s helpful to turn the comparison into absolute clock cycles. Even a Python reimplementation of my benchmark case was faster at 2.5M rows/s. A low-level direct-sqlite benchmark was clocking around 2.43M rows/s which seemed a little low when a C implementation of the same was processing rows at almost 6.9M rows/s. With 3239d474f0 and 0ee050807d, the benchmark score went up from 750K to 764K rows/s.Īt this point I ran out of low hanging fruit in sqlite-simple and started to look elsewhere for optimizations. The next couple of optimizations dealt mostly with clean up to reduce allocation rate. Fixing this in d2b6f6a5 nicely bumped up the score from 53K to 750K rows/s. The problem was in a function called stepStmt that should’ve been tail recursive but wasn’t. This was a performance bug I caused when I forked sqlite-simple from postgresql-simple. Original performance without optimizations was just barely over 50K rows/s. The collected benchmark data was used to identify various performance improvement opportunities in sqlite-simple. A Haskell version using sqlite-simple ( source, see function selectIntsDS).Haskell: A Haskell version using direct-sqlite ( source, see function selectInts).A native C benchmark on top of the SQLite library ( source).To turn this into numbers, I implemented multiple versions of my query benchmark (in order of fastest to slowest): As direct-sqlite runs on top of the native SQLite library, the fastest sqlite-simple and direct-sqlite can possibly go is as fast as SQLite. As sqlite-simple runs on top of direct-sqlite, the sqlite-simple can only be as fast as direct-sqlite. Establishing targets was straightforward. To better focus optimization work, I first set out to establish some reasonable performance targets to compare against. Ideally, a query should spend all its time in native SQLite and zero time in Haskell bindings.
My benchmarking goal was to figure out how much overhead the sqlite-simple library adds on top of raw SQLite performance. C and Python implementations are under db-bench/native and db-bench/python, respectively. You can find implementations of the same for various Haskell database libraries under db-bench/haskell. We’ll focus mostly on selectInts when comparing against other implementations. sqlite-simple selectInts: Convert rows into Haskell list containing all the rows.sqlite-simple selectIntsFold: Convert rows into Haskell data but fold over rows to avoid allocating memory for all rows.direct-sqlite selectIntsDS: Lowest possible API level - least memory allocated.Several variants of this function are benchmarked. SelectInts :: S.Connection -> IO () selectInts conn = do rows v + acc) 0 rowsīasically, it SELECTs all the rows from the testdata table and converts the result into a Haskell list of Ints and, as a sanity check, sums the resulting integers together. The schema consists of a single testdata table, defined as: Measure the time it takes to query the first column of all these rows.Setup: Initialize a table testdata with 10000 rows of data.The high-level operation of the benchmark is: 64-bit Debian running inside VirtualBox (with only a single core enabled).Here’s the setup I used for running my experiments: Optimizations brought this up to 1.8M rows/s, a nice 34x improvement. Initially sqlite-simple scored barely over 50K rows/s. It also discusses some of the optimizations that resulted from this performance analysis. This post will present the results of this benchmarking. Initial results for sqlite-simple were extremely poor but improved significantly after optimizations. I was curious to know how sqlite-simple performance compares to native C, Python and other Haskell database bindings. I recently benchmarked my Haskell SQLite bindings package sqlite-simple.