Language phenotypes: SQL - Part 3

Thinking about the coding questions I have been given to solve using limited tools and well I think I can bypass the tooling issue by being creative.  CoderPad will allow you to use databases (it's kind of a feature most people wouldn't use to solve iterative problems, but think functionally and you come down to sets and then use/abuse the database like so)

Problem 1: Bucketize a list

Normally you would make a loop and adjust the values.  Now try to do it as set operations like this (postgres + c#)

  • Create a temp table
  • Load the starting buckets into the temp table
  • Update all the bucket values smaller than the initial bucket size to that size
  • Do some division and modulus logic to rebucket
  • Compare the initial size of the item to the rebucketed and pick the smaller of the values and bam you got it

Function prototype is

private static List BulkAdjustMin(int[] currentQuantities, int minSize, int batchBucketSize)

Input buckets are:

[10, 16, 0, 17] with a minimum order size of 14 and a bucket bin size of 8

i.e. var results = BulkAdjustMin(new[] { 10, 16, 0, 17 }, 14, 8);

Should give you

Result
16
16
16
17

The SQL looks like this

DROP TABLE IF EXISTS inputs;
CREATE TEMP TABLE  inputs (BucketValue INT NOT NULL, ComputedValue INT);
INSERT INTO inputs (BucketValue) VALUES (@item);
UPDATE inputs SET ComputedValue = @minInput WHERE BucketValue < @minInput;

SELECT CASE WHEN a.Bucketed > a.BucketValue THEN a.Bucketed ELSE a.BucketValue END Output
FROM
(
SELECT BucketValue,
        ((ComputedValue / @batchSize) * @batchSize) + CASE WHEN ComputedValue % @batchSize > 0 THEN @batchSize ELSE 0 END Bucketed
FROM inputs

The key here is the new bucket size ComputedValue is divided by the batch bucketing size, then multiplied by the batchSize and round up a batch for the remainder

Example: 10

  1. 10 becomes 14
  2. Start bucket size 14 / 8 = 1 multiply this by 8
  3. Remainder items 14 % 8 = 6 so round up the bucket 8
  4. Add the values together 8 + 8 gives you the bucket value of 16

This is the particular sql ((ComputedValue / @batchSize) * @batchSize) + CASE WHEN ComputedValue % @batchSize > 0 THEN @batchSize ELSE 0 END Bucketed

  1. Now pick the smaller of the original values so if you started with on this statement CASE WHEN a.Bucketed > a.BucketValue THEN a.Bucketed ELSE a.BucketValue END Output

C# Code compatible with Postgres and will run in CoderPad.io

using System;
using Dapper;
using Npgsql;


class Solution
{
    static  List<int> BulkAdjustMin(int[] currentQuantities, int minSize, int batchBucketSize)
    {
        using (var conn = new NpgsqlConnection("Host=/tmp/postgresql/socket; Database=coderpad; Username=coderpad"))
        {
            conn.Open();
            conn.Execute("DROP TABLE IF EXISTS inputs");
            conn.Execute("CREATE TEMP TABLE  inputs (BucketValue INT NOT NULL, ComputedValue INT)");
            conn.Execute("INSERT INTO inputs (BucketValue) VALUES (@item);", currentQuantities.Select(x => new { item = x }));
            
            var results = conn.Query<int>(@"
UPDATE inputs SET ComputedValue = @minInput WHERE BucketValue < @minInput;
SELECT CASE WHEN a.Bucketed > a.BucketValue THEN a.Bucketed ELSE a.BucketValue END Output
FROM
(
SELECT BucketValue,
        ((ComputedValue / @batchSize) * @batchSize) + CASE WHEN ComputedValue % @batchSize > 0 THEN @batchSize ELSE 0 END Bucketed
FROM inputs
) a", new
            {
                minInput = minSize,
                batchSize = batchBucketSize
            }).ToList();
            return results;
        }
    }
    
    static void Main(string[] args)
    {
        var expectedResults = new int[] { 16, 16, 16, 17};
        var results = BulkAdjustMin(new[] { 10, 16, 0, 17 }, 14, 8);
        Console.WriteLine($"Matched: {expectedResults.OrderBy(z => z).SequenceEqual(results.OrderBy(k => k))}");
    }
}

Why would you do this?  

It bypasses the can't use external libraries code thing coderpad says they do and pushes you to think of this operation in a set theory format which is more functional and delivering fast performance and would be the way to make this code faster.  Take that algorithm and find it.  SQL has far superior dynamic algorithms for doing this matching in code, why rewrite for a dumb test and production code increases in speed and reduces costs by moving the code as close to the data as you feel comfortable.

I apologize for nothing and if I get another one of these questions I am going to pull this technique out to make someone's head explode with

Them: You can't do that. He's doing it, wait it works

Me: Yeah, you want to make stuff scream you do this with indexes etc because the problem you asked is a data matching problem