Data Modeling: SQLite + Python - Part 3

This is a continuation of Sqlite in memory data structures used for general problem solving. As this is a continuation I am going to skip over some of the background details on why to use in memory sqlite and how to create and get to the harder problem that this solves elegantly.

Introduction

In Australian Football playoff teams are determined by wins and tiebreaker condition of match competitiveness. Match competitiveness is calculated by the formula of 100 * (points scored / points allowed) For example a win counts as 4 points, a tie 2 points and a loss 0 points. If multiple teams have the same match points then teams are ranked by the tiebreaker condition listed above.

Say two teams Team A, Team B with identical match points of 6 but one Team A has scored 100 points and allowed 50 in those matches while Team B scored 100 and allowed 99. In this situation the quality of Team A matches have been more in their favor than Team B by nature of scoring more points than allowing. This could happen as Team B luckily won or tied a match that there were losing up to a point whereas Team A won convincingly.

All teams will play each other with 1 team marked as home and the other as away. Therefore if you have 4 teams they will all play 6 (3 as home and 3 as away) matches to play against every other team in this format.

At the end of the season you could have a scenario like below. In this case the top 2 teams are advancing to the playoffs which requires no tiebreaker East Coast, Swans advance

team_name opponent_name score_for score_against home_result away_result
east coast essendon 102 42 east coast - win essendon - loss
east coast swans 112 81 east coast - win swans - loss
east coast tigers 81 36 east coast - win tigers - loss
essendon east coast 37 55 essendon - loss east coast - win
essendon swans 44 50 essendon - loss swans - win
essendon tigers 111 88 essendon - win tigers - loss
swans east coast 38 64 swans - loss east coast - win
swans essendon 72 39 swans - win essendon - loss
swans tigers 57 53 swans - win tigers - loss
tigers east coast 37 73 tigers - loss east coast - win
tigers essendon 46 65 tigers - loss essendon - win
tigers swans 95 62 tigers - win swans - loss

However if you take the match listed "d:b" Which is Tigers:East Coast and change the score from "37:73" from a East Coast win to a Tigers win instead of say "74:73" and you take the top 3 teams then it would go to a tiebreaker between Essendon, Tigers with Tigers advancing by having a higher competitiveness score

team_name point_scored points_allowed total_points
east coast 487 308 20
swans 360 407 12
tigers 392 449 8
essendon 338 413 8

338/413 = 81.84 to 392/449 = 87.30 for the Tigers

Again the data is presented as a series of string data that is broken apart into the following.

Team Identifiers

A = ["a:Essendon", "b:East Coast", "c:Swans", "d:Tigers"]

Match Opponents

B = ["a:b", "a:c", "a:d", "b:a", "b:c", "b:d", "c:a", "c:b", "c:d", "d:a", "d:b", "d:c"]

Match Scores

C = ["37:55", "44:50", "111:88", "102:42", "112:81", "81:36", "72:39", "38:64", "57:53", "46:65", "37:73", "95:62"]

Sqlite Database Plan

There is a need to create a table that holds the MatchDetail from the perspective of one team

From the above table I can calculate a running sum of the scored points for/against in each match as the first team listed and as opponent along with capturing the win score as the field points.

With this table you can find the match summaries with a query like this to see all the matches played (each team plays 3 matches and the opponent, points for and points against and with the final result)

SELECT team_name,
       opponent_name,
       score_for,
       score_against,
       concat(team_name, ' - ', CASE points_team WHEN 4 THEN "win" WHEN 2 THEN "tie" ELSE "loss" END) as home_result,
       concat(opponent_name, ' - ', CASE points_opponent WHEN 4 THEN "win" WHEN 2 THEN "tie" ELSE "loss" END) as away_result
FROM MatchDetail
order by team_name, opponent_name;
team_name opponent_name score_for score_against home_result away_result
east coast essendon 102 42 east coast - win essendon - loss
east coast swans 112 81 east coast - win swans - loss
east coast tigers 81 36 east coast - win tigers - loss
essendon east coast 37 55 essendon - loss east coast - win
essendon swans 44 50 essendon - loss swans - win
essendon tigers 111 88 essendon - win tigers - loss
swans east coast 38 64 swans - loss east coast - win
swans essendon 72 39 swans - win essendon - loss
swans tigers 57 53 swans - win tigers - loss
tigers east coast 37 73 tigers - loss east coast - win
tigers essendon 46 65 tigers - loss essendon - win
tigers swans 95 62 tigers - win swans - loss

Furthermore to get the playoff ranking you need to sum each record as home and away (team_name, opponent_name) to build up the competitiveness tiebreaker value

In this case using Common Table Expressions you can sum the points for a team and then join the two sets together to get that ranking like so

WITH points_for as (
    SELECT team_name, SUM(score_for) score_1, SUM(score_against) score_2, SUM(points_team) win_score
    FROM MatchDetail
    group by team_name
),
points_against as (
    SELECT opponent_name, SUM(score_against) score_1, SUM(score_for) score_2, SUM(points_opponent) win_score
    FROM MatchDetail
    group by opponent_name
)
SELECT pf.team_name, pf.score_1 + pa.score_1 point_scored, pf.score_2 + pa.score_2 points_allowed, pf.win_score + pa.win_score total_points
from points_for pf
inner join points_against pa
on pf.team_name = pa.opponent_name
ORDER BY pf.win_score + pa.win_score DESC, (100 * (CAST((pf.score_1 + pa.score_1) as float)) / CAST((pf.score_2 + pa.score_2) as float)) DESC
LIMIT 2;

The key point here is to group by team the perspective of the scores as the home team team_name and as the away team opponent_name to get the full picture. You need to consider the win_score from each teams perspective on the match to calculate the winning points.

Implementation

I need to do the following

  • Parse the first array of strings as the Team id and names
  • Parse the second array of strings as the team matches by their ids
  • Parse and match the match scores through the team matches back to their ids

In this case the method signature will look like this def get_playoff_teams(teams:list[str], matches:list[str], match_scores:list[str]) -> list[str]:

Returning the list of team names in order of playoff rankings

Method details

def get_playoff_teams(teams:list[str], matches:list[str], match_scores:list[str]) -> list[str]:
    try:
        all_teams = get_teams(teams)
        con = sqlite3.connect(':memory:')
        con.execute(
            "CREATE TABLE MatchDetail(team_name text, opponent_name text, score_for int, score_against int, points_team int, points_opponent int);")
        [insert_match_data(all_teams, con, match, score) for match, score in zip(matches, match_scores)]
        max_points = con.execute("""
                    WITH points_for as (
                        SELECT team_name, SUM(score_for) score_1, SUM(score_against) score_2, SUM(points_team) win_score
                        FROM MatchDetail
                        group by team_name
                    ),
                    points_against as (
                        SELECT opponent_name, SUM(score_against) score_1, SUM(score_for) score_2, SUM(points_opponent) win_score
                        FROM MatchDetail
                        group by opponent_name
                    )
                    SELECT pf.team_name, pf.score_1 + pa.score_1 point_scored, pf.score_2 + pa.score_2 points_allowed, pf.win_score + pa.win_score total_points
                    from points_for pf
                    inner join points_against pa
                    on pf.team_name = pa.opponent_name
                    ORDER BY pf.win_score + pa.win_score DESC, (100 * (CAST((pf.score_1 + pa.score_1) as float)) / CAST((pf.score_2 + pa.score_2) as float)) DESC
                    LIMIT 2;
                                 """)
        playoff_teams = [result[0] for result in max_points]
        return playoff_teams
    except Exception as e:
        ic(e)
        return ["", ""]

Details below of inner method calls

Parse the array of team ids and names

def get_teams(teams) -> dict[str, str]:
    all_teams = {}
    for input_team in teams:
        team_details = input_team.split(":")
        all_teams[team_details[0].lower().strip()] = team_details[1].lower().strip()
    return all_teams

Parse the team match score win,tie,loss

def match_points(score_against_team2: int, score_for_team1: int) -> int:
    team2_points = 2
    if score_against_team2 > score_for_team1:
        team2_points = 4
    if score_against_team2 < score_for_team1:
        team2_points = 0
    return team2_points

Parse and insert the team match points into the MatchDetails table

def insert_match_data(all_teams: dict[str, str], con: sqlite3.Connection, match: str, score: str) -> None:
    match_details = match.split(":")
    score_details = score.split(":")
    team1_name = all_teams[match_details[0].lower().strip()]
    team2_name = all_teams[match_details[1].lower().strip()]
    score_for_team1 = int(score_details[0])
    score_against_team2 = int(score_details[1])
    team1_points = match_points(score_for_team1, score_against_team2)
    team2_points = match_points(score_against_team2, score_for_team1)
    con.execute(
        "INSERT INTO MatchDetail (team_name, opponent_name, score_for, score_against, points_team, points_opponent) VALUES (?, ?, ?, ?, ?, ?)",
        ic((team1_name, team2_name, score_for_team1, score_against_team2, team1_points, team2_points)))

When called as such

if __name__ == '__main__':
    A = ["a:Essendon", "b:East Coast", "c:Swans", "d:Tigers"]
    B = ["a:b", "a:c", "a:d", "b:a", "b:c", "b:d", "c:a", "c:b", "c:d", "d:a", "d:b", "d:c"]
    C = ["37:55", "44:50", "111:88", "102:42", "112:81", "81:36", "72:39", "38:64", "57:53", "46:65", "37:73", "95:62"]
    ic(get_playoff_teams(A,B,C))

Returns the following ['east coast', 'swans']

Conclusion

Again the use of structured table data, here allows for introspection and manipulation of the data in multiple flows and approaches. The two way nature of each single match having a perspective of home/away team with points scored/against is deeply relational and avoids several complicated and unwieldy dictionary and list structures and customized key lookups. This solution is both performant, clear, and readily manipulatable according to SQL rules and set transformations. πŸ’―to me

πŸ’‘
Sqlite in memory tables should be your first choice when working with connected relational data.