Data Modeling: SQLite + Python - Part 3
This is a continuation of Sqlite in memory data structures used for general problem solving. As this is a continuation I am going to skip over some of the background details on why to use in memory sqlite and how to create and get to the harder problem that this solves elegantly.
Introduction
In Australian Football playoff teams are determined by wins and tiebreaker condition of match competitiveness. Match competitiveness is calculated by the formula of 100 * (points scored / points allowed)
For example a win counts as 4
points, a tie 2
points and a loss 0
points. If multiple teams have the same match points then teams are ranked by the tiebreaker condition listed above.
Say two teams Team A, Team B
with identical match points of 6
but one Team A
has scored 100 points and allowed 50 in those matches while Team B
scored 100 and allowed 99. In this situation the quality of Team A
matches have been more in their favor than Team B
by nature of scoring more points than allowing. This could happen as Team B
luckily won or tied a match that there were losing up to a point whereas Team A
won convincingly.
All teams will play each other with 1 team marked as home and the other as away. Therefore if you have 4 teams they will all play 6 (3 as home and 3 as away) matches to play against every other team in this format.
At the end of the season you could have a scenario like below. In this case the top 2 teams are advancing to the playoffs which requires no tiebreaker East Coast, Swans
advance
team_name | opponent_name | score_for | score_against | home_result | away_result |
---|---|---|---|---|---|
east coast | essendon | 102 | 42 | east coast - win | essendon - loss |
east coast | swans | 112 | 81 | east coast - win | swans - loss |
east coast | tigers | 81 | 36 | east coast - win | tigers - loss |
essendon | east coast | 37 | 55 | essendon - loss | east coast - win |
essendon | swans | 44 | 50 | essendon - loss | swans - win |
essendon | tigers | 111 | 88 | essendon - win | tigers - loss |
swans | east coast | 38 | 64 | swans - loss | east coast - win |
swans | essendon | 72 | 39 | swans - win | essendon - loss |
swans | tigers | 57 | 53 | swans - win | tigers - loss |
tigers | east coast | 37 | 73 | tigers - loss | east coast - win |
tigers | essendon | 46 | 65 | tigers - loss | essendon - win |
tigers | swans | 95 | 62 | tigers - win | swans - loss |
However if you take the match listed "d:b"
Which is Tigers:East Coast
and change the score from "37:73"
from a East Coast win to a Tigers win instead of say "74:73"
and you take the top 3 teams then it would go to a tiebreaker between Essendon, Tigers
with Tigers
advancing by having a higher competitiveness score
team_name | point_scored | points_allowed | total_points |
---|---|---|---|
east coast | 487 | 308 | 20 |
swans | 360 | 407 | 12 |
tigers | 392 | 449 | 8 |
essendon | 338 | 413 | 8 |
338/413
= 81.84
to 392/449
= 87.30
for the Tigers
Again the data is presented as a series of string data that is broken apart into the following.
Team Identifiers
A = ["a:Essendon", "b:East Coast", "c:Swans", "d:Tigers"]
Match Opponents
B = ["a:b", "a:c", "a:d", "b:a", "b:c", "b:d", "c:a", "c:b", "c:d", "d:a", "d:b", "d:c"]
Match Scores
C = ["37:55", "44:50", "111:88", "102:42", "112:81", "81:36", "72:39", "38:64", "57:53", "46:65", "37:73", "95:62"]
Sqlite Database Plan
There is a need to create a table that holds the MatchDetail
from the perspective of one team

From the above table I can calculate a running sum of the scored points for/against in each match as the first team listed and as opponent along with capturing the win score
as the field points.
With this table you can find the match summaries with a query like this to see all the matches played (each team plays 3 matches and the opponent, points for and points against and with the final result)
SELECT team_name,
opponent_name,
score_for,
score_against,
concat(team_name, ' - ', CASE points_team WHEN 4 THEN "win" WHEN 2 THEN "tie" ELSE "loss" END) as home_result,
concat(opponent_name, ' - ', CASE points_opponent WHEN 4 THEN "win" WHEN 2 THEN "tie" ELSE "loss" END) as away_result
FROM MatchDetail
order by team_name, opponent_name;
team_name | opponent_name | score_for | score_against | home_result | away_result |
---|---|---|---|---|---|
east coast | essendon | 102 | 42 | east coast - win | essendon - loss |
east coast | swans | 112 | 81 | east coast - win | swans - loss |
east coast | tigers | 81 | 36 | east coast - win | tigers - loss |
essendon | east coast | 37 | 55 | essendon - loss | east coast - win |
essendon | swans | 44 | 50 | essendon - loss | swans - win |
essendon | tigers | 111 | 88 | essendon - win | tigers - loss |
swans | east coast | 38 | 64 | swans - loss | east coast - win |
swans | essendon | 72 | 39 | swans - win | essendon - loss |
swans | tigers | 57 | 53 | swans - win | tigers - loss |
tigers | east coast | 37 | 73 | tigers - loss | east coast - win |
tigers | essendon | 46 | 65 | tigers - loss | essendon - win |
tigers | swans | 95 | 62 | tigers - win | swans - loss |
Furthermore to get the playoff ranking you need to sum each record as home and away (team_name, opponent_name
) to build up the competitiveness tiebreaker value
In this case using Common Table Expressions you can sum the points for a team and then join the two sets together to get that ranking like so
WITH points_for as (
SELECT team_name, SUM(score_for) score_1, SUM(score_against) score_2, SUM(points_team) win_score
FROM MatchDetail
group by team_name
),
points_against as (
SELECT opponent_name, SUM(score_against) score_1, SUM(score_for) score_2, SUM(points_opponent) win_score
FROM MatchDetail
group by opponent_name
)
SELECT pf.team_name, pf.score_1 + pa.score_1 point_scored, pf.score_2 + pa.score_2 points_allowed, pf.win_score + pa.win_score total_points
from points_for pf
inner join points_against pa
on pf.team_name = pa.opponent_name
ORDER BY pf.win_score + pa.win_score DESC, (100 * (CAST((pf.score_1 + pa.score_1) as float)) / CAST((pf.score_2 + pa.score_2) as float)) DESC
LIMIT 2;
The key point here is to group by team the perspective of the scores as the home team team_name
and as the away team opponent_name
to get the full picture. You need to consider the win_score
from each teams perspective on the match to calculate the winning points.
Implementation
I need to do the following
- Parse the first array of strings as the Team id and names
- Parse the second array of strings as the team matches by their ids
- Parse and match the match scores through the team matches back to their ids
In this case the method signature will look like this def get_playoff_teams(teams:list[str], matches:list[str], match_scores:list[str]) -> list[str]:
Returning the list of team names in order of playoff rankings
Method details
def get_playoff_teams(teams:list[str], matches:list[str], match_scores:list[str]) -> list[str]:
try:
all_teams = get_teams(teams)
con = sqlite3.connect(':memory:')
con.execute(
"CREATE TABLE MatchDetail(team_name text, opponent_name text, score_for int, score_against int, points_team int, points_opponent int);")
[insert_match_data(all_teams, con, match, score) for match, score in zip(matches, match_scores)]
max_points = con.execute("""
WITH points_for as (
SELECT team_name, SUM(score_for) score_1, SUM(score_against) score_2, SUM(points_team) win_score
FROM MatchDetail
group by team_name
),
points_against as (
SELECT opponent_name, SUM(score_against) score_1, SUM(score_for) score_2, SUM(points_opponent) win_score
FROM MatchDetail
group by opponent_name
)
SELECT pf.team_name, pf.score_1 + pa.score_1 point_scored, pf.score_2 + pa.score_2 points_allowed, pf.win_score + pa.win_score total_points
from points_for pf
inner join points_against pa
on pf.team_name = pa.opponent_name
ORDER BY pf.win_score + pa.win_score DESC, (100 * (CAST((pf.score_1 + pa.score_1) as float)) / CAST((pf.score_2 + pa.score_2) as float)) DESC
LIMIT 2;
""")
playoff_teams = [result[0] for result in max_points]
return playoff_teams
except Exception as e:
ic(e)
return ["", ""]
Details below of inner method calls
Parse the array of team ids and names
def get_teams(teams) -> dict[str, str]:
all_teams = {}
for input_team in teams:
team_details = input_team.split(":")
all_teams[team_details[0].lower().strip()] = team_details[1].lower().strip()
return all_teams
Parse the team match score win,tie,loss
def match_points(score_against_team2: int, score_for_team1: int) -> int:
team2_points = 2
if score_against_team2 > score_for_team1:
team2_points = 4
if score_against_team2 < score_for_team1:
team2_points = 0
return team2_points
Parse and insert the team match points into the MatchDetails
table
def insert_match_data(all_teams: dict[str, str], con: sqlite3.Connection, match: str, score: str) -> None:
match_details = match.split(":")
score_details = score.split(":")
team1_name = all_teams[match_details[0].lower().strip()]
team2_name = all_teams[match_details[1].lower().strip()]
score_for_team1 = int(score_details[0])
score_against_team2 = int(score_details[1])
team1_points = match_points(score_for_team1, score_against_team2)
team2_points = match_points(score_against_team2, score_for_team1)
con.execute(
"INSERT INTO MatchDetail (team_name, opponent_name, score_for, score_against, points_team, points_opponent) VALUES (?, ?, ?, ?, ?, ?)",
ic((team1_name, team2_name, score_for_team1, score_against_team2, team1_points, team2_points)))
When called as such
if __name__ == '__main__':
A = ["a:Essendon", "b:East Coast", "c:Swans", "d:Tigers"]
B = ["a:b", "a:c", "a:d", "b:a", "b:c", "b:d", "c:a", "c:b", "c:d", "d:a", "d:b", "d:c"]
C = ["37:55", "44:50", "111:88", "102:42", "112:81", "81:36", "72:39", "38:64", "57:53", "46:65", "37:73", "95:62"]
ic(get_playoff_teams(A,B,C))
Returns the following ['east coast', 'swans']
Conclusion
Again the use of structured table data, here allows for introspection and manipulation of the data in multiple flows and approaches. The two way nature of each single match having a perspective of home/away team with points scored/against is deeply relational and avoids several complicated and unwieldy dictionary and list structures and customized key lookups. This solution is both performant, clear, and readily manipulatable according to SQL rules and set transformations. π―to me