Code Notebooks
Resource Surveillance Code Notebooks (RSCNs) are a way to store and manage executable code, allowing SQL or other “polyglot” code (e.g., Python, R, AI prompt code) to be kept, tracked, and executed within a Resource Surveillance State Database (RSSD). The structure enables detailed control over code execution and maintains a record of execution history and state transitions.
In this system:
- Each notebook is split into individual cells.
- Each cell contains a single block of code or function and is treated as an independently executable unit.
- Notebooks can be associated with a kernel, which is a specific computational engine responsible for executing the code.
The RSCN structure allows:
- Storing code in modular, traceable units (cells).
- Associating each cell with a specific kernel for execution.
- Tracking execution states and results for stateful operations, which is particularly useful for tasks like SQL migrations.
- Managing versioning and execution control for each cell independently, making the system flexible, auditable, and suitable for complex workflows in a controlled environment.
This database design enables precise control over notebook-based code execution, particularly in environments where code execution history and state consistency are crucial.
Key Database Tables
There are three primary tables involved: code_notebook_kernel
,
code_notebook_cell
, and code_notebook_state
.
1. code_notebook_kernel
Table: Kernel Information
The code_notebook_kernel
table stores metadata about each kernel or
execution engine used for running notebook code. Each record represents a
unique kernel, including details like its name, description, MIME type, and
associated file extension. This table allows for different types of kernels,
depending on the language or environment needed to execute the notebook’s code.
- Kernel: An engine that executes code in a particular language or environment.
- Fields:
kernel_name
: The unique name of the kernel (e.g., “SQLite”, “Python”).mime_type
andfile_extn
: Describe the type of code the kernel handles (e.g.,application/sql
for SQL).- JSON fields (
elaboration
,governance
): Optional metadata about kernel configurations or governance policies. - Timestamps: Creation and update timestamps for versioning and auditing.
This table helps enforce that each kernel is uniquely defined by its
kernel_name
and ensures each notebook cell has a compatible execution engine.
2. code_notebook_cell
Table: Individual Notebook Cells
The code_notebook_cell
table contains the actual code for each notebook,
stored one code block per row. Each cell represents a unit of interpretable code
tied to a specific notebook and kernel. This table’s schema supports versioning
and tracking of individual cells.
-
Cell Structure:
notebook_name
: The name of the notebook the cell belongs to.cell_name
: A unique identifier for the cell within its notebook.interpretable_code
: The actual code or function that will be executed.interpretable_code_hash
: A hash of the code, allowing detection of code modifications.notebook_kernel_id
: A foreign key linking tocode_notebook_kernel
, specifying the kernel that executes this cell’s code.
-
Uniqueness Constraint:
- Cells are uniquely identified by a combination of
notebook_name
,cell_name
, andinterpretable_code_hash
, meaning cells with the samenotebook_name
andcell_name
can exist in multiple versions if their code (interpretable_code
) changes.
- Cells are uniquely identified by a combination of
-
JSON Metadata Fields:
arguments
: JSON object to define input arguments or parameters for code execution.cell_governance
: JSON field for policy data (e.g., who can modify or execute the cell).
This table organizes and manages each code unit, enabling efficient version control and traceability of modifications.
3. code_notebook_state
Table: Tracking Execution States
The code_notebook_state
table records the execution state and transitions
for each cell, enabling detailed tracking of whether a cell has been executed
and what the outcomes were. This is especially useful for stateful kernels
where execution results impact future operations.
-
State Tracking:
from_state
andto_state
: Indicate the cell’s state transition (e.g., fromPENDING
toEXECUTED
).- Transition Reasons and Results:
transition_reason
: Explanation for why the state changed (e.g., triggered by user, scheduled task).transition_result
: JSON-encoded result or outcome of the code execution (e.g., output data, success/failure message).
-
Stateful vs Stateless Execution:
- Some cells are stateful (e.g., a cell that creates a table) and should only be executed once, as re-running could have unwanted effects.
- Others are stateless (e.g., cells that compute values without side effects) and can be run multiple times without changing the system state.
The state-tracking mechanism here helps ensure that cells are executed in a controlled, traceable way, preventing unintended re-executions and supporting auditability.
Example of How These Tables Work Together
-
Kernel Definition:
- Suppose there’s a kernel with
kernel_name = 'SQL'
, which is responsible for executing SQL code. This entry exists incode_notebook_kernel
.
- Suppose there’s a kernel with
-
Notebook and Cell Definition:
- A SQL notebook named
"ConstructionSqlNotebook"
is created, and it’s split into individual cells, each containing a SQL statement for data processing, schema modification, etc. - Each cell is linked to the
SQLite
kernel by thenotebook_kernel_id
.
- A SQL notebook named
-
Cell Execution and State Tracking:
- Each cell’s execution is tracked in
code_notebook_state
, where it transitions from an initial state (likePENDING
) to a final state (likeEXECUTED
) once the code runs successfully. - If the notebook is used to manage SQL schema migrations, a cell in
"ConstructionSqlNotebook"
might be marked asEXECUTED
once it has created a table, preventing redundant re-executions.
- Each cell’s execution is tracked in
Special Case: Migratable SQL
For specific SQL notebooks, such as "ConstructionSqlNotebook"
, certain cells
are candidates for migration. Cells that have not yet been executed (i.e.,
without an EXECUTED
state in code_notebook_state
) are considered
unmigrated and represent SQL code that hasn’t yet been applied to the
database.
This setup allows the system to track which parts of a SQL migration notebook have been applied and which are still pending, ensuring orderly migration processes without redundant operations.
Code Notebooks Models and Schema
The code_notebook_kernel
table defines each kernel or execution engine
responsible for running code in the notebook cells. Each record represents a
unique kernel, specifying the programming language or environment it supports.
Column | Type | Description |
---|---|---|
code_notebook_kernel_id | VARCHAR | Primary key, uniquely identifies each kernel. |
kernel_name | TEXT | Unique name of the kernel (e.g., SQLite , Python ). |
description | TEXT | Description of the kernel. |
mime_type | TEXT | MIME type associated with the code language (e.g., application/sql for SQL). |
file_extn | TEXT | File extension associated with the kernel (e.g., .sql ). |
elaboration | JSON (nullable) | Optional metadata about kernel configurations. |
governance | JSON (nullable) | Optional governance policies related to the kernel. |
created_at | TIMESTAMPTZ | Timestamp when the kernel was created. |
created_by | TEXT | User who created the kernel. |
updated_at | TIMESTAMPTZ | Timestamp of the last update. |
updated_by | TEXT | User who last updated the kernel. |
deleted_at | TIMESTAMPTZ | Timestamp if the kernel is deleted (soft delete). |
deleted_by | TEXT | User who deleted the kernel. |
activity_log | TEXT | Log of kernel activities for auditing purposes. |
The code_notebook_cell
table stores the actual code for each notebook, with
one row per code block or cell. Each cell is associated with a specific notebook
and a kernel.
Column | Type | Description |
---|---|---|
code_notebook_cell_id | VARCHAR | Primary key, uniquely identifies each cell. |
notebook_kernel_id | VARCHAR | Foreign key linking to code_notebook_kernel , specifies the kernel used for execution. |
notebook_name | TEXT | Name of the notebook this cell belongs to. |
cell_name | TEXT | Unique name of the cell within the notebook. |
interpretable_code | TEXT | The actual code to be executed. |
interpretable_code_hash | TEXT | Hash of the code, used for detecting code modifications. |
description | TEXT | Description of the cell’s purpose. |
arguments | JSON (nullable) | JSON object containing input arguments for code execution. |
cell_governance | JSON (nullable) | Governance policies related to the cell. |
created_at | TIMESTAMPTZ | Timestamp when the cell was created. |
created_by | TEXT | User who created the cell. |
updated_at | TIMESTAMPTZ | Timestamp of the last update. |
updated_by | TEXT | User who last updated the cell. |
deleted_at | TIMESTAMPTZ | Timestamp if the cell is deleted (soft delete). |
deleted_by | TEXT | User who deleted the cell. |
activity_log | TEXT | Log of cell activities for auditing purposes. |
Uniqueness Constraint: Cells are uniquely identified by a combination of
notebook_name
, cell_name
, and interpretable_code_hash
. This enables
multiple versions of a cell with the same name in the same notebook as long as
the code differs.
The code_notebook_state
table records the state transitions for each cell,
allowing for tracking of code execution history, particularly for stateful
kernels.
Column | Type | Description |
---|---|---|
code_notebook_state_id | VARCHAR | Primary key, uniquely identifies each state record. |
code_notebook_cell_id | VARCHAR | Foreign key linking to code_notebook_cell , identifies the cell whose state is being tracked. |
from_state | TEXT | State before the transition (e.g., PENDING ). |
to_state | TEXT | State after the transition (e.g., EXECUTED ). |
transition_reason | TEXT | Reason for the state change (e.g., triggered by user action). |
transition_result | JSON (nullable) | Result of the transition (e.g., success, error messages). |
transitioned_at | TIMESTAMPTZ | Timestamp of the state transition. |
elaboration | JSON (nullable) | Additional metadata related to the state transition. |
created_at | TIMESTAMPTZ | Timestamp when the record was created. |
created_by | TEXT | User who created the record. |
updated_at | TIMESTAMPTZ | Timestamp of the last update. |
updated_by | TEXT | User who last updated the record. |
deleted_at | TIMESTAMPTZ | Timestamp if the record is deleted (soft delete). |
deleted_by | TEXT | User who deleted the record. |
activity_log | TEXT | Log of state activities for auditing purposes. |
Stateful vs Stateless Execution
- Stateful Cells: Cells that perform persistent operations (e.g., creating tables) should only be executed once.
- Stateless Cells: Cells that perform non-persistent computations can be executed multiple times without side effects.
Notebook Cells for Script Storage & Execution Tracking
The code_notebook_cell
table serves as a comprehensive repository to store and
manage SQL scripts of all kinds, including but not limited to:
SQL DDL
: For database structure modifications such asCREATE
,ALTER
, andDROP
.SQL DQL
: Scripts related to data querying likeSELECT
operations.SQL DML
: Scripts for data manipulation includingINSERT
,UPDATE
, andDELETE
.- … and other SQL operations.
While it’s versatile enough to manage various SQL tasks, code_notebook_cell
table’s primary advantage lies in storing SQL DDL scripts needed for migrations
Columns:
code_notebook_cell_id
: A unique identifier for each SQL script.notebook_name
: Broad categorization or project name. Especially useful for grouping related migration scripts.cell_name
: A descriptive name for the SQL operation or step.cell_governance
: Optional JSON field for any governance or compliance-related data.kernel
: The SQL dialect or interpreter the script targets, such as PostgreSQL, MySQL, etc. (might also support shebang-style for non SQL)interpretable_code
: The SQL script itself (or any other runtime).description
: A brief description or context regarding the purpose of the script.activity_log
: optional JSON which stores the history of the changes to thie notebook cell
Migration Scripts & Database Evolution
One of the best uses for code_notebook_cell
is to manage SQL DDL scripts
crucial for database migrations. As databases evolve, tracking structural
changes becomes vital. By cataloging these DDL scripts, one can maintain a clear
version history, ensuring that database evolution is orderly, reversible, and
auditable.
To maintain a clear audit trail of script execution, the code_notebook_state
table logs each execution’s status. And, migration scripts can use the state to
know whether something has already been executed.
Inserting a New Execution State:
To log the execution of a script, you can add an entry like so:
State Transitions:
With from_state
and to_state
, you can track a script’s lifecycle, from its
INITIAL
state to any subsequent states like ‘EXECUTED’, ‘ROLLED_BACK’, etc.
This provides a traceable path of script interactions.
CLI Use
One way to keep your code and the database in sync is to just use the database to get its SQL (instead of putting it into an app) and execute the SQL directly in the database.
You can pass in arguments using .parameter
or sql_parameters
table, like: