uniform_resource Table
A resource can be any entity that contains data or is used to facilitate system operations. In a typical information system, resources include files, devices, network components, web services, users, applications, and more. Effective resource management helps ensure security, efficiency, and compliance by allowing for proper monitoring, maintenance, and control of these entities.
One of the benefits of surveilr
is its heavily opinionated relational
database schema in general and design of the uniform_resource
table used in
particular. The uniform_resource
table is built to serve complex requirements
in cybersecurity, compliance, and resource management scenarios. We will also
cover its related tables and provide practical SQL examples to demonstrate their
use.
In an RSSD, uniform_resource
represents a central element designed for
append-only, immutable storage of entities in a file system. It is highly
opinionated, aiming to maintain a history of resources by linking back to
ingestion sessions, devices, and their original sources. This design ensures not
only traceability but also reliability in terms of historical audit and
compliance.
The table is structured to track metadata related to any given resource, such as files or data components. Key features include the ability to record unique identifiers, content digests for data integrity, and paths of origin, all of which are critical in compliance and cybersecurity use cases.
Let’s look at the schema for the uniform_resource
table to get a better
understanding:
Breaking Down the uniform_resource
Table
The uniform_resource
table is opinionated in its approach, ensuring that the
database maintains a rich historical record of any given resource. Let us dive
into some of the core attributes of the table:
uniform_resource_id
: This is the primary key that uniquely identifies a resource. This ID ensures that every resource in the system can be traced back.device_id
andingest_session_id
: These columns link a resource to a specific device and an ingestion session, thereby providing context on where and how the resource originated.uri
: The Uniform Resource Identifier (URI) field serves as a globally unique reference to the resource, capturing its address within the system.content_digest
: This is a checksum or hash representing the content of the resource. It is used to verify data integrity, making sure that the resource content hasn’t been tampered with.nature
: This field describes the type or nature of the resource, such as whether it is a file, a document, or any other type of entity.frontmatter
andcontent_fm_body_attrs
: These JSON fields are used to store metadata and attributes about the content, enhancing the resource’s descriptive capabilities.created_at
,updated_at
,deleted_at
: These timestamps are crucial for tracking the life cycle of a resource, enabling historical and compliance-based reporting.
Related Tables
The uniform_resource
table interacts with several other tables, forming a
comprehensive system to manage, audit, and secure resources. This constellation
of related tables understands the diverse sources, multiple modalities, and
various formats that a resource can have, whether it’s a file, an email message,
or an issue from a PLM system.
1. ur_ingest_session
The ur_ingest_session
table captures sessions during which resources are
ingested into the system. Each ingestion session is recorded with metadata,
providing a timestamp and a snapshot of the state at that point.
- Purpose: To record and track the ingestion sessions, providing context for when and how resources were added to the system.
- Key Columns:
ur_ingest_session_id
: Primary key for the session.ingest_started_at
andingest_finished_at
: Timestamps for tracking the duration of an ingestion session.device_id
: Links the session to a specific device.
Example SQL to list all resources ingested in a given session:
2. ur_ingest_session_fs_path
The ur_ingest_session_fs_path
table stores file system paths linked to an
ingestion session. It ensures traceability for the file system structure when
resources were ingested.
- Purpose: To track the root file system paths that were ingested as part of a session, allowing for reconstruction of the original file system structure.
- Key Columns:
ur_ingest_session_fs_path_id
: Primary key for the file system path entry.ingest_session_id
: Links the file system path to a specific ingestion session.root_path
: The root path of the ingested file system.
Example SQL to find the root paths of all resources for a given device:
3. ur_ingest_session_imap_account
The ur_ingest_session_imap_account
table records details about IMAP
accounts used during an ingestion session, particularly for email messages.
- Purpose: To manage email accounts that are used as sources for ingestion, allowing email messages to be treated as resources.
- Key Columns:
ur_ingest_session_imap_account_id
: Primary key for the IMAP account.ingest_session_id
: Links the account to a specific ingestion session.email
,host
: Details about the email account used.
4. ur_ingest_session_imap_acct_folder
The ur_ingest_session_imap_acct_folder
table captures details about the
folders within an IMAP account from which messages were ingested.
- Purpose: To track email folders involved in ingestion, such as inboxes or custom folders.
- Key Columns:
ur_ingest_session_imap_acct_folder_id
: Primary key for the IMAP folder.ingest_session_id
,ingest_account_id
: Links the folder to an IMAP account and session.folder_name
: Name of the folder being ingested.
5. ur_ingest_session_plm_acct_project
The ur_ingest_session_plm_acct_project
table stores information about
Product Lifecycle Management (PLM) projects, such as those from GitHub or Jira,
involved in an ingestion session.
- Purpose: To track PLM projects that are ingested as resources, enabling issues, tasks, and other project data to be part of the resource pool.
- Key Columns:
ur_ingest_session_plm_acct_project_id
: Primary key for the PLM project.name
,description
: Metadata about the PLM project.
6. ur_ingest_session_fs_path_entry
The ur_ingest_session_fs_path_entry
table records individual file entries
within a file system that are part of an ingestion session.
- Purpose: To provide detailed tracking of files that were ingested, allowing for version control and change tracking at the file level.
- Key Columns:
ur_ingest_session_fs_path_entry_id
: Primary key for the file system path entry.file_path_abs
,file_basename
: Details about the file path and name.uniform_resource_id
: Links the file entry to theuniform_resource
table.
Compliance and Security Aspects
The RSSD database is designed with a focus on compliance and security. Here’s
how uniform_resource
helps achieve this:
-
Immutability: All tables are append-only. Any modification is essentially a new version, which ensures historical integrity and auditability. This immutability is critical for compliance, ensuring there is no tampering without leaving traces.
-
Content Hashing: The
content_digest
field stores a hash of the resource’s content. This allows for integrity checks to confirm that data hasn’t been altered, which is fundamental for detecting tampering or corruption. -
Detailed Provenance Tracking: With links to ingestion sessions, file paths, and originating devices, the
uniform_resource
table can help reconstruct the history of a resource. This is crucial in cybersecurity for incident analysis and forensic purposes. -
Attributes as JSON: JSON fields like
frontmatter
andcontent_fm_body_attrs
store complex metadata about the resource. This flexibility allows the system to adapt to evolving compliance requirements without altering the table schema significantly.
Practical Example: How to Track Resource Changes
Suppose we want to track changes to a specific file across different ingestion sessions to see how it has evolved. Here’s a query to achieve that:
This SQL query helps us track how a file (security.log
) has been captured over
multiple ingestion sessions, providing a snapshot of its historical changes.
The uniform_resource
table is a robust solution for managing resources
with immutability and traceability in mind. By combining the resource with
associated devices, ingestion sessions, and file paths, the RSSD database is
built to cater to the high-demand needs of compliance and cybersecurity.
With its opinionated approach, the uniform_resource
table doesn’t just capture
the current state of resources—it also records the entire historical context,
making it suitable for audit, analysis, and compliance. This elaborate
infrastructure ensures data integrity, reliability, and full transparency into
each resource’s life cycle.