Using ParquetEphemeris
======================
``ParquetEphemeris`` reads pre-computed spacecraft state vectors from a
`Parquet `_ file via
`DuckDB `_, then resamples them to a uniform output grid
using Hermite interpolation — the same approach used by
:class:`~rust_ephem.OEMEphemeris` and :class:`~rust_ephem.FileEphemeris`.
It is designed for the increasingly common case where mission ephemerides
are produced as columnar Parquet datasets, often stored on S3 or another
object store, and need to feed the ``rust-ephem`` constraint-evaluation
pipeline directly without an intermediate text export.
Supported sources
-----------------
The Parquet ``source`` argument is forwarded to DuckDB's
``read_parquet()``, so any URI that DuckDB understands works:
* **Local files**: ``"path/to/file.parquet"`` or globs like
``"data/sat_*.parquet"`` (DuckDB unions matching files automatically).
* **Amazon S3**: ``"s3://bucket/key.parquet"``.
* **DigitalOcean Spaces** and other S3-compatibles: any ``s3://``
URI plus the ``s3_endpoint`` constructor kwarg
(e.g. ``"nyc3.digitaloceanspaces.com"``).
* **Google Cloud Storage**: ``"gcs://bucket/key.parquet"``.
* **Cloudflare R2**: ``"r2://bucket/key.parquet"``.
* **HTTP(S)**: ``"https://host/path.parquet"`` for public buckets and CDNs.
Required schema
---------------
The Parquet file must contain at minimum **seven columns**:
* a timestamp column,
* three position columns (``x``, ``y``, ``z``),
* three velocity columns (``vx``, ``vy``, ``vz``).
The defaults are ``time`` for the timestamp and ``x/y/z`` + ``vx/vy/vz`` for
the state vectors. Override via the ``time_col``, ``pos_cols`` and
``vel_cols`` constructor kwargs.
The timestamp column must be castable by DuckDB to ``TIMESTAMP`` /
``TIMESTAMPTZ``. A column written by Arrow/pandas with timezone-aware
``datetime64`` values, or an ISO 8601 string column, both work out of the
box. Other columns in the file are ignored.
Authentication
--------------
Cloud access uses DuckDB's built-in
``credential_chain`` provider, which transparently picks up the standard
AWS environment variables. None of the credentials ever pass through
``rust-ephem``:
============================== ==================================================
Variable Purpose
============================== ==================================================
``AWS_ACCESS_KEY_ID`` Access key (S3 or S3-compatible)
``AWS_SECRET_ACCESS_KEY`` Secret key
``AWS_SESSION_TOKEN`` Session token (for STS / temporary creds)
``AWS_REGION`` /
``AWS_DEFAULT_REGION`` Region (override with ``s3_region`` kwarg)
============================== ==================================================
For DigitalOcean Spaces or any non-AWS S3-compatible service, additionally
pass the host as ``s3_endpoint`` (without ``https://``):
.. code-block:: python
ParquetEphemeris(
"s3://my-spaces-bucket/sat42.parquet",
begin, end, step_size=60,
s3_endpoint="nyc3.digitaloceanspaces.com",
)
Time-range pre-filtering
------------------------
To keep memory and bandwidth down on large historical archives,
``ParquetEphemeris`` issues a ``WHERE`` clause on the DuckDB query that
restricts loaded rows to ``[begin − 1 h, end + 1 h]``. The 1 hour margin
ensures Hermite interpolation has neighbours on either side of the
requested window. Combined with Parquet's row-group statistics this gives
DuckDB enough information to skip irrelevant chunks of large files.
If a Parquet file's sampling rate is sparser than 1 minute, or if the
requested range sits very close to the file's earliest / latest sample,
you may need to pad the source data so the margin is satisfied.
Optional dependency
-------------------
DuckDB is an optional dependency. Install with:
.. code-block:: bash
pip install rust-ephem[parquet]
or simply:
.. code-block:: bash
pip install duckdb
Constructing one
----------------
.. code-block:: python
import rust_ephem as re
from datetime import datetime, timezone
begin = datetime(2024, 1, 1, 0, 0, 0, tzinfo=timezone.utc)
end = datetime(2024, 1, 1, 1, 0, 0, tzinfo=timezone.utc)
eph = re.ParquetEphemeris(
"spacecraft_states.parquet",
begin=begin,
end=end,
step_size=60, # resample to 1-minute output grid
)
print(f"Source frame : {eph.source_frame}") # "GCRS" by default
print(f"Source rows : {eph.file_pv.position.shape[0]}")
print(f"Output rows : {eph.gcrs_pv.position.shape[0]}")
print(f"Position[0] : {eph.gcrs_pv.position[0]} km")
print(f"Latitude[0] : {eph.latitude_deg[0]:.4f} deg")
Custom column names
-------------------
If your schema uses non-default names:
.. code-block:: python
eph = re.ParquetEphemeris(
"ephemeris.parquet",
begin=begin, end=end, step_size=60,
time_col="epoch_utc",
pos_cols=("rx", "ry", "rz"),
vel_cols=("rdotx", "rdoty", "rdotz"),
)
Column names may contain only ASCII letters, digits, and underscores
(``[A-Za-z0-9_]+``). This includes names that begin with a digit. Names
with spaces or punctuation are rejected to prevent SQL injection — re-name
them in your data or query before ingestion.
Reading from S3
---------------
.. code-block:: bash
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=us-east-1
.. code-block:: python
eph = re.ParquetEphemeris(
"s3://my-bucket/ephemerides/sat42.parquet",
begin=begin, end=end, step_size=60,
)
Reading from DigitalOcean Spaces
--------------------------------
.. code-block:: bash
# Spaces uses the same env-var convention as AWS S3
export AWS_ACCESS_KEY_ID="DO00..."
export AWS_SECRET_ACCESS_KEY="..."
.. code-block:: python
eph = re.ParquetEphemeris(
"s3://my-space/ephemerides/sat42.parquet",
begin=begin, end=end, step_size=60,
s3_endpoint="nyc3.digitaloceanspaces.com",
)
Filtering by satellite ID
-------------------------
If a single Parquet file contains states for multiple spacecraft, narrow
the result with ``where_clause``:
.. code-block:: python
eph = re.ParquetEphemeris(
"s3://fleet/states.parquet",
begin=begin, end=end, step_size=60,
where_clause="sat_id = 42",
)
The clause is ANDed onto the time-range filter, so DuckDB still gets to
push it down to Parquet row groups.
Unit overrides
--------------
By default ``ParquetEphemeris`` assumes position columns are in **km** and
velocity columns in **km/s**. Pass ``position_unit`` and/or ``velocity_unit``
to override either independently:
.. code-block:: python
# Position and velocity both in metres / m/s
eph = re.ParquetEphemeris(
"states_meters.parquet",
begin=begin, end=end,
position_unit="m", # position in m
velocity_unit="m/s", # velocity in m/s
)
# If only position_unit is given, velocity defaults to position_unit + "/s"
eph = re.ParquetEphemeris(
"states_meters.parquet",
begin=begin, end=end,
position_unit="m", # implies velocity_unit="m/s"
)
# All standard properties (gcrs_pv, latitude_deg, …) are still in km / km/s.
print(eph.source_position_unit) # "m" (as supplied)
print(eph.source_velocity_unit) # "m/s" (as supplied or derived)
print(eph.gcrs_pv.position_unit) # "km" (internal representation)
Supported values: ``position_unit``: ``"km"`` (default), ``"m"``, ``"cm"``.
``velocity_unit``: ``"km/s"`` (default), ``"m/s"``, ``"cm/s"``.
Earth-fixed (ITRS/ECEF) input
-----------------------------
If the data is in an Earth-fixed (ITRS/ECEF) frame, set ``frame``:
.. code-block:: python
eph = re.ParquetEphemeris(
"states_ecef.parquet",
begin=begin, end=end,
frame="ECEF",
)
Inspecting raw data
-------------------
The raw, uninterpolated state vectors and timestamps are exposed via
``file_pv`` and ``file_timestamp``:
.. code-block:: python
raw = eph.file_pv
print(f"Pulled {raw.position.shape[0]} state vectors from Parquet")
ts = eph.file_timestamp
print(f"Earliest sample : {ts[0]}")
print(f"Latest sample : {ts[-1]}")
Evaluating constraints
----------------------
``ParquetEphemeris`` participates in the full constraint-evaluation
pipeline:
.. code-block:: python
re.ensure_planetary_ephemeris()
constraint = re.SunConstraint(min_angle=45.0) | re.MoonConstraint(min_angle=10.0)
result = constraint.evaluate(eph, target_ra=83.63, target_dec=22.01)
print(f"Visibility windows: {len(result.visibility)}")
for window in result.visibility:
print(f" {window.start_time} → {window.end_time}")
Ephemeris ABC subclass
----------------------
``ParquetEphemeris`` is a registered virtual subclass of
:class:`~rust_ephem.Ephemeris`:
.. code-block:: python
assert isinstance(eph, re.Ephemeris) # True