Using ParquetEphemeris
ParquetEphemeris reads pre-computed spacecraft state vectors from a
Parquet file via
DuckDB, then resamples them to a uniform output grid
using Hermite interpolation — the same approach used by
OEMEphemeris and FileEphemeris.
It is designed for the increasingly common case where mission ephemerides
are produced as columnar Parquet datasets, often stored on S3 or another
object store, and need to feed the rust-ephem constraint-evaluation
pipeline directly without an intermediate text export.
Supported sources
The Parquet source argument is forwarded to DuckDB’s
read_parquet(), so any URI that DuckDB understands works:
Local files:
"path/to/file.parquet"or globs like"data/sat_*.parquet"(DuckDB unions matching files automatically).Amazon S3:
"s3://bucket/key.parquet".DigitalOcean Spaces and other S3-compatibles: any
s3://URI plus thes3_endpointconstructor kwarg (e.g."nyc3.digitaloceanspaces.com").Google Cloud Storage:
"gcs://bucket/key.parquet".Cloudflare R2:
"r2://bucket/key.parquet".HTTP(S):
"https://host/path.parquet"for public buckets and CDNs.
Required schema
The Parquet file must contain at minimum seven columns:
a timestamp column,
three position columns (
x,y,z),three velocity columns (
vx,vy,vz).
The defaults are time for the timestamp and x/y/z + vx/vy/vz for
the state vectors. Override via the time_col, pos_cols and
vel_cols constructor kwargs.
The timestamp column must be castable by DuckDB to TIMESTAMP /
TIMESTAMPTZ. A column written by Arrow/pandas with timezone-aware
datetime64 values, or an ISO 8601 string column, both work out of the
box. Other columns in the file are ignored.
Authentication
Cloud access uses DuckDB’s built-in
credential_chain provider, which transparently picks up the standard
AWS environment variables. None of the credentials ever pass through
rust-ephem:
Variable |
Purpose |
|---|---|
|
Access key (S3 or S3-compatible) |
|
Secret key |
|
Session token (for STS / temporary creds) |
|
|
|
Region (override with |
For DigitalOcean Spaces or any non-AWS S3-compatible service, additionally
pass the host as s3_endpoint (without https://):
ParquetEphemeris(
"s3://my-spaces-bucket/sat42.parquet",
begin, end, step_size=60,
s3_endpoint="nyc3.digitaloceanspaces.com",
)
Time-range pre-filtering
To keep memory and bandwidth down on large historical archives,
ParquetEphemeris issues a WHERE clause on the DuckDB query that
restricts loaded rows to [begin − 1 h, end + 1 h]. The 1 hour margin
ensures Hermite interpolation has neighbours on either side of the
requested window. Combined with Parquet’s row-group statistics this gives
DuckDB enough information to skip irrelevant chunks of large files.
If a Parquet file’s sampling rate is sparser than 1 minute, or if the requested range sits very close to the file’s earliest / latest sample, you may need to pad the source data so the margin is satisfied.
Optional dependency
DuckDB is an optional dependency. Install with:
pip install rust-ephem[parquet]
or simply:
pip install duckdb
Constructing one
import rust_ephem as re
from datetime import datetime, timezone
begin = datetime(2024, 1, 1, 0, 0, 0, tzinfo=timezone.utc)
end = datetime(2024, 1, 1, 1, 0, 0, tzinfo=timezone.utc)
eph = re.ParquetEphemeris(
"spacecraft_states.parquet",
begin=begin,
end=end,
step_size=60, # resample to 1-minute output grid
)
print(f"Source frame : {eph.source_frame}") # "GCRS" by default
print(f"Source rows : {eph.file_pv.position.shape[0]}")
print(f"Output rows : {eph.gcrs_pv.position.shape[0]}")
print(f"Position[0] : {eph.gcrs_pv.position[0]} km")
print(f"Latitude[0] : {eph.latitude_deg[0]:.4f} deg")
Custom column names
If your schema uses non-default names:
eph = re.ParquetEphemeris(
"ephemeris.parquet",
begin=begin, end=end, step_size=60,
time_col="epoch_utc",
pos_cols=("rx", "ry", "rz"),
vel_cols=("rdotx", "rdoty", "rdotz"),
)
Column names may contain only ASCII letters, digits, and underscores
([A-Za-z0-9_]+). This includes names that begin with a digit. Names
with spaces or punctuation are rejected to prevent SQL injection — re-name
them in your data or query before ingestion.
Reading from S3
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION=us-east-1
eph = re.ParquetEphemeris(
"s3://my-bucket/ephemerides/sat42.parquet",
begin=begin, end=end, step_size=60,
)
Reading from DigitalOcean Spaces
# Spaces uses the same env-var convention as AWS S3
export AWS_ACCESS_KEY_ID="DO00..."
export AWS_SECRET_ACCESS_KEY="..."
eph = re.ParquetEphemeris(
"s3://my-space/ephemerides/sat42.parquet",
begin=begin, end=end, step_size=60,
s3_endpoint="nyc3.digitaloceanspaces.com",
)
Filtering by satellite ID
If a single Parquet file contains states for multiple spacecraft, narrow
the result with where_clause:
eph = re.ParquetEphemeris(
"s3://fleet/states.parquet",
begin=begin, end=end, step_size=60,
where_clause="sat_id = 42",
)
The clause is ANDed onto the time-range filter, so DuckDB still gets to push it down to Parquet row groups.
Unit overrides
By default ParquetEphemeris assumes position columns are in km and
velocity columns in km/s. Pass position_unit and/or velocity_unit
to override either independently:
# Position and velocity both in metres / m/s
eph = re.ParquetEphemeris(
"states_meters.parquet",
begin=begin, end=end,
position_unit="m", # position in m
velocity_unit="m/s", # velocity in m/s
)
# If only position_unit is given, velocity defaults to position_unit + "/s"
eph = re.ParquetEphemeris(
"states_meters.parquet",
begin=begin, end=end,
position_unit="m", # implies velocity_unit="m/s"
)
# All standard properties (gcrs_pv, latitude_deg, …) are still in km / km/s.
print(eph.source_position_unit) # "m" (as supplied)
print(eph.source_velocity_unit) # "m/s" (as supplied or derived)
print(eph.gcrs_pv.position_unit) # "km" (internal representation)
Supported values: position_unit: "km" (default), "m", "cm".
velocity_unit: "km/s" (default), "m/s", "cm/s".
Earth-fixed (ITRS/ECEF) input
If the data is in an Earth-fixed (ITRS/ECEF) frame, set frame:
eph = re.ParquetEphemeris(
"states_ecef.parquet",
begin=begin, end=end,
frame="ECEF",
)
Inspecting raw data
The raw, uninterpolated state vectors and timestamps are exposed via
file_pv and file_timestamp:
raw = eph.file_pv
print(f"Pulled {raw.position.shape[0]} state vectors from Parquet")
ts = eph.file_timestamp
print(f"Earliest sample : {ts[0]}")
print(f"Latest sample : {ts[-1]}")
Evaluating constraints
ParquetEphemeris participates in the full constraint-evaluation
pipeline:
re.ensure_planetary_ephemeris()
constraint = re.SunConstraint(min_angle=45.0) | re.MoonConstraint(min_angle=10.0)
result = constraint.evaluate(eph, target_ra=83.63, target_dec=22.01)
print(f"Visibility windows: {len(result.visibility)}")
for window in result.visibility:
print(f" {window.start_time} → {window.end_time}")
Ephemeris ABC subclass
ParquetEphemeris is a registered virtual subclass of
Ephemeris:
assert isinstance(eph, re.Ephemeris) # True