About the Openflow Connector for Veeva Vault¶
Note
This connector is subject to the Snowflake Connector Terms.
The Openflow Connector for Veeva Vault replicates data from a Veeva Vault instance into Snowflake using the Veeva Direct Data API. The connector downloads Direct Data archive files, extracts the CSV data they contain, and loads that data into Snowflake tables using Snowpipe Streaming. It supports full snapshots, incremental updates, and optional audit log ingestion.
Use cases¶
The connector supports the following use cases:
Full data replication: Perform a one-time full snapshot of all Veeva Vault data objects into Snowflake for reporting, analytics, and compliance.
Incremental synchronization: After the initial snapshot, the connector polls for incremental Direct Data archives on a configurable schedule (default: every 15 minutes) to keep Snowflake tables up to date with changes in Veeva Vault.
Audit log ingestion: Optionally ingest Veeva Vault audit log archives alongside data archives, providing a complete audit trail in Snowflake.
Migration and analytics: Centralize Veeva Vault data in Snowflake for cross-system analytics, data science, and regulatory reporting.
The replication lifecycle¶
A table’s replication cycle begins with a full data snapshot and then transitions to incremental synchronization.
Snapshot phase: The connector downloads the latest full Direct Data archive from Veeva Vault. This archive is a tar.gz file containing one CSV per Vault data object. The connector unpacks the archive, creates a destination table in Snowflake for each object (if it does not already exist), loads the data through Snowpipe Streaming into a staging table, and merges the staging data into the final destination table.
Incremental phase: After the snapshot completes, the connector polls Veeva Vault for incremental Direct Data archives. Each incremental archive contains only the records that changed since the previous archive. The connector applies updates through the same staging-and-merge pipeline and processes deletes separately based on the configured delete strategy. Data freshness in Snowflake depends on how frequently Veeva Vault publishes Direct Data archives and the configured sync frequency of the connector.
Audit log phase (optional): When audit log ingestion is enabled, the connector also downloads
log_directdataarchives and loads them into Snowflake following the same pipeline.
The connector groups Direct Data archives by their reported time window and processes one window at a time to ensure each batch is handled atomically before moving to the next.
The connector tracks its progress using a persisted state that records the last processed timestamp. If the connector is stopped and restarted, it resumes from where it left off.
Ingestion modes¶
The connector supports three ingestion modes that control how Direct Data files are consumed:
- SNAPSHOT_AND_INCREMENTAL (default):
The connector first processes the latest full Direct Data archive (snapshot). Once the snapshot is complete, it transitions to polling for incremental archives. This is the recommended mode for most deployments.
- SNAPSHOT:
The connector continuously polls for the latest full Direct Data archive. Each time a new full archive becomes available, it is processed. Use this mode when you want to periodically replace all data in Snowflake with a fresh full export.
- INCREMENTAL:
The connector polls only for incremental Direct Data archives. No full snapshot is performed. Use this mode when a snapshot has already been loaded by other means or when only recent changes are needed. You can optionally specify a start time to control how far back incremental polling begins.
Authentication¶
The connector authenticates with Veeva Vault using session-based authentication. You provide
a service account username and password, and the connector obtains a session identifier from
the Veeva auth API endpoint. This session is reused across requests and is automatically
refreshed when it expires.
For Snowflake authentication, the connector supports two strategies:
- SNOWFLAKE_MANAGED (default):
Uses the Snowflake-managed token associated with the Openflow runtime role. This is the recommended strategy for both Openflow - Snowflake Deployments and Openflow - BYOC Deployments.
- KEY_PAIR:
Uses a user-provided RSA key pair for authentication. This strategy is available only on Openflow - BYOC Deployments and is intended for cross-account scenarios where the connector needs to write to a Snowflake account different from the one hosting the Openflow runtime.
How deletes are handled¶
When the connector receives a delete extract from Veeva Vault, it applies the deletes in Snowflake according to the configured delete strategy:
- Hard Delete (default):
Rows are permanently removed from the destination table using a
DELETEstatement.- Soft Delete:
Rows are not removed. Instead, the connector sets a
__SNOWFLAKE_DELETEDcolumn toTRUEand a__SNOWFLAKE_DELETED_ATcolumn to the current timestamp. If these columns do not exist in the destination table, the connector adds them automatically.
Schema evolution¶
The connector supports schema evolution when the structure of Veeva Vault data objects changes between archives. When the connector detects new columns in an incoming archive, it automatically adds those columns to the destination and staging tables in Snowflake.
When a column is no longer present in the incoming archive, the connector applies the configured column removal strategy:
- Drop Column (default):
Drops the column from the Snowflake table.
- Rename Column:
Renames the column by appending a configurable suffix (default:
__deleted). This preserves historical data in the table.- Ignore Column:
Leaves the column as-is in the Snowflake table and stops populating it.
Automatic retry handling¶
The connector automatically retries failed API calls using an exponential backoff strategy.
Retryable conditions include HTTP status codes 429 (rate-limited), 500, 502, 503, and 504,
as well as transient network errors. The connector honors the Retry-After header when
provided by the Veeva Vault API.
If a session expires or becomes invalid, the connector automatically re-authenticates and retries the request.
Limitations¶
Consider the following limitations when using the connector:
Veeva Vault Direct Data must be enabled on your Vault instance before using the connector. Contact your Veeva Vault administrator to enable this feature.
The connector authenticates using session-based username and password credentials. Other authentication methods (such as OAuth) are not yet supported.
The connector replicates structured data from Direct Data archives only. Document and attachment content (such as files stored in Veeva Vault) is not replicated.
Next steps¶
For information on how to set up the connector, see the following topic: