Skip to content

Export Data Marts to Parquet #338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
coussens opened this issue Jan 14, 2025 Discussed in #335 · 1 comment · Fixed by coussens/sagerx#2 · May be fixed by #347
Closed

Export Data Marts to Parquet #338

coussens opened this issue Jan 14, 2025 Discussed in #335 · 1 comment · Fixed by coussens/sagerx#2 · May be fixed by #347
Labels
optimization Nice to have, but not critical

Comments

@coussens
Copy link
Contributor

Discussed in #335

Originally posted by coussens January 10, 2025
Thanks again for creating such a valuable open-source project!

I had a minor feature improvement suggestion, which is an option to export Data Marts as parquet files. This would benefit both users of SageRx as well as those downloading the exported Data Marts off of the CodeRx website:

  • Columns would have data types, so there's no risk of users parsing from CSVs into different data types (e.g., the dreaded accidental conversion of NDCs to integers!)
  • File sizes would be much smaller (leading to faster downloads, and less storage required)
  • Better performance, integration with apache arrow, etc

This could be readily incorporated into the export_marts DAG in python, or even by incorporating the pg_parquet extension for Postgres.

Let me know what you think!

@coussens
Copy link
Contributor Author

After I'm able to get build/export marts DAGs to run successfully, I'll take a stab at writing a PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Nice to have, but not critical
Projects
None yet
2 participants