-
Notifications
You must be signed in to change notification settings - Fork 5
Export formats #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I would recommend doing export to DSV after building sqlite. This is safer
because in theory there could be post-processing logic on the build. For
example, when chebi is built, we generate additional views. If you are not
deriving DSVs from those views, then this is perhaps less relevant.
I'm guessing this route is less attractive as if you bypass sqlite you can
speed up the build and reduce dependencies? This is fair.
Of course, things would be more composable at the Python level, to separate
atomic operations:
- prefix loading
- turning RDF/OWL to tables
- RG closures
- exporting to duckdb/sqlite/pg/parquet/dsv
Any thoughts on approaches here? dbt? keep it simple, pure python?
but as an interim step adding to the hairy Makefile may be easiest...
…On Fri, Feb 14, 2025 at 9:15 AM i alarmed alien ***@***.***> wrote:
As part of KBase's work with ontologies in semsql, it would be very handy
to be able to export as DSV directly, rather than going via the sqlite
database.
We could put together a file similar to build.Makefile to perform the
conversions, but I am wary of changes being made in the semsql sqlite build
process that we'd have to ensure that we kept up with. I was therefore
wondering about adapting the current build command to export either DSV or
sqlite, depending on what args it is given.
build.Makefile <http://src/semsql/builder/build.Makefile> line 51 onwards:
# -- MAIN TARGET --
# A db is constructed from
# (1) triples loaded using rdftab
# (2) A relation-graph TSV
%.db: %.owl %-$(RGSUFFIX).tsv $(PREFIX_CSV_PATH)
rm -f ***@***.*** && \
cat $(THIS_DIR)/sql_schema/semsql.sql | sqlite3 ***@***.*** && \
echo .exit | sqlite3 -echo ***@***.*** -cmd ".mode csv" -cmd ".import $(PREFIX_CSV_PATH) prefix" && \
rdftab ***@***.*** < $< && \
sqlite3 ***@***.*** -cmd '.separator "\t"' ".import $*-$(RGSUFFIX).tsv entailed_edge" && \
gzip -f $*-$(RGSUFFIX).tsv && \
cat $(THIS_DIR)/indexes/*.sql | sqlite3 ***@***.*** && \
echo "ALTER TABLE statements ADD COLUMN graph TEXT;" | sqlite3 ***@***.*** && \
(test -d views && find views -maxdepth 1 -name '$(notdir $*)*.sql' -type f -print0 | xargs -0 -I{} sh -c 'sqlite3 ***@***.***< "$$1"' sh {} || echo no views ) && \
mv ***@***.*** $@
.PRECIOUS: %.db
Alter this so that if the desired output is DSV, the rdftab command is run
but the sqlite db loading is omitted. Relevant files, including prefixes,
relation graph, and schema SQL could be moved to a directory together or
something like that.
—
Reply to this email directly, view it on GitHub
<#96>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMMOOK72ZI4OK2Q5AAFX32PYQDFAVCNFSM6AAAAABXFCWIRSVHI2DSMVQWIX3LMV43ASLTON2WKOZSHA2TIMZSGU2TCMA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
[image: ialarmedalien]*ialarmedalien* created an issue
(INCATools/semantic-sql#96)
<#96>
As part of KBase's work with ontologies in semsql, it would be very handy
to be able to export as DSV directly, rather than going via the sqlite
database.
We could put together a file similar to build.Makefile to perform the
conversions, but I am wary of changes being made in the semsql sqlite build
process that we'd have to ensure that we kept up with. I was therefore
wondering about adapting the current build command to export either DSV or
sqlite, depending on what args it is given.
build.Makefile <http://src/semsql/builder/build.Makefile> line 51 onwards:
# -- MAIN TARGET --
# A db is constructed from
# (1) triples loaded using rdftab
# (2) A relation-graph TSV
%.db: %.owl %-$(RGSUFFIX).tsv $(PREFIX_CSV_PATH)
rm -f ***@***.*** && \
cat $(THIS_DIR)/sql_schema/semsql.sql | sqlite3 ***@***.*** && \
echo .exit | sqlite3 -echo ***@***.*** -cmd ".mode csv" -cmd ".import $(PREFIX_CSV_PATH) prefix" && \
rdftab ***@***.*** < $< && \
sqlite3 ***@***.*** -cmd '.separator "\t"' ".import $*-$(RGSUFFIX).tsv entailed_edge" && \
gzip -f $*-$(RGSUFFIX).tsv && \
cat $(THIS_DIR)/indexes/*.sql | sqlite3 ***@***.*** && \
echo "ALTER TABLE statements ADD COLUMN graph TEXT;" | sqlite3 ***@***.*** && \
(test -d views && find views -maxdepth 1 -name '$(notdir $*)*.sql' -type f -print0 | xargs -0 -I{} sh -c 'sqlite3 ***@***.***< "$$1"' sh {} || echo no views ) && \
mv ***@***.*** $@
.PRECIOUS: %.db
Alter this so that if the desired output is DSV, the rdftab command is run
but the sqlite db loading is omitted. Relevant files, including prefixes,
relation graph, and schema SQL could be moved to a directory together or
something like that.
—
Reply to this email directly, view it on GitHub
<#96>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMMOOK72ZI4OK2Q5AAFX32PYQDFAVCNFSM6AAAAABXFCWIRSVHI2DSMVQWIX3LMV43ASLTON2WKOZSHA2TIMZSGU2TCMA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
At the moment (and in the future), we load the semsql schema (tables+views) into the target DB, run the ontology merge/sqlite db generation code, export from sqlite as DSV and into the target DB. It's true that in the grand scheme of things, the sqlite load/export isn't particularly time-consuming, especially in comparison with the previous steps (an hour vs 9-10 hrs for the ontology build), but it would be nice to skip it and go directly to the target DB. I think for now, the KISS / hairy Makefile approach is best, since I don't know how many others are using the outputs of semsql as something other than a sqlite db. On that topic, we're currently using a MySQL db and will be using an ANSI SQL interface, and there are a few edits that could be made to the current base semsql schema to make it more compliant with these other SQL syntaxes; see #95 for suggested initial tweaks. It's out of the scope of semsql to provide the schema in every SQL dialect under the sun, but the suggested changes do make it easy enough to fix other inconsistencies with a regex. I'm investigating SQL dialect translators / transpilers so could add a recommendation for schema conversion if other people want to use semsql with a different DB engine. |
As part of KBase's work with ontologies in semsql, it would be very handy to be able to export as DSV directly, rather than going via the sqlite database.
We could put together a file similar to
build.Makefile
to perform the conversions, but I am wary of changes being made in the semsql sqlite build process that we'd have to ensure that we kept up with. I was therefore wondering about adapting the current build command to export either DSV or sqlite, depending on what args it is given.build.Makefile line 51 onwards:
Alter this so that if the desired output is DSV, the rdftab command is run but the sqlite db loading is omitted. Relevant files, including prefixes, relation graph, and schema SQL could be moved to a directory together or something like that.
The text was updated successfully, but these errors were encountered: