Skip to content

Export formats #96

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ialarmedalien opened this issue Feb 14, 2025 · 2 comments
Open

Export formats #96

ialarmedalien opened this issue Feb 14, 2025 · 2 comments

Comments

@ialarmedalien
Copy link
Contributor

As part of KBase's work with ontologies in semsql, it would be very handy to be able to export as DSV directly, rather than going via the sqlite database.

We could put together a file similar to build.Makefile to perform the conversions, but I am wary of changes being made in the semsql sqlite build process that we'd have to ensure that we kept up with. I was therefore wondering about adapting the current build command to export either DSV or sqlite, depending on what args it is given.

build.Makefile line 51 onwards:

# -- MAIN TARGET --
# A db is constructed from
# (1) triples loaded using rdftab
# (2) A relation-graph TSV
%.db: %.owl %-$(RGSUFFIX).tsv $(PREFIX_CSV_PATH)
	rm -f [email protected] && \
	cat $(THIS_DIR)/sql_schema/semsql.sql | sqlite3 [email protected] && \
	echo .exit | sqlite3 -echo [email protected] -cmd ".mode csv" -cmd ".import $(PREFIX_CSV_PATH) prefix" && \
	rdftab [email protected] < $< && \
	sqlite3 [email protected] -cmd '.separator "\t"' ".import $*-$(RGSUFFIX).tsv entailed_edge" && \
	gzip -f $*-$(RGSUFFIX).tsv && \
	cat $(THIS_DIR)/indexes/*.sql | sqlite3 [email protected] && \
	echo "ALTER TABLE statements ADD COLUMN graph TEXT;" | sqlite3 [email protected] && \
	(test -d views && find views -maxdepth 1 -name '$(notdir $*)*.sql' -type f -print0 | xargs -0 -I{} sh -c 'sqlite3 [email protected]< "$$1"' sh {} || echo no views ) && \
	mv [email protected] $@
.PRECIOUS: %.db

Alter this so that if the desired output is DSV, the rdftab command is run but the sqlite db loading is omitted. Relevant files, including prefixes, relation graph, and schema SQL could be moved to a directory together or something like that.

@cmungall
Copy link
Collaborator

cmungall commented Feb 17, 2025 via email

@ialarmedalien
Copy link
Contributor Author

At the moment (and in the future), we load the semsql schema (tables+views) into the target DB, run the ontology merge/sqlite db generation code, export from sqlite as DSV and into the target DB. It's true that in the grand scheme of things, the sqlite load/export isn't particularly time-consuming, especially in comparison with the previous steps (an hour vs 9-10 hrs for the ontology build), but it would be nice to skip it and go directly to the target DB.

I think for now, the KISS / hairy Makefile approach is best, since I don't know how many others are using the outputs of semsql as something other than a sqlite db.

On that topic, we're currently using a MySQL db and will be using an ANSI SQL interface, and there are a few edits that could be made to the current base semsql schema to make it more compliant with these other SQL syntaxes; see #95 for suggested initial tweaks.

It's out of the scope of semsql to provide the schema in every SQL dialect under the sun, but the suggested changes do make it easy enough to fix other inconsistencies with a regex. I'm investigating SQL dialect translators / transpilers so could add a recommendation for schema conversion if other people want to use semsql with a different DB engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants