Skip to content

Latest commit

 

History

History

RubyLagoon

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

RubyLagoon

Ruby client for lagoon.

Prerequisites for 0.02

To use Lagoon with https connections, please specify the SSL_CERT_FILE variable. e.g. export SSL_CERT_FILE=/path/to/cert.pem

Usage

This is an example usage. Please refer to the relevant sections for in-depth explanation.

To start with let's create a simple csv file that we will use for testing. We use the File module:

File.open('test_file.csv', 'w') {|f| f.write("foo,bar\n1,2\n5,6")}
File.read('test_file.csv').lines {|l| puts l}
# =>
#   foo,bar
#   1,2
#   5,6

Then let's load the gem and create a connection to the server. You can read more about this in the Installing the gem and Setting up a connection sections below.

irb(main):003:0> require 'lagoon'
irb(main):004:0> dlagoon = Lagoon.new

We can then ingest the file created above and name the source "Experiment 42". Please refer to Uploading a dataset for the function arguments.

src = dlagoon.ingest('test_file.csv', name: "Experiment 42")
# => #<Lagoon::Source ... name="Experiment 42", ... columns=[...]>

The ingest process might log the current progress:

Start: Starting ingest proper
Notice: Processed 2 records
Done: Starting ingest proper
Start: Creating indices for public.t195
Start: Creating primary key
Done: Creating primary key
Start: Creating index on column c1
...
Done: Creating index on column c2
Done: Creating indices for public.typed195
{
  "typed": [
    "typed195",
    "Experiment_42_v2_typed"
  ],
  "viewName": "Experiment_42_v2",
  "name": "Experiment 42",
  ...
  "columns": [
    {
      "inView": "foo",
      "name": "c1",
      "header": "foo",
      "type": "INTEGER"
    },
    {
      "inView": "bar",
      "name": "c2",
      "header": "bar",
      "type": "INTEGER"
    }
  ],
  ...
}

The returned object is the source, which can be turned into a DataFrame directly. Refer to the section Working with dataframes for more information.

df = src.to_df
# => #<Daru::DataFrame(2x3)>
#       ix foo bar
#    0   1   1   2
#    1   2   5   6

Pre-ingested sources can be retrieved as well. Here we first load all the sources from the server, and use the Enumerable function find to retrieve "Experiment 42". Refer to the section Working with sources for more information.

dlagoon.load
src = dlagoon.sources.find{ |s| s.name == "Experiment 42"}
# => #<Lagoon::Source ... name="Experiment 42", ... columns=[...]>
df = src.to_df
# => #<Daru::DataFrame(2x3)>
#       ix foo bar
#    0   1   1   2
#    1   2   5   6

Since it is not always convenient to load information about all sources first, we can specify some filtering to be performed by the server. Refer to the section Reading information from the server for more information.

dlagoon.load(nil, name: "Experiment 42")
src = dlagoon.sources.first
# => #<Lagoon::Source ... name="Experiment 42", ... columns=[...]>
irb(main):038:0> df = src.to_df
# => #<Daru::DataFrame(2x3)>
#       ix foo bar
#    0   1   1   2
#    1   2   5   6

Installing the gem

The lagoon Ruby code is provided as a Ruby gem. See the relevant documentation in order to get started with rubygems.

Rubygem server

If you have access to a Rubygem server where the lagoon gem is hosted, you can simply run

$ gem install lagoon

Gem file

If you have access to the lagoon gem file, run

$ gem install lagoon-x.x.x.gem

In order to obtain a lagoon-x.x.x.gem file from source please see the Packaging section below.

Gem dependency

If you are writing a gem, add the following to your .gemspec file:

Gem::Specification.new do |s|

    ...

  s.add_runtime_dependency 'lagoon'

    ...

end

Note: You need to make sure that the gem is available at runtime.

Once installed, the gem can then be used inside irb or inside your own programs:

require 'lagoon'

Setting up a connection

There are two important classes: Lagoon and Lagoon::Source. The Lagoon class is used for configuration. Assuming that lagoon-server is running locally on port 3001 create an object as follows:

lagoonserver_config = {host: 'localhost', port: 3001}
dlagoon = Lagoon.new(lagoonserver: lagoonserver_config)

The Lagoon constructor can also be configured by supplying a YAML file:

dlagoon = Lagoon.new(file: "config.yaml")

or through environment variables (when using environment variables, the configuration parameters can then be omitted entirely):

dlagoon = Lagoon.new

Authentication

The credentials can be provided when creating the server, through the configuration file or through environment variables. If credentials are provided through any of the means listed above, the server will try to authenticate. Upon successful authentication, all subsequent requests will be performed as the authenticated user.

dlagoon = Lagoon.new(user: "my-username", password: "my-password", verbose: true)
# => [INFO] Found credentials, authenticating
# => [INFO] Authentication successful for user "my-username"

If you do not which to authenticate (even if credentials are provided), you can specify authenticate: false when creating the server:

dlagoon = Lagoon.new(verbose: true, authenticate: false)
# => [INFO] Not authenticating

Configuration reference

The yaml configuration and/or environment variables should be set as follows:

environment variable yaml key description
LAGOON_HOST lagoonserver_host lagoon-server endpoint
LAGOON_PORT lagoonserver_port lagoon-server port
USER user lagoon-server username
PASSWORD password lagoon-server password

The arguments specified explicitely in the constructor have priority over the yaml values. The yaml values have priority over the environment variables.

Reading information from the server

When a Lagoon object is created no request to the server is made. The sources are empty:

dlagoon.sources
# => nil

The load method needs to be called for the data to be fetched from the database. Once this is done dlagoon will contain all the metadata present in the lagoon database:

dlagoon.load
dlagoon.sources.length
# => 163
dlagoon.sources[1..5].map(&:name)
# => ["My source #1", "My source #2", "My source #3", "My source #4", "My source #5"]

The load method will load all sources into memory. There are several ways to avoid loading all the sources. Either by specifying a Range as the first parameter:

dlagoon.load(1..3)
dlagoon.sources.length
# => 3
dlagoon.load(1...3)
dlagoon.sources.length
# => 2

Or by specifying filter attributes as a hash:

dlagoon.load(nil, name: "gene_protein.json")
dlagoon.sources.length
# => 1
dlagoon.sources.first.name
# => "gene_protein.json"
dlagoon.load(nil, created_after: Time.now)
dlagoon.sources.length
# => 0

Parameters for load:

Key Name Type Ingest equivalent
offset String --offset
limit Int --limit
search String --search
ix Int --ix
tags Array of Strings --tag <foo> --tag <bar>
description String --description
name String --name
user String --user
columns Array of Strings --column <foo> --column <bar
created_after Time or String --created-after
created_before Time or String --created-before
include_deprecated Boolean --include-deprecated

See the lagoon-server documentation for more information.

Working with sources

Most ingest source fields are available:

require 'date'

src = dlagoon.sources.first
src.created
# => "2016-11-10T11:46:40.42856Z"
Date.parse(src.created).strftime('%a %d %b %Y')
# => "Thu 10 Nov 2016"
src.columns.map(&:type)
# => ["BOOLEAN", "TEXT"]

The source's content can be accessed. It is not cached and will be downloaded every time the function get_contents is called:

dlagoon.sources.last.get_contents
# => "\"Foo\"\n1\n"

See the Source class documentation for more information.

Uploading a dataset

The method Lagoon#ingest is available for ingestion operations. It can be used either with a File or by specifying a filepath:

new_src = dlagoon.ingest 'my_source.csv'
new_src.columns.map(&:name)
# => ["c1", "c2", "c3"]

file = File.new('my_source.csv', 'r')
new_src = dlagoon.ingest file
new_src.columns.map(&:name)
# => ["c1", "c2", "c3"]

Additionally upload parameters can be specified:

new_src = dlagoon.ingest('my_source.csv', name: "Experiment 42")
new_src.name
# => "Experiment 42"

All parameters are available. Most parameters can be specified in a camel_case format:

new_src = dlagoon.ingest('my_source.csv', json_path: "{ length: [ _ ] }")

Tags can be specified as a list:

new_src = dlagoon.ingest('my_source.csv', tags: ["foo", "bar"])

Upload parameters:

Key name Type Ingest equivalent
input String (Name of the uploaded file)
name String --name
file_type String --comma/--tab/--json
peek_at Int --peek-at
decompress_method String --unzip
json_path String --json-path
encoding String --latin1, --utf8
description String --description
tags Array of Strings --tag <foo> --tag <bar>

See the lagoon-server documentation for more information.

Working with dataframes

RubyLagoon has basic support for daru.

dlagoon = Lagoon::Lagoon.new('localhost', 1234)
dlagoon.load
src = dlagoon.sources.first
df = src.to_df
df.nrows
# => 10004

The method to_df can also be passed a block that describes sequel server-side filtering. For instance:

df = src.to_df {|x| x.filter('ix > 10').filter('ix <= 25')}
df.nrows
# => 15

The value x is a Sequel::Dataset object equivalent to

x = DB.from(src.viewName)

All Sequel::Dataset operations are available.

Make sure then the gem command is available. Inside RubyLagoon run the following command:

$ gem build lagoon.gemspec

This will create a file named lagoon-x.x.x.gem. This file is self-contained and cross-platform. This file needs to be distributed to users for them to use the Ruby lagoon code.