Requirements and Install

Requirements

Requirements:

  • python v3.5 or higher
  • Postgresql v10 or higher

Installation

Set up a venv and install requirements:

virtualenv -p python3 .ve
source .ve/bin/activate
pip install -r requirements.txt
pip install -e .

Database

You need to create a UTF8 Postgresql database and create a user with write access.

Once you have created the database, you need to configure the tool to connect to the database.

You can see one way of doing that in the example below, but for other options see Configuration.

You also have to run a command to create the tables in database.

You can see the command in the example below, but for more on that see Command line tool - upgrade-database option.

Example of creating an database user, database and setting up the schema:

sudo -u postgres createuser ocdskingfisher --pwprompt
sudo -u postgres createdb ocdskingfisher -O ocdskingfisher --encoding UTF8 --template template0 --lc-collate en_US.UTF-8 --lc-ctype en_US.UTF-8
export DB_URI='postgres://ocdskingfisher:PASSWORD YOU CHOSE@localhost/ocdskingfisher'
python ocdskingfisher-cli upgrade-database

The generated data base will have this tables:

_images/kingfisher.png

Where:

  • Collection: this table contains the results of a ‘run’ that pass the storage stage.
    • source_id: the name of the collection that was run, for example ‘canada_buyandsell’
    • data_version: the date and time when the run command was executed
    • store_start_at: the date and time when the store stage started
    • store_end_at: the date and time when the store stage ended
    • sample: a mark that indicates if (false) the collection has all the available data or just a sample of it (true)
    • gather_start_at: the date and time when the gather stage started
    • gather_end_at: the date and time when the gather stage ended
    • fetch_start_at: the date and time when the fetch stage started
    • fetch_end_at: the date and time when the fetch stage ended
  • Collection File Status: this table contains the information about each file downloaded from a collection.
    • filename: the name of the file that was downloaded
    • store_start_at: the date and time when the store of this file started
    • store_end_at: the date and time when the store of this file ended
    • warnings: a text that indicates any warnings in the process of saving the file, for example encoding issues
  • Package Data: this table contains the meta data information that is included in a release or record package.
    • hash_md5: a md5 hash to know if the data changes
    • data: the meta data in jsonb format.
  • Record: this table contains the ocid and relations with other tables from the downloaded records
  • Release: this table contains the ocid, release id and relations with other tables from the downloaded releases
  • Data: this table contains the actual data from the releases package and record package. Each row contain a hash and a data column that is a jsonb with a release, that comes from a record: from its releases list or compiledRelease field, or from a release package.
  • Record check and Release check: this table contains the result of running the CoVe

validation to a release or record package in the cove_output column.

  • Record an Release check error: register any error happened in the check stage