Importing the UK Species Index into an Indicia list - full syncrhonisation

This page describes the full-synchronisation technique for importing the UK Species Index data into an Indicia species list ready for recording. Once synchronisation has been performed at least once, you can use the incremental updates technique for further updates.

There are 2 starting points for this process depending on whether you are going to generate a fresh export of data from the UKSI Microsoft Access database or use the extract provided for you the last time the sync tool was updated. The process is complex so it would be wise to backup your warehouse database first and to plan to do the task when the warehouse is not under high load or being used for anything important such as a demonstration.

The GitHub repository for the synchronisation process (https://github.com/Indicia-Team/support_files/tree/master/UKSI) contains the text files extracted from the UKSI Microsoft Access database the last time the process was run. You can either use the copies in this repository, or read the notes in the following section if you need to generate a fresh copy of the UKSI data.

Starting with the Microsoft Access database

You should have been provided with the original UKSI Microsoft Access database, as “curated” by London Natural History Museum.

In the list of tables, right click on the ORGANISM_MASTER table, select to copy it, then right click again and paste it, setting the new table name to ORGANISM_MASTER2.

To export the data into a text file format ready for the import, create the following queries in your database. To add each query in Microsoft Access, first, create the query in Design View by following these steps:

  • Go to the Create tab on the ribbon, then click the Query Design button.

  • Close the Show Table dialog.

  • Use the Views toolbutton to change to SQL View.

  • Then paste the query in and save the query with the appropriate title.

  • Repeat until all 10 queries are created.

Query 1 - title=repair_organism_tvks

This query ensures that organisms point to the accepted name TVKs, not synonyms, so we can make this assumption in later queries.

UPDATE ORGANISM_MASTER2
INNER JOIN NAMESERVER ON NAMESERVER.INPUT_TAXON_VERSION_KEY=ORGANISM_MASTER2.TAXON_VERSION_KEY
SET ORGANISM_MASTER2.TAXON_VERSION_KEY=NAMESERVER.RECOMMENDED_TAXON_VERSION_KEY
WHERE NAMESERVER.RECOMMENDED_TAXON_VERSION_KEY<>ORGANISM_MASTER2.TAXON_VERSION_KEY

Query 2 - title=all_designation_kinds

This query gets the list of terms used to populate the categories of taxon designations.

SELECT TAXON_DESIGNATION_TYPE_KIND.TAXON_DESIGNATION_TYPE_KIND_KEY,
TAXON_DESIGNATION_TYPE_KIND.KIND
FROM TAXON_DESIGNATION_TYPE_KIND;

Query 3 - title=all_names

This query gets all taxon names, excluding those which are explicitly marked as redundant or deleted. Contains basic name data and a link to the preferred name.

SELECT ORGANISM_MASTER2.ORGANISM_KEY, NAMESERVER.RECOMMENDED_TAXON_VERSION_KEY, NAMESERVER.INPUT_TAXON_VERSION_KEY,
TAXON.ITEM_NAME, TAXON.AUTHORITY, NAMESERVER.TAXON_VERSION_FORM, NAMESERVER.TAXON_VERSION_STATUS,
NAMESERVER.TAXON_TYPE, TAXON.LANGUAGE, TAXON_VERSION.OUTPUT_GROUP_KEY, TAXON_RANK.LONG_NAME,
TAXON_VERSION.ATTRIBUTE, TAXON_RANK.SHORT_NAME,
IIF(NAMESERVER.DELETED_DATE Is Null AND ORGANISM_MASTER2.DELETED_DATE Is Null
AND ORGANISM_MASTER2.REDUNDANT_FLAG Is Null AND TAXON_VERSION.DELETED_DATE Is Null
AND TAXON_VERSION.DATE_TO Is Null, 'N', 'Y') AS REDUNDANT,
IIF(NAMESERVER.DELETED_DATE Is Null AND ORGANISM_MASTER2.DELETED_DATE Is Null AND ORGANISM_MASTER2.REDUNDANT_FLAG Is Null, 'N', 'Y') AS ORGANISM_DEPRECATED,
IIF(TAXON_VERSION.DELETED_DATE Is Null AND TAXON_VERSION.DATE_TO Is Null, 'N', 'Y') AS NAME_DEPRECATED
FROM ((TAXON
INNER JOIN (NAMESERVER
INNER JOIN TAXON_VERSION ON NAMESERVER.INPUT_TAXON_VERSION_KEY = TAXON_VERSION.TAXON_VERSION_KEY)
ON TAXON.TAXON_KEY = TAXON_VERSION.TAXON_KEY)
INNER JOIN ORGANISM_MASTER2 ON NAMESERVER.RECOMMENDED_TAXON_VERSION_KEY = ORGANISM_MASTER2.TAXON_VERSION_KEY)
INNER JOIN TAXON_RANK ON TAXON_VERSION.TAXON_RANK_KEY = TAXON_RANK.TAXON_RANK_KEY;

Query 4 - title=preferred_names

This query gets a list of the preferred taxon names, excluding those which are explicitly marked as redundant or deleted.

Todo

Sort code (from taxon list item, recorder 3.3 list?)

SELECT DISTINCT ORGANISM_MASTER2.ORGANISM_KEY, ORGANISM_MASTER2.TAXON_VERSION_KEY, TAXON.ITEM_NAME,
TAXON.AUTHORITY, ORGANISM_MASTER2.PARENT_TVK, ORGANISM_MASTER2.PARENT_KEY, TAXON_VERSION.TAXON_RANK_KEY,
TAXON_RANK.SEQUENCE, TAXON_RANK.LONG_NAME, TAXON_RANK.SHORT_NAME, ORGANISM_MASTER2.MARINE_FLAG,
ORGANISM_MASTER2.FRESHWATER AS FRESHWATER_FLAG,
ORGANISM_MASTER2.TERRESTRIAL_FRESHWATER_FLAG AS TERRESTRIAL_FLAG, ORGANISM_MASTER2.NON_NATIVE_FLAG,
NULL AS SORT_CODE,
IIF(NAMESERVER.DELETED_DATE Is Null AND ORGANISM_MASTER2.DELETED_DATE Is Null
AND ORGANISM_MASTER2.REDUNDANT_FLAG Is Null AND TAXON_VERSION.DELETED_DATE Is Null
AND TAXON_VERSION.DATE_TO Is Null, 'N', 'Y') AS REDUNDANT
FROM (((TAXON_LIST_ITEM
INNER JOIN (TAXON
INNER JOIN (TAXON_VERSION
INNER JOIN (ORGANISM_MASTER2
INNER JOIN NAMESERVER ON NAMESERVER.INPUT_TAXON_VERSION_KEY=ORGANISM_MASTER2.TAXON_VERSION_KEY)
ON TAXON_VERSION.TAXON_VERSION_KEY=ORGANISM_MASTER2.TAXON_VERSION_KEY)
ON TAXON.TAXON_KEY=TAXON_VERSION.TAXON_KEY)
ON TAXON_LIST_ITEM.TAXON_VERSION_KEY=TAXON_VERSION.TAXON_VERSION_KEY)
INNER JOIN TAXON_LIST_VERSION ON TAXON_LIST_ITEM.TAXON_LIST_VERSION_KEY=TAXON_LIST_VERSION.TAXON_LIST_VERSION_KEY)
INNER JOIN TAXON_LIST ON TAXON_LIST_VERSION.TAXON_LIST_KEY=TAXON_LIST.TAXON_LIST_KEY)
INNER JOIN TAXON_RANK ON TAXON_VERSION.TAXON_RANK_KEY=TAXON_RANK.TAXON_RANK_KEY;

Query 5 - title=taxa_taxon_designations

Retrieves a list of the links between all taxon names and their designations.

SELECT TAXON_DESIGNATION_TYPE.SHORT_NAME, TAXON_DESIGNATION.DATE_FROM, TAXON_DESIGNATION.DATE_TO,
TAXON_DESIGNATION.STATUS_GEOGRAPHIC_AREA, TAXON_DESIGNATION.DETAIL, NAMESERVER.RECOMMENDED_TAXON_VERSION_KEY
FROM (TAXON_LIST_ITEM
INNER JOIN (TAXON_DESIGNATION
INNER JOIN TAXON_DESIGNATION_TYPE ON TAXON_DESIGNATION.TAXON_DESIGNATION_TYPE_KEY =
    TAXON_DESIGNATION_TYPE.TAXON_DESIGNATION_TYPE_KEY)
ON TAXON_LIST_ITEM.TAXON_LIST_ITEM_KEY = TAXON_DESIGNATION.TAXON_LIST_ITEM_KEY)
INNER JOIN NAMESERVER ON TAXON_LIST_ITEM.TAXON_VERSION_KEY = NAMESERVER.INPUT_TAXON_VERSION_KEY;

Query 6 - title=taxon_designations

Retrieves a list of all the available taxon designations that can be linked to taxon concepts.

SELECT TAXON_DESIGNATION_TYPE.TAXON_DESIGNATION_TYPE_KEY, TAXON_DESIGNATION_TYPE.SHORT_NAME,
TAXON_DESIGNATION_TYPE.LONG_NAME, TAXON_DESIGNATION_TYPE.DESCRIPTION, TAXON_DESIGNATION_TYPE.KIND,
TAXON_DESIGNATION_TYPE.Status_Abbreviation
FROM TAXON_DESIGNATION_TYPE;

Query 7 - title=taxon_groups

Retrieves a list of all the taxon groups (reporting categories).

SELECT DISTINCT tg.taxon_group_key, tg.taxon_group_name,
IIf(tg.input_level2_descriptor Is Null, tg.input_level1_descriptor, tg.input_level2_descriptor) AS description,
tg.parent
FROM (taxon_group_name AS tg LEFT JOIN taxon_group_name AS tg2 ON tg2.parent=tg.taxon_group_key)
LEFT JOIN taxon_version AS tv ON tv.output_group_key=tg.taxon_group_key
WHERE tg2.taxon_group_key IS NOT NULL OR tv.taxon_version_key IS NOT NULL;

Query 8 - title=taxon_ranks

Retrieves a list of all possible taxon ranks, e.g. Phylum, Family, Species.

SELECT TAXON_RANK.SEQUENCE, TAXON_RANK.SHORT_NAME, TAXON_RANK.LONG_NAME, TAXON_RANK.LIST_FONT_ITALIC
FROM TAXON_RANK;

Query 9 - title=tcn_duplicates

Where there are multiple common names and it is otherwise not possible to pick a single default one to use in reports, this table provides a link from the organims to a taxon record containing a common name to use.

SELECT ORGANISM_MASTER2.ORGANISM_KEY, TCN_DUPLICATE_FIX.TAXON_VERSION_KEY
FROM ORGANISM_MASTER2
INNER JOIN (TAXON_LIST_ITEM
INNER JOIN TCN_DUPLICATE_FIX ON TAXON_LIST_ITEM.TAXON_LIST_ITEM_KEY = TCN_DUPLICATE_FIX.TAXON_LIST_ITEM_KEY)
ON ORGANISM_MASTER2.TAXON_VERSION_KEY = TAXON_LIST_ITEM.TAXON_VERSION_KEY;

Query 10 - title=all_taxon_version_keys

Retrieves a list of all taxon version keys and the associated recommended key, including deleted and redundant names. Can be used to work out the context of any names which have now been removed from the online recording copy of UKSI.

SELECT INPUT_TAXON_VERSION_KEY, RECOMMENDED_TAXON_VERSION_KEY
FROM NAMESERVER;

Query 11 - title=recording_schemes

Retrieves all extant recording schemes - their keys and their titles.

SELECT SCHEME_KEY, SCHEME_NAME
FROM SCHEME
WHERE DELETED_DATE IS NULL;

Query 12 - title=recording_scheme_taxa

Retrieves the relationships between recording schemes and their taxa.

SELECT SCHEME_KEY, ORGANISM_KEY
FROM SCHEME_TAXA
WHERE DELETED_DATE IS NULL;

Now, run query 1 which fixes ORGANISM_MASTER records so they reliably point to an accepted name, never a synonym.

The next step is to export the query results for each of the queries from 2 through to 10 as a text file. Prepare a folder on your hard disk into which you will export the files (I used c:\tmp). These instructions are for Microsoft Access 2007 but the steps should be similar for other versions. For each query:

  1. Select the External Data ribbon tab.

  2. Under Export, choose the Text File option.

  3. Set the file name to export to in the folder you prepared earlier. The file name should be the query title with a .txt extension, e.g. all_names.txt.

  4. Click OK.

  5. On the Export Text Wizard select the Delimited text option then click Next.

  6. Set the delimiter to Comma and the Text Qualifier to a double quote character.

  7. Select the Advanced button and on the dialog that appears, change Code Page to Unicode (UTF-8) then click OK.

  8. Click Next then Finish to export the file.

Now that you have exported the files, follow through the steps in the next section “Starting with the exported text files” to complete the import. You will need to replace the files obtained with the UKSI sync tool with the files you have generated.

Starting with the exported text files

  1. If you don’t already have a species list on the warehouse ready to import the taxa into, then create one now. You can use the normal Warehouse user interface to do this (Taxonomy > Species lists). Make a note of the ID of the list.

  2. As the UKSI data is structurally complex you need to add a number of extension modules to add the necessary tables. To do this:

    1. Find the file application/config/config.php in your warehouse installation folder and open it in a text editor.

    2. Find the list of modules at the bottom of the page.

    3. Add an entry for several modules by adding the following lines into the list:

    MODPATH.'taxon_designations',
    MODPATH.'taxon_associations',
    MODPATH.'species_alerts',
    MODPATH.'data_cleaner_without_polygon',
    MODPATH.'data_cleaner_period_within_year'
    
    1. Log into your warehouse and visit the index.php/home/upgrade page to ensure that database upgrade scripts are run.

  3. Grab a copy of the files from https://github.com/Indicia-Team/support_files/tree/master/UKSI and follow the instructions in readme file. If you generated your own copy of the files from the Access database make sure you replace the downloaded copies with your versions. You will have to supply a user ID when you run the scripts (as described in the readme file). Look in the users table of your indicia schema to get the id. If you use the id of your main admin account the number is likely to be 1.