Importing the UK Species Index into an Indicia list - full syncrhonisation ========================================================================== This page describes the full-synchronisation technique for importing the UK Species Index data into an Indicia species list ready for recording. Once synchronisation has been performed at least once, you can use the :doc:`incremental updates ` technique for further updates. There are 2 starting points for this process depending on whether you are going to generate a fresh export of data from the UKSI Microsoft Access database or use the extract provided for you the last time the sync tool was updated. The process is complex so it would be wise to backup your warehouse database first and to plan to do the task when the warehouse is not under high load or being used for anything important such as a demonstration. The GitHub repository for the synchronisation process (https://github.com/Indicia-Team/support_files/tree/master/UKSI) contains the text files extracted from the UKSI Microsoft Access database the last time the process was run. You can either use the copies in this repository, or read the notes in the following section if you need to generate a fresh copy of the UKSI data. Starting with the Microsoft Access database ------------------------------------------- You should have been provided with the original UKSI Microsoft Access database, as "curated" by London Natural History Museum. In the list of tables, right click on the ORGANISM_MASTER table, select to copy it, then right click again and paste it, setting the new table name to ORGANISM_MASTER2. To export the data into a text file format ready for the import, create the following queries in your database. To add each query in Microsoft Access, first, create the query in Design View by following these steps: * Go to the Create tab on the ribbon, then click the Query Design button. * Close the Show Table dialog. * Use the Views toolbutton to change to SQL View. * Then paste the query in and save the query with the appropriate title. * Repeat until all 10 queries are created. **Query 1 - title=repair_organism_tvks** This query ensures that organisms point to the accepted name TVKs, not synonyms, so we can make this assumption in later queries. .. code-block:: sql UPDATE ORGANISM_MASTER2 INNER JOIN NAMESERVER ON NAMESERVER.INPUT_TAXON_VERSION_KEY=ORGANISM_MASTER2.TAXON_VERSION_KEY SET ORGANISM_MASTER2.TAXON_VERSION_KEY=NAMESERVER.RECOMMENDED_TAXON_VERSION_KEY WHERE NAMESERVER.RECOMMENDED_TAXON_VERSION_KEY<>ORGANISM_MASTER2.TAXON_VERSION_KEY **Query 2 - title=all_designation_kinds** This query gets the list of terms used to populate the categories of taxon designations. .. code-block:: sql SELECT TAXON_DESIGNATION_TYPE_KIND.TAXON_DESIGNATION_TYPE_KIND_KEY, TAXON_DESIGNATION_TYPE_KIND.KIND FROM TAXON_DESIGNATION_TYPE_KIND; **Query 3 - title=all_names** This query gets all taxon names, excluding those which are explicitly marked as redundant or deleted. Contains basic name data and a link to the preferred name. .. code-block:: sql SELECT ORGANISM_MASTER2.ORGANISM_KEY, NAMESERVER.RECOMMENDED_TAXON_VERSION_KEY, NAMESERVER.INPUT_TAXON_VERSION_KEY, TAXON.ITEM_NAME, TAXON.AUTHORITY, NAMESERVER.TAXON_VERSION_FORM, NAMESERVER.TAXON_VERSION_STATUS, NAMESERVER.TAXON_TYPE, TAXON.LANGUAGE, TAXON_VERSION.OUTPUT_GROUP_KEY, TAXON_RANK.LONG_NAME, TAXON_VERSION.ATTRIBUTE, TAXON_RANK.SHORT_NAME, IIF(NAMESERVER.DELETED_DATE Is Null AND ORGANISM_MASTER2.DELETED_DATE Is Null AND ORGANISM_MASTER2.REDUNDANT_FLAG Is Null AND TAXON_VERSION.DELETED_DATE Is Null AND TAXON_VERSION.DATE_TO Is Null, 'N', 'Y') AS REDUNDANT, IIF(NAMESERVER.DELETED_DATE Is Null AND ORGANISM_MASTER2.DELETED_DATE Is Null AND ORGANISM_MASTER2.REDUNDANT_FLAG Is Null, 'N', 'Y') AS ORGANISM_DEPRECATED, IIF(TAXON_VERSION.DELETED_DATE Is Null AND TAXON_VERSION.DATE_TO Is Null, 'N', 'Y') AS NAME_DEPRECATED FROM ((TAXON INNER JOIN (NAMESERVER INNER JOIN TAXON_VERSION ON NAMESERVER.INPUT_TAXON_VERSION_KEY = TAXON_VERSION.TAXON_VERSION_KEY) ON TAXON.TAXON_KEY = TAXON_VERSION.TAXON_KEY) INNER JOIN ORGANISM_MASTER2 ON NAMESERVER.RECOMMENDED_TAXON_VERSION_KEY = ORGANISM_MASTER2.TAXON_VERSION_KEY) INNER JOIN TAXON_RANK ON TAXON_VERSION.TAXON_RANK_KEY = TAXON_RANK.TAXON_RANK_KEY; **Query 4 - title=preferred_names** This query gets a list of the preferred taxon names, excluding those which are explicitly marked as redundant or deleted. .. todo:: Sort code (from taxon list item, recorder 3.3 list?) .. code-block:: sql SELECT DISTINCT ORGANISM_MASTER2.ORGANISM_KEY, ORGANISM_MASTER2.TAXON_VERSION_KEY, TAXON.ITEM_NAME, TAXON.AUTHORITY, ORGANISM_MASTER2.PARENT_TVK, ORGANISM_MASTER2.PARENT_KEY, TAXON_VERSION.TAXON_RANK_KEY, TAXON_RANK.SEQUENCE, TAXON_RANK.LONG_NAME, TAXON_RANK.SHORT_NAME, ORGANISM_MASTER2.MARINE_FLAG, ORGANISM_MASTER2.FRESHWATER AS FRESHWATER_FLAG, ORGANISM_MASTER2.TERRESTRIAL_FRESHWATER_FLAG AS TERRESTRIAL_FLAG, ORGANISM_MASTER2.NON_NATIVE_FLAG, NULL AS SORT_CODE, IIF(NAMESERVER.DELETED_DATE Is Null AND ORGANISM_MASTER2.DELETED_DATE Is Null AND ORGANISM_MASTER2.REDUNDANT_FLAG Is Null AND TAXON_VERSION.DELETED_DATE Is Null AND TAXON_VERSION.DATE_TO Is Null, 'N', 'Y') AS REDUNDANT FROM (((TAXON_LIST_ITEM INNER JOIN (TAXON INNER JOIN (TAXON_VERSION INNER JOIN (ORGANISM_MASTER2 INNER JOIN NAMESERVER ON NAMESERVER.INPUT_TAXON_VERSION_KEY=ORGANISM_MASTER2.TAXON_VERSION_KEY) ON TAXON_VERSION.TAXON_VERSION_KEY=ORGANISM_MASTER2.TAXON_VERSION_KEY) ON TAXON.TAXON_KEY=TAXON_VERSION.TAXON_KEY) ON TAXON_LIST_ITEM.TAXON_VERSION_KEY=TAXON_VERSION.TAXON_VERSION_KEY) INNER JOIN TAXON_LIST_VERSION ON TAXON_LIST_ITEM.TAXON_LIST_VERSION_KEY=TAXON_LIST_VERSION.TAXON_LIST_VERSION_KEY) INNER JOIN TAXON_LIST ON TAXON_LIST_VERSION.TAXON_LIST_KEY=TAXON_LIST.TAXON_LIST_KEY) INNER JOIN TAXON_RANK ON TAXON_VERSION.TAXON_RANK_KEY=TAXON_RANK.TAXON_RANK_KEY; **Query 5 - title=taxa_taxon_designations** Retrieves a list of the links between all taxon names and their designations. .. code-block:: sql SELECT TAXON_DESIGNATION_TYPE.SHORT_NAME, TAXON_DESIGNATION.DATE_FROM, TAXON_DESIGNATION.DATE_TO, TAXON_DESIGNATION.STATUS_GEOGRAPHIC_AREA, TAXON_DESIGNATION.DETAIL, NAMESERVER.RECOMMENDED_TAXON_VERSION_KEY FROM (TAXON_LIST_ITEM INNER JOIN (TAXON_DESIGNATION INNER JOIN TAXON_DESIGNATION_TYPE ON TAXON_DESIGNATION.TAXON_DESIGNATION_TYPE_KEY = TAXON_DESIGNATION_TYPE.TAXON_DESIGNATION_TYPE_KEY) ON TAXON_LIST_ITEM.TAXON_LIST_ITEM_KEY = TAXON_DESIGNATION.TAXON_LIST_ITEM_KEY) INNER JOIN NAMESERVER ON TAXON_LIST_ITEM.TAXON_VERSION_KEY = NAMESERVER.INPUT_TAXON_VERSION_KEY; **Query 6 - title=taxon_designations** Retrieves a list of all the available taxon designations that can be linked to taxon concepts. .. code-block:: sql SELECT TAXON_DESIGNATION_TYPE.TAXON_DESIGNATION_TYPE_KEY, TAXON_DESIGNATION_TYPE.SHORT_NAME, TAXON_DESIGNATION_TYPE.LONG_NAME, TAXON_DESIGNATION_TYPE.DESCRIPTION, TAXON_DESIGNATION_TYPE.KIND, TAXON_DESIGNATION_TYPE.Status_Abbreviation FROM TAXON_DESIGNATION_TYPE; **Query 7 - title=taxon_groups** Retrieves a list of all the taxon groups (reporting categories). .. code-block:: sql SELECT DISTINCT tg.taxon_group_key, tg.taxon_group_name, IIf(tg.input_level2_descriptor Is Null, tg.input_level1_descriptor, tg.input_level2_descriptor) AS description, tg.parent FROM (taxon_group_name AS tg LEFT JOIN taxon_group_name AS tg2 ON tg2.parent=tg.taxon_group_key) LEFT JOIN taxon_version AS tv ON tv.output_group_key=tg.taxon_group_key WHERE tg2.taxon_group_key IS NOT NULL OR tv.taxon_version_key IS NOT NULL; **Query 8 - title=taxon_ranks** Retrieves a list of all possible taxon ranks, e.g. Phylum, Family, Species. .. code-block:: sql SELECT TAXON_RANK.SEQUENCE, TAXON_RANK.SHORT_NAME, TAXON_RANK.LONG_NAME, TAXON_RANK.LIST_FONT_ITALIC FROM TAXON_RANK; **Query 9 - title=tcn_duplicates** Where there are multiple common names and it is otherwise not possible to pick a single default one to use in reports, this table provides a link from the organims to a taxon record containing a common name to use. .. code-block:: sql SELECT ORGANISM_MASTER2.ORGANISM_KEY, TCN_DUPLICATE_FIX.TAXON_VERSION_KEY FROM ORGANISM_MASTER2 INNER JOIN (TAXON_LIST_ITEM INNER JOIN TCN_DUPLICATE_FIX ON TAXON_LIST_ITEM.TAXON_LIST_ITEM_KEY = TCN_DUPLICATE_FIX.TAXON_LIST_ITEM_KEY) ON ORGANISM_MASTER2.TAXON_VERSION_KEY = TAXON_LIST_ITEM.TAXON_VERSION_KEY; **Query 10 - title=all_taxon_version_keys** Retrieves a list of all taxon version keys and the associated recommended key, including deleted and redundant names. Can be used to work out the context of any names which have now been removed from the online recording copy of UKSI. .. code-block:: sql SELECT INPUT_TAXON_VERSION_KEY, RECOMMENDED_TAXON_VERSION_KEY FROM NAMESERVER; **Query 11 - title=recording_schemes** Retrieves all extant recording schemes - their keys and their titles. .. code-block:: sql SELECT SCHEME_KEY, SCHEME_NAME FROM SCHEME WHERE DELETED_DATE IS NULL; **Query 12 - title=recording_scheme_taxa** Retrieves the relationships between recording schemes and their taxa. .. code-block:: sql SELECT SCHEME_KEY, ORGANISM_KEY FROM SCHEME_TAXA WHERE DELETED_DATE IS NULL; Now, run query 1 which fixes ORGANISM_MASTER records so they reliably point to an accepted name, never a synonym. The next step is to export the query results for each of the queries from 2 through to 10 as a text file. Prepare a folder on your hard disk into which you will export the files (I used ``c:\tmp``). These instructions are for Microsoft Access 2007 but the steps should be similar for other versions. For each query: #. Select the **External Data** ribbon tab. #. Under **Export**, choose the **Text File** option. #. Set the file name to export to in the folder you prepared earlier. The file name should be the query title with a ``.txt`` extension, e.g. ``all_names.txt``. #. Click OK. #. On the **Export Text Wizard** select the **Delimited** text option then click Next. #. Set the delimiter to **Comma** and the **Text Qualifier** to a double quote character. #. Select the Advanced button and on the dialog that appears, change **Code Page** to **Unicode (UTF-8)** then click **OK**. #. Click Next then Finish to export the file. Now that you have exported the files, follow through the steps in the next section "Starting with the exported text files" to complete the import. You will need to replace the files obtained with the UKSI sync tool with the files you have generated. Starting with the exported text files ------------------------------------- #. If you don't already have a species list on the warehouse ready to import the taxa into, then create one now. You can use the normal Warehouse user interface to do this (Taxonomy > Species lists). Make a note of the ID of the list. #. As the UKSI data is structurally complex you need to add a number of extension modules to add the necessary tables. To do this: #. Find the file ``application/config/config.php`` in your warehouse installation folder and open it in a text editor. #. Find the list of modules at the bottom of the page. #. Add an entry for several modules by adding the following lines into the list: .. code-block:: php MODPATH.'taxon_designations', MODPATH.'taxon_associations', MODPATH.'species_alerts', MODPATH.'data_cleaner_without_polygon', MODPATH.'data_cleaner_period_within_year' #. Log into your warehouse and visit the ``index.php/home/upgrade`` page to ensure that database upgrade scripts are run. #. Grab a copy of the files from https://github.com/Indicia-Team/support_files/tree/master/UKSI and follow the instructions in readme file. If you generated your own copy of the files from the Access database make sure you replace the downloaded copies with your versions. You will have to supply a user ID when you run the scripts (as described in the readme file). Look in the users table of your indicia schema to get the id. If you use the id of your main admin account the number is likely to be 1.