← Back to blog

Bits & Bobs, Mushstools & Toadrooms

…Despite being a chilly & wintery March up here in the White Mountains, there is no shortage of fun birds and exciting projects!

Merlin AI pipeline for Mushroom identification!

It’s happening, and its going to be awesome visit this project over here on GitHub

git clone https://github.com/Jesssullivan/image-identifer/ && cd image-identifer

Overview:

  • Setup
  • Artifacts
  • Preprocess
  • Artifacts
  • Train
  • Structures
  • Notes

Setup:

venv:

python3 -m venv mushroomobserver_venv source mushroomobserver_venv/bin/activate pip3 install -r requirements.txt

Artifacts:train.tgztest.tgz
images.tgzimages.jsongbif.zip

Preprocess:

python3 preprocess

  • Fetches & saves off gbif archive to ./static/
    • Checks the archive, tries loading it into memory etc
  • Fetches Leaflet Annotator binary & licenses from JessSullivan/MerlinAI-Interpreters; Need to commit annotator (as of 03/16/21) , still fussing with a version for Mushroom Observer
  • Generates an images.json file from the 500 assets selected by Joe & Nathan
  • Downloads, organizes the 500 selected assets from images.mushroomoberver.org at ./static/images/<category>/<id>.jpg
    • writes out images archive
  • More or less randomly divvies up testing & training image sets
    • writes out example testing/training archives; (while training it’ll probably be easier to resample directly from images.tgz from keras)

Train:

python3 train

  • Fetches, divvies & shuffles train / validation sets from within Keras using archive available at mo.columbari.us/static/images.tgz
  • More or less running Google’s demo transfer learning training script in train/training_v1.py as of 03/17/21 , still need to bring in training operations and whatnot from merlin_ai/ repo —> experiment with Danish Mycology Society’s ImageNet v4 notes

Google Colab:

Jupyter:

  • One may also open and run notebooks locally like this:

    • rendered pdf version available over here

    • rename ipython notebook:

      cp train/notebook/training_v1.ipynb.bak train/notebook/training_v1.ipynb

    • launch jupyter:

      jupyter notebook

    • or without authentication:

      jupyter notebook —ip=’*’ —NotebookApp.token=” —NotebookApp.password ”

Structures:

  • Leaflet Annotatorimages.json Structure:

    • id : taxonID The MO taxon id
    • category_id : The binomen defined in the ./static/sample_select_assets.csv; for directories and URIs this is converted to snake case.
    • url : Temporary elastic ip address this asset will be available from, just to reduce any excessive / redundant traffic to images.mushroomobserver.org
    • src : imageURL The asset’s source URL form Mushroom Observer [{ “id”: “12326”, “category_id”: “Peltula euploca”, “url”: “https://mo.columbari.us/static/images/peltula_euploca/290214.jpg” “src”: “https://images.mushroomobserver.org/640/290214.jpg” }]
  • Selected asset directory structure:

    ├── static ├── gbif.zip ├── images | … │   └── peltula_euploca │   ├── 290214.jpg │   … │   └── 522128.jpg │   … ├── images.json ├── images.tgz ├── js │   ├── leaflet.annotation.js │   └── leaflet.annotation.js.LICENSE.txt └── sample_select_assets.csv …

Notes:

Fiddling with the archive:

  • MODwca.gbif[1].id: Integer: This is the Mushroom Observer taxon id, e.g.

    • https://mushroomobserver.org/13
    • https://images.mushroomobserver.org/640/13.jpg
  • MODwca.gbif[1].data:: Dictionary: DWCA row data, e.g.

    • MODwca.gbif[1].data['http://rs.gbif.org/terms/1.0/gbifID'] = 13
    • MODwca.gbif[1].data['http://rs.tdwg.org/dwc/terms/recordedBy'] = Nathan Wilson

Originally published at www.transscendsurvival.org

Comments

© 2026 Jess Sullivan

| RSS