Bits & Bobs, Mushstools & Toadrooms
…Despite being a chilly & wintery March up here in the White Mountains, there is no shortage of fun birds and exciting projects!
So many Redpolls are keeping the Juncos company this year! Two pairs of Hooded Mergansers moved in just next door last week!
Local Sharp-shinned hawks & Fluffy Red Foxes have been busy careening around town gobbling up prey left and right- they seem to know Spring is right around the corner!
Merlin AI pipeline for Mushroom identification!
It’s happening, and its going to be awesome visit this project over here on GitHub
git clone https://github.com/Jesssullivan/image-identifer/ && cd image-identifer
Overview:
- Setup
- Artifacts
- Preprocess
- Artifacts
- Train
- Structures
- Notes
Setup:
venv:
python3 -m venv mushroomobserver_venv source mushroomobserver_venv/bin/activate pip3 install -r requirements.txt
| Artifacts: | train.tgz | test.tgz |
|---|---|---|
| images.tgz | images.json | gbif.zip |
Preprocess:
python3 preprocess
- Fetches & saves off gbif archive to
./static/- Checks the archive, tries loading it into memory etc
- Fetches Leaflet Annotator binary & licenses from JessSullivan/MerlinAI-Interpreters; Need to commit annotator (as of 03/16/21) , still fussing with a version for Mushroom Observer
- Generates an
images.jsonfile from the 500 assets selected by Joe & Nathan - Downloads, organizes the 500 selected assets from images.mushroomoberver.org at
./static/images/<category>/<id>.jpg- writes out images archive
- More or less randomly divvies up testing & training image sets
- writes out example testing/training archives; (while training it’ll probably be easier to resample directly from images.tgz from keras)
Train:
python3 train
- Fetches, divvies & shuffles train / validation sets from within Keras using archive available at mo.columbari.us/static/images.tgz
- More or less running Google’s demo transfer learning training script in
train/training_v1.pyas of 03/17/21 , still need to bring in training operations and whatnot from merlin_ai/ repo —> experiment with Danish Mycology Society’s ImageNet v4 notes
Google Colab:
- @gvanhorn38 pointed out Google Colabs’s neat Juptyer notebook service will train models for free if things are small enough- I have no idea what the limits are- fiddle with their intro to image classification on Google Colab here, its super cool!
Jupyter:
One may also open and run notebooks locally like this:
rename ipython notebook:
cp train/notebook/training_v1.ipynb.bak train/notebook/training_v1.ipynb
launch jupyter:
jupyter notebook
or without authentication:
jupyter notebook —ip=’*’ —NotebookApp.token=” —NotebookApp.password ”
Structures:
Leaflet Annotator
images.jsonStructure:- id : taxonID The MO taxon id
- category_id : The binomen defined in the
./static/sample_select_assets.csv; for directories and URIs this is converted to snake case. - url : Temporary elastic ip address this asset will be available from, just to reduce any excessive / redundant traffic to images.mushroomobserver.org
- src : imageURL The asset’s source URL form Mushroom Observer [{ “id”: “12326”, “category_id”: “Peltula euploca”, “url”: “https://mo.columbari.us/static/images/peltula_euploca/290214.jpg” “src”: “https://images.mushroomobserver.org/640/290214.jpg” }]
Selected asset directory structure:
├── static ├── gbif.zip ├── images | … │ └── peltula_euploca │ ├── 290214.jpg │ … │ └── 522128.jpg │ … ├── images.json ├── images.tgz ├── js │ ├── leaflet.annotation.js │ └── leaflet.annotation.js.LICENSE.txt └── sample_select_assets.csv …
Notes:
Fiddling with the archive:
MODwca.gbif[1].id: Integer: This is the Mushroom Observer taxon id, e.g.https://mushroomobserver.org/13https://images.mushroomobserver.org/640/13.jpg
MODwca.gbif[1].data:: Dictionary: DWCA row data, e.g.MODwca.gbif[1].data['http://rs.gbif.org/terms/1.0/gbifID']=13MODwca.gbif[1].data['http://rs.tdwg.org/dwc/terms/recordedBy']=Nathan Wilson
Originally published at www.transscendsurvival.org