Key Points
Introduction |
|
Classifying T-cells |
|
Evaluating a Model |
|
Decision Trees, Random Forests, and Overfitting |
|
Logistic Regression, Artificial Neural Networks, and Linear Separability |
|
Conclusion and next steps |
|
Glossary and other resources
The Google machine learning glossary and ML4Bio guides define common machine learning terms.
The scikit-learn tutorials provide a Python-based introduction to machine learning. There is also a third-party scikit-learn tutorial and a Carpentries lesson.
The book Python Machine Learning has machine learning example code.
The Elements of AI course presents general introductory materials to machine learning and related topics.
Galaxy ML provides access to classification and regression workflows through the Galaxy interface.
The workshop organizers track additional resources for beginners and intermediate users.
Training classifiers for a research project typically requires training many models and tuning their hyperparameters on a validation dataset. Writing scripts helps automate this process, document the training and tuning decisions, and improve reproducibility. Software Carpentry introduces strategies for script-driven research. A computing cluster helps train and evaluate many machine learning models in parallel.
Jupyter notebook example
You can run an example Jupyter notebook in Binder to see how a machine learning workflow looks in Python code using scikit-learn. The notebook will load an executable Python environment in your web browser. After it loads, you can inspect the code and output or rerun it yourself.