Getting Started

This library is primarily intended to automate the collection, post-processing and integration of biomedical data stored in public online databases. It is hoped that this effort will catalyze new insights and understanding by transforming multiple, distinct data repositories into unified datasets which are highly amenable to data analysis.

BioVida aims to curate a broad range of biomedical information. In areas such as diagnostics and genomics, this involves drawing on the work of others, such as the impressive work by the Disease Ontology and DisGeNET teams. In the case of image data however, BioVida itself performs the ‘heavy lifting’ involved in collecting and processing raw data from sources. This is made possible by combining traditional programmatic solutions with recent advances in machine learning, namely convolutional neural networks.

The guide below provides a brief introduction to getting started with BioVida.


Python Package Index:

$ pip install biovida

Latest Build:

$ pip install git+git://

Note: if you are using python on macOS or linux with Python 3, you may wish to use pip3 install instead.


BioVida requires: beautiful soup, h5py, keras, lxml, numpy, pandas, pillow, pydicom, requests, scikit-image, scipy, theano and tqdm.

The installer will automatically install all of these packages.


  1. Keras is used to power the convolutional neural networks in this project.
  2. To use scipy on macOS (formerly OSX) you will need gcc, which can be obtained with homebrew via. $ brew install gcc. If you do not have homebrew installed, it can be installed by following the instructions provided here.