SXSW Action - Statistics

In partnership with the Swiss Tropical and Public Health Institute we are working on interactive tools to support malaria research.

Contents

About the Project

The project involves a computationally intensive malaria models: a model takes a set of about 25 parameters as input, and then generates predictions about malaria epidemiology for a setting over a period of time. At present, model predictions are being compared to historical data; the immediate task is to optimize the model parameters, such that the predictions better match the historical data.

A few links of interest:


The Dataset

The dataset for this project is comprised of tens of thousands of records of individual model runs. Each model run is comprised of {1} a set of about 25 parameter values (the parameter values are the inputs; the exact number of parameters varies slightly for different models that are being fitted), and {2} a "score" that reflects how well that model run's output reflected the historical data.


The Goal

At present, evolutionary algorithms are being used to generate and evaluate subsequent "generations" of parameter sets. The goal of this project is to improve the optimization process, to reduce the number of iterations to convergence. Specifically, we seek to create a user interface to enable humans to explore the model history (ie, past runs and results) and so informed, to guide the selection of future parameters-- that is: to guide-- and hopefully accellerate-- the evolutionary algorithms, to “optimize the optimization process”, so to speak.


The Challenge

It is almost impossible to present to a human a 25-dimensional dataset. So we seek a method to reduce the 25 parameters to some smaller number for the user to interact with; and at the same time, to be able "translate back" a user selection within this reduced space into a complete 25-variable parameter set which can be input to the model.

  • Ideally code for this function would be delivered as a library that could be called on by developers of various data-visualization interfaces which people are building to work with the project data.
  • Code should be developed using open-source tools; the code will need to be integrated into a web application -- on a server or in a browser-based application.
  • A recommend tool is "R"

Community Input

We need your help!

Discussions

  • Please join the general discussion here.

Brainstorming Document

  • A community document outlining possible approaches is here; please take a look and contribute.

Using this Documentation/Wiki

  • It's easy to edit and add content to these pages; here's how.

Development

  • And, of course, we hope you'll help develop libraries as described in "The Challenge" above...