Community  
BROWSE SEARCH
Community Home
Advocacy
Help
Projects
Education
Development
GR Café

LOGIN
Email:
Password:
If you don't have an account,
click here to sign up.
Latest posts of: Aaron Schumacher
PROFILE FAVORITES POSTS
Posts
 Topic Name  Forum  Site  Date
 Re: SXSW Action - Stats Discussion  Development  Gridrepublic.org 02/13/12 05:07 am

https://plus.google.com/photos/112658546306232777448/albums/5708478458427255953

This is all those graphs but just for cases where the loss function is under 200. Things look even worse, as far as finding patterns. Even the parameter_21 thing disappears; it seems we just know that parameter_21 can reliably screw up the works if we put it at the wrong value, but it's hard to say where the best value of it is.

I think the next step will have to be some sort of cleverer search for patterns/relationships between larger sets of parameters, the question being, "what do the sets of parameters that generate low loss function values have in common, distinct from the sets of parameters that generate high loss function values?" I'm not sure if this search will necessarily be a visualization question. I do think it's interesting, but again, I don't know if I'll be able to spend time on it.

...


 Re: SXSW Action - Stats Discussion  Development  Gridrepublic.org 02/13/12 04:32 am

Also: I put the sample data set on google docs. It's easy to sort and look at, and might be more convenient for some people:

https://docs.google.com/spreadsheet/ccc?key=0ArMdkg9O1RmMdGhwR2ZwMS00VVFuSF8wVFYxbTNybHc

I also went ahead and ran MIC (Maximal Information Coefficient, see exploredata.org) which shows all the pairwise correlations. The results are similar to those of the first poster, but as I've mentioned it looks like the interactions are more complex than being just pairwise. Still, I haven't looked through this output carefully and there could be something interesting that I'm missing:

https://docs.google.com/spreadsheet/ccc?key=0ArMdkg9O1RmMdE5DUjZja1JrUGxLNmxidXBPbGhEWmc

...


 Re: SXSW Action - Stats Discussion  Development  Gridrepublic.org 02/13/12 04:27 am

I went ahead and made a bunch of graphs, available here:

https://plus.google.com/photos/112658546306232777448/albums/5708466103108764705

There are a lot of them, and they show what we already knew, which is that the relationships between the model parameters and the loss function are pretty complicated.

The one exception seems to be parameter 21, where it seems that all the really awful loss function values come from having that parameter set on the low end of its scale.

More patterns may come out of looking at just the better parameter combinations (not including really high loss function value rows). Perhaps I'll do that too.

From a theoretical visualization standpoint, I really don't know how to show with a graphic more complex interactions between three or more parameters, in any convenient way... I made all the two-parameter graphs, which is already too many to have to visually inspect, really. (I mean I looked at them all, but I don't know how much I got from it.) I think there may be an interesting problem here, and I'm not sure if I'm missing good existing work on it. I'd like to see some. Not sure if I'll be able to do much myself right now.

...


 Re: SXSW Action - Stats Discussion  Development  Gridrepublic.org 02/03/12 10:18 pm

Cool, thanks for your responses! I'd just feel better looking at a complete data set, so that I don't feel like I could be chasing things that aren't really there.

More questions:

* The data set has parameters numbered 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 24, 25, 26, 27, 28, 29, and 30. What's going on? Is this purely a strange numbering, or did someone already decide that parameters 1 and 2, for example, were no good?

* Are the iterations numbers accurately representing ordered runs by the evolutionary algorithm? In many cases (like parameter_5) the choices seem to be actually getting more wide-spread over time, rather than narrowing in on an ideal value. Or is this just a hint that this parameter is not particularly important?

* I haven't read the linked papers - do they describe the particular model being evaluated here? It might be helpful to have some idea of how the parameters interact in the model.

...


 Re: SXSW Action - Stats Discussion  Development  Gridrepublic.org 02/03/12 10:10 pm

Yes, I think I've seen all the pages. When you click "comment" from the project description page, it says this:

"You've followed a link to a page that doesn't exist yet. To create the page, start typingg in the box below. If you are here by mistake, just click your browser'sbackbutton."

Does that just mean that there aren't any comments yet?

And I've seen the page for possible approaches. I have two main thoughts:

1) It will be easier for people (at least for me) to think about possible approaches if the overall goal of the project is more clear. What are we trying to do? What would a successful approach lead to? (See my earlier post.)

2) Can we get a complete data set?

...


 Re: SXSW Action - Stats Discussion  Development  Gridrepublic.org 02/03/12 09:38 pm
Is there a discussion here? Are there existing posts? This interface is so strange; I was expecting it to work something like a wiki, I guess... Well, here's an email I just sent, maybe folks on this discussion thing can help too:
Hi! I'm just starting to look at this, and my first observation is that the web site isn't very conducive to collaboration. For example, clicking "comment" on the "SXSW Action - Statistics" page seems to fail. I was able to comment on another page, but other poster's comments were largely unreadable and I don't know if anyone will put up with the interface.

The comment I had wanted to put up was about the available data set being just 2074 records, when it says there are tens of thousands available. It seems like the whole thing could easily be released.

A more substantial comment is that I'm not sure I understand what the goal of the project is. We're just trying to visualize the logs of an evolutionary model optimizer? Why? Is there real hope that you can make a better model by presenting a bunch of unlabeled parameters and looking for input from "human intuition"?

What would success in this project look like? What is the goal? To make it a slightly clearer question, is the goal purely to create a visualization? Is the goal to create a system through which humans can specify a set of new parameter values to try?

It says that you hope people will produce some sort of software library - what would such a library actually do? What would be the features of such a library?

Thanks so much for helping me to understand the needs of the project,


- Aaron
...


Previous   Page 1 of 1   Next
Please enable JavaScript support in Your web browser to display this page property.