We are hiring a professor in insect systematics here at UVM. On top of UVM being a great university to work at Burlington is an amazing and special place to live. In fact I'm about to pack it in and go jump on my sailboat on this beautiful windy August day.If it were winter I'd be getting up earlier on Saturday than I ever do during the week to skin up the mountain to get some fresh tracks.

So what are you waiting for insect systematists?

Update
Another reason to consider UVM and Burlington, this rainbow over the lake as I went for my lakeside run down the city bikepath.




Assistant Professor in Systematics


Department of Biology
University of Vermont
Burlington, Vermont

The Department of Biology of the University of Vermont seeks applications for a tenure- track Assistant Professor position in Systematics and Evolutionary Biology of arthropods, especially insects. The position will be open in the fall of 2012. The successful candidate will have expertise in classical and molecular systematics, including analysis of complex data sets. Candidates pursuing phylogenomics and innovative methods in bioinformatics in combination with taxonomy are especially encouraged to apply. Department information at: http://www.uvm.edu/~biology/.


All applicants are expected to: 1) hold a Ph.D. degree in relevant disciplines and have two or more years of postdoctoral experience; 2) develop a competitively funded research program; 3) teach undergraduate courses (chosen from among general biology, evolution, systematic entomology, and others in the candidate's expertise); 4) teach, mentor and advise undergraduate and graduate students; and 5) oversee a natural history collection of historic significance.

Candidates must apply online: http://www.uvmjobs.com/. On left see "Search Postings" then find "Biology" under "HCM Department" then posting 0040090 (first posting). Sorry, but we cannot supply the direct link because it will time out.

Attach a cover letter with a statement of research focus and teaching interests (one document), a curriculum vitae, representative publications, and the contact information of three references.

Review of applications will begin on September 15, 2011, and will continue until the position is filled. Questions and up to three additional publications may be directed to Dr. Jos. J. Schall: jschall@uvm.edu.

The University of Vermont recently identified several "Spires of Excellence" in which it will strategically focus institutional investments and growth over the next several years. One spire associated with the position is Complex Systems. Candidates whose research interests align with this spire are especially encouraged to apply http://www.uvm.edu/~tri/.
The University seeks faculty who can contribute to the diversity and excellence of the academic community through their research, teaching, and/or service. Applicants are requested to include in their cover letter information about how they will further this goal. The University of Vermont is an Affirmative Action/Equal Opportunity employer. The Department is committed to increasing faculty diversity and welcomes applications from women, underrepresented ethnic, racial and cultural groups, and from people with disabilities.
0

Add a comment

  1. And that's all she wrote folks.  I've moved Distributed Ecology to GitHub, you can find all my new posts here: http://emhart.github.com/  I've imported all the old posts to the new blog, but much of the formatting is lost so I'll leave these here as an archive.  But be sure to update RSS readers, etc. I'm excited to start using Jekyll as a new blogging platform.
    0

    Add a comment

  2. Remember 1992? I had just turned 13 and was still a year away from highschool when my true descent into nerdom and the internet would begin. Back then it was with a local BBS (Bulletin Board System) run by a guy in a trailer park named Charlie and a 1200 baud modem on an old Compaq 386. But there were some intrepid ecologists who were already forging ahead with what we take for granted 20 years later, Ecolog. Last September, Ecolog hit 13000 subscribers. I took the data that David mentioned in his post and plotted it out and fitted a curve to it. The fit was based on an assumption of there being 100 subscribers when Ecolog first started. On Sunday he posted that there were now 15,000 subscribers so I decided to see how good my fit was. Here's the original fit with new the new data point and the prediction (in red),I fit this exponential growth equation:

    where r in the growth rate, and d is the number of days since Sept. 1 1992, and

    The solid black dot is the true value,
    the red dot the fitted for the new data point

    What should that initial starting value be? Clearly it's not 100. Let's begin by assuming that Ecolog grew exponentially. We're missing a huge amount of data, so I'll say up front that yes I know that there could be any number of weird things that happened between the start year of 1992 and the first year we have data, 2006. But exponential growth isn't an unreasonable assumption. We know it's not linear, a linear fit gives you ~ -15,000 subscribers at time 0! We could make guesses and keep plotting to eyeball our fit. The first problem to solve is how do we figure out the best fitting value for N0? It's a perfect problem for the built-in optimization functions that R has, optim(). First we define a function that we want to minimize. Here we just use nls() with different values for N0 and minimize the deviance, and just plug that into optim().
    fp.est <- function(param,dat){
      z <- nls(dat[,2]~param*exp(r*dat[,1]),start=list(r=.001))
      return(deviance(z))
    }
    tmp <- optim(1,fp.est,dat=dat,method="L-BFGS-B",lower=1,upper=6000)
    
    
    That gives us a perfectly reasonable point estimate of 545, and vastly improves our fit as you can see.

    We can even get a little creative with the data and come up with a measure of standard deviation. We do this with a jackknife style estimate of variance. Jackknife estimates are type of resampling that work by calculating some statistic repeatedly by removing one or more of the data points. All we do is come up with all possible combinations for our data that have 8 out 9 points, and run our optimization routine. This yields a final estimate for the number of subscribers of Ecolog at the very beginning as:

    I have no idea if this is right, but it's a neat little example of using optim() and jackknifing to solve a problem where we have good intuition (that exponential growth is the right function) and not very much data. Here's a gist of all the code that this post used.
    2

    View comments

  3. So I came across this post by Cath Ennis (@enniscath) about Einstein's riddle. I had never heard of the riddle, but the apocryphal story that goes with it is that only 2% of people in the world could solve it. Well, never wanting to miss out on a chance to prove that I'm special, I sat down and decided I should solve it. Cath did it in 18 minutes. I didn't time myself, but I'm pretty sure that I didn't do it in less than 18 minutes, I'd peg my time more around 25ish minutes. It reminded me of doing Soduku. In any case, here's my answer, you can see that I solved in Power Point of all things. I find it hard to believe that 98% of people couldn't solve this. It seemed pretty easy to me, just a lot of variables to keep track of. I'd reinterpret it as only 2% of people in the world are really nerdy enough to give a shit about logic puzzles. Here's my solution!


    1

    View comments

  4. ResearchBlogging.org
    Old timey dog treadmill
    Being both a late riser and on the west coast means that I get most of my morning news streaming from an NPR station in Hawaii to catch Morning Edition. During my ritual 10 am coffee and twitter reading I came across mentions of a Morning Edition story on a paper that recently came out in the Journal of Experimental Biology (JEB) about humans and running. The thrust of the paper is that humans receiving a "runners-high" (actually a stimulation of the endocannabinoid (eCB) receptors) was an adaptative reward for endurance activity. My first thought was that the logic of this is all wrong. Here's the intro to the NPR story.

    Endurance athletes sometimes say they're "addicted" to exercise. In fact, scientists have shown that rhythmic, continuous exercise — aerobic exercise — can in fact produce narcotic like chemicals in the body. Now researchers suggest that those chemicals may have helped turn humans, as well as other animals, into long-distance runners.

    At this point feel free to be skeptical. The phrase is carefully worded to only imply causality, the authors and NPR push the envelope a bit later in the piece.
    "Wired to run, meaning that our brains are probably, have been sort of rewired from an evolutionary sense to encourage these running and high aerobic-activity behaviors," Raichlen says....
    Meat is one payoff for runners. But Raichlen thinks there may have been another reward: a runner's high. He designed an experiment to test this idea.

    Now you should be waiting for some elegantly designed experiment to test all this. The experiment that comes though is really not at all what one might expect. Essentially the authors took two mammals, dogs and ferrets, and measured levels of eCB's after exercise and did the same with humans. Dogs and humans are "cursorial" mammals, but ferrets aren't. Therefore if dogs and humans run for the reward, they are expected to have elevated eCB's. Low and behold they do. The chain of causality is that eCB's make us feel good when we run, so we evolved to run, the reward being the runner's high. Anyone who's read Gould and Lewontin's "Spandrels of San Marco" paper ought to have caught the flaw in the reasoning here (I think its still valid here despite criticisms of Gould and Lewontin.) Jeremy Yoder wrote a great post about how they didn't demonstrate anything to do with evolution at all. He of course has read Endler's "Measuring selection in the wild" as he astutely points out on his blog:

    To really show that the runner's high is an adaptation, of course, we'd need data that showed (1) observed variation in the runner's high response has a genetic basis, and (2) people who had get stronger runner's highs have more babies.

    The logic of Raichlen is a bit like saying birds fly because it feels good to be up so high in the sky. Now birds are a good example of this because flightlessness has evolved multiple times. Just because the physiological structures exist that allow a behavior (running, flight) does not mean the organism engages in said behavior because of a trait. Raichlen says in his conclusion:
    It is possible that neurobiological rewards induced by eCB signaling are an ancient human trait that evolved to encourage aerobic activity (Bramble and Lieberman, 2004; Carrier, 1984; Malina and Little, 2008), and that the rewards explain the evolution of differences in voluntary locomotor activity more broadly across mammals.


    I highly doubt that eCB's "evolved to encourage aerobic" activity. If we were to engage in "just-so" story telling about this system, you'd say that humans that produced eCB's were more effective at capturing resources and could therefore have more children. Raichlen doesn't mention reproduction at all. But I'll leave the Endleresque criticism to Jeremy, I have another problem with this study.

    Raichlen chose just two taxa besides humans. The authors are very open about this, but what they totally neglect is the underlying phylogeny of their taxonomic choices. According to recent mammalian phylogenies, ferrets and dogs are actually sister taxa. So I adopted two possible pathways based on what I know about cursorial vs non-cursorial mammals. We know already that mice have elevated eCB's during exercise. By the authors own definition, let's assume that sedentary mammals like cows and sheep don't have eCB's. By examining the tree, it can be seen that eCB stimulation was either lost at least twice, or gained at least three times (red circles are losses and green ones gains). What would be really interesting is to see where this trait maps across the phylogeny. Is it a conserved trait that was selected for in some ancestor? That would point to the fact that maybe it has nothing to do with running. The authors are mute about phylogeny, but eCB's could alternatively be the ancestral character state, and really the interesting question is why did ferrets evolve the loss of this state? On the other hand maybe the trait evolved multiple times, and that also is really interesting to ask how that happened. But either phylogenetic scenario undermine the central thesis of Raichlen. Either it evolved in a common ancestor in which case it may have nothing to do with running. Or it evolved independently in dogs and humans, meaning that there may have been different selective pressures at work in each species (or the same, we don't really know).

    Raichlen is an anthropologist and something of a media darling scientist who studies a topic near and dear to many people. It seems irresponsible that he was able to make so many assertions about evolution with zero evidence for it, and make assumptions about the conservation of traits across a phylogeny without even mentioning phylogeny. So what's the role of the media in all of this though? It's something that was picked up by lots of media sources saying things like "Yet whether other mammals are also driven to exercise by endocannabinoids has remained a mystery." (from the Economist). Unfortunately, there's no way to undo that misplaced causality.


    Raichlen DA, Foster AD, Gerdeman GL, Seillier A, & Giuffrida A (2012). Wired to run: exercise-induced endocannabinoid signaling in humans and cursorial mammals with implications for the 'runner's high'. The Journal of experimental biology, 215 (Pt 8), 1331-6 PMID: 22442371
    2

    View comments

  5. Le Grand Casino of Monte Carlo
    On Monday I'm going to be leading a little stats workshop on randomization tests and null models. In preparation for this I wrote up code for null model examples I wanted to write a post that introduced the basics of these models (Null models, bootstrapping, jack-knifing etc...) that are all specific classes of a general method known as Monte Carlo methods. Put simply, a Monte Carlo method is any approach that generates random numbers and then seeing how different fractions of them behave. Its a powerful method that can be used for a wide variety of situations, and its commonly used for solving complex integrals among other things.

    A simple integration example

    Let's start with a trivial example, integrating a function we use all the time as ecologists, the normal distribution. Maybe you want to integrate the normal probability density function (PDF) from -1 to 1, because you're curious about how likely an event within 1 standard deviation is. To get the area under the curve we simply integrate the PDF from -1 to 1.
    MC integration of the normal
    PDF between -1 and 1
    In this case its silly because we already have an analytical solution, but it can be necessary for more complicated integrals. The simplest method is "hit or miss" integration where we create an x,y grid and sample randomly from it and ask: "Is this random point under my curve or not. To approximate the integral we multiply the fraction of our samples under the curve (fc) by the total area we sampled, A. Using R we can do both the actual integral and the Monte Carlo version easily. The actual answer is 0.682, and the approximate answer I got was 0.688, so pretty close. You can see the full code at this gist.

    A simple statistical example
    Another place we can use these methods in statistical hypothesis testing. The simplest case is as an alternative to a t-test. Imagine you have a data set with measurements of plant height for the same species in shaded and unshaded conditions. Your data might look like this:
    Shaded13.3
    Shaded12.1
    Shaded 14.7 
    Shaded12.8 
    Unshaded17.8
    Unshaded19.4  
    Unshaded18.5 
    Unshaded18.5 
    An easy way to test this is with a parametric t-test. The Monte Carlo approach involves a several basic steps that are the architecture of any randomization test no matter how complicated. First calculate the true value of your hypothesis. In this case we are interested in if there is a difference between these two groups. We can calculate the mean of the shaded and unshaded group and subtract them. We want to know if this value is significantly different from 0. To determine this we need to construct a null distribution and here is where the Monte Carlo method comes in. If there is no difference between the groups, then which group a given height is associated with shouldn't matter. So we reshuffle the values of the labels from the existing data set and each time we calculate our test statistic, the difference in the means and store it. This will create a distribution of test statistic values. We then compare the true value to the null distribution. Usually the test being performed is two-sided, so we check against the 95% confidence interval of the null distribution. If the value is within that interval then we will fail to reject the null hypothesis that there is a difference in the two means. Here's a gist with all the relevant code and some pretty figures in ggplot2.

    "All that code for a t-test?"

    That must be what you're thinking, and you're right, its certainly unwieldy to write all that code for something so simple. But its a good starting point for when we begin talking about null models. You may not have realized it but in the previous example there's two assumptions behind our inference. The first is that some process or mechanism has caused a difference between our groups. The fact that plants grow to different heights in shaded and unshaded conditions says something about the way plants use light, or the way they compete for light, or maybe some other mechanism I haven't thought about. The second is that by randomizing our existing data, we can simulate a situation where we have collected the data under completely stochastic conditions, e.g. the process causing plant height is random. So there are our two assumptions: A.) Our data set represents the outcome of some process and B). by randomizing we can create a null model without process to test our own data against. Here's where things become a bit more gray. In our example above the the null hypothesis is pretty clear, and we can all agree on it, but problems arise with more complicated questions. Traditionally null models have been used to make inferences about community assembly rules. The use of these models was prevalent during the "battles" constituting what is tongue-in-cheek called the "null model wars". I won't take up any space rehashing the null model wars of the 70's and 80's but links to good resources can be found here on Jeremy Fox's Oikos blog and his post about refighting the null model wars. Suffice to say careful attention needs to be paid to the selection of the proper null model. Nick Gotelli has lots of good papers about null models if you peruse his work . I've worked up several examples from his 2000 and 2010 papers, and sometimes the algorithms can be challenging. I'll cover more advanced methods in a future post going over some methods from Nick's papers.
    0

    Add a comment

  6. I've been following with interest Bob O'Hara's blog post over at his blog, "Deep thoughts and 
    silliness". In his post he is highly critical of a piece that came out in Science earlier this year by Fernando Maestre, Nick Gotelli and many others (full disclosure I was Nick's graduate student up until last year and have known Fernando for several years). Bob's problems seemed to be two fold, stemming from an assertion by Fernando that "significant relationships would indicate potentially strong effects of richness on multifunctionality".
    Red lines from errors in x, blue from errors in y as errors
    in x increase, slopes tend to 0. As they increase in y,
    slopes vary randomly.  The true slope is a thick dashed
    black line
    Bob feels this is "utter rubbish", and has two reasons for the caustic remark. His first problem is that there are no measures of within variable variability (an awkward phrase I know). Doing a bit of simulation he shows what has been known for a long time, that in a regression errors in y produce symmetric unbiased estimates of the intercept and slope and whereas errors in x produce biased estimates.
    This is of course no surprise, it has certainly been published in ecology before, a quick search brought me to this paper in 1994 by Steve Carpenter in Ecology, and I'd bet it come up before that. I would argue that the fundamental importance of Fernando's work is not the magnitude of the slope, but that the relationship is present and holds up across a huge observational data set. The true measure of interest is how much of the variation can be explained by species richness, and how does error-in-variable affect that? So I quickly did Bob's simulations as best I could inferring his code from the text.

    Errors in y produce almost no bias in slope
     but errors in x always underestimate the
    slope.  The black line is the true value.
    Errors in x and y produce
    similar bias in estimates
    of R2
    A few methods.

    You can see my full code over at my GitHub page. I first created a multiple regression with four variables, all equal in their effects. I chose one variable at random and imagined that I only measured my response and that one variable, yielding an R2 of ~%25. I then I ran two simulations, one with measurement error in y and another with error in x, such that the correlation between the true value and "corrupted" value of x and y was between 0 to 1, the same as Bob. Its clear that errors measurement error in x has a different bias than error in y. Bob's problem, as I understand it, is that when measuring species richness surely there is measurement error in that variable. This error will necessarily bias the slope. I would argue that Fernando's central point was that it explained some of the variance. So when I ran my simulation, I asked how do errors in variable affect values of R2? Plotting that reveals a non-linear increase for both error types as error decreases. So at worst they are underestimating the amount of variance in multifunctionality (which is quite low at 4%). Therefore if the quantity of interest is the slope, then errors in variables matter, but if the quantity of interest is R2 then the location of the error is less relevant. I've already made my case that its the latter so Bob's technical objections seem moot.

    So what does it all mean

    The next objection he raises is of a much more general nature about observational studies and I agree with it in part. One of his last sentences sums it up.
    If we estimate an effect of species richness (say), it could be that the true effect of biodiversity is larger and we are only measuring an indicator, or it could be that there is no effect, but both ecosystem function and species richness are affected by the same abiotic drivers.
    The former point is quite astute. Most ecological studies do measure some dependent variable of interest (multifunctionality in this case), and several independent variables that may be indicative of some process / mechanism but not the actual mechanism. Bob's example is that an investigator may measure winter temperature and water bird abundance, but the true mechanism may pond ice out date. His point: mechanisms are complicated and observational studies often miss these nuances. The second point falls into my bin of "pointless criticisms of *study type*". Yes, observational studies can never show causality with with complete confidence, but does that mean we should dismiss them all together? In ecology almost any study falls somewhere in the range between obeservational and microcosm. Each experiment type, from large scale macorecology studies to mesocosms to microcosms are up for some sort of criticism along these lines. For instance microcosms are often criticized as being unrealistic because of their small scale, should we stop doing them?(if you say yes Jeremy Fox has an earful for you).

    I think that most ecologists would agree that hypotheses within our field need a robust series of results from different experiment types. We need good theory, good mathematical underpinnings, but also evidence from large scale surveys, manipulative field experiments, microcosms etc... to have strong support for a given hypothesis. In the end Bob's problem with errors in variables seems irrelevant given that the magnitude of the slope was not the quantity of interest. His points about how mechanisms are complicated is valid, but he follows it up with essentially dismissing all observational studies as incapable of demonstrating causality. Is Fernando's paper the alpha and omega on biodiversity and ecosystem function? Of course not, but is it "utter rubbish" as Bob says? I'd hardly say the paper warrants the vitriol.
    5

    View comments

  7. Duke blew it this year
    Well, March Madness has come and gone again. If only Ohio had beaten Kansas, I think I would have won my bracket pool, but alas! I was curious though, how can you tell if your bracket was good or bad? I decided to create a little null model of the tournament to see. There are 63 possible wins in the tournament, so at first you might think random guesses would lead a distribution with a mean of 31.5. That of course neglects the fact that the tournament has a tree like structure so errors early in the tournament propagate throughout it. A fancy way to say this is that its a non-Markov process, a less fancy way would be to simply say that outcomes at deeper nodes in tournament are dependent on earlier outcomes. From a simulation perspective this is pretty easy to handle, albeit a bit of a pain to code. Below you can see the null distribution. Its very closely approximated by a Poisson distribution with lamda = 21.357. Therefore on average if you just chose randomly you would have only gotten 21 correct picks
    Null Model results

    The vertical lines all represent different correct game picks. The dashed black line is the 95% confidence interval. Its drawn at 31, so if you had 31 or fewer choices correct on your bracket, it basically means a monkey could have done better than you, or really animal since you did no better than just randomly choosing teams. The solid black line is 40, the number of correct choices you would have had if you simply followed the seedings of the teams. In my bracket most people did worse than this. My number of correct picks is the blue line, 43, achieved with my model you can see in my previous post. Finally the red line the best bracket chosen from the ESPN bracket challenge (its 51). So there you have it, a way you can statistically prove that your friends are no better than monkeys and you are a genius when it comes to NCAA brackets!
    1

    View comments

  8. Well, it's come to that time of the year again, a moment that brings me back to my first brackets that I filled out in 7th grade. Back then I didn't care that much about college basketball, so I just made guesses. Not much has changed in 20 years. I still don't care that much about college basketball, but how I guess has gotten much better. These days I use a pretty simple simulation model with a tried and true Bayesian core to it. I have an old post on this model from two years ago, but I think its time for an update. Lots of people have fun modeling the tournament, Nate Silver at the New York times has a good model on his 538 blog, as does a group over at Georgia Tech called the LMRC headed by Joel Sokol. My own model is probably the simplest of all of these, but it shares its DNA with Nate Silvers. Here's the overview.

    Model description

    The guts of my model are a simple Bayesian Beta-Binomial posterior distribution. This is composed of two parts, one is a Binomial distribution, which is simply the probability that team A beats team B. The next is the prior distribution. Most often in ecology we would use a flat prior, but because this is sports we have lots of background data. Already it should be clear we have problem. That's right, often times team A has never played team B, and certainly not enough times to have any data. So the first thing we need to do is get some data. My way around this is to use Ken Pomeroy's ratings. He gives each team a "Pythagorean" ranking. What's convenient about his rankings is that you can take any two teams and convert their rankings into a probability, let's call it p, that team A beats team B. We do this using a simple formula called the log5 odds. Here's how its implemented in the code
    l5rule <- function(ratings){
    a <- ratings[1]
    b <- ratings[2]
    wpct <- (a - (a*b))/ ((a + b) - (2*a*b))
    return(wpct)}
    
    Based on this win percentage, I simulate a 34 game season between the teams, based on that p. That's the binomial part of my posterior. Next I want to include prior information with that simulated data, otherwise there's no real need for simulation, I could just use the log5 odds rule to calculate everything. I do this by creating a Beta distribution based on power rankings from two other experts, Jeff Sagarin and Sonny Moore. I use their rankings to create a Beta distribution centered around the p that team A beats team B. So now I have my "data" based on the Pomeroy rankings, and the prior. I combine this into into my posterior and I generate a single draw from that posterior distribution. I work through the entire tournament, simulating it 10,000 times and generate a table of probabilities that each team makes it to the each stage in the tournament. You can see my results here. Predictions are blue highlighted cells, and actual outcomes are red text. So far I've had about 80% accuracy. So far its on par with the LMRC model. You can see all the code on github.

    Last thoughts

    My personal bracket in my pool is outperforming my model. I sort of cheated on myself filling it out. One of the great things about the tournament is that you know there will be upsets...but which ones will they be? My model doesn't really account for that, but there is a model that does. Over at the Harvard sports analysis group they have a great model that they used, and based on that I threw some upsets into my own picks. I'll be back later with more updates on how it performs.
    1

    View comments

Ecology Blog Awesomeness
Ecology Blog Awesomeness
Popular Posts
Popular Posts
Blog Archive
Subscribe
Subscribe
Total Pageviews
Total Pageviews
95982
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.