# Posts from the ‘Lagniappe’ Category

## Online resource: The Analysis of Data

This came across my desk this morning: theanalysisofdata.com. The author is former professor of Computer Science named Guy Lebanon, who is now, according to his website the Director of Product Innovation at Netflix.  The text appears to be very rigorous and is notable because he goes all the way from first principles to Limit Theorems that rely on Measure Theory. Here’s a link to his volume on Probability:

http://theanalysisofdata.com/probability/0_2.html

The following passage from the preface caught my eye:

Probability theory is a wide field. This book focuses on the parts of probability that are most relevant for statistics and machine learning ….  Probability textbooks are typically either elementary or advanced. This book strikes a balance by attempting to avoid measure theory where possible, but resorting to measure theory and other advanced material in a few places where they are essential ….

I am not aware of a single textbook that covers the material from probability theory that is necessary and sufficient for an in-depth understanding of statistics and machine learning. This book represents my best effort in that direction.

I often run into students who want to work in Data Science, but who say they are only interested in learning material that is “practical” or “useful.” What has always amazed me about probability and statistics (and mathematics in general) is that, basic, seemingly elementary questions, often have answers often require a substantial mathematical background. In fact, many topics that students now see as abstract theory were first developed to answer very practical questions. I would suggest that Lebanon’s inclusion of measure theory in his text ratifies this thought in the context of data science.

Just as musicians must spend 10,000 hours* practicing on their instruments and learning music theory before they can be creative and establish an individual voice; if you’re ambitious and want to be a “data science genius,” you had better dedicate at least 2,000 of your 10,000 hours to mathematics!

* Yes. I know the 10,000 hours is a matter of correlation rather than causation.  Generally speaking though, if you want to be good at something, you had better be prepared to spend some serious time on it!

## Stem Cell Paper Accepted

Congratulations are in order for Vin Cannataro today.  The research paper stemming from the second chapter of his dissertation was accepted today by Evolutionary Applications.

The Evolutionary Trade-off between Stem Cell Niche Size, Aging, and Tumorigenesis

Vincent L. Cannataro, Scott A. McKinley, Colette M. St. Mary

http://biorxiv.org/content/early/2016/06/15/059279

Many epithelial tissues within large multicellular organisms are continually replenished by small independent populations of stem cells. These stem cells divide within their niches and differentiate into the constituent cell types of the tissue, and are largely responsible for maintaining tissue homeostasis. Mutations can accumulate in stem cell niches and change the rate of stem cell division and differentiation, contributing to both aging and tumorigenesis. Here, we create a mathematical model of the intestinal stem cell niche, crypt system, and epithelium. We calculate the expected effect of fixed mutations in stem cell niches and their expected effect on tissue homeostasis throughout the intestinal epithelium over the lifetime of an organism. We find that, due to the small population size of stem cell niches, fixed mutations are expected to accumulate via genetic drift and decrease stem cell fitness, leading to niche and tissue attrition, and contributing to organismal aging. We also explore mutation accumulation at various stem cell niche sizes, and demonstrate that an evolutionary trade-off exists between niche size, tissue aging, and the risk of tumorigenesis; where niches exist at a size that minimizes the probability of tumorigenesis, at the expense of accumulating deleterious mutations due to genetic drift. Finally, we show that the probability of tumorigenesis and the extent of aging trade-off differently depending on whether mutational effects confer a selective advantage, or not, in the stem cell niche.

## In celebration of pi

If you happen to be in a bar and you need to quickly calculate $\pi$, head over to a dartboard.  Draw a square that perfectly encloses the dartboard. (The length of its sides should be equal to the dartboard’s diameter.)  Start throwing darts randomly at the square.

If you manage to distribute the darts uniformly at random over the square then the fraction of the total darts that land in the dartboard will equal the area of the dartboard divided by the area of the square.  If the radius of the dartboard is $r$, then this fraction will be

$\displaystyle \frac{\text{Area of Circle}}{\text{Area of Square}} = \frac{\pi r^2}{ (2r)^2} = \frac{\pi}{4}$.

Notice that the ratio is independent of the value of $r$!

Therefore if you take the number of darts that land in the dartboard (and not in the surrounding portions of the square), divide by the number of darts thrown and then multiply by four … you’ve roughly approximated the value of $\pi$!

(But remember … it only works if you’re _bad_ at darts.)

While we’re at it, here’s an oldie but goodie. (Not sure of the original source.)

Note that the radius of the circle is $\frac{1}{2}$ so that the circumference comes out to $\pi$.

But, of course, nowadays I feel an obligation to nod toward the well-reasoned exhortations of the $2 \pi$ crowd:  The Tau Manifesto.

If you’re hungry, perhaps you would be interested in some crawfish pi?

—  A lo-fi page on formulas that compute $\pi$ using the Fibonacci numbers. (link)

— Some $\pi$ jokes, if you need them.

## Cosmos rebooted

Tonight I’ll nervously be watching the reboot of the television series, Cosmos, which was created and narrated by a science hero of mine, Carl Sagan.  The visual special effects and the soundtrack of the old series now come across as very dated but the concepts and the writing are timeless.  Much of what is presented is common knowledge to science majors in college, but the presentation is charming and Carl Sagan’s heartfelt enthusiasm makes it addictive to watch.

Here is a clip in which Sagan talks about atoms and the origin of the word “googol” while trying to convey how big numbers like a googolplex really are.  (And yes, it’s googol, not google!)

Here’s a link to new TV show’s web page: Cosmos, a Spacetime Odyssey.

## Things I should have done during Spring Break …

UK-based photographer Andrew Whyte specializes in dramatic light art and long exposures of the night sky, but some of his most striking work involves helping an inch-high fellow photographer get a good shot. For over a year, Whyte has been shooting what he calls the “Legography” series, starring a Lego minifig with a bulky black camera and a penchant for exploration. The minifig travels with Whyte, waiting to be posed scaling a fence, watching the sunrise, or playing tourist in London.

## Baseball and the Bull City

Durham Bulls Athletic Park

Before I lived here in Gainesville, I spent four years in Durham, NC. I lived downtown in a converted tobacco warehouse and could walk to minor league baseball games on a whim.

“Capturing the Quiet Beauty of Baseball”

The idea for the project came a few years ago when Stephenson was sitting in the stands on the last game of the year. “I realized that the diversity of the crowd was extraordinary. Every type of person that lived within a 30 or 50 miles radius was represented there. There aren’t many places you can say that about,” he said.

## The Shape of Data

There is a talk coming up on Monday featuring one of the world’s leading thinkers about the mathematical structure of data.  I highly encourage checking this out, as the talk should be aimed at a general audience (though with some serious math sprinkled in).

The Shape of Data
Gunnar Carlsson, Department of Mathematics, Stanford University
Monday, Feb 24 at 4:05 pm in Little Hall Room 101

The problem of extracting knowledge and understanding from large and complex data sets is one of the fundamental intellectual challenges for the mathematical sciences. One approach to this is to use the notion of the “shape” of a data set, as encoded by a metric, as an organizing principle for data. Since topology is the mathematical discipline which concerns itself with the study of shape, it is only natural that methods from topology should be ported into the study of data. This transfer has in fact been taking place over the last 10-15 years, and we will discuss some of the ideas which have come up, with examples.

Carlsson Colloquium (pdf)

## Pale Blue Dot

Yesterday, the Cassini spacecraft turned its camera back to earth in an attempt to capture an image of Earth through Saturn’s vast rings.  The shot is a tribute to Voyager 1’s famous photograph of Earth as a “pale blue dot” set against a vast and empty sky.

Of course, the term “pale blue dot” was coined by my favorite science writer, Carl Sagan. During the making of his epic series, Cosmos, he read an abridged version of that passage which has been gorgeously animated by this is ORDER.

From this distant vantage point, the Earth might not seem of any particular interest. But for us, it’s different. Consider again that dot. That’s here. That’s home. That’s us. On it everyone you love, everyone you know, everyone you ever heard of, every human being who ever was, lived out their lives. The aggregate of our joy and suffering, thousands of confident religions, ideologies, and economic doctrines, every hunter and forager, every hero and coward, every creator and destroyer of civilization, every king and peasant, every young couple in love, every mother and father, hopeful child, inventor and explorer, every teacher of morals, every corrupt politician, every “superstar,” every “supreme leader,” every saint and sinner in the history of our species lived there – on a mote of dust suspended in a sunbeam.

The Earth is a very small stage in a vast cosmic arena. Think of the rivers of blood spilled by all those generals and emperors so that in glory and triumph they could become the momentary masters of a fraction of a dot. Think of the endless cruelties visited by the inhabitants of one corner of this pixel on the scarcely distinguishable inhabitants of some other corner. How frequent their misunderstandings, how eager they are to kill one another, how fervent their hatreds. Our posturings, our imagined self-importance, the delusion that we have some privileged position in the universe, are challenged by this point of pale light. Our planet is a lonely speck in the great enveloping cosmic dark. In our obscurity – in all this vastness – there is no hint that help will come from elsewhere to save us from ourselves.

The Earth is the only world known, so far, to harbor life. There is nowhere else, at least in the near future, to which our species could migrate. Visit, yes. Settle, not yet. Like it or not, for the moment, the Earth is where we make our stand. It has been said that astronomy is a humbling and character-building experience. There is perhaps no better demonstration of the folly of human conceits than this distant image of our tiny world. To me, it underscores our responsibility to deal more kindly with one another and to preserve and cherish the pale blue dot, the only home we’ve ever known.

— Carl Sagan, Pale Blue Dot