Signal and Noise: Part I

[Note: Sorry for taking so long. Elli wrote the following post. We were going to post this earlier, but several long and fruitful arguments about signal and noise delayed it, and brought it to the point where it was best split up for size. We have more material ready on this, and we should be able to get back to posting more frequently. Enjoy. –Moshe]

We have given a lot of attention to the “shape” of rabbinic careers over time. Specifically, we have looked at R. Yaakov Ettlinger and R. Moshe Feinstein and tried to consider what may have affected the shape. Factors like R. Ettlinger’s editorship of Der Treue Zionswächter and R. Feinstein’s presence in the Soviet Union during the years of Stalin’s religious purges, as well as his writing and publishing spike in the late 1950s and early 1960s, we argued, can help explain and understand how their careers developed and how they related to their own writings.

More broadly speaking, however, the goal of HaMapah is not to explain these phenomena as much as it is to show that they exist. Let us illustrate this with a table that shows the number of published responsa written by Hatam Sofer by year.

[Note: the data here is based on Hebrew year – 3760, not the actual Gregorian year. Our date parsing tools are accurate to the day where possible, however, choosing where to assign dates with year only and no further data to the Gregorian year is a bit of a question and a possible noise source].
Another map shows where the responsa were written to (for those with both dates and addresses) during those years. (ideally, open it in a new tab)

What we see here is very uneven. There is a general upward trend from the turn of the nineteenth century until the end of his life, but there is lots of variance from year to year. Before we ask how to account for that, we must ask what exactly needs to be accounted for. We should expect a certain degree of variance from year to year simply because that’s how life works. But what should we expect?

Let us discuss baseball for a moment. Unlike most other sports, much of baseball can be broken down into isolated events: pitcher versus batter. The outcome of any individual event is wildly uncertain, but over time patterns emerge. Certain features of a batter’s performance–the rate at which he strikes out, the rate at which he walks, the rate at which balls put in play result in hits–stabilize over time. There are also local environmental factors that come into play. Smaller parks tend to inflate offense and depress defense, while roomier parks have the opposite effect. Factors like wind, humidity, temperature, and altitude also affect performance. Strength of opponent is, of course, a significant factor. And, of course, there are factors in the personal lives of the players that can have an effect (usually detrimental): injury, illness, exhaustion, and grief, to name a few.

And then there is also simple, blind luck. There is only so much control that a batter can have over a ball hurtling toward him at speeds approaching (or exceeding) 100 miles per hour. Sometimes a well-struck ball finds the glove of a well-positioned fielder. Sometimes the weakest contact results in a base hit. That’s the way the ball bounces.

A basic idea of advanced statistical analysis is to try and isolate the relevant factors, the “underlying” performance of a player, that will give a better picture of who the player really is. It allows us to quantify who has been lucky and who unlucky, and it allows us to determine the specific skill at which a given player excels (or fails). We are able to separate the signal from the noise.

Because of all of the factors mentioned–the “noise”–there is a great deal of year-to-year variance in the actual results of a player’s performance. The overall trend is toward a late-20s peak followed by decline, but the number and rate of hits, home runs, doubles, etc. varies greatly from year to year. Advanced analytics develop different kinds of tools that “smooth” the jagged edges of the year-to-year variance by eliminating or accounting for more and more noise.

It is important to recognize that the “noise” itself has meaning. Poor performance is poor performance, even if it is not indicative of a player’s true talent level. A lucky win still goes in the W column. A player whose home run totals are inflated by Coors Field in Denver still has those home runs to his credit. When a batter faces a pitcher, he either will or won’t get on base. He either will or won’t strike out. This is what gives the game its drama: after all the analysis, the players must still go and play the game, whose outcome is far from certain. All advanced statistics can do is give a good idea of what to expect from a player–a better idea, in fact, than “traditional” statistics that count (noisy) results. They do not tell us what happened or predict with certainty what will happen, though they can predict what will happen with substantially better accuracy than traditional statistics. For instance, FIP predicts next year ERA better than ERA.

Can some of these insights be applied to the study of responsa? Certainly, although there is a certain tension here between the historian and the statistician. For the historian, each responsum is a discrete historical event to be studied on its own. To the extent that the “noise” is part of the event and can be determined, the historian wishes to do so. They are interested in what actually happened.

Statisticians, on the other hand, want to isolate performance from all but the most directly relevant factors. As long as the number of responsa that a given posek wrote in a year is somewhat consistent with expected year-to-year variance, it does not trouble them too much. Whatever may have inflated or depressed the number of responsa that year, even if it was not sheer luck, should be ignored when trying to determine the longer arc of the posek’s career, if the spike or dip is within the typical noise pattern. They want to see transition periods, when the posek breaks out (or in stages), if he declines at the end of his life due to health,  and other larger trends, not blips and aberrations. They’ll want this:

Hatam Sofer & EMAs
Hatam Sofer & EMAs

We’ll get in to more details soon. Stay tuned.

Leave a Reply

Your email address will not be published. Required fields are marked *