Data
Ari Y. Kelman
Ari Y. Kelman is the Jim Joseph Professor of Education and Jewish Studies in the Stanford Graduate School of Education, where he directs the Berman Archive, a free and open repository of documents pertaining to American Jewish communal life.
For the past fifty years or so, the American Jewish community has collected data about itself through population studies and program evaluations. Population studies can range in scope from national (like the Pew Studies or the National Jewish Population Studies) to local, and they usually include a mix of demographic, behavioral, and attitudinal questions. Program evaluations tend to focus on the efficacy or “impact” of a given program on participants. If honestly and carefully applied, these two modalities can produce useful and reliable data about the behaviors and attitudes of American Jews. But alone, they are not quite keeping pace with 21st-century data collection.
Over the past two decades or so, the very nature of data has changed, and so have most people’s relationships to it. Every day, people willingly surrender previously unimaginable loads of data about themselves—their habits, their whereabouts, their preferences, and even their innermost thoughts—to large tech companies. Most people in the 21st century accept that handing over data is the price they have to pay for the convenience of next-day shipping, access to millions of hours of entertainment, or connecting with friends and family around the world.
As a result, some server somewhere holds data on how I spent my day yesterday. It knows the last time I set foot in a synagogue (as long as I had my phone with me—but when don’t I?), called my mother, or used a Hebrew keyboard. It knows that I've never watched a single episode of either “Fauda” or “The Marvelous Mrs. Maisel.” If responsibly gathered and analyzed, that data could produce a pretty good portrait of my life as an American Jew in 2024. It would likely be a more accurate portrait than one generated by a survey asking me to indicate how strongly I agree or disagree with a particular statement, or asking me to rank my sense of connection to the State of Israel on a scale of 1-7, with seven indicating, “It is central to who I am as a Jew and a human being,” and one indicating, “I never think about it.”
The ways that American Jews understand themselves through data has largely failed to keep pace with modern developments, and the problem is not just methodological; it is both conceptual and cultural. Although survey and interview data will always be useful, such approaches cannot reveal what Amazon or Google can about American Jews and their daily lives. What might American Jewish organizations—particularly the ones that sponsor population studies and program evaluations—glean if they were able to access the data that has already been collected by these companies? How would they come to know American Jews differently, and how would these data change how American Jews see or understand themselves?
Who Counts?
I have a friend who works for one of the big social media companies. We were chatting one day early in her employment there and, as social scientists do, we got to talking about how they approach research at her new company. We moved quickly from big theoretical questions about how to measure social ties and whether you can trust people’s online behaviors as adequate representations of who they are in the rest of their lives, to more picayune shop talk, like how long to run an experiment and sampling strategies. It was then that something she said stopped me cold. “We don't sample,” she told me. “We don't have to. We have so many users that we don’t have to sample or recruit specific pools of participants. Our numbers are so big that however many people respond to a survey or participate in an experiment, they basically represent everyone.”
With the number of its users in the billions, my friend’s employer can avoid the trouble of ensuring that the responses of a relatively few people can reasonably represent the larger population. At that scale, the challenges of representativeness nearly vanish. This fact also means that her employer can deflect criticisms of its studies by people suspicious that the respondents who participated are not, in fact, representative.
Those of us who wish to study human beings but do not have access to billions of potential research participants labor over questions and strategies of recruitment and participation. It is one of the cornerstones of any reliable piece of research: How do researchers ensure that they are asking the right people the right questions, so that they can gather reliable data about the population or phenomenon they are trying to better understand? Were the people who participated in the study selected randomly or purposefully? How were they recruited to participate? How do researchers know that they are not just getting responses from a small segment of a larger group or that their participation was not the result of some bias either on their part or on the part of the respondents?
Sometimes, these questions are lobbed at researchers by people who simply do not wish to believe a set of findings. On some level, it is easier to critique how the pool of respondents was selected than to engage with the findings, and it is important to be critical of research methods. For example, American Jewish population studies regularly used the presence of “distinctive Jewish (sur)names” to estimate population sizes. The strategy was so prevalent that it was known by its initials: DJN. Despite its obvious problems, the reliability of this practice remained the subject of serious discussion among some scholars of American Jewry well into the 21st century. Approaches like DJN deserve scrutiny and criticism, and acknowledging its biases has led to methodological improvements.
But the real issue is not only methodological. When people ask about sampling or recruitment, about site selection or response rate, they are really asking if they can trust the data. Implicitly, they are also asking if they can rely on the researcher, and their sponsoring organization.
Data, More or Less
Social scientific research has made massive methodological advancements in an effort to keep pace with social and technological changes. The Pew Research Center used its 2020 study of American Jews to run an experiment that compared the responses of people who took the survey on the phone and those who took it online. (Full disclosure: I was an advisor to Pew’s 2020 study, but I did not advise on this experiment.)
The results of Pew’s experiment will enable them to mitigate some biases in data collection that will improve their research methods and will also make their results more reliable. But better methods do not necessarily translate into data that is either clear or meaningful. Take, for example, this finding from the 2020 Pew Report: 20% of American Jews reported that they find their “religious faith” provides them with “a great deal of meaning and fulfillment” (2020 Pew Report, page 69). Is that a lot of people or just a few people? Does it mean that Judaism, which the majority of American Jews claim as their religion, is not actually a popular source of fulfillment or meaning for American Jews? How does it compare to the situation a generation ago? Two generations ago? In the Middle Ages? Is 20% a spiritual trough or a peak? How do Jews compare with other Americans? How does the 20% compare to the 43% of American Jews who find spending time with their pets to be meaningful and fulfilling? How do you measure “meaning” anyhow? The Pew Report is probably the best national study of American Jews we have, and still, it contains some strangeness that can be more befuddling than illuminating.
I trust that Pew arrived at this finding about the faith of American Jews through methodologically sound means, but what now? What good is methodologically sound data that is confusing or inconclusive? Sure, it is interesting, but neither Federations nor educational service providers undertake evaluations just to find something “interesting.” Interesting is enough to drive what might be called “basic research,” but most of the research that circulates in and through Jewish organizations large and small, often thought of as “applied research,” is created to help people make decisions, as, in the parlance of the industry, such decisions should be “data informed” or even “data driven.”
But what does it mean to be data informed when the data are methodologically sound but opaque? What would it look like not to just keep up with the methodological Joneses, but to develop new strategies for gathering data—useful data—that is more responsive to social, cultural, and technological changes? In a world where humans create so much daily data, how might American Jewry do data differently?
The Data Deal
Just as the Distinctive Jewish Name strategy has outlived its utility, maybe it is time to reimagine the role that data can and should play in American Jewish life. We live in a new world that is overflowing with so much data that paper or phone surveys and impact evaluations might now actually get in the way of our efforts to produce reliable representations of American Jews. New modalities of data production mean new possibilities for representing Jewish life with even greater fidelity and predictive power. We could call it the “Age of Surveillance Judaism.”
My phrasing here is a nod to Shoshana Zuboff’s terrifying book, The Age of Surveillance Capitalism, which offers a chilling account of the ill effects of how Google has capitalized on user data. Cautiously, speculatively, I want to propose a peek into what the future of data on American Jewish life might look like if we were to harness the new capacities of data collection. This prospective and partial vision is meant as a provocation to my colleagues in both research and Jewish organizations to think anew about data: how it is produced, and what it means for the American Jewish community.
Today, as I described above, data can be understood as the currency of the deal that users strike with service providers. Users willingly allow their data to be collected by providers in exchange for the services promised. Often, these services come without additional financial cost to the users. But those are the terms of engagement: users surrender their data, providers capitalize on it, and the risks are usually considered (considered!) minimal enough to warrant them.
Meanwhile, people produce data everywhere, all day long. Your phone knows where you have been. If you use credit cards, check out library books, or buy anything online, your purchasing habits and patterns become data for the algorithms that govern online retailing. Social media posts serve up data for more algorithms and advertisers. Bots and cookies, test scores, diagnostic exams, keystrokes, logins, internet searches, preferences in streaming media, footsteps, and charitable donations all become data when collected systematically and fed into larger and ever-more powerful algorithms that are usually calibrated to deliver an advertisement for your next movie, vacation, grocery run, or book.
Under these conditions, it is possible to track nearly everything. Or rather, it is possible to track so much of life that one does not need to track everything to obtain a reasonable portrait of a person’s behaviors and actions. The power of this new world of data is that it only need be mostly accurate to be effective and, as modeled by my friend’s employer, what it lacks in depth it makes up for in scale.
Love Jewish Ideas?
Subscribe to the print edition of Sources today.
Of course, such data could not reveal what percentage of American Jews say that they find their religious faith to be meaningful and fulfilling. But in documenting behaviors, it could produce a reasonable portrait of how people react to various stimuli, and that might be sufficient for helping Jewish organizations make decisions and direct resources. The algorithms that power social media platforms or online advertising do not need to be precise to the decimal point, they only need to understand that a person who likes brand X might also be interested in brand Y. One could imagine a synagogue tracking Amazon purchases to detect when a young family is expecting a child, or a summer camp collecting social media networks of children who attended in order to roll out a new outreach campaign.
If Jewish organizations—the ones that typically fund and rely on community studies and evaluations—wish to know about the lives of Jewish people, they might get a higher-resolution portrait of behaviors and attitudes if they follow the data that people willingly and steadily leave behind in their digital wakes. If Jewish communal professionals wish to know more about their communities to better serve them, then why not go where the data already is, to figure out where the people want to be?
There are obvious dangers, many of which Zuboff chronicled in her book. I am not sure I want my browsing history known to my local Federation, and I don’t know that I want a local synagogue approaching me for membership because I live near some of its members. But I also think that there is no going back to a time when surveys dominated the data landscape. As a result, the question facing American Jewish leaders is not whether they should make use of digital data, but how to do so in a way that is ethical, that honors people’s privacy, and that does not seem creepy or invasive. I am certainly not suggesting that data ought to transform Jewish life into a commercialized subsidiary of some large, multinational corporation or social media platform that is just trying to sell something. I am advocating for precisely the opposite.
I believe it might be possible to harness the power of data without succumbing to the temptations and pitfalls that plague commercial interests. The goal would not be to sell the data relating to American Jews, but to treat it as Jewish organizations strive to treat people: ethically, with care, and with scrupulous attention to the highest ideals of justice, fairness, and safety. If harnessed properly, the flow of digital data can identify new program areas, reveal habits and behaviors heretofore unknown to the leaders of American Jewish organizations, and offer productive and maybe even predictive insights into how we can make Jewish life better.
For example, instead of asking people if they attended synagogue on a given Saturday, it would be possible to create a reasonable statistical representation of those who did, just by tracking their phone’s location (assuming they brought their phones with them—which might exclude the halakhically observant Jewish population). It would be possible to know how much charity people gave (and to whom), what media they consumed, and whether they regularly attended a JCC, a Jewish Film Festival, and so on. Google searches or Amazon purchase records might reveal similar patterns about Jewish interests, needs, or even holiday preparations. If Amazon knows what I might be interested in before I do, imagine the possibilities for American Jewish organizations who always seem to be asking what people want and how to better serve them.
There are lots of directions this could go. One could imagine a synagogue or a JCC reaching out to a young person who had just moved to their area. A Federation could identify new donor streams. Schools could locate potential new teachers or board members. Jewish news outlets could use data on whether someone follows news about Israel from mainstream news sources and send invitations their way. If I have just returned from a vacation during which I visited an historical Jewish site, my local Jewish museum might send me a notice about an exhibition speaking to similar themes. The point is less about what specific kinds of information might be out there for Jewish communities to use, and more about the possibilities that new forms of data might hold for Jewish communal organizations that are still trying to plan by using survey and demographic data.
Practically speaking, it would not be hard to get consent from people to allow a Jewish data collection center to mine data that could, with appropriately skilled staff, proper levels of support, and necessary layers of security, track the lives of participants. People already give their data away everywhere, all day long, for free. Participating in a project like this would be less time consuming than responding to a survey like Pew’s. A team of researchers could offer small incentives to participants to share their digital data for a defined period (one year? Five years?), and then develop a set of questions to apply to the data as it emerged.
(Don’t) Trust the Data
At its core, the data deal is grounded in trust, responsibility, and a sense that the benefits outweigh the risks. The folks who sign up to participate in this project would have to trust the people and organizations that are collecting and analyzing their data. People must be assured that participating in this effort will not expose them to risk or harm in any way. Or, at least, they must be assured that participating in this effort will be worth the risk, just as they are when using any social media or online shopping company. But for this to work at a communal, rather than a commercial, level, members of American Jewish communities will have to trust that their data is being used to inform and improve their community.
That trust must be reciprocated through trustworthiness. An effort like this would require extraordinary care, consideration, transparency, and delicacy. It would require high standards of confidentiality, protection, and a commitment to anonymity at the individual level, even as it afforded representations of American Jews in the aggregate. Organizations at the heart of such an endeavor would need powerful firewalls against those who wish to steal or abuse user data, both online and off. Researchers would have to follow the highest levels of data protection and safety protocols and demonstrate that they deserve the trust of the people whose data they gather and analyze.
There is no doubt that the world of data has changed. The question on the table is: Can American Jews change to meet it? Whether a Jewish organization or Jewish community could cultivate an appropriate level of trust might well be the real test of data collection in an age of “surveillance Judaism.” But criticism of surveys and other traditionally collected data suggests that the trust between American Jewish organizations and American Jews is already straining. The world of data in the 21st century suggests a way forward, but not by refining survey questions or sampling strategies. Instead, it requires a reimagined relationship between American Jews, their organizations, and the data they share. Establishing a relationship grounded in trust around the care and keeping of data might generate a greater sense of collective responsibility, safety, and mutual respect. That would be nothing short of a real data-driven revolution in American Jewish life.