1.5.2
Newsjunkie.net is a resource guide for journalists. We show who's behind the news, and provide tools to help navigate the modern business of information.
Use of Data1.5.2
1.5.2

The Marshall Project is a nonprofit investigative newsroom that covers the U.S. criminal justice system.
Black & Pink is a U.S.-based nonprofit centered on prison abolition, LGBTQ+ advocacy, and mutual aid.
The Vera Institute of Justice is a non-profit research and advocacy organization focused on the US criminal justice system and prison reform.
United States (Virtual)
The Data Rescue Project (DRP) is a volunteer-driven coalition of data librarians, archivists, and researchers, founded in February 2025, working to preserve and provide access to at-risk public US federal government data.
Washington
Nonpartisan think tank and invitational membership organization that "advances understanding of criminal justice policy choices and builds consensus for solutions that enhance safety and justice for all."
Jill Castellano is a Pulitzer Prize-winning data reporter with The Marshall Project, a nonprofit news organization that covers the criminal justice system. Castellano contextualizes The Marshall Project's investigations with data-driven research, creates data visualizers, and writes methodologies to help other researchers to conduct similar research.
“What can you conclude from a dataset that has so many missing values?”
I believe your most recent methodology was on investigating hospital drug testing at childbirth. Can you take us through that project—how it came about, and how you gathered your data sources?
I was brought in to help with a very robust area of reporting that my colleague, Shoshana Walter, had been doing groundbreaking work in for years, exposing that this problem existed, which is the problem of very faulty drug testing at childbirth.
Women go into the hospital to give birth, and either it's mandatory drug testing of the baby or the mom, or in some cases it's discretionary, but you may be drug tested at childbirth. Those drug tests cast a really wide net. That means a lot of people get swept up into these tests that come back with false positives.
You get situations where people have taken totally legal substances that flag as illegal substances, like poppy seed bagels flagging as opiates and baby wash products flagging as THC, or drugs that women are prescribed or given during pregnancy or childbirth—fentanyl in their epidurals—and then testing positive for fentanyl. All sorts of scenarios [of] that test being misinterpreted as an illegal consumption of a controlled substance.
Shoshana had been doing incredible reporting for a long time on this issue. She was able to see that there were situations where women were being arrested, women had their babies taken away from them, sent to child welfare, interrogated, and that this was very bad for the babies and the moms. Our question was: How often is this scenario really happening? I was brought in to offer some data to contextualize the scope of this problem.
We were able to get data from the National Data Archive on Child Abuse and Neglect, which is a warehouse that stores data for the federal Children's Bureau. Child welfare agencies submit data to the federal Children's Bureau voluntarily, and a lot of times the data is wrong, which is what we learned—a big part of this process was going back to the child welfare agencies that submitted this data. There were about 20 states that submitted it, and about half of them said the data was wrong, and that we needed to not include the data, or update our numbers, or request new information.
Their own data that they submitted was wrong?
Yeah. I think it's a really good reminder that you have to triple check everything. Just because it's in an official source doesn't mean that it's right.
We're getting the data in a circuitous way, rather than going to child welfare directly. It was information they had submitted and we got elsewhere. So it was a good reminder: you have to go back to the original source and make sure they know what they've submitted and that you're interpreting it [correctly].
We went through this whole big process to get data from states all over the country, as much as we could, and we found that there were 70,000 cases where women were accused of using drugs during pregnancy and then referred to law enforcement, either police or prosecutors, and that data covered 21 states and a six-year period.
Did the states give you a reason why their data might be wrong? Where did those mistakes come from?
There was one state, they said they had accidentally submitted data every time a form had been filled out suggesting that they may notify law enforcement, but not the actual notifications themselves. The filling out of the form was mandatory, but the actual notifications to law enforcement were rare. So that was an error on their part.
In Texas, at first it looked like their child welfare agency refers very few cases to law enforcement over alleged substance use during pregnancy. Then when we talked to them, they said, “Oh no, that’s wrong. We refer every single case. It's mandatory under our state laws. We don't know why the data is so off, but it's probably just that we didn't do a very good job of collecting it.”
So you see that too—states where they're doing their best, but they don't have a mechanism to get information from a county child welfare agency, up to the state, to then report it to the federal level. There's just lots of missing information.
We spent a long time on this project, and I think we were very careful in any conclusions we were drawing from a dataset that had a lot of missing information or a lot that we knew was under-reported. What can you conclude from a dataset that has so many missing values?
Ultimately, we were happy with where we landed. If you go to the Marshall Projects website, there is data that you can download on how often allegations of substance use during pregnancy result in referrals to law enforcement for the 21 states that we have data for. We also have other data in there as well, pretty much anything we could get on this topic.
What were the ways that you addressed the holes—ways that you were able to work around the gaps, and analyze and draw conclusions from imperfect data?
One [gap] is that any values below 10 were suppressed in the data that we got for that project to protect the confidentiality of individuals, because these are children who may have been abused. Theoretically, that's why these cases get referred to law enforcement. We had some number of values that were missing, and we didn't know whether that value was a zero or a one or a nine—how does that impact what we're trying to say?
But more broadly, there were a lot of unknowns about—how good is the data collection in Ohio versus Texas, and what might be missing from those numbers, and how do we think about that?
We always try to be conservative in an analysis, draw the least extreme conclusions we can. If we think we might be exposing a problem—let's give this problem the benefit of the doubt, [check] if there are some unknowns, and then let's see whether the problem is still there. We try to use the minimal estimates.
If we wanted to, we could maybe try to come up with maximal estimates that there could be 100,000+ referrals to law enforcement, if you try to come up with a way to estimate how much missing data there is. But that's not really useful to people. We wanna say with confidence that we know at least 70,000 cases were referred and we know that it's an under-count. But we try to be definitive when we can, rather than estimating, and be conservative in order to provide people with useful information.
At the Marshall Project, we're okay with saying what we don't know. We don't wanna exaggerate, and we wanna be transparent if there's information we don't have. So we published, along with that story, a very long methodology that was detailed in all of the caveats with the data—things that were missing. So people know there's transparency.
And it's also if somebody were to come to us and say, “Hey. Did you think about this?” Or, if they have a concern that maybe we were misleading people, we'll say, “No, look, everything's available. You can take a look at what we've got here and see for yourself how thoughtful we were in everything we did.”
So [the reasoning behind] the methodologies, besides helping other reporters, is showing your work. There's a lot of worry these days about retribution too. I don't know how well you can really protect yourself [from that] these days, but it's a good start.
It's a bit of a CYA [cover your ass], right? You're being very clear on what you did. And before publishing, I go back to whatever entity I got data from, or I'm drawing conclusions about, and share what we plan to publish. We don't believe in surprises or catching somebody off guard because they are represented in a certain light. That's really helpful. I think it allows for an ongoing dialogue, and it minimizes angry phone calls from people who are like, “How could you do this?” That's not the business we're in.
With data journalism, it's not a “gotcha” kind of situation. It's slower, it's working together with these agencies.
“These are stories of real people…or lives impacted by government policy. If it goes away, then we're missing a lot…”
We've been seeing a lot of staff cuts and budget cuts at federal data collection agencies. Is there a similar problem with these state and county agencies, where resources are so short that these mistakes are bound to happen?
It's hard to say. Some places, some states it seems never did a good job of collecting this information, because it's not a priority. But we know that more staffing would certainly help and maybe rethinking the priority of what those staff members are doing.
We talked to several child welfare agencies that told us “We are totally swamped with these positive drug tests of pregnant people and we're being told to prioritize these cases as potential high risk situations for the mother and baby. And yet, if we're doing that, we're actually de-prioritizing other types of cases, like accusations of violence in the home and criminal conduct.”
So [mistakes happen] because they're getting mandates about what matters, and there's a really broad brush being cast here where every positive drug test goes to child welfare, and then child welfare is like, “Oh my God, we're overwhelmed. What do we do with this?” We definitely found that there was an issue of staff not having enough time with these cases, and with these families, and feeling like they couldn't do a good enough job.
I’ve seen a lot of panic, as I've been covering the way that the federal government is defunding all its data infrastructure, because—you can have the information, but if you don't have all of these different stop-points where it can be analyzed, where it can be put onto the next place correctly, the usefulness just falls to the side.
Absolutely. I think that's a huge issue and a lot of people—whether that's advocates, attorneys, journalists—are thinking about that. And there's a great resource, the Data Rescue Project, that I follow that's going through this effort to try to save some of these data sources so they don't disappear forever.
But if the data's no longer being collected, then we miss the opportunity to use it. It's not just data for data's sake, right? These are stories of real people, or of funding and how our money is being spent, or lives impacted by government policy. If it goes away, then we're missing a lot, and we're missing a lot of transparency about our government agencies.
You mentioned the National Data Archive on Child Abuse and Neglect and the federal Children's Bureau. Have you seen any impacts on collection from those agencies?
I haven't heard anything that suggests that their data collection has stopped or changed. I'm speculating, but perhaps because it's voluntarily submitted data by the states and [federal agencies] don't do much to process it, vet it, they just hang onto it.
Then it can be requested mostly by researchers, but they gave us the opportunity to request and get the data as well. That data I haven't seen affected. But it's interesting because it wasn't very accessible, before we got it, it was really only made available to researchers.
We went through a long negotiation process trying to explain why we think we should take a look at this data and the kinds of analysis we would wanna run and why we think we can be responsible with this data. And we spent a year going back and forth with child welfare agencies in every single state in the country to, at least, know what their policies are—when does child welfare refer to law enforcement, under what circumstances—in the instances where we could not get the data. We really did everything we could to make the existing information better, and make it available to the public.
“It's not like we, as one institution, can change everything in the justice system. But by building a network…”
I noticed that Marshall Project really has that collaborative spirit, where you are making the data public, and you are writing these methodologies to help other journalists and researchers. Can you talk a little bit about the reasoning behind that and why you guys are willing to share your work in that sort of way?
I think that we operate on transparency. That's how we get information, so we're transparent in what we give back. We have a whole initiative called Investigate This! where we produce toolkits designed for other journalists. What we really care about is—not only are we able to do a certain amount of reporting, but if we can help other journalists report on the same thing in a responsible or thoughtful way, that is furthering our mission to expose what's happening in the justice system.
It's not like we, as one institution, can change everything in the justice system. But by building a network of journalists who are capable of doing good reporting in this space—I think it just makes [our work] stronger.
I really admire you guys and [your approach to] the co-ownership of truth and information. We all deserve to have this knowledge about how the criminal justice system works, and it's such a black box without this reporting.
And collaborative [work], maintaining relationships—that really interests me as part of the project that I'm working on, especially as federal data becomes less reliable. I think it's up to the independent sector to band together and help each other fill in gaps that are missing.
We do lots of partnership projects. Part of our model to get our reporting out to more people is co-reporting and co-publishing with other newsrooms. At a time when newsrooms are strapped for resources, you might as well work with other journalists across publications, if it means that you can divide and conquer. Maybe we can provide data skills and another newsroom can provide more on the ground reporting, or maybe there's another way to divide up the work that plays to each newsroom's strengths and allows for more ambitious journalism.
Can you talk about any of those partnerships that you've worked on?
I've only been at the Marshall Project for a year, so I feel like a lot of them are still in progress or in the early stages. But there's been some great work across the newsroom that has involved partnerships.
Last summer, when immigration really became a big issue in the news every single day because of the immigration raids, our newsroom was doing reporting in Florida. I was one of several reporters. We were speaking with somebody who had been incarcerated in Florida—he had gotten into a fender-bender—and I believe he was turned over by a sheriff's department to ICE, and he had been deported. We were able to speak with him. One of our reporters spoke Spanish, so we got him on the phone.
But this was a story we were gonna be publishing in English on our site, which is probably not a place where people impacted, or people who care about this issue, are going to be. So we worked with Univision and they were able to get that out in Spanish to their audience.
That's what matters too—reaching the right people and trying to get your work out into new spaces where people aren't automatically coming to you.
Yeah, absolutely. When I was talking to Kenna Barnes at Black & Pink, we talked about that. She was emphasizing it is really important for larger analytic research institutes to partner with these grassroots-level organizations that are plugged into the communities being studied, or [communities] you're trying to reach. I think that is really important to keep in mind, especially as the state-level data is harder to come by.
I have been looking for similar projects to Black & Pink, but for immigration detention data, especially because ICE data is so unreliable. Do you have any recommendations for groups who are on the ground and connected to immigrant communities, to detainees?
There's a group called the National Immigrant Justice Center. Vera Institute of Justice does some good work with immigration data. And then, a lot of this has been speaking with attorneys who represent immigrants in immigration court in places on the ground that you're reporting.
I was doing some reporting in Chicago about Operation Midway Blitz, and what happened to people who were swept up in that operation, and where they ended up. Partly because, Illinois has a moratorium where they are not incarcerating people and immigration detention facilities there, and as a result, people get moved elsewhere. They get moved all over the country.
We were looking at that and we knew that we needed to actually talk to affected families. Speaking with advocates in communities that have detention centers and with immigration attorneys allowed us to talk to the family members of people who were in these facilities, or who had been in these facilities, and could tell us what it was really like—what the conditions were and how cold it was, or how there was an outbreak of an infection in one of the facilities and so on. That is good old fashioned shoe leather reporting.
“This is my philosophy as a data journalist: The data is just numbers. You have to find the stories of people who are impacted, or it's not even worth it.”
Did being able to talk to the families illuminate differences between that experience and what was being reported in the data?
It really supplements anything you can get from data. This is my philosophy as a data journalist: The data is just numbers. You have to find the stories of people who are impacted, or it's not even worth it. It's very hard to get people to care about just some numbers on a page.
We're not researchers, we're journalists, and it's important to explain to people why this matters—what is this data actually showing? And so if we're talking about the hundreds or thousands of people swept up in an immigration raid, I very much believe that it's important to talk to impacted individuals, and that's hard work.
Going back to the project you were doing on drug testing after births, were you able to talk to sources for that to contextualize the data?
Absolutely. And I really have to hand it to my colleague Shoshana, who has been really doing an outstanding job in that space. I was focused more on the data for that project, but she was deep in trying to find impacted people.
It's very hard for something that is extremely personal, and doesn't happen all that often. If you think about it, you have to find somebody who not only had a baby, but also was drug tested, then was referred to law enforcement, and is willing to talk to you about it. Not only willing to talk to you, but provide you with records and allow you to ask very intrusive and personal questions about that story. It's hard to do, and she spends a lot of time building trust with people.
Transparency is important, not only in data journalism, but in that kind of reporting, absolutely. Tell people exactly what it is that you're doing. Tell people that you can't guarantee that their participation is gonna change anything, but that this is what we do, and this is why we do it, and why we talk about these topics.
So [Shoshana] was able to find people who had been impacted. She spoke with a woman named Ayanna who was taking CBD gummies to help with very bad nausea, and she was accused of using THC products and she was arrested and she was put in a jail cell and she lost her ability to breastfeed because she had been separated from her newborn.
That's the story. I do the data—splash the number 70,000 on it—but that doesn't matter if Shoshana's not there and we're not talking about what this actually means.
I think making those connections, building trust, and the transparency that you've been emphasizing, that is really the solution for a lot of problems that are facing journalists right now. Not only the problems of the job, but also problems coming from the outside—being attacked, being questioned, “the press is the enemy” kind of thing. If we have these connections with the public, then it's a little bit harder of a narrative to sell.
One of the things I love about the Marshall Project is the very clear effort we make to not report about a community, but with and for a community. That's why we have writers who are incarcerated contributing stories to our website. And we continue to have a dialogue—we don't just ask members of the public, or sheriff's departments, what July 4th means to them. We ask incarcerated people what it means. That's really important. Those voices are missing a lot.
And, I think some journalists may stop at the point of talking to people in positions of authority because they're easier to access. But if you really wanna know what's going on—you're gonna get a richer story about people who've been there and can actually talk about it firsthand, rather than an official who might have an incentive to say things are working well. It makes for way better reporting.
I think you can end up repeating the official line in that way, and reinforcing it.
Yeah. I talk about this a lot because I do a lot of work thinking about crime statistics. And there's a lot of news all the time about “crime is going up,” or “crime is going down in this area, or that area, or that part of the country,” and it is oftentimes repeating what you might see in a police report, rather than digging in deeper. It's my personal mission to help improve crime reporting, I really care about that, so I try to educate journalists on taking a closer look at the data to see whether the official narrative actually bears out.
Crime is way down across the whole country, and it has been going down and down for the last couple of years. But if you were to zoom in on any particular police agency, you might hear them say, “Well, crime is going down because we got a bunch of new equipment, or because we instituted a new workforce or task force that helped tackle a certain kind of crime.” Basically, police agencies very much like to talk about what they're doing well and how that's reducing crime. But if you look at the data, it's not just happening in one location, right? This is a trend happening all over the US. So if you put your thinking cap on, you might be able to suss out that one thing that one police agency did, or is trying to take credit for, isn't the whole story here.
You’ve got to be a little bit thoughtful about the data that you get and what it's showing, and what it's not showing, to be able to interpret it.
“You have to be nuanced and complicated where the liars get to be clean and simple.”
I feel like I hear all the time that crime data is so hard. There are so many factors that impact it. Just because crime is down right now, you can't say that one thing is causing it because it's so complicated. I imagine working with crime data all the time is maddening in that way.
It takes a lot of thought, and consulting with experts who know way more than I do. I totally recognize that journalists are on deadline and they have to do things quickly sometimes. But even in small ways, I think there are things that we can do to improve how we report on this topic.
Can you take us through those small ways?
I am in the process of working on a toolkit for our Investigate This! series that's gonna really talk about some of these things, in a lot of detail. But one example is checking to see if the trend is happening—if you're seeing crime go up or down, is it going up and down in other places or just in the place that you are?
Also, looking at several years of data is really helpful because crime is cyclical. If you're not very familiar with crime data, you may look at murders and [be surprised to] see that it went up from 10 murders in January to 20 murders in July. And I, as somebody with experience with crime data, would say that's not surprising at all. Murders go up in the summer. They just do—people are outside more—this is a known phenomenon. In order to factor that in, just look at the last few years of data rather than just doing a year-to-date calculation.
Ask for the underlying crime data. If the police say “murders went down by this much,” look at the underlying data and just do a quick check to see whether that's actually right.
Think about whether you should be reporting percentages or raw numbers. If you're saying aggravated assaults went up from two to four, that's a hundred percent increase, they doubled. But we're talking about really small values, right? So you want to be as fair as possible. And in that case, it might be better to say they went up from two to four rather than doubled, because of the impression it leaves people.
You might hear this come up from time to time, this metaphor: Journalists don't report on the sharks that don't bite people. When things go well, they don't really get as much attention.
There certainly is reporting on crime going down, but it's not nearly as much reporting [compared to] specific incidents of serious crime, or the narratives that police are saying about a crime going up. I think we can do a better job of looking at the whole picture of what we know about different crimes, and potentially do a better job of reflecting that in our storytelling.
It's harder to write about crime going down, and to talk to people about crime going down, and feel like that's a compelling story. But it really can be, if you start to think outside of the box a little. You can talk to advocates. You can talk to people who are doing the really important work of violence interruption, or volunteer work for people suffering from domestic violence, or whatever it is. That's solutions oriented journalism—talking about not just a problem, but the fact that there are people dedicated to addressing those problems.
I also think it's helpful to just be specific, and this would be for any topic but is true in crime reporting as well. I try not to actually say crime is going up or crime is going down despite me saying it a bunch of times here. When I write a story, I'm like, “Murders are down X percent from this time to this time,” offering a level of specificity. And if you're not talking about murders, what crime is it that you're talking about? Are we talking about assaults, are we talking about property crimes? Be specific, because when people hear “crime,” they have very different perceptions of what that actually means.
I think this stuff, even though it might seem small, is really important, because the public has this perception that crime is always going up. More than half of Americans think that crime is going up, even though we're seeing historic drops in crime. And journalism plays a real role in that. That makes us feel unsafe in our own backyards, when we think that we could be a victim of a crime at any time. I think it's very important to get this work right.
That is such a powerful tool to sway the public to your side, right? “Crime is going up. I will keep you safe by doing X, Y and Z.” This is something that I think about a lot—it's so hard to be heard over the noise, and especially over a really compelling, tight, clean narrative of like, “Immigrants are coming to steal your jobs. They're bringing crime. That's why we need this crazy deportation [scheme].” What are ways that journalists and data journalists can fight back against these narratives?
You have to take your time. You have to show your work. You have to be nuanced and complicated where the liars get to be clean and simple.
Debunking is really hard, and it takes a long time, and I don't know that we've figured out a solution that makes what we do as compelling and easily digestible as somebody who can point to a single egregious case and that sparks outrage.
Sure there are examples of immigrants who have committed violent crimes, but what we try to do is look at the big picture, look at the research, look at the data itself. You can look at the data that the government itself releases, that ICE publishes, and see that most of the people that are being swept up into the immigration raids now are people with no criminal records at all. You can try to use these things to help combat the official narratives.
A couple years ago, my colleague worked on a story about “The Myth of the Criminal Immigrant.” And what I liked about that piece was that it was very graphic-forward, rather than being a story with lots of writing that you had to digest. It was just like, “These three charts will tell you, people come into the country and their rates of committing violence are actually lower, or are actually neutral compared to the [average] citizen in America.”
Sometimes I think being creative about the way we present information could help as well. I remember that piece did well and people still come back to it. It's still a resource.
Yeah, that’s great. I feel like that's a really helpful role that journalists can play—make things accessible, and interesting, and exciting (when we can) for the public, as a way to [invite] them into civic participation.
“Working with any government data source… You really do have to be skeptical right now.”
I feel like an overarching theme of this administration is cutting off public access, cutting off public knowledge. Have you seen that in your reporting and your work? Is it getting harder for you to have that access, and to reach the sources that you were using before?
It is hard. In literally my first week at the Marshall Project, I wrote a newsletter—we have a weekly newsletter called Closing Argument, which is a synthesis or analysis of a justice topic—and it was about how Trump had already stripped a lot of information and changed what you could access on government websites, just a few months into his tenure. I wonder whether data collection is going to look the same around data sources that I've used over the course of my career for different projects.
I did a big project, and I know a lot of reporters have, out of data that came from the Department of Education's Office of Civil Rights series. They produce really robust data on chronic absenteeism by race and other justice related topics in schools—corporal punishment, things like that—all the way down to the school district level. And boy, is that a valuable resource! But every time I think about a government data source now, I wonder if it will be there the next time that it would normally be updated. I don't know.
Think about the census and how important the census data is. And how much fighting there has been over whether changes to the census can be made that might encourage or discourage participation of particular groups, or may or may not allow people to report their own gender identity.
Working with any government data source, it's very important to read the fine print and check to see whether the way this information was collected, or how they're defining certain variables, is the same as it was in prior years, before drawing conclusions using that data. You really do have to be skeptical right now.
Have you started gathering alternative sources? What are the ways that you're trying to prepare for the possibility of these sources being gone, or altered, or not as complete as they used to be?
Looking at non-government sources who have an expertise in the topic area that you're interested in is a great tool. And I know there's also resource hubs that are kept by institutions like Harvard that you could potentially go to that might have archived data, and might also have their own data.
Or maybe it's working with researchers, and using data that they've made available, or looking at survey data from places like Gallup that operate independently and can offer some context.
What I really like about Gallup is they have a social series of polls. Every month they focus on a different social topic, and one month is crime. So that's really good for me because that means every year they ask a series of questions about crime and perceptions of crime. But one of those months, every year, they ask about the economy. If you were an economics reporter, that would be a great resource. And these places that offer polling, if you talk to them and you're a journalist, oftentimes you can get underlying data as well, rather than just the report that was available online. And you can use that in your storytelling.
There's a series of different polls by, not only Gallup, but I think Pew, and Ipsos, and other places, to look at how people were perceiving the National Guard deployments in big cities last year. And you could just see how unpopular it was by looking at those data sources. And that's data that's totally usable even if you struggle to get other kinds of data on the same topic, like how many people are arrested in some of these incidents.
One more I will mention is the Council on Criminal Justice. If you're interested in crime statistics, they have reports and their own data collection to look at—recent trends in different types of crime. If you're looking for a non-government source that is probably going to have less of an incentive to say one thing or another thing about the crime data, but just provide a report on the crime data itself, I think that's a good place to look.
Is there anything that I missed, anything else you would like to add?
The only thing I'll say is that we offer consultations for journalists who are interested in learning more about stories that we've done, data sets we've made available. We offer webinars and we speak, not only with journalists, but with researchers and groups who wanna learn more about our reporting. That is available. And anybody can feel free to reach out.
Edited for sequencing and clarity.
Hero card on Prairie Fire home page credit: Jill Castellano and Shoshana Walter/The Marshall Project
Newsjunkie. Jill Castellano, interviewed by Morgan Kriesel, April 24, 2026.
The Marshall Project. Jill Castellano Joins The Marshall Project Covering Crime Data
© Newsjunkie.net 2026