Documentation & methodology
About the project
In 2019, the NYU Public Safety Lab began identifying publicly available online jail rosters and collecting daily individual-level records from them, as part of our Jail Data Initiative. During 2020, the Public Safety Lab team began standardizing these data, and developed our primary database of booking records from more than 1,000 jail facilities. As of 4/28/2025, the Jail Data Initiative database houses records for more than 2,085,295 unique jail bookings.
In 2023, the team began a reexamination of the architecture of the project, including: the incorporation of updated information from the 2019 Bureau of Justice Statistics 2019 Census of Jails; the separation of jail data collection and processing to extend the project's longevity; increasing the frequency of data collection and pulling historical records, where available; rebuilding our website to make the project's data more accessible and useful for research and advocacy. This new website includes an interactive dashboard, in-depth jail facility profiles, detailed booking records and roster searches, and other tools to better understand the Jail Data Initiative data.
Our original documentation page is available here; below, we provide documentation pertaining to our recent project rebuild.
To apply for access to individual-level records, please complete our Data Use Agreement. For additional information, please contact us at questions@jaildatainitiative.org. The adjacent map highlights counties in which we have collected any jail records as of 4/28/2025.
Glossary of terms
Jail characteristics
Terms calculated using data collected by the Jail Data Initiative (JDI).
County detention capacity
Total county jail beds, as reported in the Bureau of Justice Statistics' 2013 Census of Jails. If a county has more than one jail, these capacities are summed.
Daily jail population
The total number of individuals incarcerated in a jail facility on a given date.
Jail
A local correctional facility which generally houses individuals who have not been convicted (i.e. people incarcerated pretrial), those with sentences of 1 year or less, those waiting to be transferred to other facilities including prisons or mental health facilities, and/or those with community supervision or bail violations. Jails may also enter into contracts to detain individuals on behalf of other entities, e.g., federal and state governments, and agencies such as U.S. Immigration and Customs Enforcement (ICE).
Jail roster
A publicly available, online log of all individuals detained in a jail facility on a given date. Jail rosters may be updated in real-time, hourly, daily, etc. JDI only collects data from jail rosters that are updated at least daily. A single jail roster may contain information for multiple counties or facilities. For example, West Virginia provides a single online search portal for all jails in the state. Counties may also make more than one jail roster available online.
Length of stay
The number of days an individual is detained in a jail facility for a single booking. Since jail roster scraping may begin after an individual has already been incarcerated, JDI extracts a booking start date from any fields reported on a given roster. For example, if an individual appears in scraped data starting on January 10th, 2022, but their record indicates an admission date of January 1st, 2022, their length of stay is calculated from January 1st, 2022.
Top charge
The most severe charge reported for an individual, according to the Criminal Justice Administrative Records System (CJARS) Text-based Offense Classification (TOC) algorithm. These are (in order of severity, from most severe to least severe):
- Violent
- Property
- Drug
- Public Order
- DUI Offense
- Criminal traffic
Charges may not be reported on all jail rosters.
Weekly releases
The number of bookings that end during a given week. A booking is considered ended when an individual no longer appears on the roster. It is possible that individuals are transferred to another facility rather than released into the community.
Weekly admissions
The number of new bookings seen on a jail roster during a given week. A booking is considered new when an individual appears for the first time on the roster.
Demographic characteristics
Terms standardized by JDI from raw values reported on rosters. These fields may not be reported on every roster.
Age
Reported age of individual in jail, measured in years. This field takes on the following values:
- 15-24
- 25-44
- 45-64
- 65+
Individuals without reported age, or with reported younger than 13 or older than 98 are excluded from reporting.
Race
Reported race and/or ethnicity of individual in jail. Corrections reporting often conflates race and ethnicity; as such, JDI combines these fields and reports a single race demographic field. This field takes on the following values:
- AAPI (Asian-American or Pacific Islander)
- Black
- Indigenous
- Other POC (including Hispanic, Latino, Hispanic White, Middle Eastern, Multiracial, etc.)
- White (non-Hispanic)
While Hispanic white individuals would fall into the Other POC group, ethnicity is not considered for other racial groups. Some jails may report race but not ethnicity, some may report race and ethnicity in one field, and some may report them in separate fields.
Gender
Reported sex and/or gender of individual in jail. Corrections reporting often conflates sex and gender; as such, JDI combines these fields and reports a single gender demographic field. This field takes on the following values:
- Female
- Male
- Nonbinary
- Trans
Some jails report individuals as male or female only and do not have information on nonbinary or trans gender identity.
Data architecture
The sections below contain detailed information about our data collection (also known as web data scraping), cleaning, aggregation, and other processes and architecture. If you are aware of any complete, publicly available, web-based jail rosters that are potential data sources from which we are not already collecting data, or if you observe any inconsistencies in our data that you suspect may originate in faulty logic or incorrect classification, please reach out to us at questions@jaildatainitiative.org.
We believe in transparency and humanizing language. We are currently scrubbing our GitHub code repository of sensitive information, with the intention of making it publicly available. If you need access to our code in the interim, please reach out. We try to avoid the use of words like "inmate". However, our code was a collaborative effort, and in some cases, variables in our code were labeled using this term, which may appear in the documentation below. We apologize, and we will attempt to remove these as part of our code cleaning process.
Web scraping metadata
The Jail Data Initiative currently uses Airtable to keep track of its web scraping activity. Below is an abridged view of our Airtable base, showing facilities, associated web scrapers and their associated metadata.