(AP) -- Census Bureau statisticians and outside experts are trying to unravel a mystery: Why were so many questions about households in the 2020 census left unanswered?
Residents did not respond to a multitude of questions about sex, race, Hispanic background, family relationships and age, even when providing a count of the number of people living in the home, according to documents released by the agency. Statisticians had to fill in the gaps.
Reflecting an early stage in the number crunching, the documents show that 10% to 20% of questions were not answered in the 2020 census, depending on the question and state. According to the Census Bureau, later phases of processing show the actual rates were lower.
The rates have averaged 1% to 3% in 170 years of previous U.S. censuses, according to University of Minnesota demographer Steven Ruggles.
The information is important because data with demographic details will be used for drawing congressional and legislative districts. That data, which the Census Bureau will release Thursday, also is used to distribute $1.5 trillion in federal spending each year.
The documents, made public in response to an open records request from a Republican redistricting advocacy group, don't shed much light on why questions were left unanswered, though theories abound. Some observers say software used in the first census in which most Americans could respond online allowed people to skip questions. Others say the pandemic made it harder to reach people who didn't respond.
Confusion over some questions, including traditional uncertainty among Hispanics about how to answer the race question, may have been a factor, but some experts hint at a more sinister possibility. They say the Trump administration's attempt to end the count early and failed efforts to put a citizenship question on the form and exclude people who were in the U.S. illegally had a chilling effect.
"I think it's the pandemic and Trump. The very threat that citizenship was on the questionnaire, the very notion it might have been on it, may have deterred some Latinos from filling it out," said Andrew Beveridge, a sociologist at Queens College and the City University of New York Graduate School and University Center. "I think a lot of us are flabbergasted by it. It is a very high number."
Ruggles initially thought it had to do with the software used by people who answered online -- about two-thirds of U.S. households. Other countries such as Australia and Canada, which have used similar software for censuses, saw the number of unanswered questions drop to almost zero because respondents couldn't proceed if they didn't answer a question.
"I guess in the U.S. version they must just have accepted incomplete responses," Ruggles said. "If the non-response rate was consistently high across response mode, that is just strange."
Acting Census Bureau Director Ron Jarmin said recently in a blog post that the blank answers spanned all categories of questions and all modes of responding -- online, by paper, by phone or face-to-face interviews.
"These blank responses left holes in the data which we had to fill," Jarmin said.
In a statement last week to The Associated Press, Jarmin declined to go into details, saying only that the bureau would release updated rates later this month "based on the correct numbers."
To fill in the holes, Census Bureau statisticians searched other administrative records such as tax forms, Social Security card applications or previous censuses to find people's race, age, sex and Hispanic background.
If available records didn't turn up the information needed, they turned to the statistical technique called imputation that the Census Bureau has used for 60 years. The technique has been challenged and upheld in courts after past censuses.
In some cases, statisticians looked for information answered about one member of a family, such as race, and applied it to another member that had blank answers. Or they assigned a sex based on the respondent's first name. In other cases, when the entire household had no information, they filled it in using data of similar neighbors.
"Imputation has been shown to improve data quality and accuracy compared to leaving these fields blank, or without information from respondents," Census Bureau officials Roberto Ramirez and Christine Borman wrote recently in a blog post.
The Census Bureau in April released state population totals from the 2020 census. Those are used to divvy up the number of congressional seats in each state during a once-a-decade process known as apportionment.
The agency released a slide deck presentation about the high rate of unanswered questions, along with group housing records and the first details about the rate of non-responses, in response to an open records request from Fair Lines American Foundation. The Republican advocacy group sued the Census Bureau for information about how the count was conducted in dorms, prisons, nursing homes and other places where people live in groups. Fair Lines says it's concerned about the accuracy of the group housing count and wants to make sure anomalies didn't affect the state population figures.
With the information showing high rates of imputation, some Republican-controlled states may try to leave college students out of redistricting data, claiming they were also counted at their parents' homes, to get a partisan edge, said Jeffrey Wice, a Democratic redistricting expert.
"That will be hard to prove but would inject more uncertainty and possible delay into redistricting," Wice said.