A new online tool from the Census Bureau will help stakeholders work with 2020 Census apportionment data once it is released this month.
The “Historical Apportionment Data Map” displays apportionment results for each census. While it currently just includes 1910 to 2010, the Bureau has said that the new apportionment data will be added to the map as soon as they become public.
The interactive map includes: Number of seats in the U.S. House of Representatives; Changes to each state’s number of seats in the U.S. House of Representatives; Population per representative for each state; Resident population of each state; Percentage change in resident population for each state; and Population density of each state. It will also have downloadable tables with source data and technical documentation, and be optimized for mobile devices.
Interested stakeholders would do well to familiarize themselves with its functionality in advance of the apportionment data release.
In a filing with a U.S. District Court in Alabama that was made public this week, the Census Bureau’s Chief Scientist, John Abowd swore a declaration that amounts to a comprehensive history of the Census Bureau’s legal, statistical, and moral responsibility to keep respondent information confidential.
Abowd made the core point that every survey the government conducts relies on trust that the personal information respondents volunteer will remain confidential. “Though participation in the census is mandatory under 13 U.S. Code § 221, in practice, the Census Bureau must rely on the voluntary participation of each household in order to conduct a complete enumeration,” the chief scientist wrote. This ethic at the Bureau dates as far back as when Congress first established confidentiality protections for individual census responses in the Census Act of 1879.
The declaration amounts to an expansive history lesson on how privacy protections have evolved over the decades at the Census Bureau. It describes why privacy is vital to government surveys and censuses that support a wide array of critical government and societal functions at the federal, state, tribal, and local levels.
The declaration is part of the government’s response to a lawsuit by the State of Alabama and others seeking to block implementation of new disclosure avoidance methods that some believe will make data less accurate, especially for the upcoming process of redrawing federal, state and local political jurisdictions. Abowd describes for the court the public process Census has engaged with stakeholders over many years to balance the need for privacy against the need for accuracy.
Abowd argues that while the Census Bureau’s confidentiality methodologies for the 2000 and 2010 censuses were considered sufficient at the time, “… advances in technology in the years since have reduced the confidentiality protection provided by data swapping.” He describes in detail a simulated “attack” Census itself conducted that showed using just 6 billion of the over 150 billion statistics re-leased in 2010 would allow an attacker to accurately re-identify at least 52 million respondents and with some third party data could re-identify around 179 million Americans or around 58% of the population.
The Census Bureau is continuing with stakeholder engagement on their latest privacy protection effort, often described as “Differential Privacy.” In the coming weeks they will be releasing a new demonstration file for stakeholders to assess and comment upon.
On April 13, President Biden announced his intention to nominate Dr. Robert Santos to serve as the next Census Bureau Director. Currently, Dr. Santos is Vice President & Chief Methodologist at the Urban Institute, Washington, D.C. He is an expert in survey sampling, survey design and more generally in social science/policy research, with over 40 years of experience. Dr. Santos is also the current President of the American Statistical Association. He has served on numerous advisory committees, including the Census Advisory Committee for Professional Organizations (2001-2006), and the CDC National Center for Health Statistics’ Board of Scientific Counselors (2017-2020). The Census Bureau Director’s position requires confirmation by the U.S. Senate. If confirmed, Dr. Santos would be the first person of color to permanently head the agency.
A new working paper from the Georgetown Center on Poverty and Inequality found that “the 2020 Census likely will contain similar inaccuracies seen in past censuses.”
Authors Bill O’Hare and Jae June Lee analyzed “self-response rates as an early indicator,” albeit an imperfect one, “of differential census data quality (i.e. the gaps in census coverage between groups and geographic areas).” These kind of “census process indicators… can provide early evidence about the likely differential quality of the census.” The paper examined “whether historically undercounted groups have relatively low self-response rates to the 2020 Census” to try to “uncover early evidence about whether historical patterns of unequal coverage in the census were likely repeated in the 2020 Census.”
On March 23, the Senate Homeland Security and Governmental Affairs (HSGAC) Committee held a hearing, “The 2020 Census and Current Activities of the U.S. Census Bureau.” Acting Census Bureau Director Ron Jarmin and officials from the Government Accountability Office (GAO), Christopher Mihm, Managing Director, Strategic Issues, and Nick Marinos, Director, Information Technology & Cybersecurity, testified.
The purpose of the hearing was to review the conduct and outcome, to date, of the 2020 Census. The hearing began on a positive note with HSGAC Chairman Senator Gary Peters (D-MI) stating, “There is no question that as the Census Bureau continues to process the data they have collected, and conduct robust data quality checks, their hardworking and dedicated employees not only deserve our gratitude, but the resources and time required to get it right.”
Acting Director Jarmin received numerous questions about the status of the Bureau’s current plans for releasing redistricting data. Senators Rob Portman (R-OH) and James Lankford (R-OK) expressed concerns about the implications of the delayed redistricting data release for their states. Acting Director Jarmin assured senators, “We’re trying to get the data to the states as quickly as we can.”
GAO officials noted that the Bureau has made significant progress, but still faces two challenges in completing the count—assessing concerns about data quality and finalizing plans to protect the data.
Here is the Census Project’s Fiscal Year 2022 funding recommendation for the U.S. Census Bureau. There are compelling reasons to provide the Bureau with a significant funding increase in FY 2022, and to justify deviating from the usual decennial census pattern in which the agency’s overall funding decreases between years ending in 1 and 2 after a decennial census, given the unique challenges and opportunities facing the Bureau.
The Senate Commerce Committee held a hearing for Don Graves, the Biden Administration’s nominee to be Deputy Secretary of Commerce, on March 10, 2021.
Noting the importance “that we have accurate data” from the 2020 Census, Chair Maria Cantwell (D-WA) worried that the delays in delivery of census data may create “challenges” for many states as they try to meet “their constitutional duties on redistricting.”
Cantwell asked Graves if he would “work to address these state issues” with a “truncated timeline” and “address the accuracy and timeliness of the census?”
Graves responded that he would “absolutely… work on that issue,” but more importantly, would “also listen to the experts — the career experts at the department — and not allow politics to impact the accuracy and timeliness of the census.”
A March report from the Government Accountability Office (GAO) highlighted the decennial census as one of five high-risk areas “requiring significant attention” that have regressed since 2019.
“The Census Bureau implemented new technologies and other innovations for the 2020 Census,” GAO noted, “but also made a series of late design changes, such as delaying operations in response to COVID-19, that put the quality of the census at risk.”
GAO determined that census leadership commitments, program capacity, the census action plan, monitoring, and demonstrated progress, were all “partially met.”
The report warned that, “in planning for 2030, the Bureau will not fully understand the quality of the data collected for 2020 until it completes all of its planned evaluations.”
One proposal for the post-2020 Census population estimates that the Census Bureau will produce is called a “blended base.” This reflects a new approach to post-census population estimates compared to the past few decades. In this paper, I review the blended base idea and explore the implications it has for young children.
In the past, the base for post-census population estimates has been the Decennial Census count. The base is the population used at the start of an estimates series. But the blended base idea would combine some data from the 2020 Decennial Census and some data from the Census Bureau’s on-going population estimates series, and possibly other data. Some of the material in this paper applies to the total population but the paper focuses on the situation for young children (under age 10).
The post-census population estimates produced by the Census Bureau are important for a couple of key reasons. First, many of the federal funding formulas use the data from the population estimates as a basis for distributing $1.5 trillion in federal funding each year (Reamer 2020). If the estimates are incorrect, some jurisdictions will not receive as much money as they deserve based on their actual population.
Table 1 shows several large federal funding formula programs focused on children that use the Population Estimates. The Table indicates a total of nearly $80 billion was distributed to state and localities by these programs in FYT 2016. There programs identified here were only those among the 55 largest federal programs (in terms of dollars) out of 316 programs. There are many other programs that were not examined here.
Second, the population estimates are used to weight Census Bureau surveys like the American Community Survey and the Current Population Survey. That is, the survey results are inflated to be consistent with the population estimates. If the population estimates are wrong, the survey estimates will be wrong. These surveys are used for a variety of purposes including use in federal funding formulas. The American Community Survey is particularly important for provided comparable subnational and substate information on the well-being of children (The Annie. E. Casey Foundation 2020).
Third, the population estimates are also used by states and localities to monitor population change over time for planning schools, hospitals, and roads. The private sector also makes uses of the post-census population estimates for many critical business decisions.
Because the Census Bureau has released very few details of how they would implement the blended base idea much of what I write below is based on my assumptions about what they would do.
For decades, the U.S. Census Bureau (2020a) has produced yearly post-census population estimates. See Appendix A for a detailed description of the estimation methodology used by the Census Bureau.
U.S. Census Bureau provides yearly post-census population estimates for:
National, state, and county total resident population and demographic components of population change
National resident, household, resident plus Armed Forces overseas, civilian, and civilian noninstitutionalized populations by age, sex, race, and Hispanic origin
State and county resident population by age, sex, race, and Hispanic origin
Metropolitan and micropolitan statistical area total resident population and demographic components of population change (Note: metro and micro areas are composed of one or more whole counties or equivalent entities. Producing metro and micro area population estimates involves the aggregation of the appropriate county-level population estimates.)
City, town, and other subcounty area total resident population
National, state, and county housing units
In this paper I only address the estimates that provide data by age, that is, those which provide data for children (ages 0 to 10); national, state, and county estimates. The blended base approach has implications for 2020 population estimation data, but probably more importantly for yearly data from 2020 to 2030.
The population estimates are produced using what demographers call a cohort-component method. The estimates start with a base population showing demographic details, then each component of population change (births, deaths, and net migration) is updated yearly for each cohort (people born in the same year). This method is widely used in demography (Bryan 2004). The Census Bureau (2020a) provides a detailed description of the population estimation methodology for national, state, and county estimates. A separate method is used for subcounty estimates but those estimates do not contain data for children (U.S. Census Bureau, no date).
This paper is about the base, not the estimation methodology. Once the base is determined, the Census Bureau will probably use the same cohort-component method to produce estimates over the decade as far as I can tell.
3. Understanding Errors in the Census
There will be two main types of errors in the 2020 Census data that are released to the public. First, data from the 2020 Census are likely to exhibit net undercounts and net overcounts that we have seen in each Census in the past. Second, the new method (differential privacy) the Census Bureau is planning to use to reduce the possibility of respondents being identified will inject errors into the reported data. Each of these kinds of errors are discussed in more detail below.
The usual types of census errors include errors on omissions and erroneous enumerations (mostly double counting) as well as errors in characteristics such as age and race. In other words, some people will be left out of the count and some people will be counted more than once. In addition, some will have their characteristics mis-recorded in the Census. For example, someone who would self-identify as Black may somehow be miscoded as White or someone who is really 13 years old gets coded as 23 years old. These kinds of errors have existed in every U.S. Census.
This source of error is not new, but there are many reasons to believe the errors in the 2020 Census will be larger than those in the 2010 Census (American Statistical Association 2020). Among other things, reasons to worry about the quality of the 2020 Census data include:
Underfunding of the Census throughout the decade which reduced testing,
Pandemic during data collection period,
Forced rush to finish data collection,
Increased fear of federal government in immigrant communities,
Political interference at the Census Bureau,
Nearly constant litigation related to the Census.
Moreover, the Census Bureau (2020e) recently announced that they encountered “anomalies” in processing the data collected in the 2010 Census. It is not clear how the anomalies encountered in processing the 2020 Census data are different from those encountered in the 2010 Census, but it is clear that the Census Bureau has been given less time to address and correct problems in processing the 2020 Census data than they have had the last few Censuses.
The best measures of 2020 census accuracy will not be available until late in 2021 or 2022 for most groups (O’Hare et al. 2020). But collectively the factors listed above suggest the 2020 Census will not be as accurate as the 2010 Census. The likely increased errors in the 2020 Census relative to the 2010 Census is an important point in considering the use of a blended base.
The second kind of error involves the Census Bureau plans to inject distortions in the 2020 Census data using a method called differential privacy. Differential privacy is meant to reduce the possibility of an individual respondent in the Census being identified by someone outside the Census Bureau. The Census Bureau (2020c) provides information about differential privacy on their website.
The injection of error into the Census counts is not new (U.S Census Bureau 2018) but the new method is likely to inject much more error into the 2020 count than was done previously, and the complexity of differential privacy means the impact will probably be less clear to users. Differential privacy has little impact on the total population of the geographic units that are the focus of this study (states and counties) other than a few hundred smaller counties. However, differential privacy may have implications for smaller demographic groups in a state or county like minorities or young children. The distortions injected by differential privacy are much more of a problem for smaller geographic units used in the Census.
4.Blended Base Approach
Very little information has been made available from the Census Bureau regarding the idea of a blended base. Below is information from a power point slide the Bureau shared with the Federal-State Cooperative Program for Population Estimates (FSCPE) in the fall of 2020.
The census typically forms the base of our population estimates.
It is unclear whether the 2020 Census will provide sufficient scope/quality for this purpose.
We have been exploring the idea of a ‘blended’ base.
Method: control Vintage 2020 April 1, 2020 to other sources to generate a plausible base.
Potential data sources:
State total population from the 2020 Census invariant populations
National age detail from 2020 Demographic Analysis
Modeling or external data sources
Initial tests are under review, and results seem promising.
Final method must be approved by the Data Stewardship Executive Policy Committee (DSEP).
As far as I can tell, this is the only information the Census Bureau has made available on this topic.
As stated earlier, in the past the Census Bureau has made population estimates by starting with the Decennial Census counts by age, sex, race/Hispanic origin for the nation, states and counties then aging the population forward each year until the next Decennial Census.
The traditional approach means errors in the Decennial Census are reflected in the post-census population estimates. Figure 1 shows net undercount rates in the 2010 Census for five-year ages groups. Young children have much higher net undercount rates than any other age group. Using the Decennial Census as a base for population estimates is more detrimental for young children than for other age groups because young children have a larger net undercount than other ages groups. Demographic Analysis shows the net undercount for young children was 4.6 percent in the 2010 Census. The net undercount for children ages 5 to 9 in 2010 was 2.2 percent. Also, comparing the Vintage 2010 Population Estimates to the 2010 Census count shows the net undercount of young children varied widely among the states and counties (O’Hare 2014: 2017).
Thus, the base for post-2010 estimates included large net undercounts for young children which were carried forward in the Census Bureau’s post-census estimates. For example, the net undercount of children age 0 to 4 in the 2010 Census led to underestimated population ages 5 to 9, in 2015.
If the 2010 population estimates were used in place of the 2020 census results, the base would be more accurate for young children because the data for young children would come largely from birth certificates.
It appears that the blended base approach will combine some data from the 2020 Census count and some data from other sources, including the Vintage 2020 Census Bureau population estimates. In particular, the state population totals from the 2020 Decennial Census will be used in the base.
Using state total form the 2020 Census will be more accurate than demographic components for a couple of reasons. First, the state total population numbers in the 2020 Census will be handled differently than other 2020 Census data. The Census Bureau (2020b) announced that state population totals from the census will be ‘invariant.’ That means they will not have distortions from differential privacy applied. Second, errors in demographic components (like age and race groups) will balance out when combined for a state total population. For example, for the total population the high net undercounts of young children are balanced by net overcounts for older age groups as shown in Figure 1. Third, the subnational postcensus estimates for children are only produced for states and counties. So, the more highly distorted data for small population based on DP will not be part of the base.
The state total population estimates from the Vintages 2020 estimates series will be adjusted to be consistent with the 2020 Census total state population counts. This will involve adjusting the state estimated populations by the ratio of the Census count to the estimates.
If I understand correctly, a likely approach to building a blended base will adjust the Vintage 2020 substate population estimates to sum to total population census counts from the 2020 Census for each state. The adjustments will be for substate geographic units such as counties, and for demographic groups (age/race-Hispanic Origin/sex) as well to produce internal consistency.
This operation is sometimes referred to as use of a “control total” and the process is sometimes referred to as “raking.” For example, if the total state population from the Decennial Census was 2 percent higher than the sum of county population estimates, each county would be increased by 2 percent to make the county data match the state total. Making components sum to a figure that is thought to be more accurate increases the accuracy of the components. Bryan (2004, page 527 states,” More accurate estimates can generally be made for total population than for demographic characteristics of the population of an area.”
A similar approach would be used for demographic groups (age, sex, race/Hispanic origin). To make all the components add up correctly, may require multiple adjustments (raking) but such raking seldom make big changes to the estimates.
Based on experience in the 2010 Census, 2020 total state populations from the population estimates are likely to be close to the 2020 census count so little adjustment will be necessary to make the 2020 population estimates consistent with the 2020 Census state total populations.
5.Illustration Using 2010 Data
The impact of a blended base approach can be illustrated with data from 2010. Table 1 shows how the adjustment would have worked for the population ages 0 to 4 in states in 2010. The first two data columns of Table 1 show the 2010 Census counts and the Census Bureau’s Vintage 2010 population estimates for total population in each state. The third data column shows the ratio of the census count to the population estimates. This is the ratio that must be applied to the population estimates to make them consistent with the census counts.
The second panel of Table 1 shows the results of applying the Census/Estimates ratio to the population age 0 to 4. The illustration only examines ages 0 to 4, but one would expect changes in a similar direction for ages 5 to 9, but at a muted level of change because the net undercount for ages 5 to 9 is lower than ages 0 to 4.
Almost every state would have had a higher number of young children (ages 0 to 4) in 2010 if the blended based approach had been used. The biggest changes shown in Table 1 are for the states with the largest estimated net undercount of young children. In California, there would have been about 216,000 more young children than the census showed, in Texas about 158,000 more, and in Florida about 100,000 more. Only Vermont would have had a lower number based on the adjusted numbers, but it was only 72 children lower.
The use of a blended base would also have implication for the number of young children in counties as well. To illustrate the impact for counties, I will look at how this method would have worked if it has been used in the 2010 Census by looking at the 58 counties in California.
The results of applying the method to counties are shown in Table 2. The column headings in Table 2 are like those in Table 1.
Of the 58 counties in California, 44 (76 percent) showed a higher population of young children ages 0 to 4 using the blended base. Of the counties that had a smaller population of ages 0 to 4 using the blended base, all had relatively small decreases. The largest county loss was only 210 young children. Table 2 shows four counties in California (led by Los Angeles County with an increase of 70,275) would have had an increase of more the 10,000 young children if the blended base approach would have been used in 2010.
For some groups (such as young children) there is reason to believe the estimates are more accurate than the Census counts. Many analysts, inside and outside the Census Bureau, have used the population estimates for age 0 to 4 to evaluate the accuracy of the 2010 Census counts (O’Hare 2014 and 2017: Jensen et al 2018; King et al 2018, Konicki 2016; U.S. Census Bureau 2014). This suggests that population estimates for young children are deemed more accurate than the Census count. Given the problems associated with the 2020 Census, there is every reason to believe this may be the case in 2020 as well.
If the blended base approach outlined above had been used in 2010, it would have produced substantially more accurate data than the census alone for the young child populations of states and counties.
6.Advantages of a Blended Base for Young Children
The use of a blended base has a couple of methodological advantages for young children. First, the net undercount of young children in the U.S. Census has been high and growing over the past several decades (O’Hare 2015). There is no reason to believe the count of young children in the 2020 Census will be more accurate than 2010 and many reasons to think the 2020 Census is likely to be less accurate than the 2010 Census, based on changing demographics and methodologies, The Urban Institute (2019) projected a net undercount for young children in the 2020 Census would range from 4.6 percent to 6.3 percent and this was before the problems experienced in the data collection phase of the 2020 Census.
One of the big advantages of a blended base for young children is the fact that the population estimates for people under age 10 in 2020 do not include the flaws of the 2010 Census. For young children (age 0 to 9) the Census Bureau’s Vintage 2020 estimates are based solely on births, deaths, and migration. Because there are relatively few deaths among young children and relatively little migration, the estimates are based almost entirely on births. In 2010, components of the national DA population estimate for children under age 5 consisted of about 21 million births, about 145,000 deaths, and a net immigration of 240,000. In the 2020 DA estimates, the middle series estimates for ages 0 to 4 is comprised of 19,250,000 births, 120,000 deaths, and 328,000 net immigration (U.S. Census Bureau, 2020d). Births account for the vast majority of the population estimates for young children in 2020.
Heavy dependence on vital records is important because birth certificate data in the U.S. are widely seen as complete and accurate. The National Center for Health Statistics (2014, page 2) states, “A chief advantage of birth certificate data is that information is collected for essentially every birth occurring in the country each year…” After a thorough review of vital statistics prior to the 2010 Census, the U.S. Census Bureau (Devine et al. 2010, page 5) stated.” Birth registration has been 100 percent complete since 1985.”
It should be noted that there are likely to be errors in the race and Hispanic Origin categorization of children based on birth certificates. For more detail on this issue (see O’Hare Page 20-22). In 2010, the DA methodology estimated 3,195,000 young children using Black Alone, and 3,905,000 using Black Alone or in combination (O’Hare 2015, Table 3,2).
Population estimates uses births, deaths, and net migration to estimate the population which are the same input factors used in the Demographic Analysis method that has been used for more than 50 years to assess census accuracy (Robinson 2010). Not surprisingly, the estimates for ages 0 to 4 from the Vintage 2010 and 2020 state total estimates are remarkably close to the DA estimates for those populations.
Table 3 shows a comparison of populations ages 0 to 4 and 5 to 9, based on the 2010 Census and on the Vintage 2010 Population Estimates. Table 3 shows that sum of states from the Vintage 2010 population estimates was almost the same as the DA estimates for ages 0 to 4 and both were about 5 percent larger than the census count for this age group. The situation is similar for ages 5 to 9, but both are quite different that Census count for young children.
When 2010 and 2020 Census Bureau state population estimates for age 0 to 4 are totaled, they were almost identical to the DA estimates from the Census Bureau for this age group.
Table 3 also shows that for 2020, the consistency of population estimates and DA for ages 0 to 4 and for ages 5 to 9 are similar to what was seen in 2010. At the national level there is only one tenth of a percent difference for ages 0 to 4 between the two sources for 2010. For 2020, the difference is also one tenth of a percent. We will not be able to compare Vintage 2020 estimates and DA figures to the 2020 Census results until Census counts are released later in 2021.
For most states, the figures from the 2010 Census and the Vintages 2010 Population Estimates are similar. Across all states, the mean absolute percent difference is about one percent. This contrasts sharply with the 4.6 percent difference between Vintage 2010 estimates and census counts for ages 0 to 4. It should be noted that the estimate and census count in 2000 were not as consistent.
The estimates for population ages 0 to 9 will have to be adjusted so the sum of all age groups matches the Decennial Census total for the state. But this is likely to be relatively minor adjustment. As shown earlier, 2010 state total population estimates are very similar to the census state counts. If the Vintage 2020 population estimates and the total Decennial Census count are nearly the same for a state, there will be little adjustment to the population estimates for young children.
Another reason the blended base approach is good for young children is that for the Census state counts, the net undercount for young children in the Census will be spread over the total population so it will be a much smaller fraction of the total. For example, in the 2010 Census there was a net undercount of about 1 million young children (ages 0 to 4) and that amounted to a net undercount rates of 4.6 percent. But if that 1 million net undercount had been calculated based on the total population (308 million) the net undercount rate would only be .03 percent.
Use of a blended base also eliminates the biggest problems caused by the distortions from differential privacy. There is no distortion for the total state population counts, but there is some distortion for counts for small counties, but estimates will be used there. For states and counties, the distortions for small groups will be eliminated because population estimates will be used in the base.
The large DP distortions are clustered in smaller jurisdictions, not state or most counties. The most recent data available from the Census Bureau indicates differential privacy injected data for young children is very problematic for small geographic areas (O’Hare 2020). The situation for small populations (for example the population ages 0 to 4) in small counties is particularly problematic. For example, differential privacy injects substantial errors into the population ages 0 to 4 for counties of less than 10,000 total population but those census counties will not be used in the base. Using the population estimates for these small counties negates the distortions that might be caused by differential privacy.
For assessing the blended base approach, it is useful to examine data for 2020 differently than data for 2021 to 2030 because there will be 2020 census data available for this one year.
For the geographic units for which the Decennial Census is the only source of information, data users will have to rely on the Census data from 2020 for the next decade. Note that this means that when census tracts are summed to a county total for 2020 estimates, that census total will be different than the total from the population estimates.
One potential drawback of the blended base approach is that the 2020 Census data and the Vintage 2020 Population Estimates are likely to differ for some geographic units. For example, when the population in all census tracts for a county are summed to produce a county total, the Census total may differ from the county total provided by the Vintage 2020 Population Estimates using a blended base. In years other than 2020, this will not be an issue because there will not be any Decennial Census data to compare to the population estimates.
The inconsistencies for total population estimate and census counts for cities, town and other subcounty places will not involve counts of young children because population estimates for young children will only be available for states and counties.
Data for young children (or any other age group) are different than total population estimate because estimates for young children will only be available for states and counties, but total population figures will be available for cities, towns, and other subcounty areas. The total population for cities, town and subcounty areas in the Census data are likely to be inconsistent with similar data based on Vintage 2020 Population estimates based on use of a blended base.
It is important to recognize that the Census Bureau’s population estimates do not provide data for all the geographic units that have census-reported data. For example, the Decennial Census provides data for census blocks and census tracts, but the population estimates program does not.
It is important to recognize that the blended base approach means the data used to weight survey estimates will not be impacted by Differential Privacy. This limits the inaccuracy due to DP to 2020 data only. The ACS data for young children is likely to benefit from the use of a blended base because the distortions from DP will not have much impact the Population Estimates.
Initial analysis of the blended base idea shows that use of a blended base for post-census population estimates would be advantageous or young children. Primarily because the 2020 data that would be used as the blended base approach relies heavily on births which are extremely accurate. Also, adjustments needed to make the estimates consistent with Census total are likely to be small.
Using the post-census population estimates as part of the base for 2020 will greatly reduce the problem of census undercounts for young children because the data for the population under age 10 in 2020 will be based totally on births, deaths, and immigration since 2010, with a small adjustment based on the total population.
Data-users need more information from the Census Bureau to evaluate the blended base idea more completely. It would also be helpful to have a timetable for when a decision will be made by the Census Bureau. I suspect a decision will be needed by the summer of 2021.
We need more information about how the Vintage 2020 Population Estimates re use and how the Post 2020 Census estimates are used, particularly in the context of distribution of public funds.
Appendix A – Estimation methodology
The Vintage 2010 State Population Estimates used here are taken from the Census Bureau’s file labeled ‘‘Annual state resident population estimates for 5 race groups (5 race alone or in combination groups) by age, sex, and Hispanic origin: April 1, 2000 to July 1, 2010.’’ The file is also denoted as ‘‘SC-EST2010-ALLDATA5.’’ The file was released in March 2012 and it is available on the Census Bureau’s website at http:// www.census.gov/popest/research/eval-estimates/SC-EST2010-ALLDATA5.pdf.
These estimates include the results of special censuses and successful local challenges during the previous decade.
This file contains yearly estimates for 2000 through 2010, but only the estimates from April 1, 2010 are used in this study. Only the figures for the total population and population aged 0–4 are used here. The population aged 5 and older was derived by subtracting the population aged 0–4 from the total population. Data for the population aged 5 and older are provided as a point of comparison.
The data from the 2010 U.S. Decennial Census are taken from Table DP-1 in Summary File 1. The data were obtained through American Factfinder available on the Census Bureau’s website. The data for the total population and for the population aged 0–4 was taken from this file. The population aged 5 and older was derived by subtracting the population aged 0–4 from the total population. Data provided in the next section of this document explains why it is useful to include data for the population aged 5 and older along with figures for the total population.
The District of Columbia was not included in this analysis for two reasons. First, the District of Columbia does not operate like a state in many ways. The concentration of hard-to-count populations in the District of Columbia, both in terms of racial minorities and living arrangements, set it apart from states. In many respects, the District of Columbia is more like a large city. Second, the net undercount rate of young children for the District of Columbia is an outlier with respect to state undercount rates for the population aged 0–4. The net undercount rate for the District of Columbia was 16.2 percent, while the highest estimated net undercount rate for age 0–4 in any state was 10.2 percent in Arizona.
The state estimates for population aged 0–4 is likely to contain some estimation error from at least two sources. One source of such error is the interstate migration estimates and another source is the estimation of births (and deaths) for 2019 and the first quarter of 2020. Each of these factors is discussed below.
The biggest difference between the national DA and the state population estimates is the inclusion of migration across states. Migration between states is captured in the Census Bureau administrative records technique that uses federal tax records to estimate such migration (U.S. Census Bureau 2012).
Most of the population aged 0–4 in a state is a product of births and deaths experienced in that state. For the population aged 0–4, data from the 2010 American Community Survey indicate that 89.3 percent of the population aged 0–4 was living in the same state where they were born (U.S. Census Bureau 2013a). Only two states (Wyoming and New Hampshire) had more than 20 percent of the population ages 1 to 4 who were born in another state. Therefore, the overwhelming majority of children ages 0 to 4 estimated in each state in 2010 come from births and deaths in that state. Moreover, many of the gross figures for children born in a different state cancel each other out, so net figures are likely to be much smaller.
The point is that estimates of net interstate migration of children aged 0–4 that are incorporated into the Vintage 2020 Population Estimates are likely to have some errors, the error is likely to be distributed unevenly across states, but evidence suggests that the errors are relatively small.
The heavy reliance on birth certificate data and the high quality of birth certificate data provides a strong foundation for relatively accurate state population estimates for the population ages 0 to 4. But the final data from the vital event systems for 2019 and the first quarter of 2020 will not be available in time to be used in the Vintage 2020 estimates. This was true for both births and deaths, but births are a much larger factor in population estimates for young children.
Ten years ago, the estimation of births just prior to the census was in error. When the final birth data from 2009 and the first quarter of 2010 were released, it became apparent that the Census Bureau had overestimated the number of births in 2009 and the first quarter of 2010. At the national level, there was a difference of about 90,000 in the births estimated for the Vintage 2010 Population Estimates and the actual number of births. To put this in perspective, 90,000 is about 2 percent of the estimated births in 2009 and the first quarter in 2010. While the use of projected births in the 2010 population estimates provides another possible source of estimation error for states, the amount of error is likely to be small.
Given the uncertainty of state population estimates because of migration assumptions and birth projections, small differences between the population estimates and the Census counts should be viewed cautiously because they may not reflect real differences. In addition, small differences between states, both in the estimated size of the population ages 0 to 4 and in the differences between estimates and census counts, should be viewed cautiously because they may reflect estimation error rather than the real differences.
Jensen, E., Benetsky, M. and Knapp, A., (2018). “A Sensitivity Analysis of the Net Undercounts for Young Hispanic Children in the 2010 Census,” Poster at the 2018 Population Association of American conference, Denver, Colorado April 25-28 downloaded May 5, 2108, at https://paa.confex.com/paa/2018/meetingapp.cgi/Paper/20826
King, H., Ihrke, D. and Jensen, E., (2018). “Subnational Estimates of Net Coverage Error for the Population Aged 0 to 4 in the 2010 Census, “paper present the 2018 Population Association of American Conference, April 25-28, Denver Colorado, Downloaded May 6, 2018 https://paa.confex.com/paa/2018/meetingapp.cgi/Paper/21374.
Konicki, S. (2016) “The Undercount of Young Children in the Decennial Census,” Presentation at Census Bureau Quarterly Program Management Review, April 5, slide 6.
National Center for Health Statistics (2014). Assessing the Quality of Medical and Health Data from the 2003 Birth Certificate Revision: Results from Two States, National Vital Statistics Reports, Volume 62, No. 2. U.S. Department of Health and Human Services, Centers for Disease Control and Prevention.
O’Hare W.P. (2017). “Geographic Variation in 2010 U.S. Census Coverage Rates for Young Children: A Look at Counties,” International Journal of Social Science Studies, Vol. 5, No. 9 Sept. Redframe Publishing.
O’Hare, W.P. (2014). State-Level 2010 Census Coverage Rates for Young Children, Population Research and Policy Review, Volume 33, no. 6, pages 797-816.
Robinson, G. (2010). “Coverage of Population in Census 2000 Based on Demographic Analysis: The History Behind the Numbers,” Prepared for the U.S. Census Bureau Workshop: 2010 Demographic Analysis Technical Review, U.S. Census Bureau, Suitland, MD
On February 12, the Senate Appropriations Committee announced subcommittee rosters and leadership for the 117th Congress. The Commerce, Justice, Science (CJS) Subcommittee, which funds the Census Bureau, will now be chaired by Senator Jeanne Shaheen (D-NH) with the former chair, Senator Jerry Moran (R-KS), serving as the Ranking Member.
The new Republican members of the subcommittee are Senator Bill Hagerty (R-TN) and Senator Mike Braun (R-IN) with Senator Rubio rotating off. The Democrats added Senator Jeff Merkley (D-OR) to their ranks. Below is a complete list of the Senate CJS members.
Senate Commerce, Justice, Science, and Related Agencies Subcommittee Roster