Matching CPS Observations to Create Crosswalks

Occupation data in the monthly Current Population Survey (CPS) can be very useful, particularly during a shake-up in labor markets. The establishment survey offers solid employment data for industries, but the CPS can tell us about people with similar skillsets, across industries. Plus, the CPS is timely and offers public use microdata.

The occupation data in the CPS microdata, however, are tricky to deal with when the survey moves from using one set of occupation codes to a new set, as was the case in January 2020. Re-sorting individual job titles into occupation categories is necessary every so often, as the composition of jobs changes, as do job responsibilities. But when this happens, it creates a break in the series of monthly data. The occupation groups before the change are not the same occupation groups after the change.

For example, before January 2020, all software developers were grouped under a single occupation code. After the change, that code split: about 90% of those workers mapped to a new “software developers” code, and about 10% were separated out as “software quality assurance analysts and testers.” If you’re tracking software employment across the break, you need to know about that split — and ideally, what the percentages are.

Breaks in the occupation series are a well-known problem. Typically, to cross these breaks you can use a “crosswalk” that either gives a modal mapping of occupation codes (tells you which new code received the most workers from an existing code), or gives a percentage breakdown of how workers are re-sorted between occupation categories. Alternatively, you can combine occupations in a way that preserves earlier job categories, while introducing a bit of coarseness to the data.

I want to note another option: creating crosswalks from the CPS panel data. As an example, you can look at the same person in December 2019 and January 2020, confirm they have the same job, and see how their occupation code changed. Because the CPS has a panel design, with households interviewed multiple times, we can use the CPS data itself to build a crosswalk of occupation codes across a break.

Does this work? I tested it for the occupation code breaks in 1992, 2003, 2011, and 2020 and the matched sample is fairly large: 32,000-45,000 observations during each break. I then compared the results for the 2003 break to the huge dual-coded dataset (where Census assigned both old and new codes to every record) that was released to cover 2000 through 2002. The modal match using both techniques agreed in 98% of cases. Only four occupation codes had different modal matches, and they were all very close cases where the proper match is ambiguous.

Is it perfect? No. There might be seasonal issues from using only winter data. The sample sizes are large but some re-sorting of job titles results in only a few observations being shuffled between a specific pair of codes (though this shouldn’t affect modal matching). But it is an option. When you are trying to sort through the changes in codes, another tool in the toolkit might help.


Summary of CPS Occupation Code Breaks

Transition Date Panel Sample Other Sources
OCC80 → OCC90 Jan 1992 ~45,000 Beard et al. (1980 Census)
OCC90 → OCC00 Jan 2003 ~44,000 Census dual-coded, BLS tables
OCC00 → OCC10 Jan 2011 ~37,500 ACS-based only
OCC10 → OCC18 Jan 2020 ~32,500 ACS-based only