Proposed by Mark A. Harris, Sociologist, Ph.D.,
Tony Glover, Senior Research Analyst,
Sylvia Jones, Statisical and Research Analyst,
Tom Gallagher, LMI Manager
8/29/02
Proposal for Wage Record Editing
Wage Records provide a quarter-by-quarter near universe count of employed persons by place-of-work. The data are extremely valuable for conducting applied and basic research on employment, wages, and turnover at several levels of analysis (e.g., individual, firm, industry, county, region, state). When combined with other administrative data sources (driver’s license data, BLS ES202 data, etc) additional individual- and/or firm-level characteristics can be studied (e.g., gender, age, program participation, firm size, etc.) which greatly expands the array of theoretical and applied questions that can be addressed with Wage Records data.
Along with Wyoming, a number of other states have been developing their Wage Records program by refining and adding to their database and sharing the data among participating states. These efforts have led to a number of publications and a growing interest in developing a national Wage Records program. In an effort to further this general effort, the Wyoming Department of Employment, Research & Planning (R&P) is proposing to conduct research on Wage Records editing. Specifically, we seek to explore methodologies that will enhance our Wage Record edit policy and procedure. Doing so will assist us in improving the validity and reliability of our Wage Record data. The work will follow three paths: I) the analysis of missing data , II) the analysis of out-of-scope wage reports, and III) the analysis of invalid SSNs. We will then publish our research results in Wyoming Labor Force Trends and share them at the LMI Institute’s annual Wage Records Conference. Sharing our methodologies and findings will enhance the ability of other states in improving the reliability and validity of their Wage Record data and will provide information on which to build editing policy for a national Wage Record program. Other states may be interested in obtaining the database code we develop—particularly those participating in the seven state joint turnover project.
I. Missing Data
One of the difficulties R&P encounters in the use of our Wage Records data is in determining whether there is missing data (e.g., a firms failure to report SSNs and the associated wages). This is particularly relevant in a longitudinal database such as Wage Records. In other words, tracking individual SSNs over time and in-and-out of different firms becomes problematic when missing data occurs. Missing data diminishes the validity and reliability of a longitudinal database. Practically, we simply lose track of individuals along the historical timeline. Missing data and “exiting” and “entry” behavior become confounded.
B. Context
Missing data, in the context of administrative databases, takes on a different meaning than missing data in survey research. In particular, the collection of Wage Record data is legislatively mandated under Unemployment Insurance (UI) statute. Thus, it is possible to seek and obtain missing data in a way that is not possible to do using voluntary survey research methodologies.
There are several causes of missing data. These include, failure of a firm to report all or a portion of their SSN’s or failure of the reported data to be entered in a timely manner by administrative staff (i.e., before quarterly downloads to researchers). In the following narrative we propose three methods for the identification of missing data. Our overall strategy is to study each and then to determine which alone, or in combination, forms the most effective strategy for the identification of missing Wage Records data. Once identified, UI administrators can then seek to obtain the data and/or implement better methods for obtaining the data in a timely manner. Because UI administration staff are funded from a tax on employers, the failure of some employers to provide all tax information in a timely and accurate manner, means that the cost of following up to collect Wage Record, or other tax information, from delinquent employers is born by employers who report in a timely and accurate manner (in other words costs are being externalized by delinquent reporting). One practical application of identifying missing information is that the scale of the problem could be used to make the case for state legislation to sanction employers for non-compliance or poor reporting practices.
Please note, each strategy will be evaluated not only from a practical editing standpoint, but also from methodological as well as theoretical and applied perspectives.
1) Computing the mean and standard deviation of SSN’s occurring within a given UI account (i.e., firm) for the entire history of the firm, should facilitate the identification of accounts that have significantly fewer SSN’s at any given time point than is expected from the historical trend (e.g., those that fall two standard deviations above or below the mean). Firms with large deviations can then be flagged and investigated. The difficulty with this method is that there are many states that do not have a sufficient time series with which to base mean and standard deviation calculations. Research benefits from this strategy fall into the category of monitoring and understanding the life cycle of firms and their behavior over the course of the business cycle.
2) Another alternative or companion method we will explore for the identification of missing data is to do a comparison of Wage Records data to the ES202 employment. Doing this allows us to ascertain whether the firm employment totals reported in the ES202 data correspond with the aggregated individual SSN’s reported in Wage Records for the same firm in the same time period. We propose to use direct comparisons as well as the mean and standard deviation method introduced above. The main complication we foresee with this method is that the ES202 data lag the Wage Records download by about a month. Thus, comparisons will be delayed. Research questions relating to this area have to do with the within industry and size class distribution of turnover behavior and its correlation with other firm an worker characteristics, (e.g., firm age, demographic mix).
Level/ Flow Technique
3) A third method will utilize turnover calculations from a given firm as the comparison tool. Exit and entry rates are excellent tools for determining if an account has missing data. For example, a UI account that has 100 percent exits in a given quarter likely failed to report or went out of business. This can be compared to the next quarter and if the firm has 100 percent entries it can be flagged for investigation. We are aware that some employers actually do close down during certain seasons, so the best use of this method will likely be the mean and standard deviation procedures previously introduced. Theoretical questions facilitated here include the level of association between flow rates and changes in the level of employment. Specifically, can hire, exit, or net flow rates be predicted by employment level changes?
Check the Neighbors
4) We also plan to check missing SSNs against earnings in neighboring states. Some workers are simply moved from establishment to establishment within the same firm across state lines.
A critical limitation of all these approaches is that we have no means, other than verifying late reports and monitoring UI claims activity, of validating missing wage information. The accuracy of any attempt to impute missing wage and employment information needs to be validated especially in light of the fact that Wage Records are used for both program performance (the experimental group) measures and for control groups when conducting evaluations. Claimants who file, but whose employers are delinquent in wage record reporting, can be identified. However, claimants do not represent a random sample and it is difficult to generalize the findings.
Another difficulty we encounter in the use of Wage Records is wage data that are out-of-scope (i.e., reported wage data are inaccurate or they reflect large bonuses). In terms of consequences, out-of-scope wage data cause problems when calculating average wages for individuals, firms, industries, etc. Again, this is very devastating to the validity and reliability of a longitudinal database. The issue is of particular importance to R&P in hypothesis testing when conducting program evaluation research (e.g., WIA, Wagner-Peyser) that examines issues of wage differences between participant and control groups.
B. Proposed Methods of Identification
Methods of identification for wage data that are out-of-scope will be similar to the general methods used to identify missing data.
1) Computing the mean and standard deviation of wages occurring within a given UI account (i.e., firm) for the entire history of the firm, should facilitate the identification of accounts that have significantly higher or lower wages at any given time point than is expected from the historical trend (e.g., those that fall two standard deviations above or below the mean). Firms that are out-of-scope can then be flagged and investigated to determine if inaccurate data was provided, date entry errors occurred, or if a large bonus was paid. Although the payment of large bonuses is not an inaccuracy per se, uncorrected bonus data can heavily skew average wages for an industry or geographic level of aggregation and should be adjusted. Given that a few firms can make up the bulk of an entire industry in Wyoming this is not an unlikely situation. Thus, it becomes important for R&P to develop the methodology to correct this problem.
S$ vs. 202 Total
2) Another alternative or companion method we will explore for the identification of wage data is to do a comparison of Wage Records data to the ES202 wages. Doing this allows us to ascertain whether the firm wage totals reported in the ES202 data correspond with the aggregated individual wages reported in Wage Records for the same firm in the same time period. We propose to use direct comparisons as well as the mean and standard deviation method introduced above.
Another difficulty we encounter in the use of Wage Records is that of inaccurate SSN’s. In terms of consequences, as with missing data, inaccurate SSN’s cause us to lose track of individuals and their transactions with firms over time.
B. Technical
At present we are more limited in the identification and correction of invalid SSN’s than with other problems and offer this section as possible ideas that are not reality at this time. Furthermore, earlier pilot research in this area indicates that the magnitude of the problem is not that large. If an employer reports an incorrect SSN (e.g., XXX-XX-1454 instead of XXX-XX-1453) in Wage Records we must assume that this is a new employee in the quarter and that the old employee has exited. Unfortunately, we have no way of knowing if the “new” SSN is valid and actually belongs to the intended person at a given firm.
1) One method of validation is to obtain the SSN’s and names (names are not currently captured in Wage Records) of individuals as part of Wage Records and then match the names and SSN’s against the names and SSN’s of a master list provided by the Social Security Administration (SSA). Once matched, we could then verify if the Wage Record SSN is valid and if it is associated with the correct name. Incorrect SSN’s could be flagged and sent to UI administrators for correction from the firm. Undoubtedly, there would be name change issues that would have to be dealt with using this methodology (e.g., maiden to married changes, etc.).
Employer Validation
2) Another alternative is the development of a Wage Record reporting system that provides the employer a method of validating their current list of SSN’s and employee names against the previous month’s list of SSN’s and names. The employer could be required to validate the list case by case before it is submitted. Obviously, this would be best accomplished in an interactive web-base reporting system. Potentially, this could be established in a test mode to correspond with the implementation of Wyoming’s new web-based UI reporting system.