Evaluating and Improving Data Integrity
Friday, July 29th
9:00 a.m. - 10:30 a.m.
New frontiers in preventing data falsification in international surveys
Michael Robbins, Arab Barometer (Presenter)
Noble Kuriakose, Survey Monkey
Concerns about data falsification in survey research are as old as the field itself. Cheating will continue to exist as long as it seems less costly than faithfully carrying out the survey to interviewers, supervisors, or the survey firm. In recent years, tests designed to evaluate whether data fabrication has occurred suggest that cheating remains a significant problem, especially in international contexts where data collection often occurs face-to-face. We provide an overview of new approaches for detecting potential fraud, arguing that these checks should become standard practice for identifying and investigating cases of likely falsification. We also make the case that further steps, including investments in newer data collection technologies and closer on-the-ground monitoring, are needed to identify fraud in real time when steps can be taken to correct the survey. Finally, we argue that organizations engaged in international survey research must be much more transparent and collaborative. Transparency and collaboration will incentivize local firms to improve the quality of their research practices and take steps to dis-incentivize cheating at the local level.
Michael Robbins is the director of the Arab Barometer. His work on Arab public opinion, political Islam and political parties has been published in Comparative Political Studies, the Journal of Conflict Resolution and the Journal of Democracy, and his research on data fabrication has been covered by Science. He is a regular contributor to the Washington Post’s Monkey Cage blog and his analysis has appeared in Foreign Policy and The Conversation. He received his Ph.D. in political science from the University of Michigan and received the American Political Science Association Aaron Wildavsky Award for the Best Dissertation in the field of Religion and Politics. Previously, he has served as a research fellow at Harvard’s Belfer Center for Science and International Affairs and a research associate at the Pew Research Center.
Preventing interview fabrication
Patty Maher, University of Michigan (Presenter)
Jennifer Kelley, University of Michigan
Beth-Ellen Pennell, University of Michigan
Gina-Qian Cheung, University of Michigan
This panel presentation will explore innovative approaches to preventing interview fabrication. The diffusion of new technologies in survey research has led to new and innovative approaches to quality control. These procedures are being implemented to not only address overall data quality but to also make it more difficult to fabricate all or part of an interview. In this presentation, we draw upon examples from large scale complex surveys in a variety of challenging international contexts. The methods described go well beyond traditional call-back verifications to a sample of interviewer’s cases. These new approaches include interviewer supervision models, analysis of tailored reports that combine rich paradata with survey data in ‘real time’, comparisons with previous waves of data collection (where present), extensive use of paradata and audio-recording interviews to prioritize evaluation and verification of cases, use of biometrics such as fingerprints, digital photography, and use of GPS, including live tracking of interviewer travel in and among sampled segments.
Patricia Maher is the Director of Survey Research Operations (SRO), Survey Research Center (SRC), Institute for Social Research (ISR), University of Michigan. With nearly 30 years of experience with ISR, she plays a leading role in developing and managing survey and related data collections in all modes, using a variety of technical platforms. She leads an operation unit of approximately 150 professional staff -- survey statisticians, technologists, and survey managers -- and 750 interviewers working by telephone and in-person. She has a special focus on interviewer affects, training and quality control. She is also Vice-President of AASRO – the Association of Academic Survey Research Organizations.
Evaluating data quality in international surveys: A multi-dimensional approach
Katie Simmons, Pew Research Center (Presenter)
Steve Schwarzer, Pew Research Center
Gijs van Houten, Pew Research Center
Courtney Kennedy, Pew Research Center
Ensuring data quality in face-to-face surveys is a challenge given limited ability to monitor interviewer and supervisor activity in the field. This difficulty is compounded for international surveys when those who commission the research may not even be in the same country as the local vendor and are thus one more step removed from fieldwork. Computer-assisted personal interviewing (CAPI) presents a promising improvement for oversight in face-to-face surveys, but also comes with its own difficulties. Despite a vendor’s best efforts to ensure accurate data collection through these devices, it is still possible to encounter unreliable time measurement, inconsistent connectivity, and inaccurate measurement of geolocations, among other problems. Given these challenges, we explore a multi-dimensional approach to evaluating data quality in international face-to-face surveys that relies on analysis of both substantive data (such as duplicate responses, item non-response patterns, straightlining, etc.) and available paradata (such as time of interview, time between interviews, interviewer workload, etc.). Using a set of nationally representative international surveys, we employ this ex-post approach to identify potential problems with the data, ranging from basic human error to poor interviewing practices to suspected falsification. We validate the approach by evaluating other data quality measures in the survey as well as patterns of response on substantive questions. The paper discusses the sensitivity of using a single measure to identify suspicious cases, the robustness of the multi-dimensional approach, and the limitations of relying on imperfect paradata. The paper proposes some possibilities for future research on evaluating data quality in international surveys.
Katie Simmons is associate director of research at Pew Research Center. She is an expert in survey methodology and conducts research on international public opinion on a variety of topics, including U.S. foreign policy, the global economy, democracy and terrorism. Simmons helps to coordinate the Center-wide international research agenda and serves as a methodology consultant on all international projects at the Center. She is also involved in all aspects of the research process, such as managing survey projects, developing questionnaires, analyzing data and writing reports. Prior to joining Pew Research Center, Simmons worked as a research analyst for non-profit clients at Belden Russonello Strategists. She earned her doctorate in political science from the University of Michigan. Simmons is an author of reports on the crisis in Ukraine, global economic conditions,life satisfaction andeconomic reform in Mexico. Simmons speaks about findings from Pew Research Center studies to a variety of audiences, including government agencies, academic groups and domestic and international media.
Interviewers’ deviations in face-to-face surveys: Investigations from experimental studies
Natalja Menold, GESIS
In face-to-face surveys the interviewer is a key actor, who may affect the quality of survey data. There have been some studies, which address interviewers’ impact, for example, by evaluating interviewer variance in secondary analyses. However, there is a lack of knowledge on how falsifications may impact interview data as well as which working conditions influence the results of interviewers’ work. Within the research conducted in collaboration with Prof. Peter Winker (University of Giessen), which was founded by German Research Foundation, identification of interviewers’ falsifications and possible effects of interviewers’ work organization on the accuracy of the interviewers’ results were investigated. In the first experimental study indicators based on specific properties of falsified interview were developed, tested and used as a relevant part of the multivariate method for identification of falsified data. The results show that particularly differences in response sets and specific patterns of response behavior could be effectively used to identify falsifications. In the second experimental study the impact of payment scheme, instructions and task difficulty on the accuracy of interviewers’ work was analyzed. The results show that there were lower deviations if the interviewers were paid per time, while the variation in the instruction did not have an impact. In addition, there were more deviations by interviewers in the case of break offs, which was associated with a high task difficulty. The results are discussed with respect to the prevention and detection of interviewers’ deviations.
Natalja Menold completed a Master’s degree in psychology at the University of Tuebingen in 2000. She received her doctorate from the University of Dortmund in 2006. Since 2007 she has been working at GESIS – Leibniz Institute for the Social Sciences in Mannheim, Germany. Her current position is scientific leader of the team “Survey Instruments” in the department “Survey Design and Methodology”. Natalja Menold has been conducting research and providing scientific services in the area of social science research and survey methodology. Her working experiences in the area of survey methodology include consulting survey research projects (federal, EU, projects at the universities) concerning survey design, questionnaire development, and measurement of latent variables. In addition, she has been conducting research on survey methodology including survey non-response, falsifications in survey data, construction of response scale formats, measurement equivalence, measurement quality assessment, and cross cultural comparability of survey data. For her research she obtained grants from the German Research Foundation (DFG).
Prevailing issues and the Future of Comparative Surveys
Friday, July 29th
11:00 a.m. - 12:30 p.m.
Lars Lyberg, Stockholm University (Presenter)
Lilli Japec, Statistics Sweden
Can Tongur, Statistics Sweden
The interest in comparisons between countries, regions, and cultures (3MC) has increased during the last 20 years due to globalization. This is manifested by the increasing number of surveys that are 3MC and comparative in nature. Comparisons are made in many areas including for instance official statistics, assessment of skills, and social, opinion, and market research. The ambitions and research cultures vary greatly between surveys. Even though the lifecycle of a comparative survey is quite elaborate with many process steps, we notice that many organizations in charge of 3MC surveys seem keen on covering as many populations (often countries) as possible, which leaves less room for handling all these steps. In extreme cases only a source questionnaire is developed and survey organizations in participating countries are asked to conduct remaining steps in the lifecycle with little or no guidance. There is a great risk that this approach generates estimates that are not comparable and it is important to inform stakeholders about this problem.
At the other end of the spectrum we have surveys such as the European Social Survey, the World Mental Health Survey, the Health, Ageing and Retirement Survey, and the Program for the International Assessment of Adult Competencies. These and some other surveys have strong central teams leading the efforts and assisting countries using a set of process requirements and follow-up procedures. Site visits and other meetings are also common. The idea of a strong central team has gradually evolved during the last decades. Previously it was often assumed that countries were able to follow instructions without much guidance or explanations. In the aftermath of the 1994 International Adult Literacy Survey it became obvious that this assumption was overly optimistic. It turned out that many different circumstances had made participating survey organizations deviate from prescribed implementation instructions. Since then the idea of a strong survey infrastructure and central leadership has been refined. The challenge is to sell this idea widely and to give examples of efficient infrastructures and their cost-benefits. In this chapter we give some examples.
The user situation needs clarification. Often there are conflicting national and comparative interests, which is confusing. Important decisions are made regarding policies but perhaps more on national than on international levels. The comparative aspects are, with the exception of official statistics, often dominated by league tables, when in fact the real benefits for a nation would be to investigate subgroups across nations and analyze causes of differences. It seems as if the outreach of 3MC survey results should be more extensive and the results discussed in more detail by the public and the decision-makers. PISA, the assessment of 15-year old students using psychometrics, has succeeded with its outreach, even though the league table aspect dominates media reporting. The distance between users and producers are typically longer in 3MC surveys compared to one population or mono surveys. This distance ought to shrink by improved reporting of results and improved analyses. Researchers are often important users and they use the data to develop new theories and methods in social science, sometimes not taking limitations in the data into account.
The planning and implementation of comparative surveys is a huge undertaking. All problems experienced in a mono setting are magnified and new problems are added. The difficulties associated with developing concepts, questions and psychometric items that convey the same meaning across cultures and nations tend to be underrated but absolutely crucial to comparability. For instance, translation of survey materials is not an easy task and common perceptions that word for word translation and back translation are good for comparability are lingering. The quality issues in comparative surveys are indeed complicated. First, the various design steps are associated with error risks that vary across countries. Second, the risk perception and the management of risks also differ across countries. For instance, some error sources might not be considered due to a belief that they do not have a serious impact on comparability. Also there might not be enough resources to handle the error sources even though they are known to be problematic. Various models for capacity building will be discussed in the chapter.
A quality assurance system must be in place, which describes the requirements, justifications for their use, and a quality control system that checks that requirements are adhered to and that production is free from unnecessary variation. Few 3MC surveys have all that in place. Those who have tend to get quality control information too late to be able to intervene and rectify problems in a timely fashion. In the future we must strive for almost real-time quality control. In this work theory and methods for statistical process control must be used so that variability patterns in inflow, nonresponse, interviewer behavior, and response patterns can be diagnosed via control charts displayed on country and “global” dashboards. Technology for real-time monitoring is used in other fields such as flight tracking and could be applied in 3MC survey monitoring as well. We attempt to describe some possible future routes to develop and implement more timely quality assurance and quality control systems. This includes a discussion of the use of paradata and adaptive designs in 3MC surveys.
Few, if any, 3MC surveys are systematically evaluated or audited. There have been a few isolated quality reviews but what is needed is a more continuing approach. One option is the ASPIRE system developed for evaluations of statistical products in mono surveys. It is a system based on a mix of assessments of quality risks and actual critical-to-quality performance indicators and the assessments are performed by external evaluators to enhance objectivity. Assessments are made using a point system that makes it possible to check if improvement has taken place from one assessment to the next. We will discuss how this might be done in a 3MC context.
Roger Jowell developed ten golden rules for comparative surveys. One rule stated that the number of populations to compare, often countries, should be kept at a reasonable number. There are examples of surveys that comprise 140 countries. It is hard to imagine that such numbers and vast diversity can generate any kind of trustworthy comparability. Jowell stated that instead one should confine cross-national research to the smallest number of nations compatible with each study’s intellectual needs. We will discuss the implications of this rule and some of Jowell’s other golden rules with a future perspective. We will also add some rules of our own.
To gain trust many 3MC surveys need to be more transparent. Many of them lack a proper documentation of the processes and the efforts involved in controlling and improving quality. We fear that in many cases there is not much to report. One reason might be that the surveys are so extensive that all resources go to just collecting data from many countries and very little is left for sound methodology, continuous improvement, and documentation. All 3MC surveys should provide proper documentation and some already do that in excellent ways. We suggest what a minimum documentation standard might entail.
The large costs are a major deterrent to high-quality comparative surveys. A well-designed survey with resources allocated to all major design and implementation steps will cost a lot. This is one reason why country participation might decline or that requirements are not fully met. Therefore many surveys have started to explore potential cost-savers such as mixed-mode designs. The problem is that so far this experimentation shows that comparability will suffer. Also selling out a standardized approach based on input harmonization and instead face a situation where countries have lots of freedom to use methods they prefer rather than those required is not a good strategy to reach comparability. Here we will discuss the pros and cons associated with input and output harmonization. Nevertheless, the cost situation should be carefully scrutinized, since there are always activities that can be done using fewer resources or perhaps not done at all. New technology such as GIS can be a real cost-saver when locating respondents and helping interviewers administer their work. One obvious way to reduce administrative costs is to reduce the number of countries involved, which also will decrease the cost for individual countries. Of course there are also examples where generous funding is necessary to reach the research goals. There are examples where surveys have been partially funded by private financiers. Many topics that are studied by 3MC surveys are such that they may be interesting to private sponsors, for example surveys on health and education. We will explore some issues related to future funding situations.
One other way to reduce cost might be to explore other data sources as complements to the survey itself. Depending on the survey topic it might be possible to use administrative data and big data for that purpose. We will explore these possibilities and outline a roadmap for a desirable development of the use of multiple data sources in the 3MC field. Also new or not so new methodological developments such as Bayesian inference and nonprobability sampling will be briefly discussed.
Lars Lyberg, Ph.D., is former Head of the Research and Development Department at Statistics Sweden and retired Professor at the Department of Statistics, Stockholm University. Currently he is associated with Inizio, a research firm in Sweden. He is the founder of the Journal of Official Statistics (JOS) and served as its Chief Editor for 25 years. He is chief editor of Survey Measurement and Process Quality (Wiley, 1997) and co-editor of Total Survey Error in Practice (Wiley, forthcoming), Survey Methods in Multinational, Multiregional, and Multicultural Contexts (Wiley, 2010), Telephone Survey Methodology (Wiley, 1988) and Measurement Errors in Surveys (Wiley, 1991). He is co-author of Introduction to Survey Quality (Wiley, 2003). He chaired the Leadership Group on Quality of the European Statistical System and chaired the Organizing Committee of the first European Conference on Quality in Official Statistics, Q2001. He is former president of IASS and former chair of the ASA Survey Methods Section. He is a fellow of the American Statistical Association and the Royal Statistical Society and elected member of the International Statistical Institute. He received the 2012 Waksberg Award and the 2013 Helen Dinerman Award. Currently he is Chair of the ESS Methods Advisory Board and member of the Technical Advisory Boards of PISA for Development and PIAAC.