Trends in Survey and Data Science

Big data is everywhere in our lives, and the University of Michigan is on the leading edge when it comes to the future of data science and survey research.

Current Survey Methods and Big Data Dangers

Data is constantly created and collected during our everyday lives.

In fact, approximately 2.5 quintillion bytes of data is produced every day, according to cloud software company Domo. That’s 2.5 followed by 18 zeros.

“There are many more types of data available — and more sources of data — than there were even five years ago,” says Fred Conrad, director of the University of Michigan Program in Survey and Data Science and a research professor within the U-M Institute for Social Research. “There’s no question that new big data tools and sources have impacted the field. Not just how we incorporate them into the research process — where innovation is occurring as we speak — but also in the questions we can ask using these new data in conjunction with more traditional survey methods.”

The University of Michigan’s Institute for Social Research, which houses U-M’s master’s degree in survey and data science, is the world’s largest academic social science research organization. ISR employs more than 300 research faculty in over 20 academic disciplines. ISR has helped U-M emerge as a global leader in researching and applying survey methods and technologies, increasingly incorporating big data and big data tools into the research it conducts.

This important work is integrated into U-M’s master’s degree in survey and data science to offer one-of-a-kind opportunities for graduate students to shape the future of data science and computational social science.

We identify nine current trends in survey and data science and how they’ll influence the future of data science and survey methodology.

1. Capturing Data From Hard-to-Reach Populations

Using survey methods in research generates knowledge that is vital to the study of public opinion, political participation, and much more. Yet, many individuals are hard to collect data from.

For instance, Sunghee Lee, a research associate professor in U-M’s master’s degree in survey and data science program, is testing data collection methods within hard-to-reach populations such as immigrants and persons who inject drugs. These population segments can be tough to reach, and even if you are successful, these individuals may not want to be identified as such or share their thoughts.

In order to overcome these obstacles and other big data dangers, Lee is using respondent-driven sampling (RDS), which is an option for reaching populations that have been challenging to capture using other survey methods. RDS is a chain-referral process where you survey a small group and then ask them to recommend their social network members who qualify to participate. This method relies on multiple waves of recruitment and statistical adjustment to produce a random sampling.

“With RDS, you can recruit people that you normally cannot identify or contact,” says Lee. “That’s a huge benefit.”

2. How Sensor Data Is Enhancing Survey Methods in Research

In March 2020, college students went on Spring Break amidst the COVID-19 pandemic and took the disease back to their hometowns and communities. How do we know that? Sensor data collected on their smartphones.

Smartphones and their apps, as well as wearable devices such as smartwatches, have proven to be valuable big data tools. The data from these devices is much more detailed and easily collected than traditional self-reports (answers to survey questions). Plus, collecting information with sensors involves less measurement effort than self-reports.

“The availability of sensor data has exploded as smartphone use has become nearly universal in many parts of the world,” says Conrad. “The power of sensor data is magnified when it is analyzed in conjunction with in-the-moment self-reports that provide context to help researchers interpret data like step counts, geolocation, and sleep.”

Privacy when using sensor data is an active research topic among survey and data scientists. Survey methodologists have pioneered methods for protecting the privacy of survey respondents and mitigating some potential big data dangers, but not all of these safeguards — especially informed consent — translate directly to new data, such as sensor data. Many users may not be aware that their actions will become data — let alone data used for social research.

3. Social Media’s Impact Within Survey Methods and Research

Nearly 4 billion people around the world use social media. As that number continues to grow, there will be increased opportunities to gather data on what the public is thinking about a vast range of topics.

A major research question that U-M faculty members are investigating is where analyses of social media posts can augment — or even replace — traditional survey data. A great advantage of using social media to conduct social research is the timeliness of posts, especially in response to public events. However, how to generalize posts from the users of a social media platform to a broader population remains one of many big data dangers.

4. Compare Data From Multiple Countries

Each year, the Organisation for Economic Cooperation and Development (OECD) produces the Better Life Index, which looks at the well-being of countries all around the world.

The OECD uses cross-national surveys to create this index, using survey methods to collect data in multiple nations. These comparative surveys allow researchers to investigate how institutional and cultural contexts shape public opinion and political behavior.

When we look at the future of data science and survey methodology, many researchers are interested in how questionnaire design will continue to work across countries, languages, and cultures. Lee explains why survey responses might vary from country to country or even across population subgroups of one country.

“Health seems to be a straightforward concept, however, that is not the case,” says Lee. “When asked about their own health in a survey, white populations may focus on their physical health, and Hispanic populations may think about family dynamics or relationships in addition to physical health. Depending on how you ask questions, though, you can lead different groups of respondents to think similarly in their formation of a response to the health question.”

Faculty members Jim Lepkowski, Steve Heeringa, and Richard Valliant have long been involved in sampling and data collection activities in the United States and around the globe.

5. Completing Mobile Surveys Using Speech Recognition Software

One of the clearest trends in survey and data science has been the increase in completing surveys on mobile devices. In fact, approximately 30% of all SurveyMonkey surveys in the United States are completed on a smartphone or tablet.

A key reason for this and similar trends in survey and data science is the rich set of methods for filling out surveys on mobile devices, leading researchers around the world to ask:

  • How often are individuals using voice input technology during their everyday lives?
  • Do response distributions differ between comparable groups of respondents answering by entering text on their mobile device versus speaking into their device?
  • How likely are individuals to use voice input technology to answer open-ended questions in web surveys and how does that affect what they say?
  • What factors determine an individual’s willingness to use voice input technology in surveys?

6. A Speedy, Affordable Way to Survey

Probability sampling, the gold standard in recruiting participants into survey research, may rely on conditions that are no longer routinely found. It has become increasingly hard to contact sample members and when contacted, even harder to persuade them to participate in a study.

Researchers have turned to big data tools and non-probability sampling methods such as volunteer online survey panels and crowdsourcing, which are cheaper and faster than traditional survey methods.

Faculty members Michael Elliott and Valliant, along with alum Jack Chen, have used a non-probability sampling method, Estimate Controlled LASSO, to effectively predict U.S. gubernatorial and Senate races. U-M faculty members Brady West and Roderick Little have developed model-based indices of non-ignorable selection bias for non-probability samples that have been successfully tested on survey data collected in a fertility study. Another faculty member, Yajuan Si, is working on model-assisted adjustment methods that integrate multilevel regression and poststratification with Bayesian framework, for non-probability sample studies.

Using non-probability sample sources is generally faster and more cost-effective than conventional probability sampling methods, but the quality of the resulting data is still under investigation.

7. Live Video’s Growth Among Survey Methods

In recent years, video communication has exponentially grown — whether it’s been for personal interactions, educational opportunities, or remote work.

Survey methodologists have begun to explore the use of live video interviews. To collect data when face-to-face questioning is important for a study’s goals, live video may work when in-person interviews are not safe, sample members are inaccessible, or sample members are not willing to be interviewed in their home.

As live video interviews become more commonplace, several questions will need to be addressed including how the quality of data collected this way compares to data collected in other survey modes and how sample members can best be recruited considering you can’t just knock on someone’s door or cold-call them.

Over the next few years, survey and data science professionals will explore questions such as:

  • What video platforms are best for survey methods and focus groups?
  • Should respondents be able to see the interviewer during a video survey?
  • Must video interviews necessarily be scheduled in advance or can interviews be conducted on demand?
  • How will interviewers be trained to collect data in this survey mode?
  • Who will participate in video interviews, and what kinds of nonresponse bias might be specific to video interviewing?

8. Linking Survey Responses to Administrative Data

Recent trends in survey and data science have included increased use of administrative databases, including retail purchase information and medical history, to improve and enhance surveys. Survey and data science researchers are using administrative databases more frequently because of increased availability of such data sources and substantial progress in the data-matching technology they are creating.

For instance, survey and data scientists may use electronic health records to:

  • Discover new causes of infectious diseases
  • Address outbreaks that trigger public health alerts
  • Enhance communication between public health practitioners and clinical organizations
  • Maintain and control rising health care costs by eliminating unnecessary tests, procedures, and prescriptions

9. Responsive and Adaptive Survey Design

Because of decreases in survey response rates, a set of methods have been developed in recent years to revise the sampling and recruitment for surveys in real time. This process, which happens while surveys are in the field, targets populations that are underrepresented in the sample — rather than relying on statistical adjustment after data collection is complete.

Responsive and adaptive survey design — as this approach is known by practitioners — is changing the future of data science and survey methodology. This process can, for example, involve closely examining operational data concerning the success of recruitment in certain neighborhoods, (e.g., patterns of contact attempts) and redoubling efforts where subgroup representation is below census proportions.

Major survey organizations, including the U.S. Census Bureau, have incorporated responsive and adaptive survey design into their data collection practices. Several faculty members have been instrumental in developing responsive and adaptive survey design, including James Wagner, Heeringa, and West

Conduct Research and Contribute to Current and Future Trends With U-M’s Master’s Degree In Survey and Data Science

Gain familiarity with these and other trends and master the necessary skills to stand out in a career in survey and data science. You can get there by choosing a master’s degree from the Michigan Program in Survey and Data Science. You will have the opportunity to:

  • Learn and explore in small graduate-level classes from faculty who are accessible and are thought leaders in survey methodology and data science.
  • Focus your degree path by pursuing concentrations in Social and Psychological Foundations, Statistical Science, and Data Science.
  • Take part in relevant, industry-leading research — whether evaluating existing survey methods, contributing to trends in survey and data science, or helping tackle big data dangers such as the lack of informed consent.
  • Complete hands-on activities such as graduate internships and research assistantships that will complement classroom material and prepare you for a career that drives decision making.

Get started on your master’s degree in survey and data science

Ready to reap the benefits of earning a master’s degree in survey and data science? Call us at 734-647-0038 or fill out a request for more information.

Apply today

Submit your graduate school application online today for the Master of Science in Survey and Data Science degree program from the University of Michigan.

Explore financial options

Explore your options for funding your graduate school training.