About

The Data Usage Explorer (DUE) is an innovative online prototype platform designed to highlight when, where, and how federal data assets—such as federal survey data, administrative data, and derived data products like statistical reports—are being used. As a part of the broader National Secure Data Service (NSDS) Demonstration project, the DUE seeks to bring clarity, transparency, and actionable insights to the federal data ecosystem. 

The DUE was developed for the National Center for Science and Engineering Statistics (NCSES), a principal statistical agency within the U.S. National Science Foundation (NSF), in collaboration with Mathematica and datHere. The DUE is still in its prototyping stage, and further ongoing user feedback is planned important in the future to inform future development under an NSDS. before the platform is officially launched. To submit your experiences, feedback or ideas of for any dashboard enhancements, please email us at ncsesweb@nsf.gov

Why the DUE matters 

The DUE supports evidence-based decision making. The platform: 

  • Aggregates and displays federal data usage statistics in intuitive, interactive formats.
  • Enables cross-disciplinary collaboration and knowledge sharing among data users.
  • Promotes federal data usage and discovery

The DUE documents real-world uses of federal data assets and surfaces trends in data consumption in various contexts, including academic research, data journalism, lawmaking, and state reporting. These insights support both internal agency planning and external stakeholder engagement, driving more informed and efficient investment in federal data programs and infrastructure.  

What the DUE offers 

  • Interactive dashboards show the number of references to data assets in publications by publication type, time, topic, and more.
  • User feedback integration empowers verified agency users to provide real-time validation of data asset references and ensure accurate usage statistics.
  • Scalable architecture supports long-term sustainability across the federal ecosystem.
  • Use of advanced technology, including large language models (LLMs), automatically identifies references to federal data assets in publications and calculates aggregate usage statistics. 

How the DUE works 

DUE is powered by a robust, semi-automated data ingestion pipeline that gathers and analyzes content from a wide array of sources. 

Step 1. Ingestion Pipeline 

The platform pulls in documents from publicly available repositories and curated feeds. These include: 

  • Research articles from open-access scientific journals
  • News stories referencing data in policy or societal contexts
  • Federal legislation and state reports 

Step 2. Machine Learning Identification 

Once ingested, the content is processed using both rule-based approaches and machine learning models trained to detect and classify references to federal data assets. These models leverage techniques, such as: 

  • Named entity recognition to detect the names of data assets (and common dataset aliases) using machine learning.
  • Few-shot learning, which allows the models to understand and identify federal data asset references based on only a small number of examples.
  • Adversarial modeling, using LLMs to serve as critics of the reference tagging results to improve their accuracy. 

Step 3. Aggregation and Insights 

Identified references are stored and aggregated to provide actionable insights, including most frequently cited datasets, topic-specific usage patterns, and temporal trends in data utilization. 

These insights feed into the DUE’s dashboards, offering a transparent, data-driven view of how public data powers research, journalism, and policymaking. 

User-centered design 

The DUE project employs a user-centered design approach to incorporate the clear and diverse viewpoints of key types of users. We engaged with federal, state, and local agencies, researchers, and other public audience members during the iterative development process to understand audience needs and use cases. This information was then used to design a modern and intuitive dashboard experience. The dashboard was further enhanced through multiple rounds of usability testing.  

History and future vision 

This current iteration of the DUE is a prototype. As part of the NSDS Demonstration project, it lays the foundation for a transparent and evidence-driven public data infrastructure that increases the value and effectiveness of federal data assets. The prototype is available for stakeholder feedback, with the goal of delivering a more advanced, sustainable, and scalable solution soon. 

The Data Usage Explorer envisions a future where federal data are better understood and generate actionable evidence.  

Do you have questions about using DUE or would you like to have your agency data added to the DUE? Email us at ncsesweb@nsf.gov