Very large online platforms and search engines (VLOPSEs) collect both an incredible amount and a wide variety of data that even newly hired employees can struggle with as they learn the internal data systems. The structure of the data and the data warehouse that the companies use to store all their data can vary from company to company. Some might be chaotic, with data stored in tables that employees organically produce over time, and some might be more standardized and structured with employees required to follow a strict framework. In either situation, it can still be challenging to understand the potentially hundreds or thousands of tables that VLOPSEs maintain in their data warehouses.
Given the role of online platforms in our lives, it is not surprising that civil society and research institutions recognize the importance of these data, and in particular how it can help us better understand the risks that online platforms can pose to people, societies, and democracies. This data can also tell us how effectively the company is managing and minimizing those risks. Researchers and civil societies need to have access to datasets, both historical and real time, for studying the scale, cause, and nature of risks from the platforms. They need to be able to monitor the information environment in real time, especially around critical societal events like elections.
Article 40 of the Digital Services Act (DSA) allows for significant researcher access to VLOPSE data, creating the opportunity to answer research questions surrounding the systemic risks outlined in the DSA. This paper aims to equip vetted academic and civil society researchers with the understanding and tools necessary to best utilize their access to the data for the public good.
The DSA is landmark European legislation that imposes new obligations on digital platforms, particularly Very Large Online Platforms (VLOPs) and Very Large Online Search Engines (VLOSEs) - defined as services with more than 45 million monthly active users in the EU.
Key provisions include:
Online platforms with more than 45 million monthly active users in the EU. These include social media platforms, marketplaces, and app stores that must comply with enhanced DSA obligations.
Search engines with more than 45 million monthly active users in the EU. Currently includes Google Search and Bing, subject to similar transparency requirements as VLOPs.
The table below shows all 25 platforms designated as VLOPs or VLOSEs. Click column headers to sort.
| Platform | Type | EU Users (Est.) | API Available | Research Access |
|---|---|---|---|---|
| VLOP | 255M+ | Graph API | Yes | |
| VLOP | 250M+ | Graph API | Limited | |
| TikTok | VLOP | 150M+ | Research API | Application Required |
| YouTube | VLOP | 400M+ | Data API v3 | Yes |
| X (Twitter) | VLOP | 100M+ | X API v2 | Paid Tiers |
| VLOP | 180M+ | LinkedIn API | Limited | |
| Snapchat | VLOP | 100M+ | Marketing API | No |
| VLOP | 130M+ | Pinterest API | Limited | |
| Wikipedia | VLOP | 350M+ | MediaWiki API | Yes (Open) |
| Amazon Store | VLOP | 200M+ | Product Advertising API | Limited |
| AliExpress | VLOP | 150M+ | Affiliate API | Limited |
| Google Play | VLOP | 300M+ | Developer API | No |
| Apple AppStore | VLOP | 200M+ | App Store Connect API | No |
| Booking.com | VLOP | 80M+ | Affiliate API | No |
| Google Maps | VLOP | 400M+ | Maps Platform API | Limited |
| Google Shopping | VLOP | 300M+ | Content API | Limited |
| Zalando | VLOP | 50M+ | Partner API | No |
| Shein | VLOP | 100M+ | Affiliate API | No |
| Temu | VLOP | 92M+ | Limited | No |
| Pornhub | VLOP | 130M+ | Limited | No |
| Stripchat | VLOP | 75M+ | No | No |
| XVideos | VLOP | 165M+ | No | No |
| XNXX | VLOP | 150M+ | No | No |
| Google Search | VLOSE | 400M+ | Custom Search API | Limited |
| Bing | VLOSE | 100M+ | Bing Search API | Limited |
Click column headers to sort. 25 total rows.
Tip: Throughout this guide, terms with dotted underlines have tooltip definitions. Hover over them to see a quick definition, or click "Learn more" to come here for the full explanation.
Detailed guides for each platform will include: