Skip to main content
Show Me The Data
HomeIntro
About

Newsletter

Get insights on platform data and research

Subscribe

YouTube Channel

Video tutorials and insights

Subscribe

Support on Patreon

Help create more content

Become a Patron

Buy Me a Coffee

One-time support

Buy Coffee

Created by Matt Motyl

© 2025 Matt Motyl. All rights reserved.

On This Page

Submit Feedback
Back to Home

API Guide for Platform Data

Learn how to programmatically access platform data through APIs. Practice with real examples and understand authentication, rate limits, and data formats.

What are Platform APIs?

An API (Application Programming Interface) allows you to programmatically request data from platforms instead of manually downloading files. APIs are essential for:

  • Collecting real-time data for research and monitoring
  • Automating data collection and analysis workflows
  • Accessing platform features and data not available through manual downloads
  • Building custom tools and applications that work with platform data

Understanding API Basics

REST APIs

Most platform APIs use REST (Representational State Transfer) architecture. You make HTTP requests to specific URLs (endpoints) to retrieve data.

GET https://api.platform.com/v1/posts
Authorization: Bearer YOUR_TOKEN

Authentication

Most APIs require authentication to track usage and enforce rate limits. Common methods:

  • API Keys
  • OAuth 2.0 tokens
  • Bearer tokens

Rate Limits

Platforms limit how many API requests you can make per hour/day to prevent abuse. Always check rate limits and implement appropriate delays in your code.

Example: Twitter API Free tier allows 1,500 posts/month

Data Formats

Platform APIs typically return data in JSON format, which is easy to parse in most programming languages.

{
  "id": "123",
  "text": "Hello world",
  "created_at": "2024-10-21"
}

Try APIs Yourself

Practice making real API calls with these interactive demos. Both APIs require no authentication for public data access.

Mastodon API Demo

Mastodon provides one of the most accessible public APIs for social media data. Fetch real posts from public timelines without any authentication.

Mastodon API Demo

e.g., mastodon.social, fosstodon.org

Required - without the # symbol

Filter for posts with BOTH hashtags

Display:

Learn More

This demo uses Mastodon's public API. No authentication required for public timelines.

Read Mastodon API Documentation →

Wikipedia API Demo

Wikipedia (a VLOSE under the DSA) offers a comprehensive API for accessing article content, search results, and page view statistics. No authentication required.

Wikipedia API Demo

API Docs

Wikipedia's API requires no authentication. Search for articles, view summaries, and explore page view statistics.

Try these examples:

API Endpoints Used:
  • • Search: /w/api.php?action=query&list=search
  • • Summary: /api/rest_v1/page/summary/{title}
  • • Page views: wikimedia.org/api/rest_v1/metrics/pageviews

Platform-Specific API Access

PlatformAPI AccessAuthenticationDocumentation
Meta (Facebook/Instagram)Graph APIOAuth 2.0View Docs
X (Twitter)X API v2OAuth 2.0 / API KeyView Docs
TikTokResearch APIApplication RequiredView Docs
YouTubeYouTube Data APIAPI Key / OAuth 2.0View Docs
MastodonMastodon APIOAuth 2.0 (optional for public data)View Docs
LinkedInLinkedIn APIOAuth 2.0View Docs
PinterestPinterest APIOAuth 2.0View Docs
SnapchatSnap KitOAuth 2.0View Docs
Amazon StoreProduct Advertising APIAPI KeyView Docs
Apple App StoreApp Store Connect APIJWT TokenView Docs
Google PlayGoogle Play Developer APIOAuth 2.0View Docs
Google MapsMaps Platform APIsAPI KeyView Docs
Google ShoppingContent API for ShoppingOAuth 2.0View Docs
Alibaba (AliExpress)AliExpress Open PlatformAPI KeyView Docs
Booking.comAffiliate Partner APIAPI KeyView Docs
ZalandoZalando APIAPI KeyView Docs
Google Search (VLOSE)Custom Search JSON APIAPI KeyView Docs
Bing (VLOSE)Bing Web Search APIAPI KeyView Docs
Wikipedia (VLOSE)MediaWiki APINo authentication requiredView Docs

Note: Platforms with light blue background are designated as Very Large Online Search Engines (VLOSEs) under the Digital Services Act. All others are Very Large Online Platforms (VLOPs).

What Data are Actually Available?

While many platforms offer APIs, they vary significantly in what data they make accessible. This comparison shows which data fields are publicly available through each platform's API, highlighting critical gaps in transparency and researcher access.

Important Context

This comparison shows data fields that are publicly accessible via APIs as of October 2024. Platforms collect far more data internally than they make available through their APIs. Access requirements vary by platform and data type.

Data FieldFacebook/MetaX (Twitter)TikTokMastodon
User Data
User ID
Username
Follower Count
Verified Status
Account Creation Date
User Location/Country
Content Data
Post ID
Post Text/Caption
Post Timestamp
Media Type (video/image/text)
Hashtags
Mentions
Engagement Metrics
Like/Reaction Count
Share/Repost Count
Comment Count
View Count
Time Spent on Content
Click-Through Data
Content Moderation
Content Warning Labels
Fact-Check Labels
Removal/Takedown Status
Appeal Status
Content Visibility Status
Algorithmic Data
Recommendation Score/Ranking
Virality Score
Content Category/Classification
Harmful Content Scores

What's Available

Most platforms provide basic user profile data, post content, and public engagement metrics through their APIs. Mastodon, as an open-source platform, tends to be more transparent with its data.

Note: Even when data is "available," access may require special permissions, researcher credentials, or be limited by rate limits and authentication requirements.

What's Missing

Critical data for understanding platform risks is largely unavailable via public APIs:

  • • Algorithmic recommendation scores and rankings
  • • Content moderation decisions and reasoning
  • • User behavior metrics (time spent, scrolling patterns)
  • • Harmful content classification scores
  • • Detailed demographic inferences

Implications for Researchers

The gaps in publicly available API data have significant consequences:

  • 1. Limited Transparency: Researchers cannot verify platform transparency reports because the underlying classification data is unavailable
  • 2. Duplicated Work: Without access to platform content classification systems, researchers must build their own labeling systems from scratch
  • 3. Incomplete Analysis: Studies of systemic risks under the DSA are hampered by lack of algorithmic and behavioral data
  • 4. Article 40 Gap: Even Article 40 of the DSA, which grants researcher access, may not bridge these gaps if platforms don't provide the critical variables used in their own internal systems

DSA Article 40 Requirements

Under Article 40 of the Digital Services Act, VLOPs must provide vetted researchers with access to:

  • Publicly accessible data via APIs
  • Additional data necessary for assessing systemic risks
  • Data on content moderation decisions and actions
  • Information on recommender systems

However, the quality and completeness of this access is still evolving as platforms develop their researcher access programs.

Getting Started: Code Examples

Python Example

Using the popular requests library to fetch data:

import requests

# Set up API credentials
headers = {
    'Authorization': 'Bearer YOUR_API_TOKEN'
}

# Make API request
response = requests.get(
    'https://api.platform.com/v1/posts',
    headers=headers,
    params={'limit': 100}
)

# Parse JSON response
data = response.json()

# Process results
for post in data['posts']:
    print(f"Post ID: {post[___TOKEN_9___]}")
    print(f"Content: {post[___TOKEN_10___]}")
    print(f"Likes: {post[___TOKEN_11___]}\n")

JavaScript/Node.js Example

Using the fetch API:

// Set up API request
const response = await fetch(
    'https:___TOKEN_1___
    {
        headers: {
            'Authorization': 'Bearer YOUR_API_TOKEN'
        }
    }
);

// Parse JSON response
const data = await response.json();

// Process results
data.posts.forEach(post => {
    console.log(`Post ID: ${post.id}`);
    console.log(`Content: ${post.text}`);
    console.log(`Likes: ${post.likes_count}\n`);
});

R Example

Using the httr package:

library(httr)
library(jsonlite)

# Set up API request
response <- GET(
    "https://api.platform.com/v1/posts",
    add_headers(Authorization = "Bearer YOUR_API_TOKEN"),
    query = list(limit = 100)
)

# Parse JSON response
data <- fromJSON(content(response, "text"))

# Process results
for(i in 1:nrow(data$posts)) {
    cat("Post ID:", data$posts$id[i], "\n")
    cat("Content:", data$posts$text[i], "\n")
    cat("Likes:", data$posts$likes_count[i], "\n\n")
}

API Best Practices

1. Respect Rate Limits

Implement delays between requests and monitor your usage. Use exponential backoff when rate limited.

2. Handle Errors Gracefully

Always check response status codes and implement proper error handling. APIs can fail for many reasons.

3. Store Credentials Securely

Never hardcode API keys in your code. Use environment variables or secure credential management systems.

4. Cache Responses When Possible

Save API responses locally to avoid unnecessary requests and reduce costs.

5. Document Your Work

Keep records of API versions, endpoints used, and any data transformations for reproducibility.

Additional Resources

  • REST API Tutorial - Comprehensive guide to REST APIs
  • Postman API Testing - Test APIs before writing code
  • Platform-Specific Guides - Detailed guides for each VLOP/VLOSE
Previous: SQL GuideNext: Common Pitfalls