Learn how to programmatically access platform data through APIs. Practice with real examples and understand authentication, rate limits, and data formats.
An API (Application Programming Interface) allows you to programmatically request data from platforms instead of manually downloading files. APIs are essential for:
Most platform APIs use REST (Representational State Transfer) architecture. You make HTTP requests to specific URLs (endpoints) to retrieve data.
GET https://api.platform.com/v1/posts
Authorization: Bearer YOUR_TOKENMost APIs require authentication to track usage and enforce rate limits. Common methods:
Platforms limit how many API requests you can make per hour/day to prevent abuse. Always check rate limits and implement appropriate delays in your code.
Example: Twitter API Free tier allows 1,500 posts/month
Platform APIs typically return data in JSON format, which is easy to parse in most programming languages.
{
"id": "123",
"text": "Hello world",
"created_at": "2024-10-21"
}Practice making real API calls with these interactive demos. Both APIs require no authentication for public data access.
Mastodon provides one of the most accessible public APIs for social media data. Fetch real posts from public timelines without any authentication.
e.g., mastodon.social, fosstodon.org
Required - without the # symbol
Filter for posts with BOTH hashtags
This demo uses Mastodon's public API. No authentication required for public timelines.
Read Mastodon API Documentation →Wikipedia (a VLOSE under the DSA) offers a comprehensive API for accessing article content, search results, and page view statistics. No authentication required.
Wikipedia's API requires no authentication. Search for articles, view summaries, and explore page view statistics.
Try these examples:
| Platform | API Access | Authentication | Documentation |
|---|---|---|---|
| Meta (Facebook/Instagram) | Graph API | OAuth 2.0 | View Docs |
| X (Twitter) | X API v2 | OAuth 2.0 / API Key | View Docs |
| TikTok | Research API | Application Required | View Docs |
| YouTube | YouTube Data API | API Key / OAuth 2.0 | View Docs |
| Mastodon | Mastodon API | OAuth 2.0 (optional for public data) | View Docs |
| LinkedIn API | OAuth 2.0 | View Docs | |
| Pinterest API | OAuth 2.0 | View Docs | |
| Snapchat | Snap Kit | OAuth 2.0 | View Docs |
| Amazon Store | Product Advertising API | API Key | View Docs |
| Apple App Store | App Store Connect API | JWT Token | View Docs |
| Google Play | Google Play Developer API | OAuth 2.0 | View Docs |
| Google Maps | Maps Platform APIs | API Key | View Docs |
| Google Shopping | Content API for Shopping | OAuth 2.0 | View Docs |
| Alibaba (AliExpress) | AliExpress Open Platform | API Key | View Docs |
| Booking.com | Affiliate Partner API | API Key | View Docs |
| Zalando | Zalando API | API Key | View Docs |
| Google Search (VLOSE) | Custom Search JSON API | API Key | View Docs |
| Bing (VLOSE) | Bing Web Search API | API Key | View Docs |
| Wikipedia (VLOSE) | MediaWiki API | No authentication required | View Docs |
Note: Platforms with light blue background are designated as Very Large Online Search Engines (VLOSEs) under the Digital Services Act. All others are Very Large Online Platforms (VLOPs).
While many platforms offer APIs, they vary significantly in what data they make accessible. This comparison shows which data fields are publicly available through each platform's API, highlighting critical gaps in transparency and researcher access.
Important Context
This comparison shows data fields that are publicly accessible via APIs as of October 2024. Platforms collect far more data internally than they make available through their APIs. Access requirements vary by platform and data type.
| Data Field | Facebook/Meta | X (Twitter) | TikTok | Mastodon |
|---|---|---|---|---|
| User Data | ||||
| User ID | ||||
| Username | ||||
| Follower Count | ||||
| Verified Status | ||||
| Account Creation Date | ||||
| User Location/Country | ||||
| Content Data | ||||
| Post ID | ||||
| Post Text/Caption | ||||
| Post Timestamp | ||||
| Media Type (video/image/text) | ||||
| Hashtags | ||||
| Mentions | ||||
| Engagement Metrics | ||||
| Like/Reaction Count | ||||
| Share/Repost Count | ||||
| Comment Count | ||||
| View Count | ||||
| Time Spent on Content | ||||
| Click-Through Data | ||||
| Content Moderation | ||||
| Content Warning Labels | ||||
| Fact-Check Labels | ||||
| Removal/Takedown Status | ||||
| Appeal Status | ||||
| Content Visibility Status | ||||
| Algorithmic Data | ||||
| Recommendation Score/Ranking | ||||
| Virality Score | ||||
| Content Category/Classification | ||||
| Harmful Content Scores | ||||
Most platforms provide basic user profile data, post content, and public engagement metrics through their APIs. Mastodon, as an open-source platform, tends to be more transparent with its data.
Note: Even when data is "available," access may require special permissions, researcher credentials, or be limited by rate limits and authentication requirements.
Critical data for understanding platform risks is largely unavailable via public APIs:
The gaps in publicly available API data have significant consequences:
Under Article 40 of the Digital Services Act, VLOPs must provide vetted researchers with access to:
However, the quality and completeness of this access is still evolving as platforms develop their researcher access programs.
Using the popular requests library to fetch data:
import requests
# Set up API credentials
headers = {
'Authorization': 'Bearer YOUR_API_TOKEN'
}
# Make API request
response = requests.get(
'https://api.platform.com/v1/posts',
headers=headers,
params={'limit': 100}
)
# Parse JSON response
data = response.json()
# Process results
for post in data['posts']:
print(f"Post ID: {post[___TOKEN_9___]}")
print(f"Content: {post[___TOKEN_10___]}")
print(f"Likes: {post[___TOKEN_11___]}\n")Using the fetch API:
// Set up API request
const response = await fetch(
'https:___TOKEN_1___
{
headers: {
'Authorization': 'Bearer YOUR_API_TOKEN'
}
}
);
// Parse JSON response
const data = await response.json();
// Process results
data.posts.forEach(post => {
console.log(`Post ID: ${post.id}`);
console.log(`Content: ${post.text}`);
console.log(`Likes: ${post.likes_count}\n`);
});Using the httr package:
library(httr)
library(jsonlite)
# Set up API request
response <- GET(
"https://api.platform.com/v1/posts",
add_headers(Authorization = "Bearer YOUR_API_TOKEN"),
query = list(limit = 100)
)
# Parse JSON response
data <- fromJSON(content(response, "text"))
# Process results
for(i in 1:nrow(data$posts)) {
cat("Post ID:", data$posts$id[i], "\n")
cat("Content:", data$posts$text[i], "\n")
cat("Likes:", data$posts$likes_count[i], "\n\n")
}Implement delays between requests and monitor your usage. Use exponential backoff when rate limited.
Always check response status codes and implement proper error handling. APIs can fail for many reasons.
Never hardcode API keys in your code. Use environment variables or secure credential management systems.
Save API responses locally to avoid unnecessary requests and reduce costs.
Keep records of API versions, endpoints used, and any data transformations for reproducibility.