Overview
In this guide, you’ll learn how to use the Oxla database with Social Media datasets through various sample queries. In this case, we’ll utilize the GitHub datasets, from which you can retrieve events in all GitHub repositories since 2011 in a structured format.
Data Source: https://github.com/igrigorik/gharchive.org .
Prerequisite Ensure you’ve created the Data Storage with the appropriate demo dataset, the Cluster is running, and you’re connected to it using the PostgreSQL client. For more details, refer to the
Quickstart - Oxla SaaS page.
Datasets Structure
Let’s explore the tables and their structures to better understand and fully utilize the Web Traffic dataset.
Sample Queries
Sample 1: Identify the Most Active Users
This example identifies the 10 most active users on GitHub by counting their triggered events.
SELECT u.user_id, u.login, COUNT(e.event_id) AS event_count
FROM github_users u
JOIN github_events e ON u.user_id = e.user_id
GROUP BY u.user_id, u.login
ORDER BY event_count DESC
LIMIT 10;
It joins the github_users
table with the github_events
table to show ten active users.
user_id | login | event_count
8517910 | LombiqBot | 16605
14175800 | OpenLocalizationTest | 11963
10810283 | direwolf-github | 4774
3226435 | digimatic | 2596
57483 | tmtmtmtm | 2338
53164 | avsm | 1747
2354108 | coveralls | 1648
13653959 | k8s-merge-robot | 1458
12490588 | suzutan0s2 | 1431
406518 | davecramer | 1417
(10 rows)
Sample 2: Find the Most and Least Used Events
We want to identify the most used event and the least used event in the github_events
table.
(
SELECT event_type, COUNT(event_id) AS event_count, 'most used' AS usage_type
FROM github_events
GROUP BY event_type
LIMIT 1
)
UNION ALL
(
SELECT event_type, COUNT(event_id) AS event_count, 'least used' AS usage_type
FROM github_events
GROUP BY event_type
LIMIT 1
);
This output indicates that the PushEvent
is the most used event and PublicEvent
is the least used event.
event_type | event_count | usage_type
PushEvent | 587617 | most used
PublicEvent | 1140 | least used
(2 rows)
Sample 3: Analyzing Event Types Distribution per User
In this example, we calculates the distribution of different event types each user has participated in. It’s useful for understanding what types of activities users are most involved in, which can inform decision-making.
SELECT
gu.user_id,
gu.login,
ge.event_type,
COUNT(ge.event_id) AS event_count
FROM
github_users gu
JOIN
github_events ge ON gu.user_id = ge.user_id
GROUP BY
gu.user_id,
gu.login,
ge.event_type
ORDER BY
gu.user_id,
event_count DESC LIMIT 20;
The query returns a list of users along with the types of events they used and the count of each event type.
user_id | login | event_type | event_count
21 | technoweenie | IssueCommentEvent | 6
21 | technoweenie | IssuesEvent | 5
21 | technoweenie | ReleaseEvent | 1
21 | technoweenie | CreateEvent | 1
22 | macournoyer | IssueCommentEvent | 1
22 | macournoyer | PullRequestEvent | 1
22 | macournoyer | PushEvent | 1
38 | atmos | PushEvent | 1
45 | mojodna | WatchEvent | 2
69 | rsanheim | IssuesEvent | 2
78 | indirect | IssueCommentEvent | 3
78 | indirect | PushEvent | 1
78 | indirect | PullRequestEvent | 1
78 | indirect | DeleteEvent | 1
81 | engineyard | ForkEvent | 1
81 | engineyard | MemberEvent | 1
82 | jsierles | IssueCommentEvent | 3
82 | jsierles | PushEvent | 3
82 | jsierles | PullRequestEvent | 3
85 | brixen | IssueCommentEvent | 2
(20 rows)