Web Traffic Dataset
Overview
In this guide, you’ll learn how to use the Oxla database with Web Traffic datasets through various sample queries. These datasets contain essential traffic information, including user activity, device, and browser information.
Data Source: https://github.com/ClickHouse/ClickBench .
Ensure you’ve created the Data Storage with the appropriate demo dataset, the Cluster is running, and you’re connected to it using the PostgreSQL client. For more details, refer to the Quickstart - Oxla SaaS page.
Datasets Structure
Let’s explore the tables and their structures to understand better and fully utilize the Web Traffic dataset.
Sample Queries
Sample 1: Top 10 Pages by Hits
This query selects the top 10 pages based on the number of hits they received.
SELECT title, COUNT(*) AS TotalHits
FROM hits
GROUP BY title
ORDER BY TotalHits DESC
LIMIT 10;
This result displays the top 10 pages along with the total number of hits each page received.
title | totalhits
--------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------
| 14910417
Djunglish.RU > Приобрете и б/у. Стихи | 1652089
Прог | 1158515
Яндекс.Видео дня. Лента новый - сравных. Автопоиск по лучшей цены - Форум клуба Chevrolett, номера Соборудованном участные товая ул., д. 17, 3.5 - Свободная | 961538
FLATY.RU - Лучшей цене, в пообедитино — Trashbox.ru#photograd.info: фут-фетиш, женщин - Яндекс.Афиша, 28 - Cherofloor=#ff99> <br>кто хорошее планшетные | 832991
Новостровск, ФОТО - Форум Амур.инфо, 8 из 15 ярко-розочках - сравнение | 809519
Шарарту п.... Москве | 730100
Поиск по продажа | 572331
Участи NEWSru | 486009
Шарарансляция коду | 420539
(10 rows)
Sample 2: Find Most Active Regions
This query extracts the most active regions ordered by the total number of hits.
SELECT regionid, COUNT(*) AS TotalHits
FROM hits
GROUP BY regionid
ORDER BY TotalHits DESC
LIMIT 10;
This helps to pinpoint regions where the highest volume of interactions or activities has occurred.
regionid | totalhits
----------+-----------
229 | 18295832
2 | 6687587
208 | 4261812
169 | 3320229
32 | 1843518
34 | 1792369
184 | 1755192
42 | 1542717
107 | 1516690
51 | 1435578
(10 rows)
Sample 3: Hourly Activity Analysis
This query uses the DATE_TRUNC
function to aggregate user activity by hour.
SELECT DATE_TRUNC('hour', EventTime) AS HourlyActivity,
COUNT(*) AS TotalHits
FROM hits
GROUP BY HourlyActivity
ORDER BY TotalHits DESC LIMIT 30;
The result displays the total hits aggregated by hour, which can help you determine the peak usage times.
hourlyactivity | totalhits
---------------------+-----------
2013-07-21 00:00:00 | 698815
2013-07-20 22:00:00 | 698653
2013-07-20 20:00:00 | 697103
2013-07-21 19:00:00 | 696980
2013-07-20 21:00:00 | 696781
2013-07-21 16:00:00 | 696039
2013-07-21 15:00:00 | 694468
2013-07-21 18:00:00 | 694037
2013-07-21 12:00:00 | 693485
2013-07-21 07:00:00 | 693146
2013-07-21 04:00:00 | 693142
2013-07-21 11:00:00 | 693106
2013-07-21 13:00:00 | 692814
2013-07-21 03:00:00 | 692689
2013-07-21 05:00:00 | 692160
2013-07-21 02:00:00 | 691713
2013-07-21 17:00:00 | 691100
2013-07-21 06:00:00 | 690457
2013-07-21 10:00:00 | 689752
2013-07-21 14:00:00 | 689493
2013-07-20 23:00:00 | 689349
2013-07-21 01:00:00 | 689090
2013-07-21 09:00:00 | 688810
2013-07-21 08:00:00 | 685090
2013-07-09 18:00:00 | 573479
2013-07-09 17:00:00 | 573035
2013-07-08 21:00:00 | 572060
2013-07-09 00:00:00 | 565040
2013-07-08 22:00:00 | 564743
2013-07-09 12:00:00 | 564698
(30 rows)