Urban Mobility Dataset
Overview
In this guide, you’ll learn how to use the Oxla database with Urban Mobility datasets through various sample queries. These datasets include time-series data from New York City Taxi and For-Hire Vehicle Data. It contains several tables, such as trips, taxi zones, and weather observations.
Data Source: https://github.com/toddwschneider/nyc-taxi-data .
Ensure you’ve created the Data Storage with the appropriate demo dataset, the Cluster is running, and you’re connected to it using the PostgreSQL client. For more details, refer to the Quickstart - Oxla SaaS page.
Datasets Structure
Let’s explore the tables and their structures to better understand and fully utilize the Urban Mobility dataset.
Sample Queries
Sample 1: Number of Trips by Payment Type
This example shows the number of trips for each payment type used in the dataset.
SELECT payment_type, count(*) AS count
FROM trips
GROUP BY payment_type
ORDER BY count DESC;
The query displays number of trips grouped by payment type.
payment_type | count
--------------+-----------
1 | 147533966
2 | 147159834
3 | 752296
4 | 242825
5 | 150926
| 14580
(6 rows)
Sample 2: Calculate Revenue per Pickup Point
This example calculates the top 10 pickup locations with the highest revenue.
SELECT t.pickup_location_id AS pickup_location_id,
tz."zone" AS location_name,
COUNT(*) AS trip_count,
SUM(t.total_amount) AS total_revenue
FROM trips t
JOIN taxi_zones tz ON t.pickup_location_id = tz.locationid
GROUP BY t.pickup_location_id, tz."zone"
ORDER BY total_revenue DESC
LIMIT 10;
The result displays the top 10 pickup locations, with JFK Airport having the highest revenue.
pickup_location_id | location_name | trip_count | total_revenue
--------------------+---------------------------+------------+--------------------
132 | JFK Airport | 4781775 | 240929978.91028678
138 | LaGuardia Airport | 5935988 | 219636305.66008496
161 | Midtown Center | 9978810 | 125652682.01979701
79 | East Village | 9407751 | 119170547.52983798
162 | Midtown East | 9297167 | 118483924.92982587
48 | Clinton East | 8872713 | 112228413.58982424
170 | Murray Hill | 9161159 | 111683576.97983585
237 | Upper East Side South | 10378670 | 111360934.71979024
234 | Union Sq | 9156879 | 109465941.75984423
230 | Times Sq/Theatre District | 9522197 | 109449951.09985997
(10 rows)
Sample 3: Manhattan’s Most Popular Cabs
This example provides insights into Manhattan’s most popular cabs by showing key statistics for each cab type.
SELECT c.type AS cab_type,
MIN(t.trip_distance) AS shortest_distance,
MAX(t.trip_distance) AS farthest_distance,
AVG(t.trip_distance) AS average_distance,
AVG(t.fare_amount) AS average_fare,
SUM(t.fare_amount) AS total_fare
FROM trips t
JOIN cab_types c ON t.cab_type_id = c.id
GROUP BY c.type
ORDER BY c.type;
The result shows the shortest and farthest trip distances, average trip distance, and average fare for each cab type.
cab_type | shortest_distance | farthest_distance | average_distance | average_fare | total_fare
----------+-------------------+-------------------+-------------------+--------------------+-------------------
green | 0 | 640 | 2.846760430722446 | 12.214111199481835 | 154370169.6499999
yellow | -21474834 | 16200370 | 5.955657407792438 | 11.567361225195867 | 3276058919.61685
(2 rows)
Sample 4: How Weather Impacts the Transportation Demand Analysis
Here we analyze the average temperature and wind speed from January to April on the 1st and 2nd of each month.
SELECT
w."date" AS pickup_date,
(w.max_temperature + w.min_temperature) / 2 AS average_temperature,
w.average_wind_speed,
COUNT(t.id) AS number_of_trips
FROM
central_park_weather_observations w
LEFT JOIN
trips t ON w."date" = t.pickup_datetime::date
WHERE
w."date" IN (
'2009-01-01', '2009-01-02',
'2009-02-01', '2009-02-02',
'2009-03-01', '2009-03-02',
'2009-04-01', '2009-04-02'
)
GROUP BY
w."date", w.max_temperature, w.min_temperature, w.average_wind_speed
ORDER BY
w."date";
The output helps us understand how weather conditions from January to April impact the number of trips taken on specific dates. It shows that days with higher temperatures and stronger winds see more trips happening.
pickup_date | average_temperature | average_wind_speed | number_of_trips
-------------+---------------------+--------------------+-----------------
2009-01-01 | 20.5 | 11.18 | 65888
2009-01-02 | 28.5 | 6.26 | 75481
2009-02-01 | 39 | 7.61 | 86019
2009-02-02 | 43.5 | 5.14 | 80808
2009-03-01 | 32 | 9.84 | 86320
2009-03-02 | 22 | 10.96 | 65607
2009-04-01 | 46 | 6.71 | 77604
2009-04-02 | 54 | 5.59 | 99310
(8 rows)