Overview

In this guide, you’ll learn how to use the Oxla database with Urban Mobility datasets through various sample queries. These datasets include time-series data from New York City Taxi and For-Hire Vehicle Data. It contains several tables, such as trips, taxi zones, and weather observations.

Data Source: https://github.com/toddwschneider/nyc-taxi-data .

Prerequisite
Ensure you’ve created the Data Storage with the appropriate demo dataset, the Cluster is running, and you’re connected to it using the PostgreSQL client. For more details, refer to the Quickstart - Oxla SaaS page.

Datasets Structure

Let’s explore the tables and their structures to better understand and fully utilize the Urban Mobility dataset.

Sample Queries

Sample 1: Number of Trips by Payment Type

This example shows the number of trips for each payment type used in the dataset.

SELECT payment_type, count(*) AS count
FROM trips
GROUP BY payment_type
ORDER BY count DESC;

The query displays number of trips grouped by payment type.

 payment_type |   count   
--------------+-----------
            1 | 147533966
            2 | 147159834
            3 |    752296
            4 |    242825
            5 |    150926
              |     14580
(6 rows)

Sample 2: Calculate Revenue per Pickup Point

This example calculates the top 10 pickup locations with the highest revenue.

SELECT t.pickup_location_id AS pickup_location_id, 
       tz."zone" AS location_name,
       COUNT(*) AS trip_count,
       SUM(t.total_amount) AS total_revenue 
FROM trips t
JOIN taxi_zones tz ON t.pickup_location_id = tz.locationid 
GROUP BY t.pickup_location_id, tz."zone" 
ORDER BY total_revenue DESC 
LIMIT 10;

The result displays the top 10 pickup locations, with JFK Airport having the highest revenue.

 pickup_location_id |       location_name       | trip_count |   total_revenue    
--------------------+---------------------------+------------+--------------------
                132 | JFK Airport               |    4781775 | 240929978.91028678
                138 | LaGuardia Airport         |    5935988 | 219636305.66008496
                161 | Midtown Center            |    9978810 | 125652682.01979701
                 79 | East Village              |    9407751 | 119170547.52983798
                162 | Midtown East              |    9297167 | 118483924.92982587
                 48 | Clinton East              |    8872713 | 112228413.58982424
                170 | Murray Hill               |    9161159 | 111683576.97983585
                237 | Upper East Side South     |   10378670 | 111360934.71979024
                234 | Union Sq                  |    9156879 | 109465941.75984423
                230 | Times Sq/Theatre District |    9522197 | 109449951.09985997
(10 rows)

This example provides insights into Manhattan’s most popular cabs by showing key statistics for each cab type.

SELECT c.type AS cab_type,
       MIN(t.trip_distance) AS shortest_distance,
       MAX(t.trip_distance) AS farthest_distance,
       AVG(t.trip_distance) AS average_distance,
       AVG(t.fare_amount) AS average_fare,
       SUM(t.fare_amount) AS total_fare
FROM trips t
JOIN cab_types c ON t.cab_type_id = c.id
GROUP BY c.type
ORDER BY c.type;

The result shows the shortest and farthest trip distances, average trip distance, and average fare for each cab type.

 cab_type | shortest_distance | farthest_distance | average_distance  |    average_fare    |    total_fare     
----------+-------------------+-------------------+-------------------+--------------------+-------------------
 green    |                 0 |               640 | 2.846760430722446 | 12.214111199481835 | 154370169.6499999
 yellow   |         -21474834 |          16200370 | 5.955657407792438 | 11.567361225195867 |  3276058919.61685
(2 rows)

Sample 4: How Weather Impacts the Transportation Demand Analysis

Here we analyze the average temperature and wind speed from January to April on the 1st and 2nd of each month.

SELECT
    w."date" AS pickup_date,
    (w.max_temperature + w.min_temperature) / 2 AS average_temperature,
    w.average_wind_speed,
    COUNT(t.id) AS number_of_trips
FROM
    central_park_weather_observations w
LEFT JOIN
    trips t ON w."date" = t.pickup_datetime::date
WHERE
    w."date" IN (
        '2009-01-01', '2009-01-02',
        '2009-02-01', '2009-02-02',
        '2009-03-01', '2009-03-02',
        '2009-04-01', '2009-04-02'
    )
GROUP BY
    w."date", w.max_temperature, w.min_temperature, w.average_wind_speed
ORDER BY
    w."date";

The output helps us understand how weather conditions from January to April impact the number of trips taken on specific dates. It shows that days with higher temperatures and stronger winds see more trips happening.

 pickup_date | average_temperature | average_wind_speed | number_of_trips 
-------------+---------------------+--------------------+-----------------
 2009-01-01  |                20.5 |              11.18 |           65888
 2009-01-02  |                28.5 |               6.26 |           75481
 2009-02-01  |                  39 |               7.61 |           86019
 2009-02-02  |                43.5 |               5.14 |           80808
 2009-03-01  |                  32 |               9.84 |           86320
 2009-03-02  |                  22 |              10.96 |           65607
 2009-04-01  |                  46 |               6.71 |           77604
 2009-04-02  |                  54 |               5.59 |           99310
(8 rows)