• Что бы вступить в ряды "Принятый кодер" Вам нужно:
    Написать 10 полезных сообщений или тем и Получить 10 симпатий.
    Для того кто не хочет терять время,может пожертвовать средства для поддержки сервеса, и вступить в ряды VIP на месяц, дополнительная информация в лс.

  • Пользаватели которые будут спамить, уходят в бан без предупреждения. Спам сообщения определяется администрацией и модератором.

  • Гость, Что бы Вы хотели увидеть на нашем Форуме? Изложить свои идеи и пожелания по улучшению форума Вы можете поделиться с нами здесь. ----> Перейдите сюда
  • Все пользователи не прошедшие проверку электронной почты будут заблокированы. Все вопросы с разблокировкой обращайтесь по адресу электронной почте : info@guardianelinks.com . Не пришло сообщение о проверке или о сбросе также сообщите нам.

Taxi Drivers Efficiency Analysis with SQL & Tableau

Lomanu4 Оффлайн

Lomanu4

Команда форума
Администратор
Регистрация
1 Мар 2015
Сообщения
1,481
Баллы
155
I recently completed a hands-on SQL project using the Chicago Taxi Trips dataset on BigQuery, where I calculated taxi drivers' efficiency based on total fare earned per minute spent on trips for a single day.

? What I Did:

1. Queried the dataset with a Common Table Expression (CTE):

One of the most useful SQL features I leveraged in this project was the Common Table Expression (CTE).

A CTE lets you create a temporary, named result set that can be referenced within your main query. For me, it makes my SQL logic far more readable and manageable.

Instead of cramming all calculations into one long block of code, I broke things down step by step—calculating total trip duration, number of trips, and total fare inside the CTE. This made the final SELECT query cleaner, easier to debug, and more efficient to run.


# Creating a CTE
WITH table_mains AS (
SELECT
taxi_id,
SUM (TIMESTAMP_DIFF(trip_end_timestamp, trip_start_timestamp, MINUTE)) AS total_trip_duration,
COUNT (*) AS trip_count,
SUM (fare) AS total_fare
FROM
`bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE
DATE (trip_start_timestamp) = '2013-10-03'

# Filtering out rows that could have problems so as to get a clean result
AND (fare) is NOT NULL
AND (TIMESTAMP_DIFF(trip_end_timestamp, trip_start_timestamp, MINUTE)) > 0
GROUP BY taxi_id
)

2.Filtered out incomplete or unrealistic data while creating the CTE:

To ensure accurate results and insights, I filtered out rows with missing fare values and zero trip durations. This step is crucial—it prevents skewed efficiency scores and keeps the analysis reliable


# Filtering out rows that could have problems so as to get a clean result
AND (fare) is NOT NULL
AND (TIMESTAMP_DIFF(trip_end_timestamp, trip_start_timestamp, MINUTE)) > 0

3. Calculated key metrics: total trip duration, total fare, number of trips, and efficiency score and ranked drivers based on their efficiency score:

I calculated key performance metrics like total trip duration, total fare, and trip count.

Using SAFE_DIVIDE, I computed the efficiency score (fare per minute) to avoid errors from division by zero. Then, I applied the RANK() window function to rank drivers based on this score—making it easy to identify the most efficient drivers at a glance.

Also, I used the WHERE clause to extract results for drivers that had more than 5 trips and ordered the results by efficiency rank.


SELECT
taxi_id,
total_trip_duration,
trip_count,
total_fare,
SAFE_DIVIDE (total_fare, trip_count) AS avg_trip_cost,
SAFE_DIVIDE (total_fare, total_trip_duration) AS efficiency_score,
RANK () OVER (
ORDER BY SAFE_DIVIDE (total_fare, total_trip_duration) DESC
) AS efficiency_rank
FROM
table_mains

# Say we want to see the results for taxis that travelled more than 5 trips
WHERE
trip_count >= 5
ORDER BY efficiency_score DESC

? Results:

After running the query and getting my query result, I saved the result and went ahead to import the saved result into Tableau to create a visualization and draw insights from the result.

The visualization brings the data to life—making it easy to compare driver performance at a glance with its interactivity. It also helps stakeholders quickly identify top performers and make data-driven decisions with clarity


Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.




Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.



? Tools Used:


  1. SQL (Google BigQuery Sandbox)


  2. Tableau (for visualization)

? Reflection:
This project helped me reinforce my understanding of analytic functions like SAFE_DIVIDE() and RANK(), and taught me how to turn raw data into actionable insights.

What do you think of this approach? Feedback and ideas are welcome!

Thank you!


Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

 
Вверх Снизу