NycTlcYellow Class

Represents the NYC Taxi & Limousine Commission yellow taxi trip public dataset.

The yellow taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. For more information about this dataset, including column descriptions, different ways to access the dataset, and examples, see NYC Taxi & Limousine Commission - yellow taxi trip records in the Microsoft Azure Open Datasets catalog.

Initialize filtering fields.

Inheritance
azureml.opendatasets._nyc_taxi_base.NycTaxiBase
NycTlcYellow

Constructor

NycTlcYellow(start_date: datetime = datetime.datetime(2015, 1, 1, 0, 0), end_date: datetime = datetime.datetime(2024, 10, 18, 0, 0), cols: List[str] | None = None, limit: int | None = -1, enable_telemetry: bool = True)

Parameters

Name Description
start_date

The date at which to start loading data, inclusive. If None, the default_start_date is used.

Default value: 2015-01-01 00:00:00
end_date

The date at which to end loading data, inclusive. If None, the default_end_date is used.

Default value: 2024-10-18 00:00:00
cols

A list of columns names to load from the dataset. If None, all columns are loaded. For information on the available columns in this dataset, see NYC Taxi & Limousine Commission - yellow taxi trip records.

Default value: None
limit
int

A value indicating the number of days of data to load with to_pandas_dataframe(). If not specified, the default of -1 means no limit on days loaded.

Default value: -1
enable_telemetry

Whether to enable telemetry on this dataset.

Default value: True
start_date
Required

The start date you'd like to query inclusively.

end_date
Required

The end date you'd like to query inclusively.

cols
Required

A list of column names you'd like to retrieve. None will get all columns.

limit
Required
int

to_pandas_dataframe() will load only "limit" months of data. -1 means no limit.

enable_telemetry
Required

Indicates whether to send telemetry.

Remarks

The example below shows how to access the dataset.


   from azureml.opendatasets import NycTlcYellow
   from dateutil import parser

   end_date = parser.parse('2018-06-06')
   start_date = parser.parse('2018-05-01')
   nyc_tlc = NycTlcYellow(start_date=start_date, end_date=end_date)
   nyc_tlc_df = nyc_tlc.to_pandas_dataframe()