Datasets

A few popular time series datasets

class darts.datasets.AirPassengersDataset[source]

Bases: DatasetLoaderCSV

Monthly Air Passengers Dataset, from 1949 to 1960.

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.AusBeerDataset[source]

Bases: DatasetLoaderCSV

Total quarterly beer production in Australia (in megalitres) from 1956:Q1 to 2008:Q3 [1].

References

1

https://rdrr.io/cran/fpp/man/ausbeer.html

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.AustralianTourismDataset[source]

Bases: DatasetLoaderCSV

A single multivariate TimeSeries, containing monthly tourism numbers over 36 months in Australia. The numbers are broken down per region (“NSW”, “VIC”, “QLD”, “SA”, “WA”, “TAS”, “NT”), reason (“Hol”, “VFR”, “Bus”, “Oth”), (region, reason) pairs, and (region, reason, <city>) tuples, where <city> can be either “city” or “noncity”.

This is an augmented version of the Australian tourism dataset available in [1], where we pre-computed the groupings per region (not available in the original dataset).

References

1

https://robjhyndman.com/publications/hierarchical-tourism/

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.ETTh1Dataset[source]

Bases: DatasetLoaderCSV

The data of 1 Electricity Transformers at 1 stations, including load, oil temperature. The dataset ranges from 2016/07 to 2018/07 taken hourly. Source: [1] [2]

Field Descriptions:

  • date: The recorded date

  • HUFL: High UseFul Load

  • HULL: High UseLess Load

  • MUFL: Medium UseFul Load

  • MULL: Medium UseLess Load

  • LUFL: Low UseFul Load

  • LULL: Low UseLess Load

  • OT: Oil Temperature (Target)

References

1

https://github.com/zhouhaoyi/ETDataset

2

https://arxiv.org/abs/2012.07436

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.ETTh2Dataset[source]

Bases: DatasetLoaderCSV

The data of 1 Electricity Transformers at 1 stations, including load, oil temperature. The dataset ranges from 2016/07 to 2018/07 taken hourly. Source: [1] [2]

Field Descriptions:

  • date: The recorded date

  • HUFL: High UseFul Load

  • HULL: High UseLess Load

  • MUFL: Medium UseFul Load

  • MULL: Medium UseLess Load

  • LUFL: Low UseFul Load

  • LULL: Low UseLess Load

  • OT: Oil Temperature (Target)

References

1

https://github.com/zhouhaoyi/ETDataset

2

https://arxiv.org/abs/2012.07436

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.ETTm1Dataset[source]

Bases: DatasetLoaderCSV

The data of 1 Electricity Transformers at 1 stations, including load, oil temperature. The dataset ranges from 2016/07 to 2018/07 recorded every 15 minutes. Source: [1] [2]

Field Descriptions:

  • date: The recorded date

  • HUFL: High UseFul Load

  • HULL: High UseLess Load

  • MUFL: Medium UseFul Load

  • MULL: Medium UseLess Load

  • LUFL: Low UseFul Load

  • LULL: Low UseLess Load

  • OT: Oil Temperature (Target)

References

1

https://github.com/zhouhaoyi/ETDataset

2

https://arxiv.org/abs/2012.07436

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.ETTm2Dataset[source]

Bases: DatasetLoaderCSV

The data of 1 Electricity Transformers at 1 stations, including load, oil temperature. The dataset ranges from 2016/07 to 2018/07 recorded every 15 minutes. Source: [1] [2]

Field Descriptions:

  • date: The recorded date

  • HUFL: High UseFul Load

  • HULL: High UseLess Load

  • MUFL: Medium UseFul Load

  • MULL: Medium UseLess Load

  • LUFL: Low UseFul Load

  • LULL: Low UseLess Load

  • OT: Oil Temperature (Target)

References

1

https://github.com/zhouhaoyi/ETDataset

2

https://arxiv.org/abs/2012.07436

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.ElectricityConsumptionZurichDataset[source]

Bases: DatasetLoaderCSV

Electricity Consumption of households & SMEs (low voltage) and businesses & services (medium voltage) in the city of Zurich [1], with values recorded every 15 minutes.

The electricity consumption is combined with weather measurements recorded by three different stations in the city of Zurich with a hourly frequency [2]. The missing time stamps are filled with NaN. The original weather data is recorded every hour. Before adding the features to the electricity consumption, the data is resampled to 15 minutes frequency, and missing values are interpolated.

To simplify the dataset, the measurements from the Zch_Schimmelstrasse and Zch_Rosengartenstrasse weather stations are discarded to keep only the data recorded in the Zch_Stampfenbachstrasse station.

Both dataset sources are updated continuously, but this dataset only retrains values between 2015-01-01 and 2022-08-31. The time index was converted from CET time zone to UTC.

Components Descriptions:

  • Value_NE5 : Households & SMEs electricity consumption (low voltage, grid level 7) in kWh

  • Value_NE7 : Business and services electricity consumption (medium voltage, grid level 5) in kWh

  • Hr [%Hr] : Relative humidity

  • RainDur [min] : Duration of precipitation (divided by 4 for conversion from hourly to quarter-hourly records)

  • T [°C] : Temperature

  • WD [°] : Wind direction

  • WVv [m/s] : Wind vector speed

  • p [hPa] : Air pressure

  • WVs [m/s] : Wind scalar speed

  • StrGlo [W/m2] : Global solar irradiation

Note: before 2018, the scalar speeds were calculated from the 30 minutes vector data.

References

1

https://data.stadt-zuerich.ch/dataset/ewz_stromabgabe_netzebenen_stadt_zuerich

2

https://data.stadt-zuerich.ch/dataset/ugz_meteodaten_stundenmittelwerte

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.ElectricityDataset(multivariate=True)[source]

Bases: DatasetLoaderCSV

Measurements of electric power consumption in one household with 15 minute sampling rate. 370 client’s consumption are recorded in kW. Source: [1]

Loading this dataset will provide a multivariate timeseries with 370 columns for each household. The following code can be used to convert the dataset to a list of univariate timeseries, one for each household.

References

1

https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014

Methods

load()

Load the dataset in memory, as a TimeSeries.

Parameters

multivariate (bool) – Whether to return a single multivariate timeseries - if False returns a list of univariate TimeSeries. Default is True.

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.EnergyDataset[source]

Bases: DatasetLoaderCSV

Hourly energy dataset coming from [1].

Contains a time series with 28 hourly components between 2014-12-31 23:00:00 and 2018-12-31 22:00:00

References

1

https://www.kaggle.com/nicholasjhana/energy-consumption-generation-prices-and-weather

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.ExchangeRateDataset(multivariate=True)[source]

Bases: DatasetLoaderCSV

The collection of the daily exchange rates of eight foreign countries, including Australia, British, Canada, Switzerland, China, Japan, New Zealand, and Singapore, ranging from 1990 to 2016. Unfortunately, there were some inconsistencies concerning the dates, so the resulting TimeSeries is integer-indexed. Source: [1]

References

1

https://github.com/laiguokun/multivariate-time-series-data

Methods

load()

Load the dataset in memory, as a TimeSeries.

Parameters

multivariate (bool) – Whether to return a single multivariate timeseries - if False returns a list of univariate TimeSeries. Default is True.

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.GasRateCO2Dataset[source]

Bases: DatasetLoaderCSV

Gas Rate CO2 dataset Two components, length 296 (integer time index)

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.HeartRateDataset[source]

Bases: DatasetLoaderCSV

The series contains 1800 evenly-spaced measurements of instantaneous heart rate from a single subject. The measurements (in units of beats per minute) occur at 0.5 second intervals, so that the length of each series is exactly 15 minutes.

This is the series1 in [1]. It uses an integer time index.

References

1

http://ecg.mit.edu/time-series/

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.ILINetDataset(multivariate=True)[source]

Bases: DatasetLoaderCSV

ILI describes the number of patients seen with influenzalike illness and the total number of patients. It includes weekly data from the Centers for Disease Control and Prevention of the United States from 1997 to 2022. Source: [1] [2] [3] [4]

Components Descriptions:

  • % WEIGHTED ILI: Combined state-specific data of patients visit to healthcare providers for ILI reported each week

    weighted by state population

  • % UNWEIGHTED ILI: Combined state-specific data of patients visit to healthcare providers for ILI reported each

    week unweighted by state population

  • AGE 0-4: Number of patients between 0 and 4 years of age

  • AGE 25-49: Number of patients between 25 and 49 years of age

  • AGE 25-64: Number of patients between 25 and 64 years of age

  • AGE 5-24: Number of patients between 5 and 24 years of age

  • AGE 50-64: Number of patients between 50 and 64 years of age

  • AGE 65: Number of patients above (>=65) 65 years of age

  • ILITOTAL: Total number of ILI patients. For this system, ILI is defined as fever (temperature of 100°F [37.8°C]

    or greater) and a cough and/or a sore throat

  • NUM. OF PROVIDERS: Number of outpatient healthcare providers

  • TOTAL PATIENTS: Total number of patients

References

1

https://gis.cdc.gov/grasp/fluview/fluportaldashboard.html

2

https://www.cdc.gov/flu/weekly/overview.htm#Outpatient

3

https://arxiv.org/pdf/2205.13504.pdf

4

https://gis.cdc.gov/grasp/fluview/FluViewPhase2QuickReferenceGuide.pdf

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.IceCreamHeaterDataset[source]

Bases: DatasetLoaderCSV

Monthly sales of heaters and ice cream between January 2004 and June 2020.

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.MonthlyMilkDataset[source]

Bases: DatasetLoaderCSV

Monthly production of milk (in pounds per cow) between January 1962 and December 1975

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.MonthlyMilkIncompleteDataset[source]

Bases: DatasetLoaderCSV

Monthly production of milk (in pounds per cow) between January 1962 and December 1975. Has some missing values.

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.SunspotsDataset[source]

Bases: DatasetLoaderCSV

Monthly Sunspot Numbers, 1749 - 1983

Monthly mean relative sunspot numbers from 1749 to 1983. Collected at Swiss Federal Observatory, Zurich until 1960, then Tokyo Astronomical Observatory.

Source: [1]

References

1

https://www.rdocumentation.org/packages/datasets/versions/3.6.1/topics/sunspots

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.TaxiNewYorkDataset[source]

Bases: DatasetLoaderCSV

Taxi Passengers in New York, from 2014-07 to 2015-01. The data consists of aggregated total number of taxi passengers into 30 minute buckets. Univariate series. Source: [1]

References

1

https://www.kaggle.com/code/julienjta/nyc-taxi-traffic-analysis

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.TaylorDataset[source]

Bases: DatasetLoaderCSV

Half-hourly electricity demand in England and Wales from Monday 5 June 2000 to Sunday 27 August 2000. Discussed in Taylor (2003) [1], and kindly provided by James W Taylor [2]. Units: Megawatts (Uses an integer time index).

References

1

Taylor, J.W. (2003) Short-term electricity demand forecasting using double seasonal exponential smoothing. Journal of the Operational Research Society, 54, 799-805.

2

https://www.rdocumentation.org/packages/forecast/versions/8.13/topics/taylor

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.TemperatureDataset[source]

Bases: DatasetLoaderCSV

Daily temperature in Melbourne between 1981 and 1990

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.TrafficDataset(multivariate=True)[source]

Bases: DatasetLoaderCSV

The data in this repo is a collection of 48 months (2015-2016) hourly data from the California Department of Transportation. The data describes the road occupancy rates (between 0 and 1) measured by 862 different sensors on San Francisco Bay area freeways. The raw data is in http://pems.dot.ca.gov. Source: [1]

References

1

https://github.com/laiguokun/multivariate-time-series-data

Methods

load()

Load the dataset in memory, as a TimeSeries.

Parameters

multivariate (bool) – Whether to return a single multivariate timeseries - if False returns a list of univariate TimeSeries. Default is True.

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.USGasolineDataset[source]

Bases: DatasetLoaderCSV

Weekly U.S. Product Supplied of Finished Motor Gasoline between 1991-02-08 and 2021-04-30

Obtained from [1].

References

1

https://www.eia.gov/dnav/pet/hist/LeafHandler.ashx?n=PET&s=wgfupus2&f=W

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.UberTLCDataset(sample_freq='hourly', multivariate=True)[source]

Bases: DatasetLoaderCSV

14.3 million Uber pickups from January to June 2015. The data is resampled to hourly or daily based sample_freq on using the locationID as the target. Source: [1]

Loading this dataset will provide a multivariate timeseries with 262 columns for each locationID. The following code can be used to convert the dataset to a list of univariate timeseries, one for each locationID.

References

1

https://github.com/fivethirtyeight/uber-tlc-foil-response

Methods

load()

Load the dataset in memory, as a TimeSeries.

Parameters
  • sample_freq (str) – The sampling frequency of the data. Can be “hourly” or “daily”. Default is “hourly”.

  • multivariate (bool) – Whether to return a single multivariate timeseries - if False returns a list of univariate TimeSeries. Default is True.

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.WeatherDataset(multivariate=True)[source]

Bases: DatasetLoaderCSV

Weather includes 21 indicators of weather, such as air temperature, and humidity. The data was recorded every 10 min for 2020 in Germany. Source: [1] [2]

References

1

https://www.bgc-jena.mpg.de/wetter/

2

https://arxiv.org/pdf/2205.13504.pdf

Methods

load()

Load the dataset in memory, as a TimeSeries.

Parameters

multivariate (bool) – Whether to return a single multivariate timeseries - if False returns a list of univariate TimeSeries. Default is True.

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.WineDataset[source]

Bases: DatasetLoaderCSV

Australian total wine sales by wine makers in bottles <= 1 litre. Monthly between Jan 1980 and Aug 1994. Source: [1]

References

1

https://www.rdocumentation.org/packages/forecast/versions/8.1/topics/wineind

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries

class darts.datasets.WoolyDataset[source]

Bases: DatasetLoaderCSV

Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 – Sep 1994. Source: [1]

References

1

https://www.rdocumentation.org/packages/forecast/versions/8.1/topics/woolyrnq

Methods

load()

Load the dataset in memory, as a TimeSeries.

load()

Load the dataset in memory, as a TimeSeries. Downloads the dataset if it is not present already

Raises

DatasetLoadingException – If loading fails (MD5 Checksum is invalid, Download failed, Reading from disk failed)

Returns

time_series – A TimeSeries object that contains the dataset

Return type

TimeSeries