Version: 1.0.0 (latest)

Getting started

dlt pacman

What is dlt?

dlt is an open-source Python library that loads data from various, often messy data sources into well-structured, live datasets. It offers a lightweight interface for extracting data from REST APIs, SQL databases, cloud storage, Python data structures, and many more.

dlt is designed to be easy to use, flexible, and scalable:

dlt infers schemas and data types, normalizes the data, and handles nested data structures.
dlt supports a variety of popular destinations and has an interface to add custom destinations to create reverse ETL pipelines.
dlt can be deployed anywhere Python runs, be it on Airflow, serverless functions or any other cloud deployment of your choice.
dlt automates pipeline maintenance with schema evolution and schema and data contracts.

To get started with dlt, install the library using pip:

pip install dlt

tip

We recommend using a clean virtual environment for your experiments! Read the detailed instructions on how to set up one.

Load data with dlt from …

REST APIs
SQL databases
Cloud storages or files
Python data structures

Use dlt's REST API source to extract data from any REST API. Define API endpoints you’d like to fetch data from, pagination method and authentication and dlt will handle the rest:

import dlt
from dlt.sources.rest_api import rest_api_source

source = rest_api_source({
    "client": {
        "base_url": "https://api.example.com/",
        "auth": {
            "token": dlt.secrets["your_api_token"],
        },
        "paginator": {
            "type": "json_response",
            "next_url_path": "paging.next",
        },
    },
    "resources": ["posts", "comments"],
})

pipeline = dlt.pipeline(
    pipeline_name="rest_api_example",
    destination="duckdb",
    dataset_name="rest_api_data",
)

load_info = pipeline.run(source)

Follow the REST API source tutorial to learn more about the source configuration and pagination methods.

Use the SQL source to extract data from the database like PostgreSQL, MySQL, SQLite, Oracle and more.

from dlt.sources.sql_database import sql_database

source = sql_database(
    "mysql+pymysql://rfamro@mysql-rfam-public.ebi.ac.uk:4497/Rfam"
)

pipeline = dlt.pipeline(
    pipeline_name="sql_database_example",
    destination="duckdb",
    dataset_name="sql_data",
)

load_info = pipeline.run(source)

Follow the SQL source tutorial to learn more about the source configuration and supported databases.

Filesystem source extracts data from AWS S3, Google Cloud Storage, Google Drive, Azure, or a local file system.

from dlt.sources.filesystem import filesystem

source = filesystem(
    bucket_url="s3://example-bucket",
    file_glob="*.csv"
)

pipeline = dlt.pipeline(
    pipeline_name="filesystem_example",
    destination="duckdb",
    dataset_name="filesystem_data",
)

load_info = pipeline.run(source)

Follow the filesystem source tutorial to learn more about the source configuration and supported storage services.

dlt is able to load data from Python generators or directly from Python data structures:

import dlt

@dlt.resource
def foo():
    for i in range(10):
        yield {"id": i, "name": f"This is item {i}"}

pipeline = dlt.pipeline(
    pipeline_name="python_data_example",
    destination="duckdb",
)

load_info = pipeline.run(foo)

Check out the Python data structures tutorial to learn about dlt fundamentals and advanced usage scenarios.

tip

If you'd like to try out dlt without installing it on your machine, check out the Google Colab demo.

Join the dlt community

Give the library a ⭐ and check out the code on GitHub.
Ask questions and share how you use the library on Slack.
Report problems and make feature requests here.

Getting started

What is dlt?

Load data with dlt from …

Join the dlt community

DHelp

Ask a question

Getting started

What is dlt?​

Load data with dlt from …​

Join the dlt community​

DHelp

Ask a question

What is dlt?

Load data with dlt from …

Join the dlt community