Airflow tutorial6/13/2023 ![]() The airflow team has given us a couple of DAGs pre built and we can now check what is written.īelow is the code for this particular DAG To see the code itself click on the last view the Code View. The Run DAG w/ config option is if you have some custom configuration for the DAG which is not required in this case.Ĭongrats, you've run your very first DAG.Ī lot of the different views are for monitoring purposes once the code is deployed. On clicking Run, choose 'Run DAG' option. ![]() Once you click it, you can click on the Run button on the top right to execute the DAG. This can be done for the home page as well if you noticed. Make sure that you Activate the DAG by clicking the switch button on the top right. This brings up the Tree view and shows how the steps are linked in a tree like structure. ![]() Let's look at the first DAG - example bash operator This is for your reference and you to play around with when creating your own DAG!! Your home screen is auto loaded with a bunch of pre-defined DAGs. Let us now see the various features of Airflow. Now go to localhost:8080 and you have the below page opening up Save it on your local machine in a folder of your choice and runĮnter fullscreen mode Exit fullscreen mode Prerequisite: docker and docker compose should already be installed.ĭownload the Airflow docker compose file created by the Airflow team: I will go through a very simple setup here. You can follow the official documentation for installation Now that we have seen the popular concepts, let us get airflow up and running on our local machine. Note: the data passed between tasks should be very minimal and we should not pass large objects. XCom(Cross Communication) is basically the data that flows between Task1 and Task2. This segregates your code from the connection configuration and makes it reusable across environments as well. And in your DAG(Python code) you will ONLY use the conn_id as a reference. For example if you need a DB connection, in the Airflow UI you will create a connection with the host name, port, username and password of the connection and provide a conn_id(connection id). Airflow already has integration with many external interfaces so you do not need write low level code. There is a layer between the code and the connection details. We can even call python functions and use the outputĢ or a bash operator and execute a specific command. Each operator has configuration that can be done to suit our requirement. Now tasks are executed using certain Airflow operators. Task5 should run if either Task1, Task2 or Task3 fail. Task4 should run if Task1, Task2 and Task3 are successful only. We can provide different relationship/dependencies between tasks.Įxample: Task2 should run after Task1. Note: You do not need to know advanced python to start working on DAGs but should know the basics.Įach step of our workflow is called a Task. Each DAG has some common configuration and details of what task needs to be done at each step. The biggest plus point about Airflow is its user friendly UI.Ī DAG is a python(.py) file that defines what the steps are in our workflow. It sounds like it does a lot and it does, and can be intimidating, but it is really easy to get started. Airflow supports easy integration with all popular external interfaces like DBs(SQL and MongoDB), SSH, FTP, Cloud providers etc. It helps define workflows with python code and provides a rich UI to manage and monitor these workflows. In this blog post, we are going to take a look at how we can setup Apache Airflow on our systems and get you as a developer, started off with just the bare minimum so you can start working on it.įor detailed documentation please always refer the Airflow official DocumentationĪirflow is a workflow management platform for data engineering pipelines.
0 Comments
Leave a Reply. |