![]() This is a peculiar aspect to Airflow, but an important one to remember - especially if you're using default variables and macros. The reasoning here is that Airflow can't ensure that all data corresponding to the 2pm interval is present until the end of that hourly interval. An hourly DAG, for example, will execute its 2pm run when the clock strikes 3pm. That means one schedule_interval AFTER the start date. By design, an Airflow DAG will execute at the completion of its schedule_interval datetime(2019,1,1)) and set catchup=False (unless you're looking to run a backfill). To properly trigger your DAG to run, make sure to insert a fixed time in the past (e.g. Since this will happen every time Airflow heartbeats (evaluates your DAG) every 5-10 seconds, it'll never run. NOT a time in the past) and decides that it's not ready to run. When Airflow evaluates your DAG file, it interprets datetime.now() as the current timestamp (i.e. It's intuitive to think that if you tell your DAG to start "now" that it'll execute "now." BUT, that doesn't take into account how Airflow itself actually reads datetime.now().įor a DAG to be executed, the start_date must be a time in the past, otherwise Airflow will assume that it's not yet ready to execute. Make sure you don't have datetime.now() as your start_date Jobs not executing via Airflow that runs celery with RabbitMQ.Airflow tasks get stuck at "queued" status and never gets running (especially see Bolke's answer here).I know this has been discussed there a few times and one of the core contributors may be able to provide additional context. You might also consider posting to the Airflow dev mailing list. This used to be addressed by restarting every X runs using the SCHEDULER_RUNS config setting, although that setting was recently removed from the default systemd scripts. Airflow at WePay (section "Restart everything when deploying DAG changes.").Bug 1286825 - Airflow scheduler stopped working silently.Airflow: Tips, Tricks, and Pitfalls (section "The scheduler should be restarted frequently").Your task volume, task duration, and parallelism settings are worth considering when experimenting with a restart interval. I've found success at a 1-hour interval personally, but have seen as frequently as every 5-10 minutes used too. Most people solve the scheduler diminishing throughput problem by restarting the scheduler regularly. Scheduler Basics in the Airflow wiki provides a concise reference on how the scheduler works and its various states. One major difference is that scheduled and queued task states are rebuilt. I've reviewed the scheduler code, but I'm still unclear on what exactly happens differently on a fresh start to kick it back into scheduling normally. I think the issue persists in 1.9, but I'm not positive.įor whatever reason, there seems to be a long-standing issue with the Airflow scheduler where performance degrades over time. I'm running a fork of the puckel/docker-airflow repo as well, mostly on Airflow 1.8 for about a year with 10M+ task instances. Interesting thing to notice is when I restart the scheduler tasks change to running state. There are also tasks in the same dag that are stuck with no status (white icon). The dag is very simple with 2 independent tasks only dependent on last run. Metrics on scheduler do not show heavy load. ![]() In most cases this just means that the task will probably be scheduled soon unless:- The scheduler is down or under heavy load For the tasks that are not running are showing in queued state (grey icon) when hovering over the task icon operator is null and task details says: All dependencies are met but the task instance is not running. There are 4 scheduler threads and 4 Celery worker tasks. The airflow setup is running on ECS with Redis. ![]() I do see tasks in database that either have no status or queued status but they never get started. I keep seeing below in the scheduler logs INFO - No tasks to consider for execution. Airflow is randomly not running queued tasks some tasks dont even get queued status. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |