EDA on Formula 1 World Championship Dataset

Kishan Rasikbhai Akbari
4 min readJul 2, 2021

Background:

Formula 1 aka F1 is a competitive sport which includes extreme Engineering and split second decision making. The decision making is done after considering a lot many parameters and also in the real time. Hence, comes the need of using Data Science to explore the opportunity for performance improvement and try to win it by that left over fraction of second.

This is a preliminary study comparing some of the basic variables involved in this highly competitive sport. I have got the dataset from kaggle and tried to get some insights into it.

Overview of the Dataset:

The dataset is open source and available on the kaggle platform. It includes various parameters related to different races, circuits, drivers, constructors, lap timings, pit stops etc. Whole dataset is available in different chunks and needs to be concatenated wisely in order to generate meaningful plots. Here I haven’t provided full piece of code. If you are interested in getting full code then please do visit this.

Let’s get started…!

Top Speed of the Fastest Lap

This parameter matters a lot as it signifies the capabilities of an F1 car and its driver. Clocking the top doesn’t achieve any direct reward in the form of points in the championship but it justifies the backend engineering and driver’s risk taking appetite. On the flip side, this parameter is constrained to the fastest lap. Hence, by default it is important for achieving the fastest lap points.

Top Speed of the Fastest Lap (km/hr) Vs Grand Prix(GP)/Race

The above plot indicates that, Italian GP is having higher median and maximum top speed in the fastest lap clocked. It proves why Monza Circuit (Italian GP) is considered as Temple of Speed.

Monza Circuit— Italy

On the other end, Monaco and Singapore GP are known of their sharp turns and angles, which results in the lower top speeds.

Monte Carlo Circuit — Monaco

Note: Here we would be ignoring the observations on some GPs like Tuscan, which has been held only once in 2020 so far. The reason being that considering low occurrence events can result into biased opinion.

This interactive plot helps in understanding the differences across various GP circuits. For example, despite having lower top speed, Monaco GP’s fastest lap timings are lower as compared to other GPs. If we observe Belgian GP, it is having high speed and high lap timings. This differences in turn results into the variation in total race laps of each of them (GP) in order to meet the maximum 2 hours race rule.

The Fastest Lap

Final Positions of the Fastest Lap clocking Driver

The above plot dictates that, first finisher is not necessarily be the fastest lap clocking driver. This is because the fastest lap clocking driver is awarded extra points in the championship. In case you are not winning and/or not losing over current position, then changing tires during those last remaining laps and try to score the fastest lap points is one of the famous strategy in F1. Following plot reflects this strategy where most of the drivers try to clock the fastest lap near end of the race.

The fastest Lap Number in the Race (for individual driver)

The Dominance: Winning Driver

Driver Vs GP Vs Wins

Some drivers are considered favourite to win particular GP based on their past performance there. Above heat map indicates such dominance shown by drivers. For example, Lewis Hamilton has won a larger portion of the all Hungarian GP ever conducted. Similarly legendary German driver Michael Schumacher used to dominate San Marino GP.

The Dominance: Winning Constructor

This is an interesting plot indicating the dominating performance by Ferrari at Australian GP and McLaren at Brazilian GP. A lot of credit goes to the driver pair, but judging the circuit parameters and defining the winning strategy is constructor’s contribution.

Constructor Vs GP Vs Wins

Conclusion:

F1 is a sport with intense competition. Even a fraction of second matters to differentiate winner from the rest. High number of parameters and requirement of the smartest strategy demands the implementation of statistical and data science techniques.

This study might be just a drop in the world of sports analytics involved in F1, but it can surely help us get some insights into it.

Thanks a lot for reading!

--

--