By Pedro Ardila; Designer at Microsoft, Ironman competitor, Data Geek
Triathlon is a changing sport. Over the past twenty years, there has been a tremendous amount of change to equipment, training methodology, and racing strategy. No detail is too small for those athletes whose main goal is to qualify to a world championship event. There is no bigger stage in triathlon than the Ironman World Championships in Kailua-Kona, Hawaii.
What goes into qualifying to this event? In most cases, it is months—or even years—of training, tens of thousands of dollars in equipment and entry fees, and of course, lots of support from family and friends. One aspect that no athlete should leave out is strategy; and every athlete’s strategy starts with a couple questions:
- What race should I enter in order to have the best chance to make it to the big dance in Kona?
- What times do I need in the swim, bike, and run portions to have a chance to qualify?
I asked myself these questions, and decided to use my Microsoft Power BI knowledge to get some insights into what is needed to qualify. Here are the steps I followed:
Step 1: Get Data
I wanted to get some general demographic information to understand who enters these races. I also needed some recent race results to understand the characteristics of each race, and to get a better feel for the competition. I gathered the demographics data from USA Triathlon followed by a few thousands of rows worth of results for different races from ironman.com, and this is a lot of data. What made working with this data simple was using Excel and Power Query.
Let’s focus on how I gathered results from the Ironman website. Ironman does a great job storing the results for all its races. However, the pagination system makes it hard to export results for further analysis. Here is where Power Query becomes essential. Without Power Query, I would have to manually copy each page to an Excel spreadsheet, which is extremely tedious and time consuming. I decided to write a couple of queries that would go through each page and collect all the results for a race.
Query 1: GetData
Collects the results for a page given the page number. The second query, called KonaResults2023 (see below) will call GetData once for each page of results. We want to disable auto-loading to the worksheet for this query because KonaResults2013 will ultimately be responsible for getting the complete results into my model. Here is the code for the query:
let #"a"= (page) => let Source = Web.Page(Web.Contents("https://www.ironman.com/triathlon/events/americas/ironman/world-championship/results.aspx?p=" & Text.From(page))), /*replace this with the url for your desired race. Make sure to trim the url to the same spot: “https://...?p=” and leave out the other variables*/ Data0 = Source{0}[Data], ChangedType = Table.TransformColumnTypes(Data0,{{"Name", type text}, {"Country", type text}, {"Div Rank", type number}, {"Gender Rank", type number}, {"Overall Rank", type number}, {"Swim", type time}, {"Bike", type time}, {"Run", type time}, {"Finish", type time}, {"Points", type number}}) in ChangedType in a
Query 2: KonaResults2013
This query calls GetData n times, where n is the number of pages we need to go through. In this case, n = 107. The query also expands the results from GetData into a table and adds some formatting. I made sure to load the results straight to the data model as I was planning to visualize it using Power View. Once I ran this query –and a few more for the other races I wanted to analyze—I was ready for some analysis. Check out the code:
let Source = {1..107}, /*replace 107 with the last page on your desired race.*/ TableFromList = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error), InsertedCustom = Table.AddColumn(TableFromList, "Custom", each GetData([Column1])), #"Expand Custom" = Table.ExpandTableColumn(InsertedCustom, "Custom", {"Name", "Country", "Div Rank", "Gender Rank", "Overall Rank", "Swim", "Bike", "Run", "Finish", "Points"}, {"Custom.Name", "Custom.Country", "Custom.Div Rank", "Custom.Gender Rank", "Custom.Overall Rank", "Custom.Swim", "Custom.Bike", "Custom.Run", "Custom.Finish", "Custom.Points"}), RemovedColumns = Table.RemoveColumns(#"Expand Custom",{"Column1"}) in RemovedColumns
One great thing about the pattern above is that it can be used for any paginated table as long as the page number is passed through the URL. All of the transformations in the query (renaming columns, etc.) can be done through the Power Query UI, so the amount of code I ended up typing was minimal.
Step 2: Visualize
Next step was to visualize the data. I used Power View to create a series of charts. By using these charts, I could begin to not only see the results, but also begin to gather insights.
Step 3: Gather Insights
Here are some of the interesting things I found, as well as pictures for the different Power View report above.
The first insights came from the demographics of triathlon in the United States:
– There is a heavy concentration of triathletes in the east coast and California
– Triathlon participation is higher in Michigan than Colorado
The next set of insights came from slicing and dicing finisher data for a few different Ironman races. First of all, let’s look at Kona:
Here we can see the size of each age group participating, as well as the average completion time for each age group. To dig deeper into the data, I clicked on the ‘M30-34’ group in the horizontal “Average of OverallInHours by Division” chart, which cross-filters our data. This step shows us that the wining participant for the Age Group of males between age 30 and 34 finished in an astonishing time of 8.62 hours (or 8:37 minutes for those of us without a calculator).
Now, let’s look at some of the races I am considering signing up for, and compare their 2013 results against one another. We will also include Kona for to get some perspective, even though it is a qualifying-only event.
Here we start getting clues about the relative ‘toughness’ of each race. For instance, it is evident that Ironman Lake Tahoe favors strong climbers. It is a tough race overall. Here we can see that 20% of participants DNF’d (did not finish) at Lake Tahoe. Here are some more insights:
– Lots of people sign up but don’t start at Ironman Cozumel. My theory is that lots of people pull out of this race due to its proximity to Thanksgiving.
– The average finishing times for Ironman Canada and Ironman Cozumel are pretty close despite having very different elevation numbers. This doesn’t mean that elevation is not a factor. Instead, it tells us that there may be other challenges at Cozumel not accounted for in our data. In this case, those challenges are the added winds and humidity of Cozumel.
– There is a huge gap between Kona and the other races as far average finish times goes. This is primarily due to the selective nature of a World Championship event.
– Ironman Canada offers a balanced ride and a challenging run. This course seems to suit me, given that I am relatively light—meaning I don’t need to exert lots of effort while climbing on the bike—and I can run well in challenging courses.
With this information at hand, we can now look at some specifics for Ironman Canada, such as the time breakdowns for swimming, biking, and running, and average finishing time for each age group. We will focus in on my Age group (25-29) and compare the average bike vs average run times using a scatter plot.
The scatter plot has a clear trend. It is that most people are able to balance their efforts on the bike and the run. There are, however, some outliers. These could be people who were slow on the bike (perhaps due to mechanical or nutrition issues) but pulled together a great run, or people who had a really strong bike, but faded on the run. Some additional insights:
– Ironman Canada’s fastest age group was different from Kona’s (M35-39 in Canada vs. M30-34 in Kona)
– Age groups M35-39 and M30-34 were faster on average than the Female Pros.
– The first four people finished in under 10 hours, and were all in the top 100 overall. This means that to have a reasonable chance to qualify, I would have to finish in about 9 hours and 45 minutes.
This last insight is extremely important, and it will help me set up objective goals for my next training season.
Actionable Insights: How Should I Move Forward with my Training?
Using Power BI I was able to quickly gather use big data, and then slice it and dice it until I found the answer to my original question: What time do I need to qualify to Kona? The answer is 9:45 or better. Time to start training.
Ready to uncover your own insights?