Custom GPTs - the Middlesex Fells Skyline Trail Analyzer

A case study in the use of Custom GPTs

Dec 09, 2023

In my previous post, I shared initial ideas on how to estimate the technicality of trails using GPX recordings of running data. These ideas were based on my recent race around the Middlesex Fells. Folks on Strava raised good questions and suggestions, and ChatGPT and I were listening. We're planning another rodeo in a week or so.

As preparation, I set up a Custom GPT named the "Middlesex Fells Trail Analyzer" a self-proclaimed "Expert in analyzing the Skyline Trail for runners, focusing on technical challenges" to help me out. I set it up with five years of GPX data from my races for which I have Strava GPS recordings.

Figure 1. The front page of the Middlesex Fells Trail Analyzer Custom GPT.

To be honest, I'm still trying to figure out under what circumstances a Custom GPT is useful. I had a few false starts earlier on, and this is a moving target - Custom GPT is evolving quickly.

Customized GPTs are an option currently available to ChatGPT Pro users. Users can create a ChatGPT front-end customized with privately held data and knowledge. In this case, my privately held data is my race data. If it goes well, later on, I plan to add prompts, tips, and rules that have been crafted for this application.

Additional information can be found at Open AI’s Custom GPT link:

https://openai.com/blog/introducing-gpts

This kind of customization/option is rapidly developing, Microsoft Copilot Studio already offers a similar feature.

https://www.theverge.com/2023/11/15/23960417/microsoft-copilot-ai-studio-custom-gpts-chatgpt-openai

Yesterday's post was nuisance enough to manage using one ChatGPT session and one GPX file - but five years’ worth of data! And what if I wanted to incorporate more data and rules? This is where a "Middlesex Fells Trail Analyzer" Custom GPT might help. It has already simplified my life with managing my GPX files. I added the five years of data to its knowledge base and was immediately able to ask questions about the entire combined data set without having to load it into a ChatGPT session by hand.

I immediately discovered some strangeness. The GPX files varied in the number of points saved, up to a factor of two. After a cursory Q/A'ing with ChatGPT and Google, it appears to be from the difference in sampling. I'm told that different phones with different settings can alter how many points are collected. I switched up a phone in the middle of this series, so it makes sense.

Onwards. Some stats on the dataset currently in the "Middlesex Fells Trail Analyzer" knowledge base are provided below.

Figure 2. How many GPX points are in each file?

Figure 3. Distribution of distances between GPX points for each year.

Figure 4. What are the total distances covered by each file?

Appendix - Sampling of ChatGPT generated code used here

# Redefine file paths since they were lost in the previous cell
file_paths = [
    "/mnt/data/TARC_Winter_classic_2023.gpx",
    "/mnt/data/TARC_Winter_Fells_2022.gpx",
    "/mnt/data/Winter_Fells_2021.gpx",
    "/mnt/data/Winter_Fells_2019_snowyear.gpx",
    "/mnt/data/TARC_Winter_Fells_Ultra_2018.gpx"
]

# Re-run the count using ElementTree
points_counts_et = [count_gpx_points_et(path) for path in file_paths]
points_counts_et



import matplotlib.pyplot as plt
import numpy as np

def calculate_distances(gpx_path):
    """
    Calculate distances between consecutive GPX points in a file.
    Returns a list of distances in meters.
    """
    tree = ET.parse(gpx_path)
    root = tree.getroot()
    ns = {'gpx': 'http://www.topografix.com/GPX/1/1'}
    points = root.findall('.//gpx:trkpt', ns)

    distances = []
    for i in range(len(points) - 1):
        lat1 = float(points[i].attrib['lat'])
        lon1 = float(points[i].attrib['lon'])
        lat2 = float(points[i+1].attrib['lat'])
        lon2 = float(points[i+1].attrib['lon'])

        # Calculate distance between points
        dist = np.sqrt((lat2 - lat1)**2 + (lon2 - lon1)**2) * 111319.9  # Approximation
        distances.append(dist)

    return distances

# Calculating distances for each file
distances_by_year = [calculate_distances(path) for path in file_paths]

# Plotting
plt.figure(figsize=(15, 8))

for i, distances in enumerate(distances_by_year):
    plt.subplot(2, 3, i + 1)
    plt.hist(distances, bins=50, alpha=0.7)
    plt.title(f'Year: {2018 + i}')
    plt.xlabel('Distance (meters)')
    plt.ylabel('Count')
    plt.yscale('log')  # Using log scale due to wide range of distances

plt.tight_layout()
plt.show()


from geopy.distance import great_circle

def calculate_total_distance(gpx_path):
    """
    Calculate the total distance of the route covered by a GPX file.
    Returns the total distance in kilometers.
    """
    tree = ET.parse(gpx_path)
    root = tree.getroot()
    ns = {'gpx': 'http://www.topografix.com/GPX/1/1'}
    points = root.findall('.//gpx:trkpt', ns)

    total_distance = 0
    for i in range(len(points) - 1):
        coord1 = (float(points[i].attrib['lat']), float(points[i].attrib['lon']))
        coord2 = (float(points[i+1].attrib['lat']), float(points[i+1].attrib['lon']))

        total_distance += great_circle(coord1, coord2).kilometers

    return total_distance

# Calculating total distance for each file
total_distances = [calculate_total_distance(path) for path in file_paths]
total_distances

Nate’s AI-ish Substack

Discussion about this post