Blog | The Renewal Prediction Playbook: Key learnings from our data-driven journey

.ie domains
.IE Tech
Data and analytics
Domain renewal
by Priyadarshini Tiwari
25 Sep 2024

Introduction

At .ie, our purpose is to enable and empower people, communities, and businesses to thrive online. One way we gauge if Ireland is thriving digitally is by assessing net growth and the balance between newly created domains and those that are not renewed.

Maintaining a healthy net growth figure is essential for us, as it reflects the vitality of the .ie domain space. This metric provides insight into how well the local domain landscape is performing, helping us assess our contribution to Ireland’s digital growth.

However, predicting domain renewals can be unpredictable. Various factors such as market dynamics, user behaviour, and external circumstances affect whether a domain is renewed or allowed to lapse. This unpredictability presents challenges not only for businesses and individuals but also for registries like ours that are dedicated to supporting Ireland’s digital economy.

To address these challenges and maintain a healthy renewal rate, we initiated the Domain Renewal Predictive Model project.

Why predicting domain renewals is critical

Before diving into our project’s details, let’s first understand why predicting domain renewals is so crucial for businesses and organisations:

  • Cost optimisation: Being able to predict domains that are likely to expire allows registrars to focus on retaining ‘at-risk domains’, optimising resource allocation, and avoiding the financial strain of managing expired domains. Registries like .ie can offer predictive insights to support registrars, helping them target these at-risk domains more effectively and improve renewal rates.
  • Risk mitigation: High-value domains are vulnerable to drop-catching when they lapse. Accurate renewal predictions help mitigate this risk by ensuring valuable domains are renewed, safeguarding both the domain portfolio and business stability.
  • Strategic planning: Predictive models help registries and registrars identify domains at risk of expiring, enabling targeted renewal campaigns and proactive customer outreach. This strategic foresight supports financial planning, improves renewal processes, and ensures better anticipation of future cash flows.
  • Enhanced customer experience: Predicting renewals helps prevent service interruptions for domain owners, maintaining reliability and trust. Ensuring timely renewals strengthens customer satisfaction and reinforces the registry’s reputation for dependability.

Why we are doing this project

To address the challenges of domain renewals and better support Ireland’s digital growth, we launched the Domain Renewal Predictive Model project. Our main objectives for this initiative are:

  • Identify key factors: Understanding the critical factors influencing domain renewals and deletions, including registration patterns, usage behaviour, and external influences.
  • Study behaviour: Analyse the behaviour of both registrants and registrars to better anticipate whether a domain will renew or lapse, focusing on decision-making trends.
  • Proactive issue management: Develop strategies to identify potential issues early, allowing for timely interventions to prevent domain loss and improve retention rates.

What data we have to share

Our domain renewal prediction model hinges on analysing a wide array of data that helps paint a clear picture of the factors influencing renewal behaviour. The data we utilised for this project included:

  • Domain registration details: Information such as domain age, renewal history, creation dates, and patterns around when domains were initially registered.
  • Technical attributes: Factors like DNS magnitude (domain traffic), SSL certificate usage, and security protocols such as DMARC (domain-based message authentication, reporting and conformance).
  • Registrar behaviour: We also examined patterns in how registrars manage their portfolios—an essential component, as registrars handle pricing and customer communication directly.

The initial stages of data collection revealed several challenges, particularly regarding incomplete data on registrants, and limitations of our direct access to certain variables managed by registrars. However, we managed to gather enough data to move forward.

Our approach: How we tackled the problem

Armed with a broad set of data, we began to develop our predictive model using machine learning techniques. Our goal was to explore every possible factor that could influence whether a domain would renew, and through rigorous analysis, refine our model accordingly. Here’s how we approached the task:

  • Data collection, cleaning and preparation: After gathering the necessary data as mentioned above, we discovered that data from various sources was often inconsistent or incomplete, which meant we had to invest considerable time in cleaning and organising it to get a clear picture. Cleaning and standardising this data turned out to be a far bigger task than anticipated.. This process taught us the importance of having robust data governance practices in place, including clear data provenance, before starting such a project.
  • Feature engineering: This phase involved creating “features” or data points that we believed would help predict domain renewals. For example, we looked at how old a domain was, whether it was linked to an active website, and if it used certain technologies like content management systems (CMS) or e-commerce platforms.
    However, not all features proved to be as valuable as we anticipated. Some elements that we initially thought would be crucial, didn’t significantly impact our model’s accuracy, while others we hadn’t considered turned out to be quite important. This experience underscored a crucial lesson: our own biases and assumptions about what might be important can sometimes mislead us. The real test lies in rigorous testing and validation. This process taught us that translating our intuition into predictive models requires careful examination and an openness to adjusting our approach based on what the data reveals, rather than relying solely on our initial expectations.
  • Model selection & testing: We then experimented with different machine learning models to see which one would predict domain renewals the best. It was a bit like trying out different recipes to see which combinations create the best dish. We tested algorithms like Random Forest, Logistic Regression, AdaBoost, and XGBoost. Some algorithms that are generally reliable didn’t perform as well as expected in our specific context, teaching us that there’s no one-size-fits-all solution in machine learning.
    We followed the standard practice of splitting our data into training and testing sets. However, the key insight came from closely tracking metrics for both sets to avoid overfitting. By monitoring performance on both the training and test data, we identified when certain algorithms were overfitting—performing well on the training set but poorly on unseen data. This allowed us to make adjustments and better manage the issue.

Key findings

Despite the challenges we faced, several significant discoveries emerged that shed light on the domain renewal landscape:

Key factors influencing domain non-renewals: Analysing domain renewals from November 2022 to October 2023, we explored 292,410 domains with an 83% renewal rate. Our goal was to identify key characteristics that predict whether a domain will be renewed or not. Several crucial factors impact domain renewals according to XGboost, one of the many experiments:

  • Creation month & age: Older domains (10+ years) and those created in specific months are more likely to be renewed.
  • Redirection status: Domains that redirect from unsecured HTTP to HTTPS are less likely to be abandoned.
  • DNS magnitude: High DNS traffic makes domains much less likely to be deleted compared to low traffic.
  • Security measures: Domains without DMARC, SPF, or DKIM records are significantly more likely to lapse.
  • Content & CMS usage: Low-content domains and those not using a Content Management System (CMS) face higher deletion risks.
  • Domain length: Domains longer than 10 characters have a slightly higher chance of being deleted.

Boosting algorithms as favourites: We found that boosting algorithms, such as AdaBoost and Gradient Boosting, were particularly effective in our scenario. These models outperformed others by enhancing the prediction accuracy through iterative refinement, making them the preferred choice for our domain renewal predictions.

No clear patterns in failures: Initially, we found that failures in predictions lacked clear patterns, suggesting that non-renewal reasons were complex and varied. However, as we delved deeper, certain patterns began to emerge. For example, domains with low DNS traffic, missing security features, or those failing to use CMS were more likely to be at risk.

Feature engineering and evaluation metrics

During this process, we spent most of our time in two key areas: Feature engineering and evaluation metrics. Both were crucial to developing an effective domain renewal prediction model, and here’s how each contributed to our process:

Feature engineering: This phase involved creating and refining the data points used by our model. We categorised our features into four main groups:

  • Domain-related: Attributes such as domain age, registration date, and the specific day and year of registration.
  • Registrant-related: Information about the registrant, including the type of registrant.
  • Website-based: Features indicating whether the domain had an active webpage.
  • Technical: Details such as SSL certificates, measure of domain usage based on authoritative DNS traffic, and TLS settings.

We initially worked with 20 features but expanded to 38 as we refined our approach. This included extensive brainstorming sessions to identify useful features while navigating challenges like GDPR restrictions and on-boarding a new system which included new domain life cycle. We employed advanced techniques like Named Entity Recognition (NER) and Part-of-Speech (POS) tagging to analyse domain names. Despite these efforts, the addition of more features did not always result in significant performance improvements. Some features that we expected to be critical did not substantially impact accuracy, while others turned out to be unexpectedly important.

Evaluation metrics: Initially, we used balanced accuracy, a metric designed to consider the balance between classes and provide a more comprehensive view of model performance, especially in imbalanced datasets. Given that our data was highly skewed — with approximately 83% of domains renewing and only 17% not renewing — Balanced Accuracy seemed like a logical choice.

However, as we delved deeper, we realized that balanced accuracy didn’t fully capture the intricacies of our predictions. The imbalance in our data meant that the model could achieve high balanced accuracy simply by predicting the majority class (renewals) most of the time. This limitation highlighted the need for more nuanced metrics to truly understand our model’s performance.

We then adopted the Matthews Correlation Coefficient (MCC), which provides a more balanced view by evaluating all categories of the confusion matrix: true positives, true negatives, false positives, and false negatives. MCC offered a clearer understanding of our model’s performance, particularly with class imbalance.

Subsequently, we introduced the F-score, which combines precision and recall into a single metric. This adjustment was strategically aimed at enhancing our ability to predict non-renewals accurately—an important aspect for making informed domain retention decisions. Tracking the F-score alongside MCC enabled us to better understand the trade-offs between identifying renewals and minimising false positives.

By concentrating on these two metrics, we improved our model’s accuracy and gained valuable insights into its performance. This holistic approach highlighted the importance of both thorough feature engineering and nuanced evaluation to meet our business objectives and enhance domain renewal predictions.

Setbacks: What we learned

Our journey in developing the domain renewal prediction model was filled with valuable insights, especially when tested on unseen data. While the model was never deployed in a live environment, the testing phase uncovered several key challenges, ultimately leading to significant learning opportunities.

When evaluated on unseen data, the model consistently predicted all domains as renewed. This result highlighted several critical issues:

  • Over-fitting: The model performed well on training data but struggled to generalise to unseen data. It had become too attuned to the specifics of the training dataset, which was heavily skewed toward renewal cases.
  • Imbalanced data: With around 83% of domains renewing and only 17% not renewing, the dataset imbalance led the model to favour renewals, failing to predict non-renewals effectively.
  • False positives: The model’s tendency to predict all domains as renewed led to a high number of false positives. While it correctly identified many renewals, it failed to recognize domains that were at risk of non-renewal. This was problematic for making informed decisions about domain retention and resource allocation.
  • Model evaluation: The initial metrics we used, such as Balanced Accuracy, did not fully capture the model’s limitations. Balanced Accuracy failed to address the nuances of class imbalance and did not highlight the model’s poor performance on non-renewals.

Lessons learned:

  1. Handling class imbalance: Balancing datasets perfectly is mostly impractical. We tried undersampling and oversampling but saw limited improvement. Instead, we found that using appropriate metrics and selecting algorithms suited for imbalanced data were key. This approach led to a more accurate assessment and better handling of class imbalance.
  2. Evaluation metrics: Relying solely on Balanced Accuracy was insufficient. Incorporating additional metrics, such as the Matthews Correlation Coefficient (MCC) and F-score, provided a more comprehensive view of the model’s performance. MCC helped us understand the model’s ability to handle imbalanced data, while the F-score allowed us to focus on improving the prediction of non-renewals.
  3. The real world is messy: Even after careful planning and testing, our model didn’t perform perfectly when tested against the unseen data. This was a valuable reminder that models need to be continuously monitored and updated to remain effective.

Conclusion: Our future plans

While we’ve made substantial progress, we know there’s more work to be done. Moving forward, we plan to:

  • Expand data sources: Seek additional data sources to enhance both the quantity and quality of our data.
  • Enhance features: Develop and refine features to gain deeper insights.
  • Adopt advanced methodologies: Utilise tools like PyCaret to streamline model development and experimentation.
  • Collaborate and innovate: Engage with stakeholders to uncover business insights and explore new areas which are uncovered.

It’s been a very enjoyable and enriching experience, and we’re delighted to share that there’s much more to the story. Since we can’t fit everything into one blog post, we’ll be following up with more posts that delve deeper into our journey and insights. Stay tuned for more updates as we continue to explore and uncover new facets of domain renewal prediction!

Conclusion: Continuing the journey

This project has been a challenging yet rewarding journey in domain renewal prediction. We’ve learned valuable lessons from analysing critical features and refining our models, and we’ve made significant progress in understanding the factors that influence domain renewals.

Our exploration of various metrics and feature engineering has deepened our insights, and we continue to work on achieving robust predictive capabilities. Along the way, we’ve also challenged our own biases and understanding of the business, pushing the boundaries of what we know.

Access all our blogs here

As the trusted national registry for over 330,000 domain names, .ie protects Ireland’s unique online identity and empowers people, communities and businesses connected with Ireland to thrive and prosper online. A positive driving force in Ireland’s digital economy, .ie serves as a profit for good organisation with a mission to elevate Ireland’s digital identity by providing the Irish online community with a trusted, resilient and accessible .ie internet domain. Working with strategic partners, .ie promotes and invests in digital adoption and advocacy initiatives – including the .ie Digital Town Blueprint and Awards for local towns, communities and SMEs. We provide data analytics and dashboards built by the .ie Xavier team to help with data-led decision-making for the public, registrars and policymakers. The organisation is designated as an Operator of Essential Services (OES) under the EU Cyber directive, and we fulfil a pivotal role in maintaining the security and reliability of part of Ireland’s digital infrastructure.

About .ie

How to register a .ie domain

Search for a .ie domain

Social media links