Scaling Horizons: Navigating High-Traffic Challenges with createIT’s Tech Visionaries

In this exclusive interview, we delve into the intricate world of scaling websites for increased traffic with Alex Fredrych, Co-founder & CTO, and Bartek Borkowski, Co-founder & CSO of createIT. These seasoned experts shed light on the multifaceted challenges and innovative solutions in the realm of web development and performance optimization.

What are the most common technical challenges encountered when scaling websites to handle increased traffic?

Alex Fredrych (AF): Before I delve into the technical details, in my opinion, the biggest challenge is managing scaling comprehensively. Grasping all the technical and business aspects is a kind of challenge that requires preparation for both the application and the organization itself.

First and foremost, we think about it too late – usually when the website is already experiencing performance issues, and we are losing not only traffic but also customer trust. Some of the problems can be temporarily solved by scaling the infrastructure, if our hosting allows it.

Then the quality of the code comes into play. Previously, retrieving, for example, all users into memory and searching for necessary information was not a problem, but now the memory instantly runs out. Fragmented code optimization often introduces new errors, so an optimization plan as well as a detailed testing protocol should be an integral part of the entire process. In any platform that experiences an increase in traffic and, consequently, more challenging updates, so-called “0 downtime deployments” should be implemented. Additionally, Application Performance Monitoring (APM) tools such as NewRelic or DataDog are very helpful in diagnosing performance issues.

Another challenge is the application servers and CPU and RAM usage. We distinguish two types of scaling here. Vertical scaling, which involves expanding an existing server with additional processors (CPU) or RAM, and horizontal scaling – adding more servers with similar configurations. While vertical scaling is an easier option because it does not require additional changes to the application architecture, one must be aware that it is a temporary solution. As the application grows, which is everyone’s goal, the problem will return with multiplied force. Vertical scaling brings a number of other challenges, including load balancers, distributed session management, distributed file systems for uploads, distributed caching, centralized logging and monitoring, and the implementation of security policies for all application layers and infrastructure. Therefore, before starting this process, it is worth checking if we have the necessary expertise for both the implementation and the subsequent continuous management of an increasingly complex system.

Additionally, as the traffic increases, the application will perform more background queries asynchronously – from normalizing uploaded files to generating reports. Another natural consequence of application development is the need for mechanisms such as queues to distribute tasks and handle any exceptions. As we grow, we must ensure that our 3rd party providers are also able to handle our queries in a timely manner. This is also the moment to plan for graceful degradation, meaning designing the application in such a way that even in the event of component failures, the platform continues to function. We employ this strategy not only for external services but also for internal ones.

Remember that we will gain much more by employing a mix of different optimization techniques rather than focusing solely on one.

How can modern technologies help in managing the performance of websites with high traffic?

AF: Modern technologies, in general, streamline and optimize the management of website performance. Currently, the most popular solutions in this area are cloud-based solutions. When properly configured, the cloud enables virtually unlimited performance scaling. However, in practice, companies often opt for a hybrid approach, which combines cost-effective operation on physical servers with autoscaling. This approach allows for flexible adjustment of infrastructure to varying loads. Autoscaling itself is one of the key tools that optimize costs and eliminate the risk of resource overutilization.

In addition, performance monitoring tools such as New Relic allow for the identification of issues and performance optimization, resulting in better user experiences and more effective marketing actions. With this approach, systems can effectively respond to dynamic traffic changes and ensure stable operation even in the event of significant increases in user activity.

Which methods can be employed to ensure the security of user data during an increase in website traffic?

Bartek Borkowski (BB): The increase in website traffic brings not only benefits but also challenges, especially in ensuring the security of players and company data. As the platform becomes more popular, the threat grows, and attacks become more sophisticated. Small and medium-sized companies are usually vulnerable to generic attacks focused on specific technologies or platform types. However, as a casino grows, it becomes more attractive for targeted attacks, both technical and socio-technical, aimed at both players and employees.

Defending against such precise attacks and building effective security measures is a comprehensive process involving not only actions at the application level but also organizational and administrative aspects. It is worth noting that higher website traffic means greater consequences in the event of service failure or unavailability. Therefore, systematic monitoring of costs associated with service downtime is essential to understand the criticality of platform stability.

It’s also important to consider security not only in terms of intended attack on the platform. It is advisable to spread infrastructure across several servers, for the sake of ensuring the stability of the entire system. Implementing such strategy minimizes the risk of entire platform’s downtime in case of one server room failure due to major and unexpected events such as earthquakes, riots, or changes in government policy. Equally important for security is organization and procedures.

Comprehensive security procedures covering the entire organization, regular training of employees on socio-technical threats such as phishing, and routine procedure reviews are necessary. Technically, it is important to implement solutions such as Web Application Firewall (WAF), Demilitarized Zone (DMZ), Firewall, Load Balancers, ACL, and regular exercises to test the feasibility of backup procedures. The proper server architecture, including continuous updates and maintenance, also allows effective management of the risk associated with failures and ensures uninterrupted operation in the event of an attack.

How can cloud computing technologies support scaling and managing high traffic on websites?

AF: Cloud computing technologies can support scaling and managing high website traffic. It is the answer to the rapidly changing market and increasing traffic. Currently, we receive numerous tools from providers to solve similar problems, each with its own limitations. We will discuss them based on Amazon Web Services (AWS) solutions, but similar services can be found with other providers such as Google Cloud Computing (GCP) or Microsoft Azure.

Currently, there are two popular directions for building scalable solutions: Kubernetes and serverless infrastructure.

Kubernetes, also known as K8s, is an open-source system for automating the deployment, scaling, and management of containerized applications. Applications are launched from previously prepared containers, which contain all the required libraries and dependencies, eliminating any differences between the development and production environments. Additionally, applications can be installed on public/private clouds or in hybrid environments.

Kubernetes can automatically scale pods (containers), control CPU and memory usage. In case of pod failures, it can restart them and ensure they are running correctly. The built-in load balancer ensures deploying the application on the necessary number of pods by automatically moving them to servers with sufficient resources. It can also automatically add and remove servers from its pool. The entire Kubernetes configuration is stored in YAML files, making it possible to set up a new cluster in a matter of minutes – it can be a cluster with a different provider like GCP or bare metal. Therefore, vendor lock-in, which many companies fear, is not an issue here.

An alternative is Serverless, a cloud-native development model that allows developers to build and run applications without managing servers. In this case, the cloud takes care of scaling our application and ensuring that it has enough resources. Scaling usually happens instantaneously, allowing us to handle any amount of traffic. Additionally, in theory, we only pay for the resources used, such as CPU time or memory, so an application with lower traffic can be much cheaper to maintain compared to traditional hosting.

Serverless has its limitations, such as time execution limits, memory consumption, cold start (if no one is using the application for some time, it may take a few seconds to wake it up), and more difficult debugging. Migrating applications to Serverless is also more challenging than migrating to containers. It also involves vendor lock-in, so awareness of this aspect is crucial when adopting this technology.

In projects, it is common to use a combination of Kubernetes to handle standard requests and Serverless for smaller and faster asynchronous tasks (e.g., handling tasks in queues). This approach allows us to handle unexpected peaks at relatively low cost, fully automated, without delays associated with scaling Kubernetes infrastructure.

Once we have decided how our application will be deployed, the cloud can also assist us with managed services, where we can utilize a service without having to manage it:

Database – for example, RDS supporting read replicas, autoscaling, deployments between regions, and ready proxies.
Queues – SQS – capable of handling millions of messages per minute, a distributed system across multiple servers, with ACLs.
Notifications – SNS – enables notification of thousands of channels simultaneously, including email, SMS, and push notifications.
Data – S3 – unlimited, secure object storage, with immediate data replication in other regions.
Computing – EC2 – server instances suited for various needs, whether CPU or GPU intensive.
Big Data – Amazon Redshift, Amazon Athena, Amazon Kinesis, and much more.

How can website traffic be effectively monitored and analyzed in order to optimize performance?

BB: There are many ways to effectively monitor and analyze website traffic. We typically encounter two dimensions of analysis.
The first is conducted using analytical tools such as Google Analytics 4, Hotjar, Microsoft Clarity, which provide insights into user behavior. They reveal what users are looking for, where they are searching, or if they are following an optimal behavior scenario within the system. Importantly, they also provide information on which elements need improvement to enhance efficiency and help players achieve their goals faster.

The second dimension of analysis is load analysis. We examine the traffic pattern and assess which components of the system are under load, then optimize them. Here, it’s worth mentioning two popular performance optimization methods: caching, which reduces the number of database queries to make the website faster, and implementing a Content Delivery Network (CDN), where multimedia files are stored in multiple geographical locations to shorten the time it takes for users to receive data.

How is the role of UX/UI design changing in the context of high-traffic websites, and what are the best practices?

BB: In the case of high-traffic websites, the role of UX takes on additional significance. It is no longer just about designing an interface that allows users to achieve their goals in the simplest and fastest way possible. It also involves ensuring the computational efficiency of the system.

If generating a certain piece of information for one user takes 0.1 seconds of processor time, with a million users, it adds up to over 27 hours. In this context, the UX process must be data-driven and closely aligned with developers and system administrators. Is the provided information crucial in that particular location? What is the ROI, meaning whether the cost of generating that information is worth the value it brings to the user?

Let us remember that UX processes should not be executed as one-time tasks. It is an ongoing process of observing user behavior on the website. We have seen cases where a seemingly “dead” – rarely visited and unoptimized – subpage suddenly starts generating significant traffic for various unrelated reasons. Perhaps such a page should be handed over to developers for optimization? Or maybe it needs to be divided into smaller sections? Perhaps other parts of the website need to be revamped to reduce its popularity once again?

Considering createIT’s experience in the iGaming sector, what unique challenges does this industry face in relation to increased website traffic, and how do companies tackle them?

AF: As mentioned before, increased website traffic leads to a higher risk of targeted hacking attacks (both technical and social engineering). Only comprehensive solutions at technical and organizational levels can help mitigate these risks.

A significant portion of iGaming operators’ websites operate on white-label solutions. While not criticizing these solutions, which can be technically sound, it’s hard to ignore that most are not fully prepared for sudden surges in popularity of the launched casino or betting service. Basic design and hosting mistakes, made by white-label creators, still have consequences affecting casino operators and players. In such situations, companies migrate their services to other engines, including full-custom solutions.

An interesting issue is the scalability of the architecture. Some jurisdictions mandate hosting the entire system or parts of it on machines physically located in a specific geographic area. However, it may not be obvious to everyone that such services can still be scaled using conventional cloud solutions (e.g., AWS) by allocating certain aspects of the system there.

How does createIT’s philosophy of “createITzation” influence the company’s approach to problem-solving in this field?

BB: The foundation of our philosophy is the combination of skills and experience with unique personalities that make up the team. The company’s personality allows us to have better, closer communication with clients and thus a better understanding of the problem and, in fact, defining the expected outcomes. The iGaming sector is one of the key markets we operate in, and we have a dedicated team that quickly adapts solutions to the problem. Their experience also allows us to think technologically a few steps ahead.