Architecting a data platform

2024-10-01

When a company first build their data platform, the focus is often on quick solutions that meet immediate needs rather than designing an architecture that can scale. It’s understandable — startups move fast, and data infrastructure can seem secondary to just getting things done. An introduction to a post on dlthub blog expresses that perfectly well:

“Companies that build their first data stack usually start with a small investment — so it is unlikely they start with a senior team and the vision or resources needed to build a platform. Due to this, it’s more likely that the team will start with something that works and go from there. (…) This is far from a consistent, sustainable way to manage data pipelines or data flows”

Adrian Brudaru

This is, of course, true. We have seen many companies with data pipelines built by a Python programmer, a data analyst, an aspiring data scientist or a junior data engineer. They had probably been quick and cheap to build and we believe they had done the trick at the time they had been developed. But on the longer run there were neither extensible nor scalable. In exchange, they were also not reliable and difficult to maintain.

Quick fixes won’t cut it for long

Naturally, as the company grows, not only the volume of the data processed grows, but also the demand for the information increases and there appear more and more data use cases. These ad-hoc stitched data processing systems cannot cope with the changes. They were not designed for that. Actually, often they were not really designed at all.

Very often the initial data stack has to be thrown away and completely reshaped to handle the new use cases and to match the grown organisation. This is a difficult decision to make and it is typically postponed until the problem aggravates. And maintaining the old system — which was not built for low-maintenance — while building a new data platform is the task which overwhelms most of the data teams. There is a high risk the data operations would be hindered for many months. Moreover, this would happen exactly at the time the demand rapidly increases. It is simply a risk for the business.

And we don’t blame the junior engineer or analyst who built the initial pipelines. They were building for “here and now” and due to limited experience they could not fully foreseen how the demand would evolve over time. They did the best job they could, taken into account their level of expertise. It is just that their capacity to come up with a robust, scalable and extensible architecture was limited.

The other way

It does not have to be like that. You can get a proper data architecture from the beginning. It does not mean you have to build a fully fledged system from the day one. This is more about making a few key decisions on how the destination system would look like, which parts you have to prepare now and which can wait. One still can (and should) cut some corners, but this should be a conscious decision.

And you don’t need to hire a full time data architect to design the data platform — something which can be prohibitively costly at the early stage. There are ways to approach this more strategically, without over-committing resources. The key is to balance immediate needs with long-term growth, recognising that not everything needs to be built all at once, but the parts you do build should be capable of scaling when the time comes.

One way to achieve this is by bringing in experienced guidance early on. You can get a Fractional Head of Data to help you building both your data team and the data architecture. By doing so, you ensure that from the start, your systems are designed with growth in mind — modular and scalable. Designed not only for the existing needs, but taking into account your company vision for the future. This kind of foresight allows you to avoid a complete rebuild later, which is often disruptive and costly. Instead, you can scale it piece by piece when necessary. Or replace the individual pieces without the need to redesign the whole system.

What about the team?

Moreover, the Fractional Head of Data can guide you in hiring the right talent as your company grows. If you are just starting building your data team then very likely there is absolutely no one in the company who can properly predict the future needs of the data team and assess the technical skills of the candidates. Cases where the founder or the CTO has recently moved from e.g. a Director of Data position and actually have the domain-specific expertise are rare.

So how do you make sure that you are hiring the right people? The external Head of Data ad interim naturally understands the needs of both the present system and its future evolution. He can ensure that new hires — whether engineers or analysts — are well-suited to support and expand the existing infrastructure. This not only reduces the learning curve for new team members but also helps avoid a scenario where the team’s skills don’t align with the architecture of the new system.

Building it smart

It’s quite likely that the type of workload while building the system will be very different from the typical workload when operating it. So may be the set of skills necessary to be successful. There would be quite some effort in data architecture and data engineering field at the beginning, while analyst would have little to do. Once the system is closer to completion and operates without major hiccups the focus may shift towards analytics, engineering effort being limited to the standard maintenance and occasional enhancements.

It would make sense to get one or two extra data engineers, on a contract basis, at the beginning of the project. Not only are they able to provide more development capacity. Most importantly, they bring in the expertise and skills you may not yet have in the team. Ideally, they should be able to stay for longer, with much reduced engagement, to train your new hires and help out with the maintenance until your team is fully capable to take it over. This ensures smooth transition and it is one of the reasons we offer Data Engineering as a Service.

Building it right without breaking the bank

We understand that early-stage companies need to be mindful of resources. Building the data platform perfectly from day one isn’t always realistic, and that’s OK. What’s important is having a clear idea of what the system should eventually look like, and making sure that the early choices — whether in design, tools, or processes — don’t create barriers to growth later on. This allows you to make the most of your initial investment, without needing to constantly rework or replace your early efforts.

By carefully considering how your data needs will evolve, you can avoid the “quick fix” mentality that often leads to headaches down the road. Whether it’s laying out a long-term data strategy or building efficient, scalable pipelines, the right choices early on will save you time, money, and effort as your company grows. The data strategy is not about predicting all the potential future use cases and requirements. It’s enough to foresee some of them and acknowledge the fact you will have to change some parts of the system on the way. Build it the way that makes the changes easy to do without breaking anything.

And it doesn’t have to be overwhelming. With access to the right expertise, even on a part-time basis, you can build systems that serve your current needs without compromising future growth. This approach allows you to move forward with confidence, knowing your data platform will be able to keep pace as your organisation scales.