Clean Architecture: Getting Started

First of all, define exactly what do you want to abstract, and most importantly, Why?
If there is no clear "why", don't do it!
It makes your code less readable and potentially slower.

The goal of abstraction is to minimize the cost of future changes by removing coupling between different parts of your project as much as possible.
An added benefit is that it makes your code more modular, which would improve the testablility of your code.

Repository Pattern

I will be now applying the clean code architecture on the database side of my project. This is a design pattern called Repositories Pattern, which is about decoupling the business logic and data access layers in the application.

As a result, and since there is no coupling between the dastabase logic and the implementation, you will be able to freely do changes in the future, like changing the database from MySQL to MongoDB for example without needing to do much changes on the other parts of the application.. which is awesome!

In the following steps we will be dealing with a simple database of users, where each user has an id, a name and an email.

Steps:

  1. Define the Entities

Entities are the core building blocks of your data model

from dataclasses import dataclass

@dataclass
class User:
    id: int
    name: str
    email: str

entities.py

  1. (Optional) Add basic data validation directly to the Entities

This is my personal preference. You can argue that it's better to keep the entities clean, and to add the data validation in another layer.

But I think that having data validation at the entity level means that if there is an invalid data we will immediatly have an exception, and this invalid data will not travel to another layer, it will be stopped on spot.

I'm talking here about basic data validation that does not require the import of any external package. Such validation should be done in a higher layer, not at the entity level.

Hereโ€™s how you could implement validation directly in the User dataclass using the __post_init__ method, which is a special method in dataclasses that runs after the __init__ method

from dataclasses import dataclass

@dataclass
class User:
    id: int
    name: str
    email: str

    def __post_init__(self):
        if len(self.name) > 10:
            raise ValueError("User name must not exceed 10 characters")
        if '@' not in self.email:
            raise ValueError("Email must contain '@'")

entities.py

Note that __init__ method runs behind the secenes. Which is part of the @dataclass decorator
  1. Define the Repository interface

Create an interface for the repositories that will handle database operations. This interface should be defined in terms of domain logic, not specific database operations.

from abc import ABC, abstractmethod
from typing import List
from entities import User

class UserRepository(ABC):
    @abstractmethod
    def add(self, user: User) -> None:
        pass

    @abstractmethod
    def get(self, user_id: int) -> User:
        pass

    @abstractmethod
    def list(self) -> List[User]:
        pass

    @abstractmethod
    def remove(self, user_id: int) -> None:
        pass

UserRepository.py

This interface is the blueprint that all databases services must adhere to, SQL NoSQL, simple json file, and whatever database you have.

  1. Create a custom plug adapter for the database
two white power adapters on white background
Photo by Call Me Fred / Unsplash

I call it a plug adapter because this is what it's practically doing, it couples the third party database methods with its unique namings and styles with our own interface.

As a database I will first use a simple .json file, here is the plug adapter for it:

import json
from typing import List
from entities.User import User
from repository.UserRepository import UserRepository
from dataclasses import asdict

class Json_repo(UserRepository):
    def __init__(self, file_path: str, table_name):
        print("initializing json DB")
        self.file_path = file_path
        self.table_name = table_name
        pass

    def _get_data(self):
        with open(self.file_path) as f:
            data = json.load(f)
            if self.table_name not in data:
                raise ValueError(f"Data in json file does not have the key '{self.table_name}'")
            return data
        
    def _write_data(self, data):
        with open(self.file_path, "r+") as f:
            f.seek(0)        
            json.dump(data,f, indent=4)
            f.truncate()
            return True
    
    def add(self, user: User) -> None:
        all_data = self._get_data()
        all_data[self.table_name].append(asdict(user))
        self._write_data(all_data)
    
    def get(self, user_id: int) -> User | None:
        all_data = self._get_data()
        students_list = all_data[self.table_name]
        for student in students_list:
            if student["id"] == user_id:
                return student
        
        return None
    
    def list(self) -> List[User]:
        return self._get_data()[self.table_name]

json_repo.py

Note how it implements all of the required methods in our UserRepository interface.

We can directly use it in our main script, but it's better if we add just one more layer above it ๐Ÿ˜…

  1. (Optional) Add another layer for additional functionality
from entities.User import User
from repository.UserRepository import UserRepository
from repository.json_repo import Json_repo

db: UserRepository = Json_repo("./repository/json_files/data.json", "users")

def add_student(student: User):
    db.add(student)

def get_all_students():
    return db.list()

def get_student_by_id(id):
    return db.get(id)

db.py

You are wondering what the hell is this useless layer for..

I believe that it's better to have separation of concerns when we need additional validations and configurations. Such functionality may not logically fit in the plug adapter script json_repo.py, therefore, we add this layer.

This db.py will be what we'll be using in our main application scripts to interact with the database.

  1. Use it!

Here we go, let's test it out:

from entities.User import User
from repository.db import add_student, get_all_students, get_student_by_id

if __name__ == "__main__":
    new_user = User(3, "ahmad", "ahmad@test.com")
    add_student(new_user)

    all_students = get_all_students()
    print(all_students)
    
    our_student: User | None = get_student_by_id(2)
    if our_student:
        print(our_student)

main.py

With this implementation we don't need to touch our main.py to do any database related logic.

Need to use a difference kind of databse? No problem, create a custom plug adapter for it like json_repo.py and import it in db.py

Conclusion


The principles and practices of clean code architecture, specifically through the Repository Pattern, can greatly enhance the maintainability and flexibility of your codebase. By clearly defining the purpose and necessity of abstraction, you ensure that your code remains readable and efficient. The decoupling of business logic from data access not only simplifies future changes, such as switching databases, but also makes the code more modular and testable.

Implementing entities with built-in data validation ensures data integrity early on, preventing erroneous data from propagating through the system. The Repository interface standardizes database operations, allowing for easy substitution of different data sources without altering the core business logic. Creating custom plug adapters, like the JSON repository example, demonstrates the practicality of this approach, enabling seamless integration of various storage solutions.

Finally, adding an additional layer for functionality above the plug adapter provides a clear separation of concerns, facilitating further customization and configuration without cluttering the main application logic. By following these steps, you can achieve a robust, scalable, and adaptable architecture that simplifies maintenance and fosters continuous improvement in your projects.

I love this! ๐Ÿ˜‹