proxmox

YOU Review MY HomeLab and Colocation Architecture!

YOU Review MY HomeLab and Colocation Architecture!

#Review #HomeLab #Colocation #Architecture

“Techno Tim”

After moving some of my HomeLab servers into the new colocation I have so many choices to make when it comes to self-hosted services and architecture! From networking, to VPN, to security, to hypervisors, to backups, and even DNS! I NEED YOUR HELP! Help me decide if I have created a solid…

source

 

To see the full content, share this page by clicking one of the buttons below

Related Articles

36 Comments

  1. I am actually building a nas that I am colocating so that I have proper 321 for multiple networks I manage… so I’ll have a few sites connecting to each other. I just have a nas at home (and on several other networks) that can backup to the nas in the data center. I’m actually adding duplicates of services for high availability for some DC’s, etc. vs the migration you are doing.

    I like the firewall on opnsense/pfsense much more than unifi, I moved away from my dream machine. Some networks I’m using site to site with are still unifi though and it depends, unifi is fine for me until I start hitting more advanced features.

  2. Hey Tim. Thanks for the video and the content.

    I personally would be interested in how your storage of the pve clusters looks like. Do you use ceph/clusterfs or replication with zfs pools ? If so, how do you deal with asynchronous mirroring?

    Thanks for everything and have a nice sunday

  3. I would definitely make your infrastructure so that the colo is your primary "production" environment with no reliance on your home infrastructure. The likelihood of your colo going down is very low, while your home infrastructure has a higher risk. You dont want your colo infrastructure to be degraded because your home internet went out, otherwise you lose out ok the benefit of having the colo.

    I would also recommend creating a "jump box" of some kind in the colo and install tailscale at least on that. That way if your site to site goes down (because that definitely happens a lot) you habe a secure "back door" in to your environment to repair the connectivity without having to load upand go to the data center.

    – this is all coming from a guy who has worked as a systems infrastructure engineer with multiple datacenters for awhile now.

    I'm definitely going to look more in to GitOps, because that looks slick!

  4. I think a deepdive into your IAC Workflow and file-structure would be great. 😁

    Witch OS do you use for Kubernetes? I experience some stability issues on Ubuntu.

  5. You should configure the NFS Shares for your Backups so, that the pve hosts can’t delete their backup, or do at least snapshots of them, witch the host can’t access. Elsewhere, if someone compromised your pve hosts he deletes the backups and that would be fatal.

    The ideal option would be to use the Proxmox Backup Server and maybe pull with a other backup server the backups, because than you have 3 Locations that hold your backups, but can’t delete backups from the other system. (Each system should have other ssh-keys and passwords etc.)

    But nice Project. 👍🏼 Because of your Channel I dived into Kubernetes and some other thinks. 👀

  6. Hey Tim, very cool video, and interesting that you are taking feedback!

    I thought I would throw a comment here and offer my help to you.

    My background is in enterprise networking, and I have some experience with colo's and on best practices around the subjects of site to site VPN's and network segmentation.

    In your colo, I would recommend having a little more segmentation. The VLAN's use cases I would separate are:

    1. Public facing (Reverse Proxy/Load Balancers)

    2. Application Servers (Front ends)

    3. Supporting Servers (Backends like mysql, postgres, etc)

    4. Management Segment (Any management traffic plus management gui's such as proxmox)

    5. VPN/Semi-Trusted (this is the zone that bridges locations and should be treated almost like a semi-trusted WAN)

    At your home, I would keep a lot of your current segmentation (camera, iot, guest, personal devices, etc) and add the ones above (minus public facing since you don't intent to have that at home).

    For the site to site VPN, I would not migrate this to Tailscale. I like Tailscale, and Twingate as a convenience layer for mobile access to services, but those Unifi devices will be much more efficient and reliable on providing a site to site VPN for you. The firewall on this VPN network should be locked right down to the bare minimum (backup traffic, and maybe access from your trusted home devices to your colo management network). I would even go as far to say that if you surf to your own website in your colo, you should be going over the internet instead of that VPN connection as every traffic flow you allow through there is additional attack surface for lateral movement.

    Lastly, it looks like your firewall rules could be slimmed down a bit. Without knowing what all those rules are and why you have them, its hard to say for sure, but I run a pretty locked down environment security wise, and you have comparably alot of ACL rules.

    Happy to jump on a discord call if yea need a hand with any of this, but those are my recommendations as a general DC design.

  7. Looks like it's been a while since there was a wholesale "wipe, no chance of recovery, no restore from backup, recreate/re-engineer only the needed parts, from scratch" either literally, or as a thought exercise.

    While it's certainly understood (and expected) that you play with, toy with, engineer and demo quite a bit, that's adding a lot of complexity, so the "whoops, all burned down" can be a good mental filter to strip and refine.

    The other thing I didn't see highlighted is engineering around local<>-remote failure assumptions/expectations. Example:
    When you have two routers and 3 (or more hops) between home and remote for something like home etcd for a remote k3s/k8s/whatever, that's adding 4+ potential over-network failure points.

  8. For backing up your colo servers, I would absolutely back them up to another cloud service and then have a machine on your home network pull those down for a copy once complete. I only have a couple of servers in the cloud, and they back up encrypted files to an object storage provider, which a server on my home LAN pulls down periodically for storage here. I only have the keys on the source (for encryption prior to upload) and my home systems (for decryption after download). In my opinion, having your colo system only able to connect to your home LAN where it is absolutely essential is what I would aim for. So if one of those gets compromised, the possiblity of someone pivoting into your home network is as low as practical.

  9. I want to setup some sort of overlay VPN like tailscale or some sort of site to site thing like you have going on. Only downside is that from the little research I've done, there don't seem to be good options for configuring that stuff as infrastructure as code.

    I tried getting headscale setup at one point, but their only recent releases are alpha releases. It sounds like you're gonna go with tailscale instead of headscale, but I wondered if you considered headscale at all.

  10. How do you like Longhorn? I use OpenEBS on my home Raspberry Pi Kubernetes cluster (3x) worker nodes and (1x) master. The big problem with OpenEBS how I have it configured in openebs-hostpath is that pods have to be created on the original worker node it was provisioned on. I.E. no support for migrating pods with PVs across worker nodes.

  11. I think you are adding to much risk using your vpn as the security layer to your home data services.

    I would try to restrict access from colo to home with an Nginx proxy or full flesged Kong gateway. Kong provides more auth plugin option but is a lot heavier

  12. I needed something like this. I am currently designing my network that can scale and this video has helped me see certain things i overlooked. Thanks for making a this amazing video.

  13. Geezus Peezus!
    Holy Cow time, you have more compute power than Heinz has Pickles!

    I'm not sure what country, but there is a country in the world that has less set up!

    Very Amazing, thank you for sharing!

  14. Hopefully you don't run those Databases inside Kubernetes.
    If you do keep the following in mind:
    – Is my application really HA? (Zero downtime upgrades)
    – Does the DB gets rebuild/restartet when the application gets updates.
    – How can I scale my Stateful Set. (Dublicated DBs etc.)

  15. Very nice setup. Almost like mine I just dont have anything in a datacenter. I use argo for the cd paired with my own gitea instance as I can see everything at a glance via UI from wherever I am.

  16. I really hope that you look into Defguard as an IDP. First class support for WireGuard VPNs too! Should be easy enough to integrate with plenty of providers.

  17. As always an interesting video!!!. I have more background on Network and I want to share some tips:

    – Don't over engineer creating additional VPNs with Tailscale. Makes troubleshooting more complicated and figuring out what system / apps goes via which Remote VPN.

    – You have 2 macro network segments (Home and Public) but I think you need to be more granular on your segmentation inside of each site (Home and Public).

    – Define simple rules for network flows based on these additional segments: An idea is all directly internet facing is the red zone (i.e., the interface of your UDM Pro), All traffic that is accessible from the open internet should land on a DMZ (i.e., Yellow Zone), and all the internal traffic should be on your internal zone (i.e., Blue Zone). Using color coding will help you identify easily the type of security controls and also what and where to put workloads (and why).

    – Traffic from Red to Blue is not allowed directly (except for your Remote Access when you are on the street and want to connect to any of your sites). That means that any other traffic that comes from the internet should terminate in your DMZ (Yellow Zone) on a Proxy Server (i.e., your Traeffik or NGNIX) running CrowdSec Bouncer for extra IPS security.

    – Any traffic into your Internal / Blue Network should be allowed from the IPs of your Proxy Server ONLY.

    – Traffic from your Internal Network (Blue) is allowed to any other zone (Yellow and Red) without additional security controls.

    – Sensitive Data should be stored only in your Blue segment(s).

    – Use different VLANs (at least 1 per Zone: like for Yellow, one for Red, one for Blue) and any traffic between VLANs should be inspected by a Firewall (i.e., UDM and/or even a virtual one like PFSense or OpenSense) following the rules above.

    – Replicate this zonification in both Home and Public.

    – Make each site a DR of the other site (Depending on your end goals).

    – Identify what data / workloads are critical so you get backups locally in each site and also in the remote site.

    To follow these simple segmentation rules will probably trigger the need to have at least 2 set of clusters of K3S/R2K per site: 1 for the Yellow Segment (DMZ) and for the Blue Segment (Blue) to honor the security flows based on the security zones and allowed flows.

    My 2 cents on this.

  18. Personal opinion here, ubiquiti is not ready for prime time deployments once you start using a colo a lot of proper solutions become open to you. BGP being the biggest for fault tolerance for your layer 3 connectivity.

    It seems to me like you are looking to get into HA and always available services but your most critical piece of hardware is not fault tolerant.

    When you colo you have redundant power, you can get multiple wan connections so one going down doesn't take your network offline. You can have your own ASN / public address space.

    Personally in your use case i'd start looking at fortigate firewalls. As they are proper solutions to routing / firewall implementations that support true HA not a warm node like ubiquiti but true real HA. They have a tested and trusted BGP / OSPF implementation so you don't have to manage static routes anymore. You can create IPSEC tunnels using industry recognized standards.

    Your firewall rules become actually readable and understandable rather than an arbitrary lan out lan in that isn't well explained in it's implementation. Your firewall logs are actually useful for troubleshooting where on ubiquiti you get an iptable number instead of the actual policy name as well as what action was taken where instead you have to go lookup what that iptable is doing. AND you get 24/7 support should you need it whereas w/ ubiquiti you are waiting god knows how long to get an answer.

    You have entered the realm of needing proper equipment vs pro-sumer with your change to a colo.

  19. Having done this for decades professionally, my initial reaction is there is way too much being allowed between "production" and "the office". Site to site vpn is convenient but in the modern design world in audited environmnets would not leave those nailed up. All access to the production site would be done via an individual workstation vpn that would be MFA and logged as an admin opening temporary access. No always on access to production. Normal day to day config changes and content updates would be handled by the CI/CD pipeline. Code builds would not be built in production, build env and soource repo not at production site or treated as a very very secured network there, if at all. Essntially the concept that is missing in the design is no malware or attack that is successful can migrate between sites. The number of connections between sites should be kept to an absolute minimum and for even you to get in, you must auth with MFA for that work session. I believe with as many moving parts as you have, a proper pentest would both find a way to attack something successfully and then utilize the nailed up links to attack the other site. Up to you, but we wouldn't be allowed that much full time access, even if it was more convenient. Carefully assess where you could get to if you successfully gained root on any device. Also consider asking your community if someone is a pro pentester (with proper business credentials) who would do an attack surface assessment either free or discounted if they're nice. Especially if any of the items hosted at the colo arepart of your business and in your business continuity plan to keep your business operating if say the entire home network got a malware/ransomware event. Your critical business items — it makes you money — should be heavily guarded and not connected to casual use networks whatsoever when not necessary. And never without logging that access and with MFA if at all possible. Cheers!

  20. Why NFS for Backups of proxmox and Not a dedicated proxmox Backup Server? You could even run it in truenas/proxmox as a VM. Big benefit is you can sync between multiple instances and the Backups are incremental.

    Also Automatic Verification and reverification for each Backup. Cheers Paul

Leave a Reply