Philip House

Encrypting Existing RDS Instances

Sat, 30 Dec 2023 12:00:00 -0600

It’s been a while since I last wrote, and life has gotten a bit busier since I last posted. My family welcomed a baby girl into the world, and we have since found out we have another on the way. There have certainly been things worth writing about, but none stood out so much as this topic.

Our company recently had to modify all of our RDS instances to be encrypted-at-rest for compliance reasons. While this is now the default in AWS, at the time when we started building our infrastructure (late 2017), this was not the case. Moving our large production instance with minimal-downtime was not as simple as we hoped, and required experimentation and cobbling together various sources of information and tooling.

The AWS documentation has a dedicated article for this topic, which does a great job of giving a high-level overview of what needs to be done. Unfortunately, there are some critical details in my opinion that are missing. I’d like to share our experience of what gaps we had to fill to get everything to work smoothly. I recommend doing this migration on a less-critical database to get some confidence and real-time experience before doing it in production. To give you some sense of time, the total migration time was about 5 hours, and application downtime lasted 15 minutes.

If you are a db admin, you will find this elementary. This is written for those of us that are used to RDS magic, but find ourselves having to make a more manual operational change than we’re used to.

Migration Overview
Detailed Walkthrough
Random Notes

Migration Overview

As I mentioned, I do highly recommend reading through the AWS tutorial above as it does give a good lay of the land. Just to briefly recap, here are the big items that need to happen:

Create a new snapshot of the source database. Copy the snapshot, and encrypt it.
Start a new target database with the encrypted snapshot.
Disable foreign keys and triggers on target database.
Setup a DMS replication task that replicates source -> target, continuously.
Once DMS task is caught up, shut off writes to the source database.
Re-enable foreign keys, triggers and sequences on target database.
Switch over DNS entry to new target database, resume as normal.

If you are not already familiar with DMS (AWS’s Database Migration Service), you’ll need to setup a replication instance ahead of time. You also need to make sure there are replication endpoints for both the source and target databases. I recommend testing the source endpoint before you start just to get it out of the way.

I’m also assuming you are using CNAME dns records in your application settings instead of the RDS endpoint directly. If you are not doing this, go ahead and set that up first, as it will allow you to reduce downtime and make it easy to cutover when the time comes.

Detailed Walkthrough

Some of these steps are straightforward, but some of them have a ton of configuration or confusing options to sort through. I’ll try and walk through each of them that we had to consider ourselves.

Step 1: Encrypting the Source Snapshot

Copying a snapshot and encrypting it is basic. The only thing to think about here is whether you want to use the default RDS KMS key for encryption, or a customer or self-managed one. The encryption checkbox is all the way at the bottom of the page.

Step 2: Creating the Encrypted Target Instance

Creating a new target database is straightforward as well, you just need to make sure you copy over all the settings over from the existing database. Make sure you use the same engine settings, network settings, master password (or IAM auth), parameter group and option group. Double-check this, failing to match this can cause time-consuming issues further in the process.

Once this is done, make sure to create a DMS endpoint for the target database, and test the connection. This database won’t populate in the DMS dropdown until it is fully deployed and available.

Step 3: Preparing Target Instance for Replication

In order for the DMS replication task to work, foreign keys and triggers on the target database need to be disabled. Even on a small database, doing this manually is almost impossible to do reliably. We created a script that generates SQL statements for dropping the foreign keys, and used the following SQL query to generate all foreign keys to drop.

-- list all foreign keys on a table in the public schema
SELECT conrelid::regclass AS table_name, 
       conname AS foreign_key
FROM   pg_constraint 
WHERE  contype = 'f' 
AND    connamespace = 'public'::regnamespace   
ORDER  BY conrelid::regclass::text, contype DESC;

We then templated in the table_name and foreign_key columns from above into the following template in a custom Python script. Please note this is potentially dangerous with unsanitized input and should only be done from known input that you verify yourself. Our script is basic and not ready for open-source, so you can template it however you’d like. I do recommend scripting this so that you can generate this on-demand for different databases. You can also just template out the statements directly in your SQL query if you prefer as well.

ALTER TABLE {table} DROP CONSTRAINT {fk};\n

Disabling triggers is not so bad, as you can disable triggers without completely deleting them. If you have more than a few triggers, I recommend having this ready to go ahead of time. The less you have to generate on the fly during the migration, the better - prework as much as you can.

-- select all triggers
SELECT event_object_table AS tab_name ,trigger_name
 FROM information_schema.triggers
 GROUP BY tab_name,   trigger_name
 ORDER BY tab_name,trigger_name ;

-- update trigger manually
ALTER TABLE <table_name> DISABLE TRIGGER <trigger_name>;
ALTER TABLE <table_name> ENABLE TRIGGER <trigger_name>;

Step 4: Configuring and Running the DMS Task

Once the target database is ready, it’s time to start replication using DMS. There are a lot of levers in the AWS console here, and it’s important to get them right.

Migration type: Migrate existing data and replicate ongoing changes.
Task Settings -> Target table preparation mode: Truncate
Task Settings: Enable validation
Check Turn on Cloudwatch logs.
Check box at bottom, to keep task from starting upon setup.

For some reason, the AWS migration guide specifies using the Truncate mode, which clears all row data, and migrates rows from scratch (not the schema). Because of this, we cannot use set_replication_role in replica mode (see note).

We enabled validation also according to the AWS documentation. This extends the actual migration task process by quite some time, but it does give peace of mind. I recommend turning on the logs in Cloudwatch as this gives you good visibility into what is breaking and why. If you forget to remove a foreign key, for example, the logs will show why certain tables are not able to be replicated properly.

Configuring the task but not starting it allows for one more chance to review things and make sure things are all set before you go. If you have already removed foreign keys and triggers, you can also just go for it.

Once the target database is ready and you’ve configured the DMS task, you are ready to start the task. Depending on your database size, this could take anywhere from 1-3 hours. In our case, the replication took 30 minutes but the validation took almost 1 hour after the replication was finished. The AWS console gives a good overview of what tables are in progress, and don’t forget to scroll all the way to the right to see the full table metrics.

Step 5: Stopping Writes on the Source Instance

Once the DMS task is at 100% and validation is complete, you are now ready to begin the cutting over process, and start downtime. Before we can switch over to using the target database, we need to stop any new writes to the source database, so that nothing gets lost. How you do this in practice will depend on your architecture. In our case, it made sense to add a restrictive security group that only allowed access from dev machines. You still need access to the source database, so make sure you have a way to connect, however that is.

In addition to stopping application access, you can also stop the DMS replication task at this time. It will take a minute or two, but you should see connection activity and CPU activity drop significantly on the source.

Step 6: Restoring Foreign Keys, Triggers and Sequences

This step is the most complex, and time critical as you are on the clock with application downtime at this point. I really recommend going through this step against a test database at least once before you take down your production environment.

When searching for details about this process, I came across this project by Sin-Woo Bang. He built some tooling for automating the cleanup tasks to get your target database ready for the switchover. It’s pretty well-documented and I recommend using it to restore your foreign keys and sequences. An aside here, it does attempt to restore indexes as well as foreign key constraints, but those error out quietly as they already exist. He saved us a bunch of work, and helped answer some questions we had about the process as well. Thanks Sin-Woo!

Once you have restored your sequences and foreign keys, you can turn the triggers back on, using the SQL from Step 3. At this point, planned downtime can come to an end. I hope you can understand why running through this step in practice is important, the faster you do this, the quicker your users are back online.

Step 7: Ending Downtime, Cleaning Up

At this point, you can point your CNAME dns record at the new target database, and restart your services if necessary. Traffic should pick up on the new encrypted database as normal, and you should be all set. Once things are stable, you can go back and start cleaning up the mess left in your wake. A couple things to remember:

Take a final snapshot of the source database, and shut it down, once you feel confident that the target database has taken over without issue.
Cleanup the DMS task, and replication instance once it is no longer needed.
Create any read-replicas as necessary for the new target database.

Other than that, you are done! Let me if you have any questions or experience doing the same - I’m sure there are some other notes and tips I could add here.

Random Notes

Why not use `session_replication_role` to disable foreign keys?

When doing the preparation, I came across several resources that mentioned you could set the replication role to replica for the DMS migration, without requiring disabling foreign keys and triggers. I tried this route, but unfortunately found that this wouldn’t work when using the TRUNCATE option in DMS, as mentioned in the DMS documentation here. I believe you could use DMS to replicate without using TRUNCATE, but given I am not an expert, I opted to follow the process laid out specifically for the encryption migration. I would love to hear how to make it work using this, as it is much simpler.

-- prepare for migration
SET session_replication_role = 'replica';

-- post migration re-enablement
SET session_replication_role = 'origin';

See note about session_replication_role being incompatible with TRUNCATE operations, when foreign key constraints exist.

PostgreSQL has a failsafe mechanism to prevent a table from being truncated, even when session_replication_role is set. You can use this as an alternative to disabling triggers, to help the full load run to completion. To do this, set the target table preparation mode to DO_NOTHING. Otherwise, DROP and TRUNCATE operations fail when there are foreign key constraints.

How do I migrate my source database replicas?

The simplest thing to do is leave your source replicas untouched, and setup the new target database with replicas once the target is generally available. In most scenarios, I imagine you can allow applications to use the original read replica, as data will be slightly stale but available until the switch. If that kind of lag is unnacceptable, you will have to either settle for more downtime on services relying on the replica, or point those services at the target database temporarily, until the replicas have time to get up and running from the new target database.

Deploying Static Sites on AWS with Terraform

Sun, 02 May 2021 13:00:00 -0500

Recently I’ve had to deploy a couple of client-side web applications to the web, and my cloud provider of choice is AWS. If you are familiar with the various tools provided by AWS, setting up a web stack through the console is straightforward. It may be tempting to depend on the UI, especially for something that is usually pretty static, but I highly recommend adopting Infrastructure-as-Code (IaC) principles and using a management tool. You’ll find that the simplicity in deploying new sites and regions is worth the upfront time in setting up your deployment, and it’ll be much easier to manage.

If you are a web developer or full-stack developer with little or no devops experience, you’ll find that this is a great way to get started. In this post, I’ll walkthrough managing your infrastructure with an open-source IaC tool called Terraform but these principles will apply with any other cloud agnostic tool, or AWS’s IaC tool, CloudFormation.

Terraform Introduction
AWS Resources
Prerequisites
Writing the Plan
Deployments
Final Notes

Terraform Introduction

Before I jump into how we’re going to deploy a static site, a brief introduction to Terraform is required to make sense of the code we’ll write. Terraform allows for engineers to write declarative code to create, modify and destroy cloud assets on various cloud platforms such as GCP, AWS and others. Instead of having to navigate a platform’s CLI or UI, we can write terraform files that can be version controlled and added to the CI/CD platform of your choice.

This makes for more maintainable cloud infrastructure - doing it without the IaC approach is the software developer’s equivalent of manually copying files with FTP or rsync to the production server. We are aiming for reliable and repeatable deployments, and continuously shipping infrastructure is a part of the modern stack.

Below is some sample code from their homepage. The syntax is straightforward and describes a running AWS instance with some attributes defined outside in another block. Different types of AWS (and other platform) resources and their definitions and syntax can be found in their documentation.

resource "aws_instance" "iac_in_action" {
  ami               = var.ami_id
  instance_type     = var.instance_type
  availability_zone = var.availability_zone

  // dynamically retrieve SSH Key Name
  key_name = aws_key_pair.iac_in_action.key_name

  // dynamically set Security Group ID (firewall)
  vpc_security_group_ids = [aws_security_group.iac_in_action.id]

  tags = {
    Name = "Terraform-managed EC2 Instance for IaC in Action"
  }
}

Once you have your infrastructure defined, Terraform gives you two CLI tools to get your plan deployed. The plan operation compares your defined infrastructure versus what’s currently there. In the same way that configuration drift occurs in physical servers, it also happens in your cloud infrastructure. Maybe an engineer makes a change without anyone knowing, or a resource has new features launched. Either way, terraform plan shares an execution plan where you can confirm the upcoming changes are exactly what you want.

apply does exactly what you would expect, it will roll out that listed execution plan across the resources as defined. I won’t be diving into integrating these into continuous deployment workflows in this post, but basic knowledge of the above will let you version control your static site in preparation for automated deployments in the future. Now, onto the AWS resources required to host a static site.

AWS Resources

Hosting a static website is a common and standard need for any business or developer, and AWS provides production-grade resources to standup a new site in minutes, so that developers don’t need to worry about reliability. I’ll highlight each of the components and explain how each is used in the toolchain.

Diagram of AWS resources required for hosting a static site.

1. S3 Buckets

S3 buckets are the most critical resource, as they are responsible for storing your collection of images, Javascript, and HTML. S3 is AWS’s object storage offering, and it is essentially a giant key-value store that allows for users to reliably store objects of any size with a key, namespaced with buckets. Deploying a new release to your site will involve overwriting existing assets in this bucket.

2. CloudFront Distributions

CloudFront distributions are globally available content delivery networks (CDNs) that allow for the contents of a single S3 bucket to be distributed with low latency all over the globe, depending on the configuration. Do you want your content optimized for access in Asia? Managing that is a simple configuration change with CloudFront.

3. Route53 routes

Route53 is a DNS web service, that allows for you to programatically direct network traffic to internal and external assets with your domain name. We’ll use Route53 to direct traffic to our CloudFront distribution so that our static site uses our memorable domain name.

4. IAM Policies

IAM stands for Identity Access Management, and is AWS’s tool for managing secure platform access within their ecosystem. We will use this to prevent unauthorized access to our S3 bucket, so that the only way users access our content is through our CloudFront distribution. This prevents unauthorized access, and enforces the client requirements we will set in our CDN. We’ll write an IAM policy in our terraform code below.

5. AWS Certificate Manager

ACM helps us manage our SSL/TLS certificates for secured HTTPS access to our static site. While not always necessary for certain kinds of content, I’ll assume your site requires HTTPS, although deploying a site with HTTP only access is just as easy.

Prerequisites

There are two prerequisites assumed below, so you will have to modify the configuration plan accordingly, or manually configure these assets. The reason these are not contained below is that they are really static, and are much more “set-it and forget-it” than anything below. They can be automated as well, but I deemed it out of scope for this example.

First, this tutorial assumes you have an existing Hosted Zone created in Route53. For each unique domain you have, you’ll need a hosted zone. You don’t necessarily need to purchase a domain through AWS, but if you manage a domain through another domain provider like Namecheap, you’ll have to configure their portal to point to the AWS name servers provided after you have a hosted zone created. You will also need the hosted zone id once you have it setup.

Second, I’m assuming you have a valid SSL/TLS certificate created through ACM. You can create one with a wildcard to your domain, such as *.customdomain.com, and this will allow you to use the same certificate in all future subdomain static sites. Keep track of the ARN that comes with that certificate.

Writing the Plan

With all of that out of the way, we can get into the details and look at what such a plan will look like. I’ve pasted the entire plan below as well as in this this gist.

The parts that should be overriden by your own config are in the <> brackets, and the brackets should also be replaced by whatever text or variable is specified. The region can also be changed, I just defaulted to us-east-1.

The locals block allows for you to parameterize variables that get referenced in multiple places later on. These could also be converted to input variables so that they can be dynamically set as well.

The amount of CloudFront distribution parameters would take a full blog post to cover all of the details. I selected some sane defaults for this distribution: it requires HTTPS, uses PriceClass_100, which caches your content in NA and EU (cheapest option), and uses some standard caching values.

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "3.19.0"
    }
  }
}

provider "aws" {
  # region can be overriden, parameterized if desired
  region = "us-east-2"
}

# PARAMETERS, certificate and hosted zone id required
locals {
  s3_origin_id = "myS3Origin"
  certficate_arn = "<certificate_arn_here>"
  dns_zone_id = "<hosted_zone_id>"
}

# s3 bucket configuration
resource "aws_s3_bucket" "bucket" {
  bucket = "<your_bucket_name_here>"
  acl    = "private"

  website {
    # change this if you have something like root.html or home.html configured instead
    index_document = "index.html"
  }

  # feel free to modify tags for your own use, used for cost analytics
  tags = {
    Service = "<service_name>"
    Operation = "app-hosting"
    Environment = "prod"
  }
}

# cloudfront principal identity for s3 access
resource "aws_cloudfront_origin_access_identity" "s3_access_identity" {
  comment = "Cloudfront user for S3 bucket access."
}

# cloudfront distribution configuration
resource "aws_cloudfront_distribution" "s3_distribution" {
  origin {
    domain_name = aws_s3_bucket.bucket.bucket_regional_domain_name
    origin_id = local.s3_origin_id

    s3_origin_config {
      origin_access_identity = aws_cloudfront_origin_access_identity.s3_access_identity.cloudfront_access_identity_path
    }
  }

  enabled = true
  is_ipv6_enabled = true
  comment = "Host for Blog"
  default_root_object = "index.html"

  # logging_config {
  #   include_cookies = false
  #   bucket          = "mylogs.s3.amazonaws.com"
  #   prefix          = "myprefix"
  # }

  aliases = ["<domain desired here, ex: blog.customdomain.com>"]

  default_cache_behavior {
    allowed_methods  = ["DELETE", "GET", "HEAD", "OPTIONS", "PATCH", "POST", "PUT"]
    cached_methods   = ["GET", "HEAD"]
    target_origin_id = local.s3_origin_id

    forwarded_values {
      query_string = false

      cookies {
        forward = "none"
      }
    }

    viewer_protocol_policy = "allow-all"
    min_ttl                = 0
    default_ttl            = 3600
    max_ttl                = 86400
  }

  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }

  price_class = "PriceClass_100"

  viewer_certificate {
    acm_certificate_arn = local.certficate_arn
    ssl_support_method = "sni-only"
  }

  tags = {
    Service = "<your_service_name>"
    Operation = "cdn"
    Environment = "prod"
  }
}

# json policy for cloudfront -> s3 access
data "aws_iam_policy_document" "s3_policy" {
  statement {
    actions = ["s3:GetObject"]
    resources = [
      "${aws_s3_bucket.bucket.arn}/*"
    ]

    principals {
      type = "AWS"
      identifiers = [ aws_cloudfront_origin_access_identity.s3_access_identity.iam_arn ]
    }
  }
}

# iam policy
resource "aws_s3_bucket_policy" "s3_read_access" {
  bucket = aws_s3_bucket.bucket.id
  policy = data.aws_iam_policy_document.s3_policy.json
}

# dns route to cloudfront
resource "aws_route53_record" "app_route" {
  zone_id = local.dns_zone_id
  name = "blog.customdomain.com"
  type = "A"

  alias {
    name = aws_cloudfront_distribution.s3_distribution.domain_name
    zone_id = aws_cloudfront_distribution.s3_distribution.hosted_zone_id
    evaluate_target_health = false
  }
}

Once you have a plan that you are happy with, you can test it out using terraform plan to get a full list of what will happen, and deploy it using terraform apply if nothing errors out.

Deployments

All that is required to deploy updates to your static site is to sync your desired build directory to the S3 bucket, and then create an Invalidation in your CF distribution to let it know that the cached content needs to be refreshed from the S3 bucket.

For a quick and easy example of how I do this with this blog, see the following deployment script in this very blog that uses the AWS CLI to sync my build folder, and create the invalidation.

That script runs automatically with TravisCI so that each merge to my production branch is deployed without any effort.

Final Notes

On the topic of pricing: you might be intimidated by all the resources we’ve created in this post and wondering what kind of AWS bill you will incur at the end of the month. I can assure you that for the average site, AWS’s usage based pricing of S3, CloudFront and Route53 will be competitive to almost any alternative out there for hosting websites with a CDN. If you don’t believe me, make sure your tagging schema is set correctly, and use the Cost Explorer next month to see how little it costs. For reference, this site costs less than a cup of coffee a month to host.

With the rise of cloud platforms in the past decade, running infrastructure in the cloud has never been more accessible. I hope this gives you a peek into the power of IaC for your side project or business. If you have any questions, feel free to email or ask below, I’m always happy to help if I can.

Homelab Log: #002

Sun, 06 Dec 2020 12:00:00 -0600

Home Security Camera Network

I recently set up the first service for my homelab, a ZoneMinder server that is recording and saving data from a couple of POE cameras.

Physical Installation

For the physical installation perspective, I purchased 500ft of Cat6 outdoor rated cable to run along the outside of our home inside plastic PVC conduit. Our home is not new construction, and didn’t have any sort of structured wiring installation. Given that I don’t want to tear out the drywall in our home, I decided to run the conduit along an exterior wall, out of view.

From my office, I drilled 1” holes to the exterior and fit PVC conduit through those holes to junction boxes similar to these. Given that our length of exterior wall is quite long (at least 120ft), I put several junction boxes at key points for future use, so I can expand the network with more cameras and access points as necessary. I purchased PVC conduit in the standard 10ft lengths, and glued and cut to size pieces as necessary. Instead of going with 3/4” conduit, I went with 1” for more cable space if required in the future.

I only had a maximum of 180 degrees of turns in each stretch, so pulling the wire wasn’t difficult, and didn’t require any special tools. All cables were terminated at specialty junction boxes that fit my specific Armcrest cameras. From there, I used an Ethernet cable tester to verify cables were all working, and moved onto configuring the network.

Network Setup

Once the physical installation was done, I moved on to setting up my ZoneMinder server and switch to handle the new traffic. I’ve included a basic diagram below that outlines the current state of the network.

Network diagram of security cameras.

I had an old PC lying around that I reconfigured as an Ubuntu machine and installed ZoneMinder. To power the switches and handle packet routing, I am using a Netgear GS516TP Managed Switch. It powers the cameras, and connects my main router and ZM server so that I can review footage from any device at home.

I used the managed functionality of my switch to set up a VLAN specifically for my POE cameras. I set up an ACL to only allow devices in that VLAN to communicate with the ZM server by using the service’s static IP address.

In the best case, security cameras often have firmware from the manufacturer that will ‘phone home’ and send basic usage data. In the worst case, a compromised device on my network could be used to access camera data and expose it to the outside world. By isolating the traffic this VLAN allows, it narrows my surface area for attack to just the ZM server. As long as I perform regular security updates and keep that server up-to-date, I can be confident my camera data won’t be compromised.

Next Steps

Given my timeline to finish this project, there were a few things I had to leave as-is in the interest of getting this V1 of the system up and running.

I’d like to setup a proper server rack that gives me space to install my switch and patch panels for more reliable and cleaner cable management. Right now, the corner of my office is a mess and leaves a lot to be desired.

Next up is purchasing a real server such as the Dell R710 so I can begin virtualizing my internal services and not require inefficient hardware for my experiments. My old PC running ZM is loud, power hungry and lacks the ability to easily manage and monitor it, something that a hypervisor like ProxMox would offer.

Finally, I have Ethernet cables outside my home that could be prone to lightning strikes or other interference. To prevent something from frying my expensive internal hardware, I need to install Ethernet surge protectors on each incoming line to isolate my homelab.

Homelab Log: #001

Fri, 16 Oct 2020 13:00:00 -0500

Being a homeowner has opened up a world of projects both digital and physical, especially with a dedicated garage. This post will be the first in a long series about the various projects I take on, both as a personal record and as a way to share the interesting things learned along the way. I’m titling the series the Homelab Logs, as homage to the homelabbers. Although that community is centered around sysadmins and DIY technologies, I hope to emulate that spirit of experimentation and learning in my approach to improving and upgrading our home.

These logs will not be as long or include as much writing as my usual posts, but will be a combination of photos and descriptions of the process. You can also tell from my numbering schema that I plan on writing many of these, and checking-in my work early and often.

Office Cabinets

With the intro out of the way, here is a brief overview of one of my biggest woodworking projects to date: an office cabinet for storage and organization. It was based on standard cabinet dimensions, and I built 3 drawers on the cabinet on the left, and 2 full length drawer slides on the right for easy access.

I’ve never built cabinets from scratch, and it was a great learning experience for my next time around. The cabinets were built with baltic birch plywood, and the drawer faces and doors were all cut from a single piece of maple plywood, for a continuous edge between them.

Installation of front doors on the right cabinet.

Finishing the cabinet top and one of the doors.

Installation of cabinets before the top went on. I used leveling feet to get bot of these perfectly even and level.

The finished cabinet, you can see the reveal on the left swinging door had some alignment and warping issues.

Recovering Data from a Failed NTFS Drive

Sun, 06 Sep 2020 13:00:00 -0500

A family member recently came to me with a portable hard drive that Windows could not read. After confirming I was seeing the same issue on my own PC, I went down the rabbit hole of attempting to recover as much data as possible from the failed drive. I documented the path I took below, as well as some notes if you ever find yourself in the same position.

cloning the damaged drive

If you have a drive that can’t be read anymore, chances are that only a portion of it is dead, and many (if not most) sectors might still be readable. The more time passes between initial failure and imaging the drive, the higher the chance the drive might continue to fail as it powers on/off.

GNU ddrescue is a great recovery tool that copies from a block device to another. I won’t go into details about the documentation, since you can read that yourself, but it does its job very well. Below are some important notes about how to install and use it.

Be sure to install it by using sudo apt-get install gddrescue, ddrescue is an older, incomplete script.

ddrescue requires that the input file be visible when you run sudo lsblk, so if the device doesn’t even register there, unfortunately this won’t help you.

You will need an output disk that can contain the entire failed disk, not just the amount you used, so if you have a 1 TB hard drive you are attempting to recover, I recommend using at least a 2 TB hard drive to store the output. ddrescue copies block for block, and doesn’t know anything about the contents.

ddrescue can be run with a mapfile that can allow the process to be picked up at any time, so you can take a break or shutdown your computer if desired. I’ve read that some people don’t run it too long in one session to prevent the damaged disk from getting too hot, but that seems anecdotal.

The actual command I ended up running was sudo ddrescue -r 2 /dev/damaged_drive /media/large_backup_drive/image /media/large_backup_drive/logfile.

The final output file is called image and the mapfile is called logfile on the large_backup_drive in the above example. It also will attempt to retry bad sectors two additional times.

Lastly, be aware that this is a very time consuming process, especially when you start retrying sectors. For my 1 TB hard drive, in total ddrescue ran for over 70 hours to fully process and retry the failed sectors. In my case, ddrescue was able to recover all but 60 MB of the original drive.

fixing the drive structure

Once you have imaged the drive, you can attempt some of the various tools to fix the structure of your NTFS partition, like fsck. Some people have had success with other tools as well, but none of these worked for me. It’s worth a shot, and here is a link to a discussion about some of the options you have.

SuperUser - Fix corrupt NTFS partition without Windows

In my case, it was unable to find the superblock, and couldn’t fix itself. I still recommend it, as the upside is huge, and it’s a simple check once you have created a backup.

recovering data

Once I had a full copy of the original drive, I was free to test out the various open source data recovery tools out there. Originally I used Foremost, but I found it to not be helpful in terms of communicating how far along it was, and ease of use in pausing and restarting.

Ultimately I ended up using Photorec, and followed it’s super simple usage directions. Despite its name, it is more than just a tool for recovering image files, it looks for all sorts of documents, text files, etc. Make sure you already have a new directory created ahead of time, as you won’t have the option to create one inside photorec.

$ sudo photorec /media/hardrive/backup_image

Photorec has a detailed terminal UI that will guide you through selecting the image file (if you ran it without specifying it), selecting the output directory, and start the long process of searching. It will report the progress percentage and the number of each file type found. After running for many hours, I was able to recover over 10k files and copy them to a new hard drive.

conclusion

At this point, you are free to test any data recovery tools on your backup images, just make sure that you aren’t running them on the damaged disk. The only thing left desired after going through this is related to lost metadata after recovering files. After using photorec to find hundreds of gigabytes of data, many files were missing filenames and of course, any directory information was gone as well.

Given that some file types have more complicated metadata embedded inside, a tool that parsed metadata when filenames were lost would be a great addition to photorec. I could imagine a simple rules engine that would look for filenames, authors, timestamps, locations and add them in priority order for some context for the drive owner.

All in all, losing only 60 MB out of a damaged 1 TB hard drive is more than I can ask for, so I can’t complain. Lastly, I will always plug using some sort of backup tool like Restic or Backblaze. An ounce of prevention is worth a pound of cure, and backing up this drive regularly would’ve saved hours of work and headache.

Configuring the Raspberry Pi Zero W

Tue, 07 Jul 2020 13:00:00 -0500

I recently was working on getting the Raspberry Pi Zero W set up, a tiny computer with a Wi-Fi chipset built-in. I run these headless, so I need to manually configure these to be available via SSH right out of the box. This will be a short overview of what I did, as it took way too long and I plan on setting up many of these over the coming years.

setup

flashing Raspbian lite

The first step is to download the Raspbian OS Lite image from the Raspbian website. Unzip that and keep the .img file accessible. You can use dd or a tool like Balena Etcher for OSX systems to flash the image to your formatted SD card.

boot up the pi

Boot up the Pi using the PWR micro USB port, and give it a few minutes to initialize for the first time.

configuring the OS

After the pi boots up, power it off and stick the SD card into your computer once more. Once the SD card is mounted, we’ll make a few modifications to add WiFi credentials and enable SSH access.

In my circlefiles repository, I have a python script that uses the python invoke library to make it easy to run tasks against your machine.

# run rasp pi zero setup to configure Wi-Fi and enable ssh
$ inv rasp-pi-zero-setup

Once those steps are complete, the SD card can be ejected and added back to the Pi. The Pi Zero is now ready to boot and should be able to connect to your Wi-Fi without issue. Give it a few minutes to fully boot up, and then you can find the IP on your router device list and connect.

# default user is pi and password is raspberry
$ ssh pi@<fake_ip>

looking ahead

These are some simple instructions that outline configuring a Pi Zero with little to no extras added. I plan on adding Ansible playbooks to setup docker and various other utilities sometime in the near future.

Logging iTerm2 Activity

Mon, 04 Nov 2019 12:00:00 -0600

I do most of my software development on OSX, and my terminal of choice is iTerm2. iTerm2 is a full-featured terminal emulator built for Mac that allows you to do some incredible customization. The recent release of 3.3 added a new level of customization, a Python API with which you can customize almost any aspect of your terminal.

I have wanted to do keylogging / tracking on my terminal to get a better idea of what aliases would be the most impactful, see patterns in my usage and so on. Up until now, I haven’t been able to find a useful open-source keylogger for my environment. With the recent API release, iTerm2 has exposed all of its internals, including the ability to hook into events on the terminal.

With the recent change, I decided to bite the bullet and build a small daemon to begin capturing my usage of iTerm on my development machine.

API Introduction

Before I get started, I highly recommend walking through the iTerm2 documentation and tutorials on getting started with the new API. George Nachman and the rest of the iTerm team did a fantastic job documenting and helping new users get their first script running. The examples listed are also helpful, and in particular the Alert on Long-Running Jobs script was helpful in demonstrating session monitoring capabilities.

I recommend downloading one, slightly modifying it and place it into your iTerm2 directory to begin testing. Once you place scripts inside your ~/Library/Application Support/iTerm2/Scripts directory, you’ll see iTerm load it up in the Scripts menu option. Again, there is clear documentation for this portion, so I’ll point you there for clear directions.

Finally, you can take advantage of the Scripts console to monitor currently running scripts, see exception logging and more. You can easily start new scripts, restart existing ones and manage what is going on behind the scenes. See Troubleshooting for more details.

Logging Sessions

Compared to what you can do, my logging script is relatively simple. I had a few goals:

log sessions by name upon creation
log commands by session
log command exit status by session
log command duration

My script uses PromptMonitor and EachSessionOnceMonitor to log sessions opening up, and run a function that waits for any command input.

async def main(connection):
    """
    This long running iTerm2 daemon logs commands, status's sessions, etc.
    :param connection: iTerm2 connection obj
    """
    app = await iterm2.async_get_app(connection)

    async def monitor(session_id):
        """
        Monitor a session for commands, log them out.
        :param session_id: str
        """
        session = app.get_session_by_id(session_id)
        logger.info("new session: %s", session_id)
        if not session:
            logger.warning("No session with id: %s", session_id)
            return

        modes = [
            iterm2.PromptMonitor.Mode.PROMPT,
            iterm2.PromptMonitor.Mode.COMMAND_START,
            iterm2.PromptMonitor.Mode.COMMAND_END,
        ]
        async with iterm2.PromptMonitor(connection, session_id, modes=modes) as mon:
            while True:
                # blocks until a status changes, new prompt, command starts, command finishes
                mode, info = await mon.async_get()
                if mode == iterm2.PromptMonitor.Mode.COMMAND_START:
                    logger.info("session-%s-command: %s", session_id, info)
                elif mode == iterm2.PromptMonitor.Mode.COMMAND_END:
                    logger.info("session-%s-status: %s", session_id, info)

    await iterm2.EachSessionOnceMonitor.async_foreach_session_create_task(app, monitor)

With PrompMonitor, you can listen to certain modes, so I connected to commands start and ending, and logged those out along with the session id. I use the standard python logging interface, and set up a rotating file handler to turn over files once a day. If you’d like to see the full script, feel free to check it out on GitHub.

Running it 24/7 is as easy as putting it into the Scripts/AutoLaunch directory and letting iTerm2 take care of the rest.

Next Steps

I plan on running this logger on my machine for the next several months before beginning to look at the data. Some potential questions I have, for fun and for utility:

what are the most common tools I use? which ones do I use for work? personal?
which one of my current bash aliases are the most used? (or save the most keystrokes?) what are some commands that would benefit from being aliased?
what commands keep me waiting the longest? If I find myself waiting hours a month for test suites to finish, maybe I should make it easy to run smaller sets.
what series of commands should I combine into a single tool or uitility?

Some last thoughts on how to improve this and further customize iTerm2:

there are some transient errors around sessions closing or being interrupted, and I don’t know enough about the Python API to solve them yet
for each software project I work on, it’d be nice to issue a single command that opens all the sessions required for builds, checkout branches, etc.

With such an open API, the options are limitless..

Building a Desktop Linux PC

Sat, 12 Jan 2019 12:00:00 -0600

For the past 7 years, I’ve almost exclusively lived off 3 different laptops that I’ve owned. Between the three of them, I’ve used one Windows 7 machine, and two OSX laptops. The OSX laptops have been the most reliable and common ones that I’ve used for development the past 5 years. The Windows laptop died a few years into usage and I haven’t touched it since. In the meantime, I’ve wanted to work on getting a home network setup with NAS and various other utilities. To set all that up, a Linux home base has been a prerequisite for a while. Over the holidays, I found some good deals on some parts I’ve been waiting for, so I decided to spring on it.

goals
the build
configuration management
stability and burn-in
backup and storage
last thoughts

goals

This desktop build is meant to satisfy a relatively narrow set of use-cases. My parts list and setup will optimize for the uses listed below.

What it’s meant to do:

Run 24/7 with very high reliability
Handle multiple storage drives for dual-boot, potential NAS storage
Transcribe signals from analog video to digital formats
Be accessible from trusted devices in the local network

What it’s not meant to do:

Play games that require lots of computing power
Mine cryptocurrencies

the build

Building the PC in total took me about 2 hours for the physical setup and testing, and another hour to get Linux set up as I wanted. At the time of writing, I have used these components only for a short time, so I cannot give a proper review. I plan on following up 6- 12 months from now with a review based on what I’ve experienced.

parts list

One of my unstated goals was to finish the build for under $500, and my parts below at the time of writing cost in total about $485. You might be able to find these cheaper depending on how prices change over time. The links below are also not sponsored in any way, so please just use them as a reference.

CPU: AMD Ryzen 3 2200G

includes on-board graphics
cheap workhorse CPU
if using an external PCI GPU, only uses PCI x8 mode even if your GPU supports x16. I’ve heard this is not a big deal, but just in case you care about maximizing your performance, I would do your research.

Motherboard: MSI B450 Tomahawk ATX AM4

I sprung for the B450 over the B350 because of the support for more SATA III connections. Other than a few minor USB configuration differences, they are mostly the same. If you have a really old CPU, there might be some compatibility issues that are worth upgrading to the B450. At the time, there was a $10 difference, so I decided to upgrade.
Includes a USB Type-C connector which was a non-negotiable for me.
From an aesthetics perspective, the board looks really good and the color scheme fits in well with my case. It also has a few RGB LED headers you can use.

Ram: Corsair 1x8GB DDR4 2400MHz

I plan on adding another 1x8 GB stick in the future, but this suffices for now.
Upon researching the different RAM speeds, it appears to make small performance improvements at 2800MHz + but I opted to stay away for now.
There are cheaper 1x8GB sticks out there from other brands, but I decided to go with Corsair’s tried and tested model for my own sanity’s sake and pay the extra $15.

HDD: Samsung 860 Evo 250GB 2.5” SATA III

I wanted a smaller drive to run my Linux system, and for the NAS storage options I talked about in the future, I plan on getting WD 2TB red drives.
This is the first SSD I’ve installed myself, and they are incredibly light, thin and cheap. SSD technology has come a long way since prices 10 years ago.

PSU: EVGA SuperNova G3 750W

This is a fully modular power supply, if you care about cable management, it’s worth the cost.
EVGA has great customer service and a 10 year warranty, make sure to register your product.
It includes an ECO mode that helps with silencing the PSU fans and only running as necessary, but not a big selling point for me.

Case: Cooler Master HAF XB EVO ATX Desktop

I didn’t want a tower build, but a more boxy build like the EVO.
It is incredibly spacious, and makes it easy to route cables with zip tie loops throughout the case.
Easy access to the motherboard makes it simple to service or modify if you keep it on your desk. The top and both side panels are removable to allow all encompassing access.

WiFi PCIe Adapter: Gigabyte GC-WB867D-I

order of operations

When looking up how to build your own PC, you’ll find pretty similar high level instructions for component order and what to do. There are always smaller details like test booting and other miscellaneous items that seem to fall through the cracks, so I thought I’d record what I did for next time. I am by no means a build expert, so I defer to professional opinions if you find conflicting information. This is simply a record of my particular build.

As most guides out there suggest, fully read through the list twice, before removing a single item from packaging. If this is your first build, cross-check it with other online resources if your parts are significantly different from mine. Manuals are also your friend, I highly recommend reading each manual for your motherboard, case, GPU and PSU with the priority being in that order. Every build is different, and this is one case where it pays to read the manual before starting to jam parts together.

One final note, I did not test-boot my motherboard while it was outside of the case before the power and reset switches from the case were connected. I am not experienced enough to understand how to manually power off the motherboard so I opted not to do this as many experts recommend. My method is more time-consuming if you misconfigured your motherboard and have to debug it once it’s in the case, but for me it was the safer option.

WARNING - these steps do not include GPU installation, so please read your manual if you are installing a GPU.

Install CPU onto the motherboard. If this is your first time, please watch some YouTube videos for your specific processor installation to make sure you don’t damage your CPU. The CPU is delicate, and there is room for error here if you aren’t careful. If you are using the same CPU as I am, I recommend this particular install video.
Mount CPU fan to the motherboard. My build uses the stock cooling fan for the Ryzen 3, and already includes thermal paste. I again recommend watching YouTube videos for your specific cooler for mounting instructions and using thermal paste. Don’t forget to attach the CPU fan power cable to the motherboard as specified in the manual.
Install RAM on the motherboard. Make sure you carefully read your motherboard manual to understand the mounting location. For example, I was only installing 1 stick of RAM, and its spot was in the 2nd from the left. Not the most intuitive, so always read the manual.
Install PSU to the case. At this point before the motherboard is installed, I recommend making sure that you attach all the power cables you need so that you don’t have to dig under the motherboard later once it has already been mounted and there is limited space. In my case, I needed the MB cable, the CPU cable, one SATA power cable and one peripheral power cable for the front fans.
Install the standoffs to the motherboard mount. Mount the motherboard to the standoffs for a standard ATX motherboard. Refer to the case manual on specific instructions on how to do this. The HAF XB EVO has a removable mount that makes it easy to do this without having to fit your motherboard in the case yet.
Place the IO shield for your motherboard inside the case in the standard back slot. Make sure that it’s oriented correctly before pressing it in.
Install the motherboard mount to the case, but only screw in a couple screws so that it’s secure, but not too hard to remove later if necessary.
Read the motherboard and case manual to figure out how to attach the power and reset switches to the mother board. Optionally attach the Power and HDD LED’s if you are confident.
Plug in the power cable, flip on the PSU switch and try turning it on. Your CPU fan should come on and some LEDs on the board will light up. The B450 board I used includes some easy debug LEDs that can tell you if you are having CPU, RAM, VGA or boot issues. Once you see the BIOS come up, you made it!
If you successfully test boot, power off the system, turn off the PSU and fully unplug the PSU before starting to work on it again.
I mounted the SSD in the back HD mount of the case. The case I was using has 2 hot swappable drive mounts in the front, but I didn’t want the primary boot drive to be easily removed. Carefully read your case manual about how to mount a 2.5” drive, usually there are adapters that make HD mountings compatible with both 2.5” and 3.5” drives.
Connect the SATA connectors to SATA1 on the motherboard, and connect the PSU SATA cable to your drive.
Connect the front USB 3 cables and audio cables to the motherboard. Again, this is always specific to each motherboard and case, so I highly recommend reading both manuals to ensure proper connections.
Connect any external fan power cables to your motherboard or PSU. Manuals come in handy as always!
Once I was able to boot and test that the system was stable, I then installed my PCIe adapter for wireless networking.

At this point, your computer is all set for you to begin booting up your particular OS boot drive if you have that ready. I recommend waiting to zip tie and organize cables until the very end of the process, but that’s up to you. If you didn’t fully mount the motherboard or PSU with all 4 screws, I highly recommend you do that now.

Linux Install

I have the most experience with Ubuntu so I decided to go with the Ubuntu 18.04.1 LTS release. There are many guides that go over setting up bootable drives so I won’t rehash those here. I was able to boot from my USB drive easily and go through the standard Ubuntu install without much effort.

Things only got tricky once I started rebooting my computer for the first time. I would get a purple splash screen… and then nothing. Upon further investigation, it looked to be an issue with support for the AMD Ryzen 3 processor with on-board graphics. AMD has not released any official driver support for Linux, and I have not been able to find well-supported community drivers either. The only thing I’ve found searching across different forums and threads is the hint that the newer Linux kernels (4.17+) have better support for AMD GPUs.

I decided to use a utility called UKUU to help manage the kernel upgrade. Once installing kernel 4.20, I rebooted and have not had any issues with the AMD on-board graphics GPU getting recognized. To get to the point where I could even install UKUU and utilize the system, I had to manage the GRUB boot settings to manually force it to not attempt to use a GPU.

In the GRUB loader, selected Advanced options for Ubuntu, then click e on the default loading configuration to edit it. You should see a line that has something looks like this:

ro quiet

# change it to:
ro nomodeset quiet

You can use CTRL + X to be able to boot with your modified entry. The system should boot fine, but you’ll notice that the resolution will be poor, but it’s enough to get what we need done. At this point, you can install UKUU and then load the 4.20 Linux kernel, reboot and you should be able to properly use your new system.

Lessons Learned

As this was my first build in quite a while, I made some mistakes and learned some things that I either forgot or hadn’t experienced before.

Don’t forget to put your IO shield in early, I forgot to do this and had to move the motherboard late in the install after all the cables were plugged in.
Connect all the PSU cables you need if you have a fully modular PSU. You can route them out of the side of the case temporarily to preserve space, but it’s much easier to do it earlier than later once your motherboard is in.
Have a USB keyboard and mouse available. I have been using Bluetooth keyboards for so long that I had to find an old one in storage.
Research driver support early, before you buy components. In my case, it worked out that the later Linux kernel supported my integrated graphics card. In hindsight, I should’ve known there might’ve been potential compatibility issues before I bought the parts.

configuration management

In the spirit of making my builds repeatable and to prevent headaches when building future machines, I wanted to put as much configuration as possible into version control of some sort. I have the most experience with Ansible, so I decided to create an Ansible playbook for the machine.

I’ll highlight the process in more detail in another post one day, but for now the only dependency I had to install before running my playbook was ansible itself. The initial steps are highlighted below:

$ sudo apt-get install software-properties-common
$ sudo apt-add-repository ppa:ansible/ansible
$ sudo apt-get update
$ sudo apt-get install ansible git

Once ansible was installed, I cloned my repo that contains my configs, circlefiles. I then run my ansible playbook to ensure that the machine state is up-to-date.

$ ansible-playbook piper_home.yml --ask-sudo-pass

Using those ansible playbooks, you can then add whatever modules you need or want installed. I also use my playbooks to copy my vim config dotfiles, bashrc files and more. If you want to see the latest one that I use on this machine, you’ll see it on the master branch on my circlefiles repo. The goal is to not just run commands when configuring and setting up packages, but to put it all in version control, so it’s easy to set up new systems if the need arrives. Two years later when you’re trying to remember what you did, you’ll appreciate the detail.

stability and burn-in

Now that my system was set up, I wanted to test for reliability and make sure that there were no obvious stability issues. From my research, I came up with two different tests for the CPU and memory. prime95 is a common tool that can be used to push a CPU to its limits. memtest86 can also be used to thoroughly test your RAM to make sure that there are no damaged areas.

memtest86

Memtest86 is a utility that can be run from inside your existing OS or from a bootable USB drive. I opted to do the latter, as it allows for your RAM to be fully tested. From my understanding, when memtest is run from a running OS, it can’t access all the RAM addresses possible. When run as a bootable drive, it can fully test all of your RAM for any errors.

You can see the results from my memtest run above. By default, it goes for four full passes and runs each test four times. For my 8GB stick, it took a little under 2.5 hours to complete.

prime95

prime95 is a software that calculates prime numbers, but has evolved into a program used by system builders to test stability of their systems. prime95 is computationally intense, and can push your entire system’s limits. MPrime is the specific Linux version that we will use. I based my test off of Jeff Atwood’s blog post on system reliability, so if you are curious about the details, I would read that first. The only difference is that I use another tool to help monitor my CPU temps, as I’m running an AMD processor and not an Intel series CPU. If your computer can run it overnight without mprime crashing, you should be all set.

backup and storage

The final step that I wanted to setup at the beginning was automating periodic backups of my user directories. There are dozens of backup tools out there, but I am most comfortable with a tool called restic. To start, I just want to backup my home directory once a week to a remote S3 bucket that I’ve already configured.

I’ve setup a cronjob that runs weekly that uses restic to create snapshots of my /home and /var directories and stores the encrypted snapshots on S3.

It’s not very complex and wouldn’t handle a full system restore if I needed it, but at the very least I know I will have my user data backed up if anything goes wrong. If backups aren’t easy, you won’t ever backup your data, and using a tool like restic removes a lot of the friction.

last thoughts

If you’ve stuck around this far, you’ve seen the entire life-cycle of a new desktop build. Setting up a Linux system definitely requires a few more tweaks here and there, but if you’re looking for a free and reliable system you can use for local development and browsing, I highly recommend Ubuntu 18.

Lastly, this new desktop is only a couple weeks old and there will be lots of things to iron over the coming months and years. There are many more things I would like to experiment with this computer, and I plan on documenting those projects as the need and time arises.

Installing another SSD that exclusively boots Windows.
Installing at least two 1TB HDDs that I can use for media backups.
Running Minio, an open-source s3 tool on those HDDs.
Digitizing old VHS tapes using ffmpeg or vlc and a TV tuner card.

Migrating Hosting to S3 and Cloudfront

Tue, 20 Mar 2018 13:00:00 -0500

When I began this blog, I decided to host it on a small Digital Ocean droplet. At the time, it made sense - I was learning about managing Ubuntu servers, firewalls and dns routing. I’ve learned a bunch since then, and lately my focus has centered around building reliable data systems. I haven’t had time to properly manage my blog hosting and maintain the toolchain around it.

It’s built with Jekyll, and over time as I’ve switched computers, tried installing new gems and more, my local Ruby installation is completely out of sync. I was also previously using a Jenkins server for continuous integration and deployment, but since then I’ve stopped it to cut costs.

As a result, writing new posts has become a chore that requires me to wrestle with Jekyll, test and then remember how to deploy manually. One of my 2018 goals is to simplify the projects that I work on and make sure that they are easily maintainable moving forward. With that in mind, today I’ll be walking through the process of modernizing all the operations around my blog.

development and writing

My first goal was to make writing new posts and testing local with Jekyll as easy as possible. In the past, It was hard to manage my local Ruby installation and keep everything up-to-date. To deal with this, I decided to use a Docker image with Jekyll and ruby already installed.

Thankfully envygeeks maintains a popular Docker image that I was able to out of the box, without building my Dockerfile from scratch. From there, it was just a matter of modifying my Makefile to run a simple bash script inside of the image.

# Makefile

development:
	docker run --rm -p 4000:4000 --volume="${PWD}:/srv/jekyll" \
        -it jekyll/jekyll ./scripts/development.sh

build:
	docker run --rm --volume="${PWD}:/srv/jekyll"  \
        -it jekyll/jekyll ./scripts/build.sh

There are just a few important things to note here about how I use docker to build:

I begin by connecting my current directory with /srv/jekyll in the container. The Dockerfile in the image uses the following line: WORKDIR /srv/jekyll so this is where our commands will get run from.
I use --rm to delete the container after it shuts down, I don’t need a bunch of old containers filling up my hard drive.
port 4000 from the container is connected to 4000 on the host (my computer) so I can easily test http://localhost:4000/ in the browser.

The scripts used are even simpler, as shown below. Since jekyll is already installed, these are just a simple wrapper around in-case I want to add more tasks in the future. One last note, since the volumes are shared between my current dir and the container, jekyll serve detects changes as I write and immediately regenerates so I can proof-read and test quickly.

# scripts/development.sh

#!/usr/bin/env bash
jekyll build
jekyll serve


# scripts/build.sh

#!/usr/bin/env bash
jekyll build

serverless hosting with s3 and cloudfront

Next, I wanted to remove all operational overhead of hosting my own static website. Amazon’s S3 service is a perfect fit and allows for static websites to be hosted without any personal maintenance. On top of that, Cloudfront can be used to offer a CDN service in front of S3 to improve latency. For many static sites, these tools are a great fit and allow for you to pay only for what you use and nothing more.

Deploying on this setup is as simple as syncing jekyll’s static output to your S3 bucket. After that you only need to invalidate the Cloudfront cache in front of your bucket to ensure that your changes propagate.

continuous integration and deployment

The final step was to set up Travis-CI to automate testing and deployment. Travis-CI supports a new jobs feature that allows for serial job pipelines that are great for setting up build and deploy pipelines. Here is the barebones configuration I use to only deploy on merges to master.

# .travis.yml configuration
sudo: required
services:
  - docker

stages:
  - build
  - name: deploy
    if: branch = master

jobs:
  include:
    - stage: build
      script: make build
    - stage: deploy
      script: make deploy

Since I already have a Makefile setup for things like building and testing, I added another make deploy command so that it’s easy for me to deploy from my machine and from a CI server. I just have to pass in my AWS creds to my Docker image as environment variables and let it go.

deploy: build
	docker run --rm --volume="${PWD}:/build" -it \
	-e AWS_ACCESS_KEY_ID=<access_key> \
	-e AWS_SECRET_ACCESS_KEY=<secret_key> \
	-e AWS_DEFAULT_REGION=<region_name> \
	library/python:3.6 ./build/scripts/deploy.sh

My deploy script is very simple, it is based on a Python image, so Python and pip are already installed. From there, installing awscli is a breeze, and we only need to sync our local directory with S3 and invalidate the Cloudfront cache.

#!/bin/bash
echo "Running deploy"

echo "Install aws-cli"
pip install awscli --upgrade --user

echo "Beginning deploy"
~/.local/bin/aws s3 sync ./build/_site s3://<bucket_name>
~/.local/bin/aws cloudfront create-invalidation --distribution-id <distribution_id> --paths /\*

future ideas

These changes go a long way in helping me post often and without much effort, but there are a few more nice-to-haves that I’ll save for another weekend.

I would like to have a command-line spelling/grammar check built into my testing workflows. The only way I do it now is to copy and paste each blog post into an online editor once before posting.

Cloudfront logs all of its requests to file before compressing and storing them in a S3 bucket. While I already use tools like Google Analytics and Gaug.es, it would be great practice for my sed, awk and grep skills to build some basic analyzing scripts for my Makefile.

Version Control with Flyway

Sun, 07 Jan 2018 12:00:00 -0600

It’s been a while since my last post with the main reason being that things have gotten really busy at amper. In the past couple months, we’ve hired another software engineer and a data scientist. As you can imagine, moving from engineers working solo to teams requires more processes and tools to help maintain order and keep people from stepping on toes.

From the beginning, we’ve used things like continuous integration, unit-tests and version control for our code. Something that we’ve put off is applying similar principles to our database. We use a Postgres RDS instance hosted by AWS, and connect many of our services to it. When we’ve needed to make changes, it was a matter of jumping on the instance and manually writing SQL for table/index modifications. This has been fine with only two people on the team, but not with a team of six.

Growing our team is the main reason why we’ve started version controlling our database, but it’s not the only one. When I got started with this concept, I found Jeff Atwood’s blog post on version controlling the database to be very helpful. There are other resources as well that give some solid reasons for moving in that direction, and I won’t rehash them here.

We ended up going with flyway, a Java-based tool that helps with this process. It is built on using simple SQL scripts that are numbered and get applied sequentially to bring database schemas up-to-date. The documentation was decent at explaining the core functionality and usage of the tool, but I found resources for using flyway in a production environment to be lacking. The rest of this blog post will be about how amper uses flyway and integrates it into our workflows. I don’t claim that our usage is the standard, but it has been useful in getting ourselves up and running!

flyway in practice

One of the first confusing things that tripped us up was figuring out how to structure our repository. When you download flyway for the first time, it comes with many directories: some for config files, jars, sql and more. This is what the structure of our repository looks like. I’ll briefly go over what each section is responsible for below.

.circleci/  # holds our circleci build configurations
  config.yml
conf/  # holds all of our configuration files for database locations
  factory_dev.conf
  factory_prod.conf
  factory_test_ci.conf
  flyway.conf
seed/  # for storing timestamped dump data when developing
  11_30_17_dump.sql
sql/  # where our sql migrations get stored
  V001__sample_sql.sql
  V002__another_sample.sql
  V003__more_sql.sql
users/  # sql that holds the user accounts
  admin.sql
.gitignore
README.md
initialize.sh  # used to set up the db locally
install_flyway.sh  # used to download, unzip and setup flyway
run_ci.sh  # used by circleci to run sql against test db
seed.sh  # adds the seed data to the local dev db

configuration files

The first important directory is conf/. Inside that directory, you’ll find multiple configuration files that specify different database logins depending on their name. For example, here is what my factory_dev.conf file looks like:

flyway.url=jdbc:postgresql://localhost:5432/dev_db
flyway.user=postgres
flyway.locations=filesystem:sql

These configuration files are pretty simple, and if you want to read more about configuration files, you can find documentation here. We use a separate configuration file for each development environment in order to make it clear when we are running migrations.

sql migrations

The next important directory is sql/ - this directory is more self-explanatory. It holds the numbered sql files that flyway uses to migrate your database. The flyway documentation is pretty clear about how this works, so I’ll leave that to you to figure out.

user Configuration

Inside users/ we store sql scripts that hold the configuration for consistent users for ourselves and our services. I create new credentials for each service, along with fine-grained permissions based on what each service needs.

/* user accounts */
CREATE USER admin_user WITH PASSWORD 'fake_pass';
CREATE USER user1 WITH PASSWORD 'fake_pass';
CREATE USER service1 WITH PASSWORD 'fake_pass';

/* give all priv's to admin_user */
GRANT ALL PRIVILEGES ON DATABASE "test_db" to admin_user;

/* give read-only privileges to user1 */
GRANT SELECT ON ALL TABLES IN SCHEMA test_schema TO user1;
ALTER DEFAULT PRIVILEGES IN SCHEMA test_schema GRANT SELECT ON TABLES TO user1;

/* give read/write privileges to service1 */
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA test_schema TO service1;
ALTER DEFAULT PRIVILEGES IN SCHEMA test_schema GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO service1;

We keep this outside our SQL migrations to allow for us to maintain consistent permissions across all of our servers. The credentials generated here are what the configuration files use to connect/run migrations, so it’s a bit of a chicken-and-egg problem. The initialize.sh script in the root of our directory automatically runs these permissions commands to save some time and psql syntax lookups. This is definitely one of the more experimental aspects of our flyway usage, so we’ll see how this evolves over time.

scripting

The rest of the contents in this directory are helper scripts that make common actions a bit easier. initialize.sh is used to ease getting a local database up and running locally. seed.sh makes it easy to load in seed data from the seeds/ directory. We also include a script that our continous integration tools use to automatically validate new migrations.

practical workflows

With the repository structure set up, let’s go over how we actually use flyway. I’ll go over two basic workflows, 1) initializing a local development database and 2) making changes to the production database schema.

local setup

When a new developer joins the team, or we’re setting up a new development machine, we follow this workflow. It’s not completely automated, but the most critical parts have been to help reduce human error.

Set up a local instance of Postgres, this can be done with Docker, brew or any other packages.
Manually create a database with the name that your dev configuration uses.
Run the initialize.sh script which takes care of user and schema initialization.
Migrate the clean database to the current production schema with flyway migrate.
Optional: Add some seed data to the database by running the script seed.sh.

As you can see, it’s pretty simple, and it handles most of the complicated aspects for you. Once you have a database installed and running, our process takes care of almost all of it for you! We also use a similar flow when we want to blow our local DB and bring it back up to latest, especially after lots of testing.

making changes

The next most common workflow is actually making changes to the database. These steps assume that you already have the latest production schema running in your local database.

Checkout a new branch in the database repo.
Using psql or your favorite DB admin tool, modify the database to fit your requirements. Be sure to keep track of exactly what you did if you spent a bunch of time experimenting.
Place all of your new changes in a sql file, and be sure to name it following the flyway convention. By default, it requires numerically-ordered files that look like this: V001__some_change.sql. Add a new file and increment the latest version so that flyway can pick it up.
Open a pull-request, and let the CI server pick up your changes and ensure that your sql runs without error.
Merge the pull-request, and then run flyway migrate against your production database.

This flow is also pretty simple, it makes it very clear for everyone to review what exactly you’re doing to the database. And while we don’t have complex migration validation with our CI (it uses an empty DB), we can at least validate that it is valid sql on an outside machine. Next, I’ll share a little more about how we’ve set up our testing flow.

continuous testing

Now that we can programmatically migrate our databases, the next step is to hook it up to some continuous tools to help validate your builds on a clean server. Our CI testing flow is pretty naive so it doesn’t ensure that existing data can be migrated, but it helps validate that the SQL is valid and can be run against the existing schema.

We keep a tiny RDS instance running at all times. When a new commit is pushed to a remote branch, we run flyway clean and then flyway migrate against the test db. By clearing any existing schemas, this lets us be sure that the SQL we wrote will work against what exactly is in production. Like mentioned before, this doesn’t migrate with seed data, but it gives us enough confidence that we aren’t missing anything obvious.

I elected not to automatically deploy to production on merges, as there might be cases where we want to carry out additional spot testing. At our team-size, this hasn’t proven to be an issue as we aren’t altering the database multiple times a day. If you elect to move forward with automatic production deploys, it might be worth investing in automating testing of the existing APIs against the new schema to make sure there it is compatible.

looking forward

So far, I’ve described our simple processes for managing our database schemas. For a small team, it works decently well, and helps keep everyone on the same page as we add, refactor, and remove old datastores. It also helps us find errors and easily see a working history of our databases’ evolution over time. That being said, there are a few areas that we’d like to improve as time permits:

improving the setup process

When onboarding new developers and setting up new machines, the setup process above is a bit complex and requires many steps. It is also quite easy to botch it if one isn’t careful. In the future we’d like to simplify the process, and also highlight exactly how flyway commands help us manage our schemas.

automating seed data

The seed data we currently have in our repo is manually generated using pgdump and the --schema-only flag. As you can imagine, this gets out of date as our schema evolves and requires someone to occasionally bump this. In an ideal world, we would have an automated weekly job that dumps production data into an S3 bucket. Each snapshot would be tagged with the current schema version, and when a seed command gets run, the tool would reconcile the schema version and find the latest valid seed snapshot.

ci testing with real data

Similar to the above point, we would want our CI testing to also get seeded with the most recent production data. Once we’ve validated that our latest migration works against a prod mirror, we’d like to run integration tests from our API to ensure that it has the correct code needed for the new schema.

If you have ideas, feedback or more questions about how we do simple version control, feel free to reach out! When we developed this process, there wasn’t as much helpful documentation on the web as I thought there might be, so hopefully this gives you a concrete example. Good luck!

Philip House

Encrypting Existing RDS Instances

Migration Overview

Detailed Walkthrough

Step 1: Encrypting the Source Snapshot

Step 2: Creating the Encrypted Target Instance

Step 3: Preparing Target Instance for Replication

Step 4: Configuring and Running the DMS Task

Step 5: Stopping Writes on the Source Instance

Step 6: Restoring Foreign Keys, Triggers and Sequences

Step 7: Ending Downtime, Cleaning Up

Random Notes

Why not use session_replication_role to disable foreign keys?

How do I migrate my source database replicas?

Deploying Static Sites on AWS with Terraform

Terraform Introduction

AWS Resources

1. S3 Buckets

2. CloudFront Distributions

3. Route53 routes

4. IAM Policies

5. AWS Certificate Manager

Prerequisites

Writing the Plan

Deployments

Final Notes

Homelab Log: #002

Home Security Camera Network

Physical Installation

Network Setup

Next Steps

Homelab Log: #001

Office Cabinets

Recovering Data from a Failed NTFS Drive

cloning the damaged drive

fixing the drive structure

recovering data

conclusion

Configuring the Raspberry Pi Zero W

setup

flashing Raspbian lite

boot up the pi

configuring the OS

looking ahead

Logging iTerm2 Activity

API Introduction

Logging Sessions

Next Steps

Building a Desktop Linux PC

goals

the build

parts list

order of operations

Linux Install

Lessons Learned

configuration management

stability and burn-in

memtest86

prime95

backup and storage

last thoughts

Migrating Hosting to S3 and Cloudfront

development and writing

serverless hosting with s3 and cloudfront

continuous integration and deployment

future ideas

Version Control with Flyway

flyway in practice

configuration files

sql migrations

user Configuration

scripting

practical workflows

local setup

making changes

continuous testing

looking forward

improving the setup process

automating seed data

ci testing with real data

Why not use `session_replication_role` to disable foreign keys?