diff --git a/.gitignore b/.gitignore index ce896678be..7e20c7b4ee 100644 --- a/.gitignore +++ b/.gitignore @@ -101,4 +101,4 @@ service-definitions.yml /config/local*.yml # Template for the local runtime configuration. -!/config/local.template.yml +!/config/local*.template.yml diff --git a/config/local-shields-io-production.template.yml b/config/local-shields-io-production.template.yml new file mode 100644 index 0000000000..d891882a23 --- /dev/null +++ b/config/local-shields-io-production.template.yml @@ -0,0 +1,10 @@ +private: + bintray_user: ... + bintray_apikey: ... + gh_client_id: ... + gh_client_secret: ... + sentry_dsn: ... + shields_secret: ... + sl_insight_userUuid: ... + sl_insight_apiToken: ... + wheelmap_token: ... diff --git a/doc/production-hosting.md b/doc/production-hosting.md index f48a6d7a20..d1916e25cd 100644 --- a/doc/production-hosting.md +++ b/doc/production-hosting.md @@ -1,37 +1,245 @@ # Production hosting -In production, a deploy commit checks in two config files: +[![operations issues](https://img.shields.io/github/issues/badges/shields/operations.svg?label=open%20operations%20issues)][operations issues] -- `.env` -- `config/local-shields-io-production.yml` +[#ops chat room][ops discord] -The `.env` file sets `NODE_CONFIG_ENV` which bootstraps the configuration process. The rest of the configuration is loaded from three sources: +[operations issues]: https://github.com/badges/shields/issues?q=is%3Aissue+is%3Aopen+label%3Aoperations +[ops discord]: https://discordapp.com/channels/308323056592486420/480747695879749633 -- `config/local-shields-io-production.yml` (secrets) -- [`config/shields-io-production.yml`](../config/shields-io-production.yml) (non-secrets) -- [`config/default.yml`](../config/default.yml) +| Component | Subcomponent | People with access | +| -------------- | --------------- | ------------------------------------------------------------------------------------------ | +| Badge servers | Account owner | @espadrine | +| Badge servers | ssh, logs | @espadrine | +| Badge servers | Deployment | @espadrine, @paulmelnikow | +| Badge servers | Admin endpoints | @espadrine, @paulmelnikow | +| Cloudflare | Account owner | @espadrine | +| Cloudflare | Admin access | @espadrine, @paulmelnikow | +| GitHub | OAuth app | @espadrine ([could be transferred to the badges org][oauth transfer]) | +| DNS | Account owner | @olivierlacan | +| Sentry | Error reports | @espadrine, @paulmelnikow | +| Frontend | Deployment | Technically anyone with push access but in practice must be deployed with the badge server | +| Metrics server | Owner | @platan | +| UptimeRobot | Account owner | @paulmelnikow | +| More metrics | Owner | @RedSparr0w | -These settings are currently set in `config/local-shields-io-production.yml`: +There are [too many bottlenecks][issue 2577]! -- bintray_apikey -- bintray_user -- gh_client_id -- gh_client_secret -- gh_oauth_state -- libraries_io_api_key -- sentry_dsn -- shields_secret -- sl_insight_apiToken -- sl_insight_userUuid -- wheelmap_token +## Badge servers -## Main Server Sysadmin +There are three public badge servers on OVH VPS’s. -- Servers in DNS round-robin: - - s0.shields-server.com: 192.99.59.72 (vps71670.vps.ovh.ca) - - s1.shields-server.com: 51.254.114.150 (vps244529.ovh.net) - - s2.shields-server.com: 149.56.96.133 (vps117870.vps.ovh.ca) -- Self-signed TLS certificates, but `img.shields.io` is behind CloudFlare, which provides signed certificates. -- Using systemd to automatically restart the server when it crashes. +| Cname | Hostname | Type | IP | Location | +| --------------------------- | -------------------- | ---- | -------------- | ------------------ | +| [s0.shields-server.com][s0] | vps71670.vps.ovh.ca | VPS | 192.99.59.72 | Quebec, Canada | +| [s1.shields-server.com][s1] | vps244529.ovh.net | VPS | 51.254.114.150 | Gravelines, France | +| [s2.shields-server.com][s2] | vps117870.vps.ovh.ca | VPS | 149.56.96.133 | Quebec, Canada | -See https://github.com/badges/ServerScript for helper admin scripts. +- These are single-core virtual hosts with 2 GB RAM [VPS SSD 1](). +- The Node version (v9.4.0 at time of writing) and dependency versions on the + servers can be inspected in Sentry, but only when an error occurs. +- The servers use self-signed SSL certificates. ([#1460][issue 1460]) +- After accepting the certificate, you can debug an individual server using + the links above. +- The scripts that start the server live in the [ServerScript][] repo. However + updates must be pulled manually. They are not updated as part of the deploy process. +- The server runs SSH. +- Deploys are made using a git post-receive hook. +- The server uses systemd to automatically restart the server when it crashes. +- Provisioning additional servers is a manual process which is yet to been + documented. + +[s0]: https://s0.shields-server.com/index.html +[s1]: https://s1.shields-server.com/index.html +[s2]: https://s2.shields-server.com/index.html +[vps ssd 1]: https://www.ovh.com/world/vps/vps-ssd.xml +[issue 1460]: https://github.com/badges/shields/issues/1460 +[serverscript]: https://github.com/badges/ServerScript + +## Attached state + +Shields has mercifully little persistent state: + +1. The GitHub tokens we collect are saved on each server in JSON files on disk. + They can be fetched from the [GitHub auth admin endpoint][] for debugging. +2. The analytics data is also saved on each server in JSON files on disk. +3. The server keeps a few caches in memory. These are neither persisted nor + inspectable. + - The [request cache][] + - The [regular-update cache][] + - The [raster cache][] + +[github auth admin endpoint]: https://github.com/badges/shields/blob/master/services/github/auth/admin.js +[request cache]: https://github.com/badges/shields/blob/master/lib/request-handler.js#L29-L30 +[regular-update cache]: https://github.com/badges/shields/blob/master/lib/regular-update.js +[raster cache]: https://github.com/badges/shields/blob/master/gh-badges/lib/svg-to-img.js#L9-L10 +[oauth transfer]: https://developer.github.com/apps/managing-oauth-apps/transferring-ownership-of-an-oauth-app/ + +## Configuration + +To bootstrap the configuration process, +[the script that starts the server][start-shields.sh] sets a single +environment variable: + +``` +NODE_CONFIG_ENV=shields-io-production +``` + +With that variable set, the server ([using `config`][config]) reads these +files: + +- [`local-shields-io-production.yml`][local-shields-io-production.yml]. + This file contains secrets which are checked in with a deploy commit. +- [`shields-io-production.yml`][shields-io-production.yml]. This file + contains non-secrets which are checked in to the main repo. +- [`default.yml`][default.yml]`. This file contains defaults. + +[start-shields.sh]: https://github.com/badges/ServerScript/blob/master/start-shields.sh#L7 +[config]: https://github.com/lorenwest/node-config/wiki/Configuration-Files +[local-shields-io-production.yml]: ../config/local-shields-io-production.example.yml +[shields-io-production.yml]: ../config/shields-io-production.yml +[default.yml]: ../config/default.yml + +The project ships with `dotenv`, however there is no `.env` in production. + +## Badge CDN + +Sitting in front of the three servers is a Cloudflare Free account which +provides several services: + +- Global CDN, caching, and SSL gateway for `img.shields.io` +- Analytics through the Cloudflare dashboard +- DNS hosting for `shields.io` + +Cloudflare is configured to respect the servers' cache headers. + +## Frontend + +The frontend is served by [GitHub Pages][] via the [gh-pages branch][gh-pages]. SSL is enforced. + +`shields.io` resolves to the GitHub Pages hosts. It is not proxied through +Cloudflare. + +Technically any maintainer can push to `gh-pages`, but in practice the frontend must be deployed +with the badge server via the deployment process described below. + +[github pages]: https://pages.github.com/ +[gh-pages]: https://github.com/badges/shields/tree/gh-pages + +## Deployment + +To set things up for deployment: + +1. Get your SSH key added to the server. +2. Clone a fresh copy of the repository, dedicated for deployment. + (Not required, but recommended; and lets you use `npm ci` below.) +3. Add remotes: + +```sh +git remote add s0 root@s0.shields-server.com:/home/m/shields.git +git remote add s1 root@s1.shields-server.com:/home/m/shields.git +git remote add s2 root@s2.shields-server.com:/home/m/shields.git +``` + +`origin` should point to GitHub as usual. + +4. Since the deploy uses `git worktree`, make sure you have git 2.5 or later. + +To deploy: + +1. Use `git fetch` to obtain a current copy of + `local-shields-io-production.yml` from the server (or obtain the current + version of that file some other way). Save it in `config/`. +2. Check out the commit you want to deploy. +3. Run `npm ci`. **This is super important for the frontend build!** +4. Run `make deploy-s0` to make a canary deploy. +5. Check the canary deploy: + - [Visit the server][s0]. Don't forget that most of the preview badges + are static! + - Look for errors in [Sentry][]. + - Keep an eye on the [status page][status]. +6. After a little while (usually 10–60 minutes), finish the deploy: + `make push-s1 push-s2 deploy-gh-pages`. + +To roll back, check out the commit you want to roll back to and repeat those +steps. + +To see which commit is deployed to a server run `git ls-remote` and then +`git log` on the `HEAD` ref. There will be two deploy commits preceded by the +commit which was deployed. + +Be careful not to push the deploy commits to GitHub. + +`make deploy-s0` does the following: + +1. Creates a working tree in `/tmp`. +2. In that tree, runs `features` and `examples` to generate data files + needed for the frontend. +3. Builds and checks in the built frontend. +4. Checks in `local-shields-io-production.yml`. +5. Pushes to s0, which updates dependencies and then restarts itself. + +`make push-s1 push-s2 deploy-gh-pages` does the following: + +1. Pushes the same working tree to s1 and s2. +2. Creates a new working tree for the frontend. +3. Adds a commit cleaning out the index. +4. Adds another commit with the build frontend. +5. Pushes to `gh-pages`. + +## DNS + +I'm not sure where the DNS is registered. + +## Logs + +Logs are available on the individual servers via SSH. + +## Error reporting + +[Error reporting][sentry] is one of the most useful tools we have for monitoring +the server. It's generously donated by [Sentry][sentry home]. We bundle +[`raven`][raven] into the application, and the Sentry DSN is configured via +`local-shields-io-production.yml` (see [documentation][sentry configuration]). + +[sentry]: https://sentry.io/shields/ +[raven]: https://www.npmjs.com/package/raven +[sentry home]: https://sentry.io/shields/ +[sentry configuration]: https://github.com/badges/shields/blob/master/doc/self-hosting.md#sentry + +## Monitoring + +Request performance is monitored in two places: + +- [Status][] (using [UptimeRobot][]) +- [Server metrics][] using Prometheus and Grafana +- [@RedSparr0w's monitor][monitor] which posts [notifications][] to a private + [#monitor chat room][monitor discord] + +Overall server performance is monitored using Prometheus and Grafana. +Coming soon! ([#2068][issue 2068]) + +[status]: https://status.shields.io/ +[server metrics]: https://metrics.shields.io/ +[uptimerobot]: https://uptimerobot.com/ +[monitor]: https://shields.redsparr0w.com/1568/ +[notifications]: http://shields.redsparr0w.com/discord_notification +[monitor discord]: https://discordapp.com/channels/308323056592486420/470700909182320646 +[issue 2068]: https://github.com/badges/shields/issues/2068 + +## Analytics + +The server analytics data is public and can be fetched from the +[analytics endpoint][] or using the [analytics script][]. + +[analytics endpoint]: https://github.com/badges/shields/blob/master/lib/analytics.js +[analytics script]: https://github.com/badges/ServerScript/blob/master/stats.js + +## Known limitations + +1. The only way to inspect the commit on the server is with `git ls-remote`. +2. The production deploy installs `devDependencies`. It does not honor + `package-lock.json`. ([#1988][issue 1988]) + +[issue 2577]: https://github.com/badges/shields/issues/2577 +[issue 1988]: https://github.com/badges/shields/issues/1988