diff --git a/website/docs/_posts/2017-07-26-dependency-pinning.md b/website/docs/_posts/2017-07-26-dependency-pinning.md index 1f4b0ddd9a0c3735900f5ec3d9cc1186987c412a..d8f2b701acc7330652302296c5e52710a3cc7b5c 100644 --- a/website/docs/_posts/2017-07-26-dependency-pinning.md +++ b/website/docs/_posts/2017-07-26-dependency-pinning.md @@ -14,7 +14,7 @@ Once you start using a tool/service like Renovate, probably the biggest decision To ensure we're all talking about the same thing, it's important to define exactly what we mean by dependency "pinning". -Typically, projects use semver ranges in their `package.json`. For instance, if you run `npm i foobar` you will see an entry like `"foobar": "^1.1.0"` added to your `package.json`. Verbosely, this means "any foobar version greater than or equal to 1.1.0 but less than 2". Therefore the project will automatically use 1.1.1 if it's released, or 1.2.0, or 1.2.1, etc - meaning you will get not only patch updates but also feature (minor) releases too. +Historically, projects use semver ranges in their `package.json`. For instance, if you run `npm install foobar` you will see an entry like `"foobar": "^1.1.0"` added to your `package.json`. Verbosely, this means "any foobar version greater than or equal to 1.1.0 but less than 2". Therefore the project will automatically use 1.1.1 if it's released, or 1.2.0, or 1.2.1, etc - meaning you will get not only patch updates but also feature (minor) releases too. Another alternative is ranges like `"foobar": "~1.1.0"` which means "any foobar version greater than or equal to 1.1.0 but less than 1.2". This narrows the range to only patch updates to the 1.1 range. @@ -22,7 +22,7 @@ If instead you "pin" your dependencies rather than use ranges, it means you use ## Why use ranges? -For projects of any type, the main reason to use ranges is so that you can "automatically" get bug fixes - which may even include security fixes. By "automatically", we mean that any time you run `npm install` you will get the very latest version matching your semver - assuming you're not using a lock file, that is. +For projects of any type, the main reason to use ranges is so that you can "automatically" get updated releases - which may even include security fixes. By "automatically", we mean that any time you run `npm install` you will get the very latest version matching your semver - assuming you're not using a lock file, that is. #### Tilde vs Caret @@ -32,27 +32,27 @@ If you're familiar with the theory of semver, you might think that you only need A second reason for using ranges applies to "libaries" that are published as npm packages with the intention that they are used/`require()`'d by other packages. In this case, it is usually a bad idea to pin all your dependencies because it will introduce an unnecessarily narrow range (one release!) and cause most users of your package to bloat their `node_modules` with duplicates. -For example, you might have pinned `foobar` to version `1.1.0` and another author pinned his/her `foobar` to dependency to `1.2.2`. Any user of both your packages will end up with npm attempting to install two separate versions of `foobar`, which might not even work. Even if both projects use a service like Renovate to keep their pinned dependencies up to date with the very latest versions, it's still not a good idea - there will always be times when one package has updated/released before the other one and they will be out of sync. e.g. there might be a space of 30 minutes where your package specifies foobar `1.1.0` and the other one specifies `1.1.1`. +For example, you might have pinned `foobar` to version `1.1.0` and another author pinned his/her `foobar` to dependency to `1.2.2`. Any user of both your packages will end up with npm attempting to install two separate versions of `foobar`, which might not even work. Even if both projects use a service like Renovate to keep their pinned dependencies up to date with the very latest versions, it's still not a good idea - there will always be times when one package has updated/released before the other one and they will be out of sync. e.g. there might be a space of 30 minutes where your package specifies foobar `1.1.0` and the other one specifies `1.1.1` and your joint downstream users end up with a duplicate. ## Why pin dependencies? -The main reason to pin dependencies is "certainty". When you pin dependencies, you know exactly which version of each dependency is installed at any time. This benefits when upgrading versions as well as when rolling back in case of problems. +You mainly pin versions for certainty, an visibility. When you have a pinned version of each dependency in your `package.json`, you know exactly which version of each dependency is installed at any time. This benefits when upgrading versions as well as when rolling back in case of problems. Note: we'll cover lock files later, don't worry. #### Upgrading pinned versions Let's say that a "faulty" version `1.2.0` of `foobar` is released and it breaks one of your tests. -If you were using default caret semver ranges, then your `master` branch is now "broken" because its `package.json` says that any version 1.x above 1.1.0 is acceptable, and npm will choose the latest (`1.2.0`). You would need to manually check and work out which dependency caused the failure (`foobar` may not have been the only dependency to have "automatically" upgraded) and then you would need to pin the dependency yourself to stop `npm` installing `1.2.0`. +If you were using default caret semver ranges, then your `master` branch is now "broken" because its `package.json` says that any version 1.x above 1.1.0 is acceptable, and npm will choose the latest (`1.2.0`). You would need to manually check and work out which dependency caused the failure (`foobar` may not have been the only dependency to have "automatically" upgraded since the last time your tests passed) and then you would need to pin the dependency yourself to stop `npm` installing `1.2.0`. -Consider if instead you were _pinning_ dependency versions and the same thing happened. Your `master` would not be broken because it's pinned to `foobar@1.1.0` - instead you'd just have a Pull Request for upgrading to `foobar@1.2.0` which would fail. You'd know not to merge it and can wait for `foobar@1.2.1` or later when it's fixed. +Consider the same situation if instead you were _pinning_ dependency versions. Your `master` would not be broken because it's pinned to `foobar@1.1.0` - instead you'd just have a Pull Request for upgrading to `foobar@1.2.0` which would fail. You'd know not to merge it and can wait for `foobar@1.2.1` or later when it's fixed. Therefore you know exactly what you're running and you know exactl9y what failed - you have great "visibility". Now consider a similar theoretical scenario where `foobar@1.2.0` is faulty but it is _not_ caught by any of your automated tests. This is more common and more dangerous. -If you were using semver ranges then this new version of `foobar` will likely be deployed to production before you notice errors and need to roll back. Like before, you need to manually work out which dependency caused it and pin it manually by editing `package.json` one dependency at a time. +If you were using semver ranges then this new version of `foobar` will likely be deployed to production automatically one day, sometime after which you notice errors and realise you need to fix it. Like before, you need to manually work out which dependency caused it - assuming you guess correctly that it was a new dependency version at fault - and pin it manually by editing `package.json` one dependency at a time. -If you were instead pinning `foobar` then you would receive a PR for `foobar@1.2.0` which awaits your approval. So first of all, you can choose to read the release notes and/or visually inspect the branch yourself before merging, hopefully saving you from pushing this faulty code to production. +Alternatively, if you were instead pinning `foobar` then you would receive a PR for `foobar@1.2.0` which awaits your approval. So first of all, you can choose to read the release notes and/or visually inspect the branch yourself before merging, hopefully saving you from pushing this faulty code to production. -If you did not catch the fault before merging, you are still better off with a pinned version. If you discover something wrong in production, you can easily "roll back" commits in your development environment until you find which it was. Then you can simply revert that commit (updating `foobar@1.1.0` to `foobar@1.2.0`) and push that to `master`. When the next release of `foobar` comes out (e.g. `1.2.1`) you will be prompted with a new PR and hopefully inspect it carefully this time before merge! +If you did not catch the fault before merging, you are still better off with a pinned version. If you discover something wrong in production, you can easily "roll back" commits in your development environment until you find which rollback fixes the problem. Then you can simply revert that commit (reversing `foobar@1.1.0` -> `foobar@1.2.0`) and push that to `master`. When the next release of `foobar` comes out (e.g. `1.2.1`) you will be prompted with a new PR and hopefully inspect it carefully this time before merge! As you can see in the above, pinning dependencies makes your build more consistent and predictable as a developer. @@ -68,7 +68,7 @@ The increased volume of Pull Requests for upgrading dependencies may be consider There are some dependencies that either (a) don't have the potential to break something in production, or (b) are fully tested by your tests. -For example, it's essentially impossible for `eslint` to break anything in production. If your build/tests pass, then you are fine. Therefore you should consider enabling automerge for all lint packages to save yourself the pointless click. In this case you might wake up to 5/10 of your overnight Pull Requests having already merged themselves. +For example, it's very hard for `eslint` to break anything in production. If your build/tests pass, then you are fine. Therefore you should consider enabling automerge for all lint packages to save yourself the pointless click when you manually approve them each time. In this case you might wake up to 5/10 of your overnight Pull Requests having already merged themselves. Another example of a good candidate for automerging might be a database driver like `node-postgres` (`pg` on npm), if you have 100% test coverage of your API. In that case if the `pg` package has a minor or patch update and passes all tests then you may as well merge it automatically if you were not going to do a manual inspection anyway. @@ -98,7 +98,7 @@ Although it's good to isolate each dependency update for ease of troubleshooting ## Pinning Dependencies and Lock Files -Since both `yarn` and `npm@5` now support lock files, it's a common question to ask "Why should I pin dependencies if I'm already using a lock file?". It's a good question! +Since both `yarn` and `npm@5` both support lock files, it's a common question to ask "Why should I pin dependencies if I'm already using a lock file?". It's a good question!  @@ -112,15 +112,43 @@ To reuse an earlier example, this means that you could have `foobar@^1.1.0` in y #### What a lock file doesn't do for you -The lock file has only delayed the inevitable break. +The lock file has only delayed the inevitable problem, and provides much less visibility than `package.json`, because it's not designed to be human readable and is quite dense.  -As soon as anyone needs to update the lock file (e.g. to add a new dependency, update a feature release of an existing dependency, or simply to refresh the lock file in order to get important patch updates), then your build will then break, because `foobar@1.2.0` will get installed. Whoever is in that process of updating the `package.json` or refreshing the lock file will be left wondering if it was their change that did it, or something else (the answer is probably: "something else"). And they still need to do what we described earlier before lock files and check dependency-by-dependency and version-by-version until they find which dependency broke. It doesn't make sense that every developer needs to be proficient in this type of troubleshooting just because they are the unlucky person who updates `package.json` the first time after a bad dependency is released. +If the `package.json` contains a range, and a new in-range version is released that would break the build, then essentially your `package.json` is in a state of "broken", even if the lock file is still holding things together. -Once again, if `foobar` had been pinned to `1.1.0` then it would never have been upgraded to the broken `1.2.0` version "by accident" and rolling it back would again be a matter of reverting the offending commit and regenerating the lock file. New features or updates would not have been held back because of this. +The upside is that the lockfile will hold back `foobar` to `1.1.0` unless it's forced to upgrade, so the break is postponed. The downside is _how_ you will discover the break eventually. -Essentially, the lock file does not solve the same semver problems that pinning solves - but it compliments it. For this reason we recommend using a lock file regardless of whether you pin dependencies or not. +The easiest case is if you for some reason _need_ to upgrade `foobar`, e.g. for a new feature it contains, so you migth run something like `yarn upgrade foobar` to update the lockfile and either discover the break during your development or when you push your new development to CI for testing. In this case, hopefully you'll guess it's `foobar` that broke it and not your own code. + +Alternatively, maybe someone thinks "This lockfile is probably really out of date and might be missing some essential patches" and decides to `yarn upgrade` the whole thing in one go. No doubt the diff will be full of green and red as many direct and indirect dependencies will have changed versions. Then it's pushed to CI for testing, fails, and you have to guess which of the changes caused it until you hit on `foobar`. Perhaps by manually looking through the lock file diffs line by line if you're unlucky and it's not obvious. Maybe dep `blahblah` also broke, to make it even harder. + +By ceding control of direct dependency versions to the lock file, you have lost the ability to _know_ when things are updated. You also may be missing out on really important patches you're not even aware of, because they're "in range" yet locked back to vulnerable or buggy versions in the lock file. + +Further, there's no obvious reason or driver for when _any_ + +Once again, think if instead `foobar` had been pinned to `1.1.0` in addition to the lockfile. The (broken) upgrade to `1.2.0` would have been explicitly proposed to you via a Renovate PR, you would see the break, and know that the version is bad. Meanwhile you could be upgrading all the other essential fixes of other dependencies without worrying about `foobar`. You could even be running `yarn upgrade` regularly to be getting _indirect_ package updates in the lockfile and seeing if everything still passes. + +Essentially, the lock file does not solve the same semver problems that pinning solves - but it compliments it. For this reason our usual recommendation using a lock file regardless of whether you pin dependencies or not, and pinning even if you have a lock file. + +Don't forget though that our motto is "Flexible, so you don't need to be", so go ahead and configure however you want. Also, we're open to ideas for how to make lock file updates more "visible" too. e.g. are you interested in a Renovate feature where you get a lockfile-only PR any time a direct dependency gets an in-range update? + +## What about indirect/sub-dependencies? + +A good argument made by [@LinuxU](https://github.com/LinusU) is: + +> pinning will only protect you against breakage in a, in many cases, small percentage of your packages. If you for example have installed express and pinned it, you will only protect yourself against a bad express release, it will not help with the 30 dependencies that express has. + +> Because of this, I personally think that pinning dependencies just creates a false sense of security, and that using a lock file is superior in every way. + +It is true that pinning applies only to direct dependencies, and "indirect" dependencies typically count for a lot more in total in your lockfile than direct. e.g. those 30 ones that Express relies on. + +Does pinning give you "increased" security? Undeniably. The question is not whether it does, but whether that increased security comes at a cost (e.g. "noise"). But Linus also points out that a _false_ sense of security is a cost too. + +Don't forget that there is some form of transitive trust too. You need to pick your direct dependencies carefully, and which versions of them you use. Hopefully in doing that you pick dependencies partly for how well _they_ look after their own dependencies and versions (e.g. do they have good enough test coverage, do they use something like Renovate to keep updated, etc?). So the reality is that even if 90% of the entries in your lock file are indirect dependencies, those are ones you have somewhat "delegated" responsibility for to your dependencies. e.g. I'd hope that Express are even better at watching their deps for breaks than I am, to use the example above. + +But certainly "does it give a false sense of securty" is not a question we can really answer quantifiably. ## So what's best? @@ -128,7 +156,13 @@ We recommend: 1. Any apps (web or node.js) that aren't `require()`'d by other packages should pin all types of dependencies for greatest reliability/predictability. 2. Browser or dual browser/node.js libraries that are consumed/`required()`'d by others should keep using semver ranges for `dependencies` but can use pinned dependencies for `devDependencies`. -3. Node.js-only libraries can consider pinning all dependencies, because application size/duplicate dependencies are not as much a concern in node.js compared to the browser. -4. Use a lock file if you can. +3. Node.js-only libraries can consider pinning all dependencies, because application size/duplicate dependencies are not as much a concern in node.js compared to the browser. Of course, don't do that if your library is a micro one likely to be consumed in disk-sensitive environments. +4. Use a lock file. As noted earlier, when you pin dependencies then you will see an increase in the raw volume of dependency updates, compared to if you use ranges. If/when this starts bothering you, add Renovate rules to reduce the volume, such as scheduling updates, grouping them, or automerging "safe" ones. + +## References + +This is a "living" document and we plan to update it whenever we think of something new or someone makes a valid point we've missed or misunderstood. [GitHub Location](https://github.com/renovateapp/renovate/blob/master/website/docs/_posts/2017-07-26-dependency-pinning.md) + +Updated 2018-01-19 after [excellent feedback on lockfiles](https://github.com/commitizen/cz-conventional-changelog-default-export/pull/4#issuecomment-358038966) by [@LinusU](https://github.com/LinusU)