Merge pull request #621 from lostella/fix-typos

oxinabox · web-flow · commit caf8692ca1bf · 2023-05-22T12:12:47.000+08:00
Fix typos in documentation
diff --git a/docs/make.jl b/docs/make.jl
@@ -68,7 +68,7 @@ makedocs(;
         ],
         "How to support ChainRules rules as an AD package author" => [
             "Usage in AD" => "ad_author/use_in_ad_system.md",
-            "Suport calling back into ADs" => "ad_author/call_back_into_ad.md",
+            "Support calling back into ADs" => "ad_author/call_back_into_ad.md",
             "Support opting out of rules" => "ad_author/opt_out.md",
         ],
         "The maths" => [
diff --git a/docs/src/design/changing_the_primal.md b/docs/src/design/changing_the_primal.md
@@ -429,7 +429,7 @@ An interesting scenario here that may be of concern to some:
 if Julia changes the algorithm it uses to compute `exp(::Matrix)`, then during an AD primal pass, it will continue to use the old Padé approximation based algorithm.
 This may actually happen, as there are many other algorithms that can compute the matrix exponential.
 Further, perhaps there might be an improvement to the exact coefficient or cut-offs used by Julia's current Padé approximation.
-If Julia made this change it wopuld not be considered breaking.
+If Julia made this change it would not be considered breaking.
 [Exact floating point numerical values are not generally considered part of the SemVer-bound API](http://colprac.sciml.ai/#changes-that-are-not-considered-breaking).
 Rather only the general accuracy of the computed value relative to the true mathematical value (e.g. for common scalar operations Julia promises 1 [ULP](https://en.wikipedia.org/wiki/Unit_in_the_last_place)).
 
diff --git a/docs/src/design/many_tangents.md b/docs/src/design/many_tangents.md
@@ -163,7 +163,7 @@ Semantically we can handle these very easily in julia.
 Just put in a few more dispatching on `+`.
 Multiple-dispatch is great like that.
 The down-side is our type-inference becomes hard.
-If you have exactly 1 tangent type for each primal type, you can very easily workout what all the types on your reverse pass will be - you don't really need type inference - but you lose so much expressibility.
+If you have exactly 1 tangent type for each primal type, you can very easily work out what all the types on your reverse pass will be - you don't really need type inference - but you lose so much expressiveness.
 
 ## Appendix: What Swift does
 
diff --git a/docs/src/maths/nondiff_points.md b/docs/src/maths/nondiff_points.md
@@ -112,7 +112,7 @@ Here we have no real choice but to say the derivative at `0` is `Inf`.
 We could consider as an alternative saying some large but finite value.
 However, if too large it will just overflow rapidly anyway; and if too small it will not dominate over finite terms.
 It is not possible to find a given value that is always large enough.
-Our alternatives  woud be to consider the derivative at `nextfloat(0.0)` or `prevfloat(0.0)`.
+Our alternatives would be to consider the derivative at `nextfloat(0.0)` or `prevfloat(0.0)`.
 But this is more or less the same as choosing some large value -- in this case an extremely large value that will rapidly overflow.
 
 
diff --git a/docs/src/rule_author/converting_zygoterules.md b/docs/src/rule_author/converting_zygoterules.md
@@ -47,7 +47,7 @@ ChainRules as a philosophy avoids magic as much as possible, and thus require yo
 If it is a plain function (like `typeof(sin)`), then the tangent will be [`NoTangent`](@ref).
 
 
-[^1]: unless you write it in functor form (i.e. `@adjoint (f::MyType)(args...)=...`), in that case like for `rrule` you need to include it explictly.
+[^1]: unless you write it in functor form (i.e. `@adjoint (f::MyType)(args...)=...`), in that case like for `rrule` you need to include it explicitly.
 
 ## Tangent Type changes
 ChainRules uses tangent types that must represent vector spaces (i.e. tangent spaces).
diff --git a/docs/src/rule_author/example.md b/docs/src/rule_author/example.md
@@ -91,7 +91,7 @@ The tangent of the field `c` is `ZeroTangent()`, because `c` can be perturbed bu
 ```
 The tangent of `b` is `foo.A' * ȳ`, but we have wrapped it into a `Thunk`, a tangent type that represents delayed computation.
 The idea is that in case the tangent is not used anywhere, the computation never happens.
-Use [`InplaceableThunk`](@ref) if you are interested in [accumulating gradients inplace](@ref grad_acc).
+Use [`InplaceableThunk`](@ref) if you are interested in [accumulating gradients in-place](@ref grad_acc).
 Note that in practice one would also `@thunk` the `f̄oo.A` tangent, but it was omitted in this example for clarity.
 
 As a final note, since `b` is an `AbstractArray`, its tangent `b̄` should be projected to the right subspace.
diff --git a/docs/src/rule_author/intro.md b/docs/src/rule_author/intro.md
@@ -12,4 +12,4 @@ However:
 - If you are writing rules with abstractly typed arguments, read about [`ProjectTo`](@ref projectto).
 - If you want to opt out of using the abstractly typed rule for certain argument types, read about [`@opt_out`](@ref opt_out).
 - If you are writing rules for higher order functions, read about [calling back into AD](@ref config).
-- If you want to accumulate gradients inplace to avoid extra allocations, read about [gradient accumulation](@ref grad_acc).
+- If you want to accumulate gradients in-place to avoid extra allocations, read about [gradient accumulation](@ref grad_acc).
diff --git a/docs/src/rule_author/superpowers/opt_out.md b/docs/src/rule_author/superpowers/opt_out.md
@@ -32,7 +32,7 @@ Thus the sum is always going to be zero.
 As such the author of that matrix type would probably have overloaded `sum(x::SkewSymmetric{T}) where T = zero(T)`.
 ADing this would result in the tangent computed for `x` as `ZeroTangent()` and it would be very fast since AD can see that `x` is never used in the right-hand side.
 In contrast the generic method for `AbstractArray` defined above would have to allocate the fill array, and then compute the skew projection.
-Only to findout the output would be projected to `SkewSymmetric(zeros(T))` anyway (slower, and a less useful type).
+Only to find out the output would be projected to `SkewSymmetric(zeros(T))` anyway (slower, and a less useful type).
 
 To opt-out of using the generic `rrule` and to allow the AD system to do its own thing we use the
 [`@opt_out`](@ref) macro, to say to not use it for sum of `SkewSymmetric`.
@@ -41,7 +41,7 @@ To opt-out of using the generic `rrule` and to allow the AD system to do its own
 @opt_out rrule(::typeof(sum), ::SkewSymmetric)
 ```
 
-Perhaps we might not want to ever use rules for SkewSymmetric, because we have determined that it is always better to leave it to the AD, unless a verys specific rule has been written[^1].
+Perhaps we might not want to ever use rules for SkewSymmetric, because we have determined that it is always better to leave it to the AD, unless a very specific rule has been written[^1].
 We could then opt-out for all 1 arg functions.
 ```@julia
 @opt_out rrule(::Any, ::SkewSymmetric)
diff --git a/docs/src/rule_author/which_functions_need_rules.md b/docs/src/rule_author/which_functions_need_rules.md
@@ -147,10 +147,10 @@ julia> @btime gradient(mse, $y, $ŷ)
   143.697 ns (2 allocations: 672 bytes)
 ```
 
-#### Inplace accumulation
+#### In-place accumulation
 
-Inplace accumulation of gradients is slow in `Zygote`.
-The issue, demonstrated in the folowing example, is that the gradient of `getindex` allocates an array of zeros with a single non-zero element. 
+In-place accumulation of gradients is slow in `Zygote`.
+The issue, demonstrated in the following example, is that the gradient of `getindex` allocates an array of zeros with a single non-zero element. 
 ```julia
 function sum3(array)
     x = array[1]