Feature #14183: "Real" keyword argument

Sorry for leaving this ticket. Matz, akr and I talked about this issue several times since the last year, and we have never reached a perfect solution. But I try to re-summarize the problem, current proposal, and migration path.

Problem

The current spec of keyword arguments is broken in several senses.

1. Keyword extension is not always safe

We call "keyword extension" to add a keyword parameter to an existing method.
Unfortunately, keyword extension is not safe when the existing method accepts rest arguments.

def foo(*args)
  p args
end
foo(key: 42) #=> [{:key=>42}]

If we add a new mode to the method, the existing call will break.

def foo(*args, output: $stdout)
  output.puts args.inspect
end
foo(key: 42) #=> unknown keyword: key

Safe keyword extension is a fundamental expectation for keyword arguments, so that is a pity.

2. Explicit Delegation of keywords backfires

You are writing a delegation, and you think of keywords, so you wrote:

def foo(*args, **kw, &blk)
  bar(*args, **kw, &blk)
end

However, this does not work correctly.

def bar(*args)
  p args
end

foo() #=> excepted:[], actual:[{}]

3. There are many unintuitive corner cases

There are many bug reports about keyword arguments. One of the most weird cases:

def foo(opt=42, **kw)
  p [opt, kw]
end

foo({}, **{})  #=> expected:[{}, {}], actual:[42, {}]

All of these issues are caused by the fundamental design flaw of the current keyword arguments which handles a keyword as a last positional argument that is a Hash object. Matz, akr and I have considered these issues seriously. Actually, matz came up with multiple ideas that would be compatible (or mildly incompatible) and solve the issues. However, all of them were proved to be incompatible, complex, and/or not to solve some of the above issues.

Proposal for 3.X semantics

The current proposal consists of two parts:

A) Separate keyword arguments from positional arguments completely
B) Allow non-Symbol keys as a keyword

(A) is the original proposal of this ticket.

  • A keyword argument is passed only by foo(k: 1) or foo(**opt), and accepted only by def foo(k: 1) or def foo(**opt).
  • A positional Hash argument is passed only by foo({ k: 1 }) or foo(opt), and accepted only by def foo(opt) or def foo(opt={}) or def foo(*args)

See the next section in detail.

(B) allows some DSL usages of brace omission:

def where(**kw)
  p kw
end

where("table.id" => 42) #=> {"table.id"=>42}

Actually, this behavior is not new. Ruby 2.0.0-p0 allowed non-Symbol keys.

Typical rewrite cases

This change brings incompatibility, so you need to rewrite existing code. Typical rewrite cases are three (plus one):

1. Accept keywords by **opt, not by opt={}

# NG in 3.X
def foo(opt={})
end

# OK in 3.X
def foo(**opt)
end

2. Pass keywords without braces, or with explicit **

def foo(**opt)
end

# NG in 3.X
foo({ k: 1 })
h = { k: 1 }
foo(h)

# OK in 3.X
foo(k: 1)
foo(**h)

3. Delegate keyword argument explicitly

# NG in 3.X
def foo(*args, &blk)
  bar(*args, &blk)
end

# OK in 3.X
def foo(*args, **kw, &blk)
  bar(*args, **kw, &blk)
end

Plus one. Manually merge the last argument with a keyword argument

If you want to allow both calling styles, you can do it manually.

# NG in 3.X
def foo(opt={})
  p opt
end
foo({ k: 1 }) #=> {:k=>1}
foo(k: 1)     #=> expected:{:k=>1}, actual:error

# OK in 3.X
def foo(opt={}, **kw)
  opt = opt.merge(kw)
  p opt
end
foo({ k: 1 }) #=> {:k=>1}
foo(k: 1)     #=> {:k=>1}

Migration path: 2.7 semantics

Basic approach:

  • If a code is valid (no exception raised) in 3.X, Ruby 2.7 should run it in the same way as 3.X
  • If a code is invalid (an exception raised) in 3.X, Ruby 2.7 should run it in the same way as 2.6, but a warning is printed

Typical examples:

def foo(opt)
end
foo(k: 1) #=> test.rb:3: warning: The keyword argument for `foo' (defined at test.rb:1) is used as the last parameter
def foo(**opt)
end
foo({ k: 1 }) #=> test.rb:3: warning: The last argument for `foo' (defined at test.rb:1) is used as the keyword parameter

These warnings tell users how to fix the source code.

(A naive implementation of this approach is not enough. Very subtle hack is required for delegation. This is explained in the last appendix section.)

Experiment

I have implemented 2.7's candidate semantics:

https://github.com/ruby/ruby/compare/trunk...mame:keyword-argument-separation

And I actually modified the standard libraries and tests to support the keyword argument separation. Many of the changes are one of the three (plus one) typical rewrite cases. There are a few tricky modifications, but in my opinion, almost all of them were trivial.

In addition, I tested an internal Rails app in my company (about 10k lines) with my prototype. Honestly speaking, when running rake spec, it produces about 120k (!) warnings, but there are many duplicated warnings. By removing the duplications, we got about 1k warnings. And, I found that almost all warnings were produced in gems. If we focus on only the application itself, we found only five method definitions to be modified. All fixes were the first typical rewrite case: def foo(opt={}) -> def foo(**opt)). We will need to rewrite some more calls to add an explicit ** if some libraries decided that their APIs only accept keywords.

Appendix: Special frozen Hash object for delegation

Unfortunately, the naive implementation of the migration path is incomplete with regard to delegation.
Consider the following code.

# in 2.7
def f1(k: 1)
  p k
end

def f2(*args)
  p args
end

def dispatch(target, *args, &blk)
  if target == :f1
    f1(*args, &blk)
  else
    f2(*args, &blk)
  end
end

dispatch(:f1, k: 1) #=> 1
#=> t.rb:17: warning: The keyword argument for `dispatch' (defined at t.rb:9) is used as the last parameter
#   t.rb:11: warning: The last argument for `f1' (defined at t.rb:1) is used as the keyword parameter
#   1

dispatch(:f2, 1, 2, 3) #=> [1, 2, 3]

You see a warning, so you rewrite it by explicit keyword delegation:

# in 2.7
def f1(k: 1)
  p k
end

def f2(*args)
  p args
end

def dispatch(target, *args, **kw, &blk)
  if target == :f1
    f1(*args, **kw, &blk)
  else
    f2(*args, **kw, &blk)
  end
end

dispatch(:f1, k: 1)    #=> 1
dispatch(:f2, 1, 2, 3) #=> [1, 2, 3, {}]
#=> t.rb:18: warning: The keyword argument for `f2` (defined at t.rb:4) is used as the last parameter

dispatch(:f1, k: 1) works perfectly with no warnings. However, the result of dispatch(:f2, 1, 2, 3) changed and a new warning is emitted. This is because **kw was automatically converted to a positional argument (due to 2.6 compatibility layer).

To fix this issue, we introduce a Hash flag to distinguish between "no keyword given" and "empty keyword given".

def foo(**kw)
  p kw
end

foo(**{}) #=> {}
foo()     #=> {(NO KEYWORD)}

{} is a normal empty hash object, and {(NO KEYWORD)} is the special empty hash object that represents "no keyword given".

If we pass the flagged empty hash to another method with ** operator, it is omitted.

def bar(*args)
  p args
end

def foo(**kw)
  # kw is {(NO KEYWORD)}
  bar(**kw) # **{(NO KEYWORD)} is equal to nothing: bar()
end

foo({}) #=> [{}]
foo()   #=> []

This is akr's idea that was explained at https://bugs.ruby-lang.org/issues/14183#note-41.

This hack of special empty hash flag is temporal just during the migration. After 3.X completes the separation of keyword arguments, this dirty hack can be removed.


View Original