I really like chaining Array.prototype.map
, filter
and reduce
to define a data transformation. Unfortunately, in a recent project that involved large log files, I could no longer get away with looping through my data multiple times...
I want to create a function that chains .filter
and .map
methods by, instead of mapping over an array immediately, composing a function that loops over the data once. I.e.:
const DataTransformation = () => ({
map: fn => (/* ... */),
filter: fn => (/* ... */),
run: arr => (/* ... */)
});
const someTransformation = DataTransformation()
.map(x => x + 1)
.filter(x => x > 3)
.map(x => x / 2);
// returns [ 2, 2.5 ] without creating [ 2, 3, 4, 5] and [4, 5] in between
const myData = someTransformation.run([ 1, 2, 3, 4]);
Inspired by this answer and this blogpost I started writing a Transduce
function.
const filterer = pred => reducer => (acc, x) =>
pred(x) ? reducer(acc, x) : acc;
const mapper = map => reducer => (acc, x) =>
reducer(acc, map(x));
const Transduce = (reducer = (acc, x) => (acc.push(x), acc)) => ({
map: map => Transduce(mapper(map)(reducer)),
filter: pred => Transduce(filterer(pred)(reducer)),
run: arr => arr.reduce(reducer, [])
});
The problem with the Transduce
snippet above, is that it runs “backwards”... The last method I chain is the first to be executed:
const someTransformation = Transduce()
.map(x => x + 1)
.filter(x => x > 3)
.map(x => x / 2);
// Instead of [ 2, 2.5 ] this returns []
// starts with (x / 2) -> [0.5, 1, 1.5, 2]
// then filters (x < 3) -> []
const myData = someTransformation.run([ 1, 2, 3, 4]);
Or, in more abstract terms:
Go from:
Transducer(concat).map(f).map(g) == (acc, x) => concat(acc, f(g(x)))
To:
Transducer(concat).map(f).map(g) == (acc, x) => concat(acc, g(f(x)))
Which is similar to:
mapper(f) (mapper(g) (concat))
I think I understand why it happens, but I can't figure out how to fix it without changing the “interface” of my function.
How can I make my Transduce
method chain filter
and map
operations in the correct order?
Transduce
term or if there are better ways to describe the problem.for
loop:const push = (acc, x) => (acc.push(x), acc);
const ActionChain = (actions = []) => {
const run = arr =>
arr.reduce((acc, x) => {
for (let i = 0, action; i < actions.length; i += 1) {
action = actions[i];
if (action.type === "FILTER") {
if (action.fn(x)) {
continue;
}
return acc;
} else if (action.type === "MAP") {
x = action.fn(x);
}
}
acc.push(x);
return acc;
}, []);
const addAction = type => fn =>
ActionChain(push(actions, { type, fn }));
return {
map: addAction("MAP"),
filter: addAction("FILTER"),
run
};
};
// Compare to regular chain to check if
// there's a performance gain
// Admittedly, in this example, it's quite small...
const naiveApproach = {
run: arr =>
arr
.map(x => x + 3)
.filter(x => x % 3 === 0)
.map(x => x / 3)
.filter(x => x < 40)
};
const actionChain = ActionChain()
.map(x => x + 3)
.filter(x => x % 3 === 0)
.map(x => x / 3)
.filter(x => x < 40)
const testData = Array.from(Array(100000), (x, i) => i);
console.time("naive");
const result1 = naiveApproach.run(testData);
console.timeEnd("naive");
console.time("chain");
const result2 = actionChain.run(testData);
console.timeEnd("chain");
console.log("equal:", JSON.stringify(result1) === JSON.stringify(result2));
const filterer = pred => reducer => (acc, x) =>
pred(x) ? reducer(acc, x) : acc;
const mapper = map => reducer => (acc, x) => reducer(acc, map(x));
const Transduce = (reducer = (acc, x) => (acc.push(x), acc)) => ({
map: map => Transduce(mapper(map)(reducer)),
filter: pred => Transduce(filterer(pred)(reducer)),
run: arr => arr.reduce(reducer, [])
});
const sameDataTransformation = Transduce()
.map(x => x + 5)
.filter(x => x % 2 === 0)
.map(x => x / 2)
.filter(x => x < 4);
// It's backwards:
// [-1, 0, 1, 2, 3]
// [-0.5, 0, 0.5, 1, 1.5]
// [0]
// [5]
console.log(sameDataTransformation.run([-1, 0, 1, 2, 3, 4, 5]));
before we know better
I really like chaining ...
I see that, and I'll appease you, but you'll come to understand that forcing your program through a chaining API is unnatural, and more trouble than it's worth in most cases.
const Transduce = (reducer = (acc, x) => (acc.push(x), acc)) => ({ map: map => Transduce(mapper(map)(reducer)), filter: pred => Transduce(filterer(pred)(reducer)), run: arr => arr.reduce(reducer, []) });
I think I understand why it happens, but I can't figure out how to fix it without changing the “interface” of my function.
The problem is indeed with your Transduce
constructor. Your map
and filter
methods are stacking map
and pred
on the outside of the transducer chain, instead of nesting them inside.
Below, I've implemented your Transduce
API that evaluates the maps and filters in correct order. I've also added a log
method so that we can see how Transduce
is behaving
const Transduce = (f = k => k) => ({
map: g =>
Transduce(k =>
f ((acc, x) => k(acc, g(x)))),
filter: g =>
Transduce(k =>
f ((acc, x) => g(x) ? k(acc, x) : acc)),
log: s =>
Transduce(k =>
f ((acc, x) => (console.log(s, x), k(acc, x)))),
run: xs =>
xs.reduce(f((acc, x) => acc.concat(x)), [])
})
const foo = nums => {
return Transduce()
.log('greater than 2?')
.filter(x => x > 2)
.log('\tsquare:')
.map(x => x * x)
.log('\t\tless than 30?')
.filter(x => x < 30)
.log('\t\t\tpass')
.run(nums)
}
// keep square(n), forall n of nums
// where n > 2
// where square(n) < 30
console.log(foo([1,2,3,4,5,6,7]))
// => [ 9, 16, 25 ]
untapped potential
Inspired by this answer ...
In reading that answer I wrote, you overlook the generic quality of Trans
as it was written there. Here, our Transduce
only attempts to work with Arrays, but really it can work with any type that has an empty value ([]
) and a concat
method. These two properties make up a category called Monoids and we'd be doing ourselves a disservice if we didn't take advantage of transducer's ability to work with any type in this category.
Above, we hard-coded the initial accumulator []
in the run
method, but this should probably be supplied as an argument – much like we do with iterable.reduce(reducer, initialAcc)
Aside from that, both implementations are essentially equivalent. The biggest difference is that the Trans
implementation provided in the linked answer is Trans
itself is a monoid, but Transduce
here is not. Trans
neatly implements composition of transducers in the concat
method whereas Transduce
(above) has composition mixed within each method. Making it a monoid allows us to rationalize Trans
the same way do all other monoids, instead of having to understand it as some specialized chaining interface with unique map
, filter
, and run
methods.
I would advise building from Trans
instead of making your own custom API
have your cake and eat it too
So we learned the valuable lesson of uniform interfaces and we understand that Trans
is inherently simple. But, you still want that sweet chaining API. OK, ok...
We're going to implement Transduce
one more time, but this time we'll do so using the Trans
monoid. Here, Transduce
holds a Trans
value instead of a continuation (Function
).
Everything else stays the same – foo
takes 1 tiny change and produces an identical output.
// generic transducers
const mapper = f =>
Trans(k => (acc, x) => k(acc, f(x)))
const filterer = f =>
Trans(k => (acc, x) => f(x) ? k(acc, x) : acc)
const logger = label =>
Trans(k => (acc, x) => (console.log(label, x), k(acc, x)))
// magic chaining api made with Trans monoid
const Transduce = (t = Trans.empty()) => ({
map: f =>
Transduce(t.concat(mapper(f))),
filter: f =>
Transduce(t.concat(filterer(f))),
log: s =>
Transduce(t.concat(logger(s))),
run: (m, xs) =>
transduce(t, m, xs)
})
// when we run, we must specify the type to transduce
// .run(Array, nums)
// instead of
// .run(nums)
Expand this code snippet to see the final implementation – of course you could skip defining a separate mapper
, filterer
, and logger
, and instead define those directly on Transduce
. I think this reads nicer tho.
// Trans monoid
const Trans = f => ({
runTrans: f,
concat: ({runTrans: g}) =>
Trans(k => f(g(k)))
})
Trans.empty = () =>
Trans(k => k)
const transduce = (t, m, xs) =>
xs.reduce(t.runTrans((acc, x) => acc.concat(x)), m.empty())
// complete Array monoid implementation
Array.empty = () => []
// generic transducers
const mapper = f =>
Trans(k => (acc, x) => k(acc, f(x)))
const filterer = f =>
Trans(k => (acc, x) => f(x) ? k(acc, x) : acc)
const logger = label =>
Trans(k => (acc, x) => (console.log(label, x), k(acc, x)))
// now implemented with Trans monoid
const Transduce = (t = Trans.empty()) => ({
map: f =>
Transduce(t.concat(mapper(f))),
filter: f =>
Transduce(t.concat(filterer(f))),
log: s =>
Transduce(t.concat(logger(s))),
run: (m, xs) =>
transduce(t, m, xs)
})
// this stays exactly the same
const foo = nums => {
return Transduce()
.log('greater than 2?')
.filter(x => x > 2)
.log('\tsquare:')
.map(x => x * x)
.log('\t\tless than 30?')
.filter(x => x < 30)
.log('\t\t\tpass')
.run(Array, nums)
}
// output is exactly the same
console.log(foo([1,2,3,4,5,6,7]))
// => [ 9, 16, 25 ]
wrap up
So we started with a mess of lambdas and then made things simpler using a monoid. The Trans
monoid provides distinct advantages in that the monoid interface is known and the generic implementation is extremely simple. But we're stubborn or maybe we have goals to fulfill that are not set by us – we decide to build the magic Transduce
chaining API, but we do so using our rock-solid Trans
monoid which gives us all the power of Trans
but also keeps complexity nicely compartmentalised.
dot chaining fetishists anonymous
Here's a couple other recent answers I wrote about method chaining