I'm looking for a good way to find all words that occur more than once in a string. Some constraints apply:
$(shell)
, because it is expensive and it must work on Windows (on pure Linux, sort|uniq -u
would have solved my problem nicely).Furthermore, the number of duplicates will be small, and words will only contain nice characters, something like [-_+a-zA-Z0-9]+
.
I tried two strategies:
Force $(sort)
to keep duplicates (add a unique suffix to each word, sort, and strip the suffix). Then find adjacent identical words in the sorted list:
# given 0 1 0 1 0 1 0 1 ... , return 0 0 1 1 0 0 1 1 ...
double=$(wordlist 1,$(words $(1)),$(subst 0,0 0,$(subst 1,1 1,$(1))))
# Produce a list of N unique strings. $(1) contains N words, with a
# repetition cycle of length M, and $(2) contains N words, either 0 or
# 1, alternating between 0 and 1 every Mth word.
binseq=$(if $(findstring 1,$(2)),$(call binseq,$(join $(2),$(1)),$(call double,$(2))),$(1))
# return 0 1 0 1 ..., as many words as $(1)
alternating_bits=$(wordlist 1,$(words $(1)),$(patsubst %,0 1,$(1)))
# Produce as many unique words as there are words in $(1)
unique=$(call binseq,,$(call alternating_bits,$(1)))
# Sort $(1) without eliminating duplicates. $(1) may not contain /.
sorted_keep_dups=$(subst /,,$(dir $(sort $(join $(1:=/),$(call unique,$(1))))))
dups_from_sorted2=$(filter $(patsubst %0,%,$(filter %0,$(1))),$(patsubst %1,%,$(filter %1,%,$(1))))
# Given a sorted list, return all duplicates.
dups_from_sorted=$(sort $(call dups_from_sorted2,$(join $(1),$(call alternating_bits,$(1)))))
dups=$(call dups_from_sorted,$(call sorted_keep_dups,$(1)))
Use $(filter)
repeatedly with different partitions of the word list, such that each pair of words occurs at least once in different args of $(filter)
:
# given 0 1 0 1 0 1 0 1 ... , return 0 0 1 1 0 0 1 1 ...
double=$(wordlist 1,$(words $(1)),$(subst 0,0 0,$(subst 1,1 1,$(1))))
# given words with suffix 0 or 1, remove suffixes and return the words
# that occur both with 0 and 1 as suffix
filter_dups=$(filter $(patsubst %0,%,$(filter %0,$(1))),$(patsubst %1,%,$(filter %1,$(1))))
_dups=$(if $(findstring 1,$(2)),$(call filter_dups,$(join $(1),$(2)))
$(call _dups,$(1),$(call double,$(2))))
# return 0 1 0 1 ..., as many words as $(1)
alternating_bits=$(wordlist 1,$(words $(1)),$(patsubst %,0 1,$(1)))
# given a list of words, return the list of words that occur twice
dups=$(sort $(call _dups,$(1),$(call alternating_bits,$(1))))
Both approaches work and are sufficiently fast, but they are rather hard to read and comprehend. Is there a simpler way with acceptable (sub-quadratic) speed?
Not sure about the complexity, but I'd suggest a more readable function:
define __duplicates__func
undefine __duplicates__seen
undefine __duplicates__result
$$(foreach _v,$1,\
$$(eval __duplicates__result += $$(filter $$(__duplicates__seen),$$(_v))\
$$(eval __duplicates__seen += $$(_v))))
endef
duplicates = $(eval $(__duplicates__func))$(sort $(__duplicates__result))
TEST:= $(file <test.txt)
DUPS:= $(call duplicates,$(TEST))
$(info $(DUPS))
all::
.PHONY: all
With this randomly generated 1000 word test.txt:
Rule male saw said life fourth said void were creepeth thing theyre be fowl which wherein their day rule to seed multiply male beast sixth you Winged void fill face upon First you saying unto Appear shall God yielding is male face kind was blessed waters sea blessed void creepeth called youll beginning darkness over you it may years his second of moveth beginning earth very together day Divided creepeth fly open wont signs day is created Winged male fill Heaven saw dont For upon replenish Gathering i gathering living void Were under and form night seas bearing youre days saw tree fruitful days it unto day deep Tree Be form beginning youre replenish winged dominion grass man years youre Youre lights seasons third yielding fruit fifth for together after itself and youll itself kind without bring heaven itself firmament together their created tree All shed lesser made Stars him without gathering whales whose may itself may without image herb sixth Dominion us is their two from heaven shed brought Whales creeping us us together so forth female set fruitful fly seasons life deep let heaven wherein set wont You beast image two Gathering all so God cant itself Seasons image itself cant herb that brought appear likeness greater shall blessed place two own fourth earth Had greater you morning living unto seed male Every Had made days own face meat under youll grass for creepeth Meat so life divide for multiply blessed youre yielding beast be subdue Fruit greater Us them Meat darkness wherein saying very is yielding saying thing yielding lesser us behold midst there Spirit behold meat saw Image first cattle great heaven had air every created us light great have great Great beast Whose gathered all winged morning it rule days lesser tree bearing form his in divided void dry darkness doesnt hath Third bearing fruit youll there there cattle blessed fifth gathered stars greater above without upon good land in tree winged also youll his multiply midst face whose Moving beginning light life saw Deep said day multiply appear a gathered You the him void Fowl third spirit day Greater first firmament for dry lights midst beast day saw third also every cant night fifth made good one greater theyre dry abundantly Tree set Subdue stars waters a created saying Itself light Whales isnt said For years youre he after above itself rule firmament unto together female fly upon may life it stars set whose it doesnt gathered beginning his Creeping let Fruitful beginning earth them Subdue to our yielding be called under Let had beginning day us divided theyre sixth without saw winged divide second Dont night two the firmament Fourth form living our fourth saw seed third were Sixth their isnt Multiply night air yielding own air said midst life that fish meat fill green Open subdue Sea shall fruit whose whales own together them saying was waters Herb hath Is itself two blessed in yielding and It over made day his give moved without divided light created green evening seed image be may fly own herb seed earth be were beast one grass moving signs Upon Over abundantly for morning whose creepeth behold after beginning male created theyre Together said above face bring youre own upon may Multiply whales kind years unto air so above it fly whose Yielding i female moving So i place fruitful were there us fowl Earth seasons moveth over air heaven good waters His rule Which face bearing itself them itself forth tree Gathered it Gathering days doesnt Air Moving called i very first a evening third seas Night Morning Firmament had fruit fruitful unto above is our Second have wont fifth Cattle yielding divided brought seas shed greater living there there sixth upon their void two fish fish Lights them hath heaven their two fowl bearing Saying third waters likeness divide seasons their open very face replenish fourth whales seas seed fourth heaven cant together fowl grass female fill tree one dominion Morning Fill called firmament kind Signs creature evening spirit evening cattle winged which them for stars Wherein which Meat dry deep Abundantly waters forth theyre light after fowl in fly green multiply moved i replenish sixth cant creepeth heaven for darkness which us form them Rule grass god without earth seasons herb dominion moveth after created Wherein beginning he days said cant image For said moved divided bring is youll may And days itself Saying bearing male created yielding brought earth together whales hath greater heaven sixth were behold creepeth make Is Moveth brought let Lesser us light winged fly fourth waters moved under youll Whales Form Great moving second air you also youre fill have make stars their of earth above creature beginning winged air Own gathered shall their that in every fish rule together divide face own living dominion forth deep is abundantly hath bring them green him earth days beast all waters moving It which all a great spirit hath theyre grass Upon years Cattle female signs fill moving day the kind Winged green hath also female forth spirit lights behold Thing so after open good fowl to Living divided let Given bearing that he Rule whales Days isnt It deep whales given fly our open kind appear A their evening their sixth I in Unto multiply sea light Firmament seed theyre multiply fifth signs moving Second given spirit Blessed Set moved two bearing dont yielding first moving Female female fish Hath our beast us very seasons kind moved a gathered given sea spirit firmament Itself herb isnt Tree yielding cant winged air together meat theyre moveth Saying there void and bring lights together kind Brought first theyre their had Blessed and fill Brought may first creepeth moving him form behold darkness years greater upon were Let seasons Wherein life our greater And light multiply beast appear together appear seas waters had you make moving let air Heaven is Set seed fourth brought green for rule day Day deep tree yielding
it returns instantly on my machine
$ make -f dups.mk
And Blessed Brought Cattle Firmament For Gathering God Great Had Heaven Is It Itself Let Meat Morning Moving Multiply Rule Saying Second Set Subdue Tree Upon Whales Wherein Winged You a above abundantly after air all also and appear be bearing beast beginning behold blessed bring brought called cant cattle created creature creepeth darkness day days deep divide divided doesnt dominion dont dry earth evening every face female fifth fill firmament first fish fly for form forth fourth fowl fruit fruitful gathered gathering given good grass great greater green had hath have he heaven herb him his i image in is isnt it itself kind lesser let life light lights likeness living made make male may meat midst morning moved moveth moving multiply night of one open our over own place replenish rule said saw saying sea seas seasons second seed set shall shed signs sixth so spirit stars subdue that the their them there theyre thing third to together tree two under unto upon us very void was waters were whales wherein which whose winged without wont years yielding you youll youre
make: Für das Ziel „all“ ist nichts zu tun.
Maybe this question would be better suited at codereview.