emacsorg-modeword-count

Using org-element to parse headlines matching a specific tag (or property)


I'm trying to write custom Elisp functions for word counts based on certain parts of an org-mode buffer, and I was wondering if there is a good way for org-element to parse all headlines matching a certain tag (or property). More specifically, I have a buffer like this:

#+TITLE: My manuscript

Authors, affiliations, etc.

* Abstract       :abstract:

Text in the abstract.

* Introduction   :body:

Some sample text goes here.

* Methods        :body:

Some sample text goes here.

* Results        :body:

Some sample text goes here.

* Discussion     :body:

Some sample text goes here.

* References     :refs:

References go here.

And I want to get a word count for all headers matching the :body: tag (i.e., 20). So far, I have been using the functions from this answer, which do a great job at returning a word count for the headline at point:

(require 'cl-lib)
(require 'org-element)

(defun org-element-parse-headline (&optional granularity visible-only)
  "Parse current headline.
GRANULARITY and VISIBLE-ONLY are like the args of `org-element-parse-buffer'."
  (let ((level (org-current-level)))
    (org-element-map
    (org-element-parse-buffer granularity visible-only)
    'headline
      (lambda (el)
    (and
     (eq (org-element-property :level el) level)
     (<= (org-element-property :begin el) (point))
     (<= (point) (org-element-property :end el))
     el))
      nil 'first-match 'no-recursion)))

(cl-defun org+-count-words-of-heading (&key (worthy '(paragraph bold italic underline code footnote-reference link strike-through subscript superscript table table-row table-cell))
                        (no-recursion nil))
  "Count words in the section of the current heading.
WORTHY is a list of things worthy to be counted.
This list should at least include the symbols:
paragraph, bold, italic, underline and strike-through.

If NO-RECURSION is non-nil don't count the words in subsections."
  (interactive (and current-prefix-arg
            (list :no-recursion t)))
  (let ((word-count 0))
    (org-element-map
    (org-element-contents (org-element-parse-headline))
    '(paragraph table)
      (lambda (par)
    (org-element-map
        par
        worthy
        (lambda (el)
          (cl-incf
           word-count
           (cl-loop
        for txt in (org-element-contents el)
        when (eq (org-element-type txt) 'plain-text)
        sum
        (with-temp-buffer
          (insert txt)
          (count-words (point-min) (point-max))))
           ))))
      nil nil (and no-recursion 'headline)
      )
      (when (called-interactively-p 'any)
      (message "Word count in section: %d" word-count))
    word-count))

I imagine I have to tweak the org-element-parse-headline function to match tags instead of grabbing the headline at point, but does anyone know how to do this? Thanks!


Solution

  • To get results from the whole buffer, not just the headline at point, set the value of level, eg. to 0 to check the whole buffer and don't set the 'first-match argument to org-element-map. Then, to restrict matches to headlines with specific tags, add a condition to the function in org-element-map.

    (defun my-org-element-parse-headline (&optional granularity visible-only)
      (let ((level 0))                       ; or restrict level
        (org-element-map
            (org-element-parse-buffer granularity visible-only)
            'headline
          (lambda (el)
            ;; eg. restrict elements to levels greater than `level`
            (when (< level (org-element-property :level el))
              (and
               ;; match "body" tags
               (member "body" (org-element-property :tags el))
               ;; ...
               el)))
          ;; dont set 'first-match if you want all the matches
          nil nil 'no-recursion)))
    
    
    ;; eg. results from your example org file
    (mapcar (lambda (el) (org-element-property :raw-value el)) (my-org-element-parse-headline))
    ;; ("Introduction" "Methods" "Results" "Discussion")