swiftcocoaspotlight

How can I access the text content (kMDItemTextContent) of an NSMetadataItem in Swift?


I'm trying to access the text representation of NSMetadataQuery results using Swift. However, the attribute kMDItemTextContent which contains a file's text representation doesn't exist on the results. I can confirm that the attribute should exist because searching for files using the attribute works flawlessly.

Here's my code so far:

import Foundation
import Cocoa

class Indexer {
    public let spotlight = NSMetadataQuery()
    let backgroundQueue = OperationQueue()

    init() {
        let nc = NotificationCenter.default

        spotlight.searchScopes = []
        spotlight.predicate = NSPredicate(fromMetadataQueryString: "kMDItemKind == *")

        nc.addObserver(forName: NSNotification.Name.NSMetadataQueryDidFinishGathering, object: nil, queue: self.backgroundQueue, using:{_ in
            self.spotlight.disableUpdates()
            for i in 0..<self.spotlight.resultCount {
                let result = self.spotlight.result(at: i) as! NSMetadataItem
                print("----- \(result.value(forAttribute: "kMDItemDisplayName") ?? "No title") -----")
                for attribute in result.attributes {
                    print("\(attribute):", result.value(forAttribute: attribute) ?? "No content")
                }
            }
            self.spotlight.enableUpdates()
        })

        spotlight.start()
    }
}

The current result for one file looks like this:

----- n26-csv-transactions.csv -----
kMDItemContentTypeTree: (
    "public.comma-separated-values-text",
    "public.data",
    "public.delimited-values-text",
    "public.plain-text",
    "public.item",
    "public.content",
    "public.text"
)
kMDItemContentType: public.comma-separated-values-text
kMDItemPhysicalSize: 16384
kMDItemDisplayName: n26-csv-transactions.csv
kMDItemKind: CSV Document
kMDItemContentCreationDate: 2019-04-25 17:09:08 +0000
kMDItemContentCreationDate_Ranking: 2019-04-25 00:00:00 +0000
kMDItemContentModificationDate: 2019-04-25 17:09:08 +0000
kMDItemInterestingDate_Ranking: 2019-05-08 00:00:00 +0000
kMDItemUsedDates: (
    "2019-05-07 22:00:00 +0000"
)
kMDItemLastUsedDate: 2019-05-08 10:00:33 +0000
kMDItemLastUsedDate_Ranking: 2019-05-08 00:00:00 +0000
kMDItemUseCount: 3
kMDItemLogicalSize: 591
kMDItemWhereFroms: (
    "https://app.n26.com/download-csv",
    "https://app.n26.com/downloads"
)
kMDItemFSName: n26-csv-transactions.csv
kMDItemFSSize: 591
kMDItemFSCreationDate: 2019-04-25 17:09:08 +0000
kMDItemFSContentChangeDate: 2019-04-25 17:09:08 +0000
kMDItemFSOwnerUserID: 99
kMDItemFSOwnerGroupID: 99
kMDItemFSNodeCount: No content
kMDItemFSInvisible: 0
kMDItemFSTypeCode: 0
kMDItemFSCreatorCode: 0
kMDItemFSFinderFlags: 0
kMDItemFSHasCustomIcon: No content
kMDItemFSIsExtensionHidden: 0
kMDItemFSIsStationery: No content
kMDItemFSLabel: 0

The kMDItemTextContent attribute seems to be missing here.

Is there a way to access that attribute using the NSMetadataItems returned by Spotlight? If not, is there another way to access a file's text representation?


Solution

  • Is there a way to access that attribute using the NSMetadataItems returned by Spotlight? If not, is there another way to access a file's text representation?

    In a word: no. Read the docs on that attribute:

    Contains a text representation of the content of the document. Data in multiple fields should be combined using a whitespace character as a separator. An application's Spotlight importer provides the content of this attribute. Applications can create queries using this attribute, but are not able to read the value of this attribute directly. [Emphasis mine.]

    The text content info goes into the Spotlight index so that, as you have observed, you can search on it. But you cannot obtain it for yourself in any way. It doesn't exist in any public programmer-facing form.

    (Just to give an example, the existing mdls command does basically just what your code does - you could save yourself the trouble by running mdls in a Process. Well, if you give an mdls command in the Terminal, you won't see kMDItemTextContent listed among the attributes, even if this file's content is indexed.)

    To see why this is, think about privacy. If you could read a text representation of every file on the user's computer just because you have access to Spotlight, you'd know all the data that's on the user's computer. Unless you are some kind of evil hacker, you shouldn't even want that. To find out what's in a file, open the file — if you can.

    So what is this attribute even for? It's so that you can supply text to Spotlight in a file type that belongs to you by way of a custom Spotlight importer.