I have a dataset of PDFs. I want to convert this dataset of raw files to a MediaSet.
I tried to convert the dataset of files in Code Repository but I'm unclear how to proceed and which code to use to perform this operation.
There are 2 main components:
In order to have a mediaset, there have multiple approaches:
You are in case 3. in those cases, because you already have a dataset with raw files. In a Code Repository, you need to import the transforms-media
library in your code repository (via the left icon to import any libraries)
To do so:
from transforms.api import transform, Input
from transforms.mediasets import MediaSetOutput
@transform(
output_mediaset=MediaSetOutput("<your path to mediaset>"),
input_dataset=Input("<your path to dataset with raw files>")
)
def compute(input_dataset, output_mediaset):
output_mediaset.put_dataset_files(input_dataset)
You can directly move to next step: create media references dataset from this raw dataset (would have been the same from a Media Set).
Now comes the question: How do you convert a media set into media references ?
Multiple ways:
Get Media References (Datasets)
.You should try approach 2. as it will be much simpler to achieve what you require here.
Once you have media references, you can use it in your Ontology etc.