I'm trying to make a LSTM model to predict sign language sentence with this format of json file. This json file is containing the coordinates information of 21 hand landmark joints for each frames and each hands.(left hand, right hand) Here's the sneak peak of my json time series data.
{
"frame": 123,
"hands": [
{
"hand": "Left",
"landmarks": [
{
"x": 0.4636201858520508,
"y": 0.3758980929851532,
"z": 7.529240519943414e-08,
"body_part": "wrist"
},
...
{
"x": 0.4639311134815216,
"y": 0.2574789524078369,
"z": -0.013109659776091576,
"body_part": "pinky_tip"
}
]
},
{
"hand": "Right",
"landmarks": [
{
"x": 0.5393109321594238,
"y": 0.6190552711486816,
"z": 1.0587137921902467e-07,
"body_part": "wrist"
},
...
{
"x": 0.4721616506576538,
"y": 0.5990280508995056,
"z": -0.006831812672317028,
"body_part": "pinky_tip"
}
]
}
]
},
The coordinate info with different location is being repeated for each frames. I'm still in progress of making correction of these json time series data. So, I haven't started making code for LSTM. However, I'm worrying whether I can use this time series json data for LSTM.
Not sure what you mean by
I'm still in progress of making correction of these json time series data.
does that mean that you need to "clean" your dataset like for example, remove outliers etc. or does that mean this is a rough structure and there could be additional elements to it.
In any case, I think this question relates more to how you can train the model with a JSON like structure rather than the data itself.
What you need to first do is think about how this data or any data can be represented as a tensor. Then you need to figure out what dimensions would the tensor likely have, given your information you data would convert to a tensor of shape
[number_samples, number_frames, number_features]
i.e
number_samples : well your dataset size
number_frames : total frames over which the data was collected in each example
number_features: 21 landmarks * 3 co-ordinates * 2 hands
once you have those tensors you can train your LSTM model. You will need to divide those features appropriately across your two hands for that representation to work.