A Node.js application to parse YAML files

These past two weeks I have been interning at Infibeam — a company that manages all of India’s public procurement using their online Government eMarketplace (GeM) platform — as a Technology Analyst. The company’s Data Warehouse in Bangalore sends all its data as YAML files, but the Analytics team needs this data in a database for analysis. So I was given the job of automating the process of transferring the YAML data into an SQL database. To do so, I am building a Node.js service that first converts YAML to JSON and then inserts it into a DB.

Here, I will be discussing my use of JS-YAML to accomplish this task.

JS-YAML - YAML 1.2 parser

js-yaml is the most popular library to convert YAML files to JSON.

To install or use js-yaml

1
$ npm install js-yaml

Getting started is very easy

1
2
3
4
5
6
7
8
9
10
yaml = require("js-yaml");
fs = require("fs");

// Get document, or throw exception on error
try {
var yamlData = yaml.safeLoad(fs.readFileSync("foo/bar", "utf8"));
console.log(JSON.stringify(yamlData));
} catch (error) {
console.log(error);
}

The code above reads your YAML file and then prints it out in JSON. This alone is enough to parse most YAML files. But what if my YAML has custom tags in it? Life will get just a little bit tough then as we will have to create a Schema to parse the YAML. The online literature on this topic is sparse, which is why I wanted to share this with you all. So here is a simple fix for this.

Parsing Custom YAML tags

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
let json_data = {
parseData: null,
error: null,
};
try {
let CustomType = new yaml.Type("customTypeName", {
kind: "mapping",
// Other available kinds are 'scalar' (string) and 'mapping' (object).
// https://yaml.org/spec/1.2/spec.html (goto 3.2.1.1. Nodes)
});
let CUSTOM_SCHEMA = yaml.Schema.create([CustomType]);
let parsed_yaml_data = yaml.safeLoad(fs.readFileSync("foo/bar", "utf8"), {
schema: CUSTOM_SCHEMA,
});
json_data.parsed_data = JSON.stringify(parsed_yaml_data);
} catch (e) {
json_data.error = e;
}

What this does is create a schema for your CustomType and then passes that schema on to the safeLoad function so that when the file is being parsed, the customTypeName is parsed as a custom YAML tag. This will fix most unknown tag errors that you might be getting when parsing YAML files.

Summary

So this was it: a simple fix to a very annoying problem you might come across when parsing YAML files. The YAML data I am working with is enormous, so I have had to create a host of schemas to parse it. I wish there was a way of automating the schema generation process as well.