One of my favorite components in Jedox Integrator is the TableLoop, with a very easy configuration you can do pretty cool stuff with it. I ran in an example that I will like to describe in this article. If you don’t have a good background in Jedox Integrator, you may need to download the complete example with the link given at the end of the article and refer back to it if needed, you only have to adapt the variable baseDir that points to the folder path on your computer.

Problem:

We have following folder “twitter_daten_sample” and we need to find the best way how to read all .txt files that are under it. Here is the folder structure:

TL_folderStructure-1

As you see from the snapshot, “twitter_daten_sample” folder include n-folders, each one is for one day (e.g. tweets_2015-01-03), and each day folder include n-folders that are simply the same name of the parent folder plus a numeric suffix (e.g. tweets_2015-01-03_0). The text files that I need to read are inside of these subfolders. One possible solution is to write a script with nested for-loops that traverse all possible paths. This would work, but what if I don’t like scripting and I want to do it without writing any code?

Solution:

Use the Integrator transform component TableLoop. It reruns a single data source n times, each time with a new set of variable values read from a loop sourceEach line in the loop source triggers a single evaluation of the data source.  The final result is the union of all these runs. In order for this to work the name of the column in the table loop source has to correspond to a variable name.

Starting to do the project, I added in the variables section (I added default values to easily test the components):

  • baseDir: is simply the path to “twitter_daten_sample”  folder (stays the same during the execution).
  • dayFolder: is the variable representing the day folder (default: tweets_2015-01-03).
  • daySubfolder: is the variable representing the subfolder in the day folder (default: tweets_2015-01-03_0).

Next, I added these connections:

  • C_DaysNames: a directory connection to read folders under ${baseDir} folder.
  • C_DaysSubNames:  a directory connection to read subfolders under ${baseDir}${dayFolder} folder.
  • C_FileNames: a directory connection to read files under ${baseDir}${dayFolder}${daySubfolder} folder.

Now, we add the extracts:

  • E_Days: Metadata extract to get the names of folders for the days (uses C_DaysNames connection).
  • E_DaySubfolders: Metadata extract get the names of folders under a certain day (uses C_DaysSubNames connection).
  • E_FileData: File Extract to get the data from all text files under a single day subfolder (uses C_FileNames connection).

As mentioned before in order to use a source (i.e. an extract or a transform) as a loop source its columns have to correspond to a variable name. The two extracts deliver though the column “Filename”. That is why I added two FieldTransforms after E_Days and E_DaySubfolders with the names T_Days and T_DaySubfolders respectively which only rename the resulting column name of the extract to a variable name: dayFolder and daySubfolder respectively. Now the new TableLoop Transform T_AllFolders, using T_DaySubfolders as a data source and T_Days as a loop source, enables us to get all day subfolders from all days in one column:

TL_datapreview_oneCol-2

Having this alone does not help us to build the path of the folder.That’s why I added the used variable value as a constant Input in the T_DaySubfolders (the data source of our TableLoop).

TL_FT_target-3

We can see now with which value of the variable dayFolder each record came with Now our result in T_AllFolders looks like this:

TL_datapreview_twoCol-4

Now the required information to get the folders containing the text files is there. We are one step away from our goal! E_FileData extract gets the data from text files existing under a single day subfolder, it uses a directory connection that points to a path which is build using the three variables that we defined:

${baseDir}${dayFolder}${daySubfolder}

(e.g. twitter_daten_sampletweets_2015-01-03tweets_2015-01-03_0).

The variable baseDir  has a fixed value and the possible dayFolder and daySubfolder values can be read from the T_AllFolders transform. So creating a new TableLoop transform T_AllData with E_FileData as a data source and T_AllFolders as a loop source, will give us the result that we want.

TL_data-5

Here is the flowgraph of the final Jedox Integrator project:

TL_flowgraph-6

the complete project and data can be found here.