Runner#
Splitting samples that have the same weight#
- static RunAnalysis.splitSamples(samples, useFilesPerJob=True)[source][source]
static methods, takes a dictionary of samples and split them based on their weights and max num. of files
- Parameters:
- Returns:
- list of tuple
each tuple will have a lenght of 5 (6 if subsamples are present), where the first element is the name of the sample, the second the list of files, the third the weight, and the fourth the index of this tuple compared to the other tuples of the same sample type, the fifth will be the isData flag (True if the sample is data, False otherwise). If subsamples are present, the sixth element will be the dict of subsamples
Instanciate the RunAnalysis class#
- RunAnalysis.__init__(samples, aliases, variables, cuts, nuisances, lumi, limit=-1, outputFileMap='output.root')[source][source]
Stores arguments in the class attributes and creates all the RDataFrame objects
- Parameters:
- sampleslist of tuple
same type as the return of the splitSamples method
- aliases
dict dict of aliases
- variables
dict dict of variables
- cuts
dict dict of cuts, contains two keys (preselections: str, cuts: dict)
- nuisances
dict dict of nuisances
- lumi
float lumi in fb-1
- limit
int, optional, default: -1 limit of events to be processed
- outputFileMap
str, optional, defaults: ‘output.root’ full path + filename of the output root file.
- Returns:
The main function:#
- RunAnalysis.run()[source][source]
Runs the analysis:
load the aliases without the
afterNuisoptionload the
suffixsystematicsload the alias
weightload the reweight systematics (they need the
weightto be defined)finally load the suffix systematics with the
afterNuisoption
After this important procedure it filters with
preselectionthe manydfs, loads systematics loadsvariables, creates the results dict, splits the samples, creates the cuts/var histos, runs the dataframes and saves results.
Runner should be provided with samples, aliases and all the other configuration dictionaries. It will determine how to split the sample splitting and merging of results
runner module#
- class mkShapesRDF.shapeAnalysis.runner.RunAnalysis(samples, aliases, variables, cuts, nuisances, lumi, limit=-1, outputFileMap='output.root')[source][source]#
Bases:
objectClass athat craeates
dfsand runs the analysissMethods
Gather resulting histograms and fold them if needed.
Create empty dictionary for results, will store all the different histos
Defines
Histo1Dfor each variable and cut and dataframe.getNuisanceFiles(nuisance, files)Searches in the provided nuisance folder for the files with the same name of the nominal files
getTTreeNomAndFriends(fnom, friends)Create a TChain with the nominal files and the friends files (nuisances TTrees with varied branches)
Loads only the special alias
weightin the dataframes.loadAliases([afterNuis])Load aliases in the dataframes.
Loads branches (the ones specified in an
aliaswith thetreekey in them), and checks if they are already in the dataframe columns, if so it adds__at the beginning of the name.Loads systematics of type
suffixin the dataframes.Loads systematics of type
suffixin the dataframes.Loads variables (not the ones with the 'tree' key in them), and checks if they are already in the dataframe columns, if so it adds
__at the beginning of the name.run()Runs the analysis:
Save results in a root file.
splitSamples(samples[, useFilesPerJob])static methods, takes a dictionary of samples and split them based on their weights and max num.
Split samples into subsamples if needed
index_sub
mergeAndSaveResults
mergeSaveResults
- static splitSamples(samples, useFilesPerJob=True)[source][source]#
static methods, takes a dictionary of samples and split them based on their weights and max num. of files
- Parameters:
- Returns:
- list of tuple
each tuple will have a lenght of 5 (6 if subsamples are present), where the first element is the name of the sample, the second the list of files, the third the weight, and the fourth the index of this tuple compared to the other tuples of the same sample type, the fifth will be the isData flag (True if the sample is data, False otherwise). If subsamples are present, the sixth element will be the dict of subsamples
- static getTTreeNomAndFriends(fnom, friends)[source][source]#
Create a TChain with the nominal files and the friends files (nuisances TTrees with varied branches)
- Args:
fnom (list): list of nominal files friends (list of list): list of list of friends files
- Returns:
TChain: TChain with nominal files and friends files
- static getNuisanceFiles(nuisance, files)[source][source]#
Searches in the provided nuisance folder for the files with the same name of the nominal files
- Args:
nuisance (dict): dict with the nuisance information files (list): list of nominal files
- Returns:
list of list: list with the down and up varied list of files
- __init__(samples, aliases, variables, cuts, nuisances, lumi, limit=-1, outputFileMap='output.root')[source][source]#
Stores arguments in the class attributes and creates all the RDataFrame objects
- Parameters:
- sampleslist of tuple
same type as the return of the splitSamples method
- aliases
dict dict of aliases
- variables
dict dict of variables
- cuts
dict dict of cuts, contains two keys (preselections: str, cuts: dict)
- nuisances
dict dict of nuisances
- lumi
float lumi in fb-1
- limit
int, optional, default: -1 limit of events to be processed
- outputFileMap
str, optional, defaults: ‘output.root’ full path + filename of the output root file.
- Returns:
- dfs#
dfs is a dictionary containing as keys the sampleNames. The structure should look like this:
dfs = { 'DY':{ 0: { 'df': obj, 'columnNames': [...], 'usedVariables': [...], 'ttree': obj2, # needed otherwise seg fault in root }, } }
- loadAliases(afterNuis=False)[source][source]#
Load aliases in the dataframes. It does not create the special alias
weightfor which a special method is used.
- loadVariables()[source][source]#
Loads variables (not the ones with the ‘tree’ key in them), and checks if they are already in the dataframe columns, if so it adds
__at the beginning of the name.Since variables are shared but not the aliases, it could happen that a variable’s name or expression is already defined for a given df but not for another one -> need to determine a common and compatible set of variables for all the many dfs.
This is done by gathering the largest set of column names.
- loadBranches()[source][source]#
Loads branches (the ones specified in an
aliaswith thetreekey in them), and checks if they are already in the dataframe columns, if so it adds__at the beginning of the name.Since variables are shared but not the aliases, it could happen that a variable’s name or expression is already defined for a given df but not for another one -> need to determine a common and compatible set of variables for all the many dfs.
This is done by gathering the largest set of column names.
- createResults()[source][source]#
Create empty dictionary for results, will store all the different histos
- splitSubsamples()[source][source]#
Split samples into subsamples if needed
After this method the
dfsattribute will be modified to contain the subsamples names instead of the original sample name
- create_cuts_vars()[source][source]#
Defines
Histo1Dfor each variable and cut and dataframe. It also creates dictionary for variations throughVariationsFor()
- convertResults()[source][source]#
Gather resulting histograms and fold them if needed.
Systematics are also saved.
- saveResults()[source][source]#
Save results in a root file.
If
Snapshotwere created will merge them in a output file.
- run()[source][source]#
Runs the analysis:
load the aliases without the
afterNuisoptionload the
suffixsystematicsload the alias
weightload the reweight systematics (they need the
weightto be defined)finally load the suffix systematics with the
afterNuisoption
After this important procedure it filters with
preselectionthe manydfs, loads systematics loadsvariables, creates the results dict, splits the samples, creates the cuts/var histos, runs the dataframes and saves results.