berkatil commited on
Commit
0081443
1 Parent(s): e6d0f8f

first draft

Browse files
Files changed (3) hide show
  1. README.md +30 -25
  2. map.py +53 -43
  3. requirements.txt +2 -1
README.md CHANGED
@@ -1,11 +1,10 @@
1
  ---
2
  title: map
3
- datasets:
4
- -
5
  tags:
6
  - evaluate
7
  - metric
8
- description: "TODO: add a description here"
 
9
  sdk: gradio
10
  sdk_version: 3.19.1
11
  app_file: app.py
@@ -14,37 +13,43 @@ pinned: false
14
 
15
  # Metric Card for map
16
 
17
- ***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
18
-
19
  ## Metric Description
20
- *Give a brief overview of this metric, including what task(s) it is usually used for, if any.*
 
21
 
22
  ## How to Use
23
- *Give general statement of how to use the metric*
24
-
25
- *Provide simplest possible example for using the metric*
26
-
 
 
 
 
 
 
27
  ### Inputs
28
- *List all input arguments in the format below*
29
- - **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).*
30
 
31
  ### Output Values
 
32
 
33
- *Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
34
-
35
- *State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."*
36
-
37
- #### Values from Popular Papers
38
- *Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
39
-
40
- ### Examples
41
- *Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
42
 
43
  ## Limitations and Bias
44
  *Note any known limitations or biases that the metric has, with links and references if possible.*
45
 
46
  ## Citation
47
- *Cite the source where this metric was introduced.*
48
-
49
- ## Further References
50
- *Add any useful further references.*
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: map
 
 
3
  tags:
4
  - evaluate
5
  - metric
6
+ description: "This is the mean average precision (map) metric for retrieval systems.
7
+ It is the average of the precision scores computer after each relevant document is got. You can refer to [here](https://amenra.github.io/ranx/metrics/#mean-average-precision)"
8
  sdk: gradio
9
  sdk_version: 3.19.1
10
  app_file: app.py
 
13
 
14
  # Metric Card for map
15
 
 
 
16
  ## Metric Description
17
+ This is the mean average precision (map) metric for retrieval systems.
18
+ It is the average of the precision scores computer after each relevant document is got. You can refer to [here](https://amenra.github.io/ranx/metrics/#mean-average-precision)
19
 
20
  ## How to Use
21
+ ```python
22
+ >>> my_new_module = evaluate.load("map")
23
+ >>> references= [json.dumps({"q_1":{"d_1":1, "d_2":2} }),
24
+ json.dumps({"q_2":{"d_2":1, "d_3":2, "d_5":3}})]
25
+ >>> predictions = [json.dumps({"q_1": { "d_1": 0.9, "d_2": 0.8}}),
26
+ json.dumps({"q_2": {"d_2": 0.9, "d_1": 0.8, "d_5": 0.7, "d_3": 0.3}})]
27
+ >>> results = my_new_module.compute(references=references, predictions=predictions)
28
+ >>> print(results)
29
+ {'map': 0.902777}
30
+ ```
31
  ### Inputs
32
+ - **predictions:** a list of dictionaries where each dictionary consists of document relevancy scores produced by the model for a given query. One dictionary per query. The dictionaries should be converted to string.
33
+ - **references:** a lift of list of dictionaries where each dictionary consists of the relevant order for the documents for a given query in a sorted relevancy order. The dictionaries should be converted to string.
34
 
35
  ### Output Values
36
+ - **map (`float`):** mean average precision score. Minimum possible value is 0. Maximum possible value is 1.0
37
 
 
 
 
 
 
 
 
 
 
38
 
39
  ## Limitations and Bias
40
  *Note any known limitations or biases that the metric has, with links and references if possible.*
41
 
42
  ## Citation
43
+ ```bibtex
44
+ @inproceedings{ranx,
45
+ author = {Elias Bassani},
46
+ title = {ranx: {A} Blazing-Fast Python Library for Ranking Evaluation and Comparison},
47
+ booktitle = {{ECIR} {(2)}},
48
+ series = {Lecture Notes in Computer Science},
49
+ volume = {13186},
50
+ pages = {259--264},
51
+ publisher = {Springer},
52
+ year = {2022},
53
+ doi = {10.1007/978-3-030-99739-7\_30}
54
+ }
55
+ ```
map.py CHANGED
@@ -11,58 +11,62 @@
11
  # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
  # See the License for the specific language governing permissions and
13
  # limitations under the License.
14
- """TODO: Add a description here."""
15
 
16
  import evaluate
17
  import datasets
 
 
 
18
 
19
 
20
- # TODO: Add BibTeX citation
21
  _CITATION = """\
22
- @InProceedings{huggingface:module,
23
- title = {A great new module},
24
- authors={huggingface, Inc.},
25
- year={2020}
 
 
 
 
 
 
26
  }
27
  """
28
 
29
- # TODO: Add description of the module here
30
  _DESCRIPTION = """\
31
- This new module is designed to solve this great ML task and is crafted with a lot of care.
 
32
  """
33
 
34
 
35
- # TODO: Add description of the arguments of the module here
36
  _KWARGS_DESCRIPTION = """
37
- Calculates how good are predictions given some references, using certain scores
38
  Args:
39
- predictions: list of predictions to score. Each predictions
40
- should be a string with tokens separated by spaces.
41
- references: list of reference for each prediction. Each
42
- reference should be a string with tokens separated by spaces.
43
  Returns:
44
- accuracy: description of the first score,
45
- another_score: description of the second score,
46
  Examples:
47
- Examples should be written in doctest format, and should illustrate how
48
- to use the function.
49
-
50
- >>> my_new_module = evaluate.load("my_new_module")
51
- >>> results = my_new_module.compute(references=[0, 1], predictions=[0, 1])
 
 
 
 
 
 
52
  >>> print(results)
53
- {'accuracy': 1.0}
54
  """
55
 
56
- # TODO: Define external resources urls if needed
57
- BAD_WORDS_URL = "http://url/to/external/resource/bad_words.txt"
58
-
59
-
60
  @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
61
  class map(evaluate.Metric):
62
- """TODO: Short description of my evaluation module."""
63
-
64
  def _info(self):
65
- # TODO: Specifies the evaluate.EvaluationModuleInfo object
66
  return evaluate.MetricInfo(
67
  # This is the description that will appear on the modules page.
68
  module_type="metric",
@@ -71,25 +75,31 @@ class map(evaluate.Metric):
71
  inputs_description=_KWARGS_DESCRIPTION,
72
  # This defines the format of each prediction and reference
73
  features=datasets.Features({
74
- 'predictions': datasets.Value('int64'),
75
- 'references': datasets.Value('int64'),
76
  }),
77
  # Homepage of the module for documentation
78
- homepage="http://module.homepage",
79
- # Additional links to the codebase or references
80
- codebase_urls=["http://github.com/path/to/codebase/of/new_module"],
81
- reference_urls=["http://path.to.reference.url/new_module"]
82
  )
83
 
84
- def _download_and_prepare(self, dl_manager):
85
- """Optional: download external resources useful to compute the scores"""
86
- # TODO: Download external resources if needed
87
- pass
88
-
89
  def _compute(self, predictions, references):
90
  """Returns the scores"""
91
- # TODO: Compute the different scores of the module
92
- accuracy = sum(i == j for i, j in zip(predictions, references)) / len(predictions)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
  return {
94
- "accuracy": accuracy,
95
  }
 
11
  # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
  # See the License for the specific language governing permissions and
13
  # limitations under the License.
14
+ """Mean average precision metric"""
15
 
16
  import evaluate
17
  import datasets
18
+ import json
19
+ from ranx import Qrels, Run
20
+ from ranx import evaluate as ran_evaluate
21
 
22
 
 
23
  _CITATION = """\
24
+ @inproceedings{ranx,
25
+ author = {Elias Bassani},
26
+ title = {ranx: {A} Blazing-Fast Python Library for Ranking Evaluation and Comparison},
27
+ booktitle = {{ECIR} {(2)}},
28
+ series = {Lecture Notes in Computer Science},
29
+ volume = {13186},
30
+ pages = {259--264},
31
+ publisher = {Springer},
32
+ year = {2022},
33
+ doi = {10.1007/978-3-030-99739-7\_30}
34
  }
35
  """
36
 
 
37
  _DESCRIPTION = """\
38
+ This is the mean average precision (map) metric for retrieval systems.
39
+ It is the average of the precision scores computer after each relevant document is got. You can refer to [here](https://amenra.github.io/ranx/metrics/#mean-average-precision)
40
  """
41
 
42
 
 
43
  _KWARGS_DESCRIPTION = """
 
44
  Args:
45
+ predictions: dictionary of dictionaries where each dictionary consists of document relevancy scores produced by the model for a given query
46
+ One dictionary per query.
47
+ references: List of list of strings where each lists consists of the relevant document names for a given query in a sorted relevancy order.
48
+ The outer list is sorted from query one to n.
49
  Returns:
50
+ map (`float`): mean average precision score. Minimum possible value is 0. Maximum possible value is 1.0
 
51
  Examples:
52
+
53
+ >>> my_new_module = evaluate.load("map")
54
+ >>> results = my_new_module.compute(
55
+ references=[
56
+ ["d_1", "d_2"],
57
+ ["d_2", "d_3", "d_5"]
58
+ ]
59
+ predictions={
60
+ "q_1": { "d_1": 0.9, "d_2": 0.8, },
61
+ "q_2": { "d_2": 0.9, "d_1": 0.8, "d_5": 0.7, "d_3": 0.3} }
62
+ )
63
  >>> print(results)
64
+ {'map': 0.902777}
65
  """
66
 
 
 
 
 
67
  @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
68
  class map(evaluate.Metric):
 
 
69
  def _info(self):
 
70
  return evaluate.MetricInfo(
71
  # This is the description that will appear on the modules page.
72
  module_type="metric",
 
75
  inputs_description=_KWARGS_DESCRIPTION,
76
  # This defines the format of each prediction and reference
77
  features=datasets.Features({
78
+ 'predictions': datasets.Value("string"), #list[dict],
79
+ 'references': datasets.Value("string")#datasets.Sequence(datasets.Sequence(datasets.Value("string"))), #list[list[str]],
80
  }),
81
  # Homepage of the module for documentation
82
+ reference_urls=["https://amenra.github.io/ranx/"]
 
 
 
83
  )
84
 
 
 
 
 
 
85
  def _compute(self, predictions, references):
86
  """Returns the scores"""
87
+ preds = {}
88
+ refs = {}
89
+ for pred in predictions:
90
+ preds = preds | json.loads(pred)
91
+ for ref in references:
92
+ refs = refs | json.loads(ref)
93
+
94
+ run = Run(preds)
95
+ """gt_dict = {}
96
+ for i in range(len(references)):
97
+ per_query_gt = {}
98
+ for rank in range(len(references[i])):
99
+ per_query_gt[references[i][rank]] = rank+1
100
+ gt_dict[f"q_{i+1}"] = per_query_gt"""
101
+ qrels = Qrels(refs)
102
+ map_score = ran_evaluate(qrels, run, "map")
103
  return {
104
+ "map": map_score,
105
  }
requirements.txt CHANGED
@@ -1 +1,2 @@
1
- git+https://github.com/huggingface/evaluate@main
 
 
1
+ git+https://github.com/huggingface/evaluate@main
2
+ ranx==0.3.19