Gutenberg-nya sendiri. Membuat buku paralel

Buku paralel Lingtrain







Jika Anda suka belajar bahasa (atau Anda mengajari mereka), maka Anda mungkin pernah menemukan cara belajar bahasa seperti membaca paralel. Ini membantu Anda membenamkan diri dalam konteks, meningkatkan kosakata, dan membuat belajar menjadi menyenangkan. Menurut pendapat saya, ada baiknya membaca teks dalam bahasa aslinya secara paralel dengan teks Rusia, ketika dasar-dasar tata bahasa dan fonetik sudah dikuasai, sehingga tidak ada yang membatalkan buku teks dan guru. Tetapi ketika berbicara tentang membaca, Anda ingin memilih sesuatu yang Anda sukai, atau sesuatu yang sudah dikenal atau dicintai, dan ini seringkali tidak mungkin, karena belum ada yang merilis versi buku paralel seperti itu. Dan jika Anda tidak belajar bahasa Inggris, tetapi bahasa Jepang konvensional atau Hongaria, maka sulit untuk menemukan materi yang menarik sama sekali dengan terjemahan paralel.







Hari ini kami akan mengambil langkah tegas untuk memperbaiki situasi ini.









. " " .







( , ):







TO KILL A MOCKINGBIRD
by Harper Lee
DEDICATION
for Mr. Lee and Alice
in consideration of Love & Affection
Lawyers, I suppose, were children once.
Charles Lamb
PART ONE
1
When he was nearly thirteen, my brother Jem got his arm badly
broken at the elbow. When it healed, and Jem’s fears of never being
able to play football were assuaged, he was seldom self-conscious about
his injury. His left arm was somewhat shorter than his right; when he
stood or walked, the back of his hand was at right angles to his body,
his thumb parallel to his thigh. He couldn’t have cared less, so long as
he could pass and punt.
      
      








 

 

, ,  -  .
     

 

1

  ,      ,     .       ,      ,     .      ;     ,      .       -        .
      
      







, :









, .









, . :







  • . (, ..), . .
  • , - .
  • , (, , ), , .


lingtrain-aligner, python, . , . .







, . . , 50- , — . , , . , , .







, :







  • .
  • .
  • razdel .
  • , .


, .

.











, . , . .







%%%%%title.
%%%%%author.
%%%%%h1. %%%%%h2. %%%%%h3. %%%%%h4. %%%%%h5.
%%%%%divider.
%%%%%.




: [.,:,!?] , .









  1. ( , , , ).
  2. .
  3. (H1 , H5 ). , .
  4. , , ( ).


, . , :







TO KILL A MOCKINGBIRD%%%%%title.
by Harper Lee%%%%%author.

%%%%%divider.

PART ONE%%%%%h1.
1%%%%%h2.

When he was nearly thirteen, my brother Jem got his arm badly
broken at the elbow. When it healed, and Jem’s fears of never being
able to play football were assuaged, he was seldom self-conscious about
his injury. His left arm was somewhat shorter than his right; when he
stood or walked, the back of his hand was at right angles to his body,
his thumb parallel to his thigh. He couldn’t have cared less, so long as
he could pass and punt.

...
      
      








 %%%%%author.
 %%%%%title.

%%%%%divider.

 %%%%%h1.
1%%%%%h2.

  ,      ,
    .       ,
     ,     .   
  ;     ,    
 .       -      
 .

...
      
      





"" (" ", " " ..) h1, h2. .









Colab



Colab . , . . html .









, .

:







pip install lingtrain-aligner
      
      





:







from lingtrain_aligner import preprocessor, splitter, aligner, resolver, reader, vis_helper
      
      





:







text1_input = "harper_lee_ru.txt"
text2_input = "harper_lee_en.txt"

with open(text1_input, "r", encoding="utf8") as input1:
  text1 = input1.readlines()

with open(text2_input, "r", encoding="utf8") as input2:
  text2 = input2.readlines()
      
      





SQLite ( ) lang_from lang_to. , :







db_path = "db/book.db"

lang_from = "ru"
lang_to = "en"

models = ["sentence_transformer_multilingual", "sentence_transformer_multilingual_labse"]
model_name = models[0]
      
      





:







splitter.get_supported_languages()
      
      





, , xx, . sentence_transformer_multilingual 50+ , sentence_transformer_multilingual_labse 100+ .







:







text1_prepared = preprocessor.mark_paragraphs(text1)
text2_prepared = preprocessor.mark_paragraphs(text2)
      
      





:







splitted_from = splitter.split_by_sentences_wrapper(text1_prepared , lang_from, leave_marks=True)
splitted_to = splitter.split_by_sentences_wrapper(text2_prepared , lang_to, leave_marks=True)
      
      





, . , , . UI, . , .







aligner.fill_db(db_path, splitted_from, splitted_to)
      
      





. batch_size, window, . , . . , , .







batch_ids = [0,1,2,3]

aligner.align_db(db_path, \
                model_name, \
                batch_size=100, \
                window=30, \
                batch_ids=batch_ids, \
                save_pic=False,
                embed_batch_size=50, \
                normalize_embeddings=True, \
                show_progress_bar=True
                )
      
      







! , . vis_helper. 400, , batch_size=400. , , batch_size=50, 4 -.







vis_helper.visualize_alignment_by_db(db_path, output_path="alignment_vis.png", \
                                    lang_name_from=lang_from, \
                                    lang_name_to=lang_to, \
                                    batch_size=400, \
                                    size=(800,800), \
                                    plt_show=True)
      
      





Perataan primer







. , . :







  • .

    • , . , , , .
  • .

    • . " " , . , , . .


. , .







. , , — , , .







. .



. , , , . , 10,11,12 15,16,17 . . , . . resolver.







:







conflicts_to_solve, rest = resolver.get_all_conflicts(db_path, min_chain_length=2, max_conflicts_len=6)
      
      





conflicts to solve: 46
total conflicts: 47
      
      





conflicts_to_solve , , rest .







:







resolver.get_statistics(conflicts_to_solve)
resolver.get_statistics(rest)
      
      





('2:3', 11)
('3:2', 10)
('3:3', 8)
('2:1', 5)
('4:3', 3)
('3:5', 2)
('6:4', 2)
('5:4', 1)
('5:3', 1)
('2:4', 1)
('5:6', 1)
('4:5', 1)
('8:7', 1)
      
      





, 2:3 3:2, , , .







:







resolver.show_conflict(db_path, conflicts_to_solve[10])
      
      





124      ,         .
125     , , —    .
126        .

122 The Radley Place jutted into a sharp curve beyond our house.
123 Walking south, one faced its porch; the sidewalk turned and ran beside the lot.
      
      





, 125 126 , [124]-[122] [125,126]-[123]. ? , . , , , . :







  1. [124,125]-[122] // [126]-[123]
  2. [124]-[122] // [125,126]-[123]


, , — 2 ( ) 6. , . , , .







:







steps = 3
batch_id = -1 #   

for i in range(steps):
    conflicts, rest = resolver.get_all_conflicts(db_path, min_chain_length=2+i, max_conflicts_len=6*(i+1), batch_id=batch_id)

    resolver.resolve_all_conflicts(db_path, conflicts, model_name, show_logs=False)

    vis_helper.visualize_alignment_by_db(db_path, output_path="img_test1.png", batch_size=400, size=(800,800), plt_show=True)

    if len(rest) == 0:
        break
      
      





:







Resolusi konflik.  Langkah 1







:







Resolusi konflik.  Langkah 2







book.db. .









, , . :







resolver.fix_start(db_path, model_name, max_conflicts_len=20)
      
      











resolver.fix_end(db_path, model_name, max_conflicts_len=20)
      
      







reader.







from lingtrain_aligner import reader
      
      





, , :







paragraphs_from, paragraphs_to, meta = reader.get_paragraphs(db_path, direction="from")
      
      





direction ["from", "to"] . (, ) .







create_book():







reader.create_book(paragraphs_from, paragraphs_to, meta, output_path = f"lingtrain.html")
      
      





:













html . , pdf, .









. , . template.







reader.create_book(paragraphs_from, paragraphs_to, meta, output_path = f"lingtrain.html", template="pastel_fill")
      
      











reader.create_book(paragraphs_from, paragraphs_to, meta, output_path = f"lingtrain.html", template="pastel_start")
      
      











, , .









template="custom" styles. CSS , .







, :







my_style = [
    '{}',
    '{"background": "#fafad2"}',
    ]

reader.create_book(paragraphs_from, paragraphs_to, meta, output_path = f"lingtrain.html", template="custom", styles=my_style)
      
      











span' :







my_style = [
    '{"background": "linear-gradient(90deg, #FDEB71 0px, #fff 150px)", "border-radius": "15px"}',
    '{"background": "linear-gradient(90deg, #ABDCFF 0px, #fff 150px)", "border-radius": "15px"}',
    '{"background": "linear-gradient(90deg, #FEB692 0px, #fff 150px)", "border-radius": "15px"}',
    '{"background": "linear-gradient(90deg, #CE9FFC 0px, #fff 150px)", "border-radius": "15px"}',
    '{"background": "linear-gradient(90deg, #81FBB8 0px, #fff 150px)", "border-radius": "15px"}'
    ]

reader.create_book(paragraphs_from, paragraphs_to, meta, output_path = f"lingtrain.html", template="custom", styles=my_style)
      
      













. , , ( ), . .







Patreon'e.









[1] lingtrain-aligner github.







[2] Google Colab.







[3] Sentence Transformers .







[4] Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation







[5] Encoder Kalimat BERT Agnostik Bahasa .








All Articles